OPINION15 May 2014

Is big data going soft?

Opinion

Market researchers have long been confident that there are some questions big data can’t answer. But new work in the field suggests that confidence might be misplaced, says Colin Strong.

But big data faces its own set of challenges. The reported failure of Google to accurately predict Flu Trends may prove to be a turning point in popular perceptions of the supposed infallability of big data. Likewise, there are also questions about the matter of representativeness. The idea that we can easily replace carefully designed sample with an ‘N=All’ approach is starting to look a little naive.

But while big data might be looking a little weaker in some respects, in others its strengthening its hand. We in market research have been feeling pretty secure that even if big data can capture everything about what we do, then organisations will still need market research to explain why we do it. So here’s the bad news. Evidence is mounting that big data can tell us a lot about the soft issues that we had assumed were the preserve of market research.

One example that I have written about previously was work undertaken by Cambridge University and the Microsoft Research Centre, which found that Facebook Likes can be used to predict a variety of personal attributes including religion, politics, race and sexual orientation. Their research involved 58,000 Facebook users in the US who completed a psychometric questionnaire through an app called ‘myPersonality’. Those taking the test were asked to provide the researchers with access to their Facebook data. The team were able to create some highly predictive models using these Likes. For example, they were able to identify male sexuality and sort African-Americans from Caucasian Americans, Christians from Muslims and Republicans from Democrats. There were also some pretty impressive figures for predicting relationship status and substance abuse.

“Evidence is mounting that big data can tell us a lot about the soft issues that we had assumed were the preserve of market research”

Another example is a study by researchers at Cornell University, who analysed over 1.5 million geotagged tweets from almost 10,000 people in the US. They wanted to understand if the content of the tweets themselves could be used to predict the location of the user, as identified from the geotagging. So they divided the data set in two, using 90% of the tweets to train their algorithm and the remaining 10% to test it against. What they found was that tweets contained an awful lot of information about the likely location of the user. Some of it was obvious, such as tweets that were generated by the location-based social networking site Foursquare, thus giving exact location. Other tweets contained references to the city they were in. And others made reference to events that were taking place in their location. As a result of all this information, they were able to create an algorithm that correctly predicted people’s home cities 68% of the time, their home state 70% of the time and their time zone 80% of the time.

Finally, a paper published in Nature last year found that the lifestyles of mobile phone users could be predicted from their patterns of movement. This was based on an analysis of mobile phone location data, using the data automatically generated as the device pings the network at regular intervals, regardless of whether it is in use or not. The analysis found it was possible to allocate to 95% of users a unique ‘fingerprint’ based on their movements, so that it was possible to accurately predict at what time of the day individuals would be in a certain neighbourhood or town. When linking this information to mapping data, it was then possible to infer a lot about that individual’s lifestyle.

It is highly likely that these studies represent merely the tip of the iceberg of activity that is underway in this area. It is usually only academic researchers that place their findings in the public domain and make them available for peer review. And academics often struggle to get access to big data assets. So we can assume that this sort of activity is being widely undertaken by many data-intensive industries including, of course, database marketing organisations.

So is this a threat or an opportunity for market research? While it is hard to see how the current level of understanding in this area (as outlined above) could directly displace much of the existing portfolio of market research survey work, the issue is less to do with what is possible now then the direction this is going in. In a very short space of time we have reached a position where some fairly basic, but intimate, information has been accurately inferred from our data trails. What if we can start inferring levels of customer satisfaction from digital behaviour? What if we can start inferring consumers’ needs before they have expressed them? That might sound far-fetched but it is exactly what Google is exploring with their anticipatory systems such as Google Now.

Yet again, this is a reinforcement of the need for market research to engage with a much wider set of tools than that of its traditional repertoire. We need an understanding of the academic literature around consumer behaviour, access to and ability to handle large scale data sets, and a facility to leverage this to meet business needs. These sound like skills which reside in our industry and as such we should be perfectly placed to meet this exciting challenge. But only if we see this as an opportunity and act before others do.

  • Colin Strong is head of industry, GfK UK
  • Editor’s note: Article updated on 16/05/14 at author’s request

5 Comments

6 years ago

Once again, Colin nails it. By bringing big data analysis and tools inside our remit, we enter upon a very exciting time for MR. I completely endorse his recommendation that the industry engage with academia to understand the sorts of tools and applications they are using. I will be attending a conference at Michigan State University next week which will consider this and the talent we will need in the future. May it be the first of many where practitioners, clients, associations and academia get together to grapple with these issues.

Like Report

6 years ago

Thanks Simon for those comments. I should also add an addendum to the article that MR is also well placed to explore these issues as there are potential privacy implications which, from my experience at GfK, the industry is very well placed to manage. Our interest is in the aggregate not the individual which creates a fundamentally different orientation to these issues than is perhaps present elsewhere.

Like Report

6 years ago

Whats your view on this as a a future when users of FB/Twitter are locking down their own details and being less explicit (an assumption rather than fact!) about things they talk about? Are people are still likely to use 'like' but are probably less likely to say 'I am in Ibiza this week party on duddes' for fear of coming home to a burgled household?

Like Report

6 years ago

"Big data vs. MR" is a false dichotomy, as if these are two competing solutions to market and customer insights. I am firmly in the camp of "both" even though most of my career has been on the MR side. Big data is in the hype cycle where early adopters and pundits tout it as a replacement for every "insight technology" that came before it. The truth is that it is an exciting and powerful new tool enabled by massive computational power but we need to see it correctly as a complement and not a substitute for [proper] attitudinal research. More on this in a recent LinkedIn thread in the Customer Experience Professionals Association group: http://linkd.in/1jaebbK.

Like Report

6 years ago

Hi Brian. I agree with your point that Big Data vs MR is a false dichotomy. My concern is that much of the MR community does not embrace the challenge / opportunities of Big Data and we should be getting it more integrated into our repertoire. Not least because of my point that it is starting to look like an attitudinal as well as behavioural tool.

Like Report