Last month, I attended Esomar’s Big Data World conference in Berlin, which (naturally) covered so-called ‘big data’ and how it pertains to market research. What follows is a synthesis of the main points I took from presentations and discussions, along with my opinions. As to what’s signal and what’s noise – well, that’s the great challenge.
Big Data ≈ the web, c.2000
It’s common knowledge that we’re now producing astronomical, Borgesian quantities of data. At the conference, statistics abounded: 500 million tweets a day; 10,000 sensors in an aeroplane’s wing; more data produced in the past two years than in all previous human history; exabytes; zettabytes... But what to do with it?
In scratching around for an analogy, the best I could come up with was the web, specifically e-commerce, around the turn of the millennium. Specialists have been in the game for a while and developed serious expertise; bigger and more forward-looking organisations have teams devoted to it; while the rest of us could be forgiven for wondering, perhaps somewhat fearfully, if and how to get started, and what getting started would even mean.
It’s instructive to think about big data this way, because it takes it from beyond a buzzword, a thing to have or a problem to solve, to something that’s far larger and more profound than any of us, and for which you – agency or client – therefore require a strategy. At Big Data World we were told cautionary tales about (mercifully nameless) organisations spending seven-figure sums collating all the data they ever thought they’d need, without adequately considering how they’d use it. Then spending the same again correcting the mistake.
Another decent analogy: social media, c.2010.
Where is all this data?
In research, our data comes from people, who typically come to us via two places: clients and panels.
For the former, the answer is obvious: clients hold more data than ever, and we as researchers need to ask for it. Otherwise, we are wasting their customers’ time, and potentially reducing information quality (where client data is more accurate, e.g. spend). And I see no fundamental reason why panels couldn’t provide demographics, purchasing behaviour, mobile activity, even Internet of Things data.
A common industry complaint is survey length: who takes a 20 minute survey? But if 10 of those minutes could be supplanted by information better captured elsewhere, the rest could focus on the ‘why’ – the attitudes and proposition responses we can’t get anywhere else.
Enhancing respondent-level analysis is the most immediate challenge for quantitative market research, and a good starting point for approaching big data more generally. At Relish, we’ve recognised the need to develop our our own dedicated analytics offer, specifically to provide new approaches based on data integration.
Are our jobs safe?
Meaning, will big data, machine learning and so on supplant traditional quantitative market research? Given that many have spent years reading the industry’s last rites, while others potter along seemingly oblivious to the potential cataclysm, let’s aim for more nuance.
One of the limitations of only relying on big data is that a data-driven model is predicated on history repeating itself. In recent years, Sky’s reward-based referral scheme had changed relatively little. Sky’s data team had created a propensity model for referral – but by necessity, this was based on the existing scheme.
Our hypothesis was that different propositions could unlock new ‘headroom’, and the solution presented at the conference was a ‘reverse segmentation’ marrying Sky’s data to responses to new propositions, leading to a 25% increase in communication response among key segments. So this gives cause for optimism: primary research still has an essential role.
However, some proposition testing can be done in a real-life environment. By combining browsing and purchase information, showing different adverts to users as they browse in their everyday life, and examining real uplift in sales, it is possible to identify successful adverts, and even segment users according to other behaviours. Though this approach is by no means universally applicable or effective, the traditional research survey has been eliminated altogether.
Should we all be learning R?
In a world with this much data, the traditional researcher’s toolkit of Excel plus maybe an SPSS package is looking decidedly outmoded. At Big Data World, the programming language most often cited was R, and there’s no doubting its power. But to obtain any sort of proficiency in R, or another language, requires significant time investment, and ideally a STEM background. In short, those currently in the industry may not find it feasible to up-skill.
The bigger question is: where should these skills sit? Reed Cundiff of Microsoft gave the brilliant analogy of traditional researchers as farmers, data scientists as miners – both valuable, and it may not make sense for any individual to specialise in both. And there will still always be a role for interpretation, communication and judgement. What is essential is that traditional researchers make themselves aware of big data tools and approaches, and consider the best way to incorporate them, even if they can’t deliver them personally.
To return at last to our e-commerce analogy, the high street still exists, but to be successful nowadays a physical outlet must learn to justify itself against its online competitors, while complementing its own online store if it has one.
Joe Catling is head of analytics at Relish
1 Comment
John Grono
8 years ago | 1 like
"more data produced in the past two years than in all previous human history" ? Surely this refers to more data stored digitally in the past two years ...
Like Reply Report