OPINION25 June 2014

Avoiding the Big Data traps

Opinion

By applying some of the basic principles of survey research and statistics, Ipsos MORI Digital’s Claire Emes says many of the short-comings of Big Data can be overcome.

Res_4011873_claire_emes

Much has been written on the pros and cons of Big Data; in fact the White House recently published a report titled ‘Big Data: Seizing Opportunities, Preserving Values’ which examines how big data is changing the way we live and work.

What follows is certainly not a definitive guide, rather it is a brief critique of Big Data from a researcher’s perspective and how, by applying some of the basic principles of survey research and statistics, we can overcome many of its shortcomings and unlock its value.

Size isn’t everything

In his book, The Signal and the Noise: Why So Many Predictions Fail but Some Don’t, Nate Silver suggests the quantity of information in the world is increasing by 2.5 quintillion bytes per day, but the amount of useful information almost certainly isn’t. He explains that most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test and so many data sets to mine, but according to Silver there is only a relatively constant amount of objective truth to find.

It’s quality not quantity

Taking the principle of bigger isn’t always better one step further, I’d suggest it’s not only not better, it can actually be worse. A number of proponents of Big Data refer to a Big Data set as one where ‘N = All’, where we no longer have to sample as we have access to the entire background population. But is ‘N = All’ really a good description of most available data sets? Do we ever really have all of the data?

As the economist Tim Harford and Microsoft’s Kate Crawford, among others, point out, most Big Data sets contain systematic biases. It takes careful thought to identify and correct for these skews. Big data sets can seem comprehensive but ‘N = All’ is often a seductive illusion.

Think of Social Media, it is in principle possible to record and analyse every message on Twitter and use it to draw conclusions about the public mood, but even if we analyse every tweet, Twitter users are not representative of the population as a whole. According to Ipsos MORI’s tech tracker only 15% of the UK population are on Twitter and they are disproportionately young and from higher social grades. In most situations, we’d be better analysing an infinitely smaller but representative sample of the population we’re wishing to understand.

Known unknowns and unknown unknowns

Another issue is that if we rely on ‘found’ data alone, we’re constrained by what exists. As Nate Silver pointed out in his interview with Ipsos MORI’s CEO, Ben Page, “The credit rating agencies in advance of the crunch had millions of observations on individual mortgages, but all from a period when housing prices were increasing”.

It can be risky to rely entirely on past observable behaviour and algorithms. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. This is particularly concerning when people feel that they can be more certain about their predictions because the size of the data set means that they’ve got the numbers to back it up.

Perhaps we could go so far as to suggest that Big Data can be dangerous. Big Data can mean big errors. The data can be wrong or misleading, but more often than not there are errors in interpretation rather than the data themselves. This is frightening if authorities wrongly predict a health scare (or fail to) and frustrating if a company tries to sell you something you already have or simply aren’t interested in.

Further, Big Data models do not just predict, they can make things happen by creating a behavioural loop. A person feeds in data, which is collected by an algorithm which then presents the person with choices, so steering behaviour. This can create efficiencies but it’s easy to see how this could result in yet more data skews or could be abused.

So Big Data can be unwieldy, misleading and possibly even hazardous but, despite this, we and many of our clients are genuinely excited about the opportunities it presents. Many of the projects we’re undertaking today leverage Big Data sources and techniques and we expect this to apply to even more of our work in the future.

Our experience suggests there are some key principles we need to consider to ensure we don’t fall into any Big Data traps.

  1. Ask the right questions. As with any study, a crucial element in managing a Big Data project is asking the right questions. In particular, how you define the problem.
  2. Evaluate the data. Once we’ve established what data we need, it’s important that we evaluate the quality of the data. As with most things, what you get out is only as good as the data you put in.
  3. Be aware of your assumptions. Not only do we need to understand the data and where it has come from, but we also need to consider the assumptions behind any models that the data is fed into and how these may differ from reality.
  4. Underpin with statistics. We should apply the rules of small data to Big Data – we need to understand any skews in the data and the probability of our predictions.  This is familiar territory. As researchers we underpin our work with statistics, we are used to dealing with bias and we present our survey data with confidence intervals. Big Data should be no different.
  5. Consider the privacy implications. The research industry is well placed to address the weighty issues surrounding privacy and data protection because anonymity and respect for the individual are core considerations in our work.
  6. Combine data sources. Our experience has demonstrated that we create the most value by combining data sources: a client’s operational  data, third party data, ‘found’ or public data, data we’ve collected passively via monitors and – more often than not – survey data (as we’ve found Big Data raises as many questions as it answers). This ‘data mash-up’ provides a more holistic picture of the ‘what’ and the ‘why’ allowing us to generate richer insights.
  7. Tell a story. Finally, the most sophisticated data analysis in the world won’t cut through if you don’t find a compelling way to communicate your findings.

In summary, while Big Data may not be the answer to all our questions, it can certainly provide a very useful contribution and, when combined with other sources of insight, helps us develop a deeper understanding of people’s motivations and behaviour.

Claire Emes is head of Ipsos MORI Digital

2 Comments

10 years ago

Small minds = small industry.

Like Report

10 years ago

I enjoyed this article. This one is bookmarked. ;)

Like Report