FEATURE10 May 2016

Finding the super-predictors

x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.

Behavioural science Features Impact UK

Techniques vary for asking people to make predictions and some individuals are more accurate than others, if only they can be identified, says Jon Puleston

Individual egg crop

Before you read this article, can I ask you to make a prediction: do you think it will rain in London next Wednesday?

The traditional approach to making predictions has been to ask a representative target group a question and aggregate the answers, relying on their views being reflective of the wider population. It can be costly and difficult to reach truly representative audiences and we have always faced difficulties dealing with what people say, and what they actually do, which can be quite different.

In the late 1990s, prediction market protocols gained attention as a more efficient means of making predictions. Based on the theory of the wisdom of the crowds – if you ask a group to forecast an outcome and their opinions are all independent and unbiased, the prediction errors of individuals should cancel themselves out, and what remains is the distillation of the knowledge.

This is far less reliant on a balanced sample, and predictive perspective can allow people to look at situations more objectively than our often subjective perspective of looking at our own behaviour.

A very effective technique for certain types of research prediction tasks, it can be uncannily accurate for doing things like price predictions, and is particularly effective for advertising copy evaluation too. We have found prediction protocols like this can deliver results that highly correlate with traditional rating techniques but require far smaller-size samples – around 40% less.

The nagging problem with this methodology, though, is that it’s not always 100% reliable. In fairly extensive experiments testing prediction markets, we found around 10% of ad predictions were miscalled. This might seem relatively small but it is a worrying error and, using these prediction techniques for other research tasks, we have witnessed errors that can rise to 100%.

cognitive biases

Because making predictions is not straightforward, the majority of people of whom you ask the question: ‘Will it rain in London next week?’ will predict rain. This is because of a popular misapprehension that it rains a lot in London. Well, yes it does, but even in the winter there is a less than 50% chance of it raining, so statistically crowd weather predictions are fairly useless.

This is an example of the cognitive biases – or short-cuts in thinking processes – used to make predictions causing network errors that can trip up crowd predictions. When testing advertising, the one in 10 occasions predictions go wrong tend to be instances when respondents are evaluating ads for more famous brands. People predict they will be liked more just because the brand is famous.

If you ask people to predict who will win an election for example, political preferences badly corrupt things. Even the wording of a question can knock a prediction over. Do more people in the UK own dogs or cats? There is a 20% prediction shift if you simply change the order of wording to ‘cats or dogs’.

This may be why prediction market protocols have not seen a wider uptake in the research community – we simply can’t always trust them.

But I believe a solution to these problems is emerging from the pioneering academic work of Philip Tetlock who, for the past 15 years, has studied the science of group forecasting, conducting a series of large-scale, long-term forecasting tournaments.

Three years ago I joined one of these tournaments, called the Good Judgement Project and, along with 3,000+ other people, we were challenged to make a series of ongoing geopolitical predictions. Over several years, they tracked how successful each person was at predicting. See Tetlock’s book Superforecasting for the outcomes.

Essentially what Tetlock and his team discovered was that among the crowd are some forecasters who are significantly and consistently better at making predictions than others – he named these individuals ‘superforecasters’. Small groups of these superforecasters working in teams could significantly and consistently out-predict the larger crowds.

Inspired by this methodological approach, two years ago we started to conduct some of our own – albeit more modest – longitudinal prediction experiments on our panels. We wanted to see if this approach could be applied to consumer research.

Could we isolate some panellists who were better at making predictions on certain topics? In essence, we started to see that the prediction performance of our sample was indeed stratified. Some respondents were generally better at making certain types of predictions than others. We were able to isolate people who were slightly better at spotting good ads; better at predicting who would leave The X Factor; better at assessing the market price of products; and we even found some people who were better at predicting the weather.

By removing the poor predictors causing network errors, we found prediction quality could be significantly improved and forecasts started to become a lot more reliable.

no short-cuts

So we found a solution – work out who are the best people at making predictions on any one topic and just ask them. But the story doesn’t end there. Determining those who make good predictors was easier in some cases than others. To work out if someone would be good at predicting the weather was easy – we just had to ask them if they had seen the weather forecast. But in most cases there are no short-cuts like this; the only way to find out was by going to the trouble of asking them to make a series of up-front predictions – usually over a series of surveys – and auditing their performance. In Tetlock’s experiments, for example, each person had to make at least 20 predictions before reliability could be determined, which took at least a year.

The solution therefore moves from a theoretical challenge to a practical one; how to motivate people to take part in an extended series of surveys to find out who was good or not at making predictions and – once we had found the good predictors – working out how to keep hold of them.

Because typical re-invite participation rates in surveys on our panel can be less than 60%, the number completing five surveys in a row can fall to below 25%. To get higher volumes of participants to repeat-participate we had to adopt a more ‘respondent-centred’ approach to survey design that had to be – first and foremost – rewarding.

As a result, we have developed an approach to survey design we’ve named ‘surveytainment’. Thinking about the surveys like a publisher would about a magazine – injecting editorial content alongside advertising – we added content to the surveys specifically to engage respondents. It became a more game-like experience, positioning them as quizzes and challenges, where respondents would only find out the answers if they returned to complete the next wave. They were treated like a community by sharing participants’ thoughts and opinions.

We have been able to increase five-in-a-row cross-participation to levels reaching 80%, which means we can start to apply this crowd-filtering methodology practically.

These techniques mean completely rethinking assumptions about sample sizes. The sample needed to produce the most reliable prediction depends entirely on the distribution curve of the prediction skills of the base sample. In many instances micro-samples of better predictors have the potential to produce far more stable predictions than much larger groups.

Some 20 predictors, who each can predict with 65% reliability, will collectively generate a more stable prediction than 100 people who can only predict at 55% reliability. Roughly speaking, for every 5% improvement in prediction reliability of your sample, you need half the sample number to get the same reliability.

We are now exploring the application of this crowd distillation approach across a range of consumer categories, with the aim of isolating experts on different topics.

This approach to prediction does require more upfront investment, more creative approaches to engage respondents, and a different mind-set on sample sizes, but it has the potential to transform many aspects of consumer research.

Jon Puleston is vice-president of innovation at Lightspeed GMI

0 Comments