OPINION10 July 2019

What time do you call this?

Opinion UK

Ryan Howard argues why implicit reaction time (IRT) is a technique that is best put to sleep.

Daniel Kahneman’s fast and slow thinking popularised the age-old dual-process theory. Within months of its publication, the prevailing opinion was that marketers were ‘missing the other half of the coin’. Where is the insight into gut reactions, the unconscious and automatic? Quantitative researchers looked to techniques such as the implicit association test (IAT) to reveal what customers were ‘really’ thinking.

IAT is built on the premise that the harder one must work to rectify gut reactions, the slower the response or the more likely one is to make an error. The years have not been kind to this otherwise credible idea. An analysis of 492 studies subsequently found scant evidence linking differences in IAT scores with related behaviours1. Being expensive to administer, exhausting to complete, or lacking in benchmarks, implicit techniques with potential had already lost commercial footing.

Implicit reaction time (IRT) won out. Easy to administer, IRT addresses lots of attributes, and is fun to do. When Simpson Carpenter trialled IRT in 2012, results were initially encouraging and plausible. Upon the re-asking of identical questionnaires, less so. Results differed so substantially, the data so notably indisposed to severe cleaning and mathematical torture, that IRT was put on ice. Test-retest reliability is a prerequisite for validity. Plainly said, if one cannot return a similar result on the second time of asking, one is not measuring anything real. This is quite problematic because there is absolutely no getting around this, no matter how insightful results may appear.

While it is conceivable that response time can be used to detect duplicity, this is not the context in which it is applied within IRT. Rather, IRT has no rules or incorrect answers to navigate, no struggle between head and heart. So it is hardly surprising to discover that evidence in support of IRT needs to cite experiments that are not based on reaction time.

Paddling the keyboard or screen as fast as one can go, as IRT instructs, still means that answers are subject to highly variable degrees of logical, deliberate filtering, at which point IRT has no advantage over any other form of stated response. Furthermore, dual-process theory maintains that just because mental processing is fast does not mean that System 1 is at work2.

More recently, I’m guided by the bold and comprehensive GfK paper3, which concluded:

  • The higher differentiation seen in IRT was attributed to higher error rather than reliable differences
  • IRT suffered low reliability and validity
  • Stated measures showed stronger relationships with purchase likelihood and recommendation

At best, IRT is too blunt an instrument, at worst, a reading or concentration test, perhaps the drunk cousin of direct questioning; incoherent, fumbling up the stairs, falling at every hurdle. Even if this weren’t so, our hasty judgements, mental blocks, consideration sets, impulse buys or instincts about brands are not functions of reaction time. In this sense, IRT is akin to using a stopwatch, instead of map, to plot a route.

More troubling is that IRT’s popularity has perpetuated the illusion that these faster, sometimes faulty, sometimes genius, emotionally tinged inclinations hide ‘under the skin’ as clear, stable and distinct concepts, with shared expression across situations and individuals. The hope is that if enough layers were to be peeled away, these inclinations can be revealed, then quantified. Of course, neurological and psychological literature strongly refute this possibility. ‘From everything we know, humans definitely do not work like that,’ would be the polite summary.

Skip forward to the IJMR lecture on Monday ( 8 July) in which the once leading advocate of IRT within the UK, Dr Ali Goode added: “The safest claim is that IRT is a measure of top of mind confidence – it does not measure the unconscious and it’s unhelpful to suggest it does anything of the sort. Given that reaction time adds so much noise, IRT should demonstrate that there are no confounding variables at play, and more importantly, why it should differ from self-reporting… It’s probably time to call time on IRT, or at least be extremely careful how the data is interpreted. It has to be used under advisement depending on the task that it is being applied to. There are better solutions.”

This is not to say that reaction time has no place in modern market research – a case can always be made but it would be limited. The burden of proof is quite high and too easily challenged. To my mind, it just seems like a lot of faff to collect data which we cannot, even under ideal conditions, be near confident in using. Ultimately, this is not how the industry creates value.

So, it is time to hand IRT a pint of water, put it to bed and find consolation in GfK’s third conclusion above; the strength of sober, reflective questioning. Post-rationalised answers still do an impressive job predicting and diagnosing, measuring much of what is possible to know. Goodnight IRT. It was fun.

Ryan Howard is director advanced analytics at Simpson Carpenter


1.      Forscher P., Lai C.K., Axt J.R., Ebersole C.R., Herman M., Devine P.G., & Nosek B. ( 2017 ). A Meta-Analysis of Change in Implicit Bias. 

2.       Evans, J. ( 2012 ). Questions and challenges for the new psychology of reasoning. Thinking & Reasoning. 18 ( 1 ): 5–31.

3.       Dieckmann A., Unfried M., Schreder R., & Kissel K. ( 2018 ). How valid are response-time measures for capturing implicit brand attitudes? GfK Working Paper Series, 7. 

1 Comment

12 months ago

Hi Ryan. We've used IRT for >12 years in commercial practice. As our founders are scientists and academics, we regularly conduct research on research and we've never found any issues with replicability. Ali Goode knows our approach and is supportive. Our approach has been successful in terms of finding associations between brands and motivations that distinguish brand equities and which correlate highly with 'willingness to pay'. This was the foundation of, for example, the advertising brief that led to T-Mobile's flash mob 'dance' ad in 2009 which, according to the IPA Eff Award-winning paper, grew sales by 49%. It's also been successful in identifying brands' distinctive assets. One of the world's biggest fmcg companies, which has its own behavioural science team staffed by academics and scientists, has chosen to use our approach having tried and tested many others. I'd question the value of GfK's finding that 'Stated measures showed stronger relationships with purchase likelihood and recommendation' because the last meta analysis of PI predicting actual behaviour that I saw was from TNS and showed a correlation of .27 which is hardly inspiring. We find much higher predictive power from IRT measures of implicit 'wanting'. In summary, I think it's a shame that you've had such poor experience of IRT. Based on our experience, I can only suggest that your, and GfK's, conclusions have been drawn from using paradigms that are not optimised.

Like Report