OPINION23 January 2020

If data isn’t truth, then what is it?

Data analytics Opinion UK

Datasets are social constructs and require an interdisciplinary approach to be properly unpacked, writes Mihajlo Popesku.

Data analysis abstract image

As an article in Forbes last year pointed out, ‘data isn’t truth'. It is better described as the source material of models that seek to represent reality, which thereby allow us to falsify a given proposition. Its centrality to any scientific exercise is that it helps to build evidence: this is what we really mean by ‘data science'.

Standing in the way of this outcome are multiple sources of bias – in the design, collection, use, and deployment of datasets. Carl Jung wrote: "It is easier to go to Mars or the moon than to penetrate one’s own being." The same is ultimately true of data: especially data related to human activity, not least as this is often self-reported or man-made.

We clearly need the best data – not simply the most readily-available or the cheapest – and the most effective combination of datasets. Yet these choices are usually informed by the hypothesis of what might be useful. The accidental discoveries of penicillin and x-rays are a lesson in the value of ‘by-product’ results that may not conform to expectations.

Yet pure, deductive data-modelling also has the power to lash out unexpectedly. Sometimes this means challenging our own biases in uncomfortable ways and often it means presenting conclusions which are simply unusable. This could mean being too objective: as in the case of the programmers’ joke about the algorithm that suggests you jump off a cliff because all your friends have.

That such outcomes are still so commonplace shows that data science remains in its infancy when it comes to providing objective models of human behaviour. Even so, the temptation to fall back on more inductive, evidence-based models should be avoided. Such models are weaker precisely because they are conditioned not to lash out in the same way, hence they offer only more incremental advances, rather than holding up the type of mirror which might offer a step-change. 

In vitro veritas?

For the purposes of this article, I will focus on data drawn from social media. It is becoming apparent that cognitive dissonance is much greater online, meaning people are willing to hold and propagate views much more fiercely than in real life. Online surveys have been conducted that asked participants to comment on policies which do not exist; which they did while hiding behind online personas. The challenge is therefore to reverse-engineer people’s social reality from its online equivalent.

Self-reported survey data can have a further distorting effect; including social desirability bias, ‘satisficing', acquiescence, non-attitudes, and priming. It is necessary to methodologically triangulate which biases come into play, and therefore which results we can trust. Additionally, echo chambers exacerbate the above self-selective processes, as it is easy to block out disagreement and to curate one’s own environment.

How can we step back from these traps, towards methodologies that more accurately reflect social reality rather than its online counterpart? 

Thinking outside the black box

The best approach is to stop thinking of such data and datasets as purely scientific ‘evidence'. They are a social construct, created by layers of decision-making by those who create, populate, acquire and deploy them. Treating datasets as such necessitates a more interdisciplinary approach to unpacking them – including, at the very least, marketing science, applied social research, psychology and behavioural economics.

Although this moves us back towards a ‘mixed-inductive’ model, it helps us get to the stage where such data modelling becomes useful in real-world situations. This can mean either getting the ‘human element’ into data models ("No, I will not jump off a cliff because my friends have"); or out of it, for example, weeding out the magnified transmission of unconscious biases. Without taking such measures, we are not going to reach an objective view of the social realities behind the datasets.

Seeking out truly strong, multi-sourced insights may further demand the creation of open data platforms to allow researchers to separate genuine signals from the noise. Such platforms could be very powerful when it comes to mapping critical issues such as the spread of disease, crime, or radicalisation. 

Such cross-pollination can lead to profound, applicable insights which allow us to bring our models to life.

For instance, the application of a common overarching insight from ‘psychology of judgement’ literature – Kahneman’s System 1 versus System 2 – can be more helpful in understanding reputation than the accumulation of ever more ‘blind’ data. In relevant cases, this gives rise to the question of whether marketers should engage with the prevailing system – or attempt to overcome it by appealing to the other system. Failing to factor in such issues is more likely to result in a model which ‘lashes out’ unexpectedly, and therefore provides a poor basis for intervention.

Building in this type of interdisciplinary approach should mean engaging with ‘black swan’ results in a different way. After all, replicability is only a goal if it advances us towards actionable real-world insights. It is preferable that models ‘learn to fail’ rather than seek replicability as a goal in its own right.

Yet greater nuance also means accepting a more probabilistic mindset of the world, taking into account multiple outcomes. The more binary and deterministic results which have so far dominated statistical thinking can be out of place in the human world. In their place, we are likely to move more towards Bayesian-type statistical thinking around insights, allowing us to ask not simply whether the data is ‘true', but rather to what extent we can rely on the learnings derived from our research and data analysis.

The idea of ‘big data’ has become an idol of the present age. It is being accumulated faster than it can be understood or deployed. The real challenge is not simply to build a giant then stand on its shoulders and enjoy the view: it is to get the giant to walk.

Mihajlo Popesku is head of research at Auspex International