FEATURE17 November 2021

The history of the data economy: The new kings and queens of data

x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.

Data analytics Features Impact

Data is now the fuel that drives business – identifying potential markets, shaping new products, and targeting consumers. Impact has partnered with Significance, the magazine of the Royal Statistical Society, to jointly publish a series exploring the past, present and future of the data economy. This third part tells the story of the evolution of social media, which created rich and detailed data sources and positioned tech giants as data economies in their own right. By Timandra Harkness.

History-of-data-economy-2021-2

Until February 2004, a ‘face book’ was a paper directory that US students received to help them get to know each other, with names, photographs and a few biographical details. That was until Harvard University student Mark Zuckerberg had the idea of creating an online version.

Well over a thousand Harvard students signed up within 24 hours of TheFacebook’s launch. Today, Facebook (which dropped the ‘the’ in 2005 ) claims to have 2.85 billion monthly active users worldwide – more than half the world’s internet users and getting on for a third of the planet’s human population. Total revenue in 2020 was more than $85bn.

Any platform that attracts an audience of more than a billion people a day could expect to make money from advertising, but what makes Facebook’s advertising space so valuable is the ability to target the right pairs of eyes, and what makes that possible is data.

The ability to gather data from someone’s online behaviour, to build a profile of them and target them with online adverts is older than Facebook or its social media predecessors MySpace, Friendster and SixDegrees. In 1996, writer Melanie Warner described the now-familiar feeling in an article for Fortune magazine: “You sign on to your favourite website and voila! – up pops an ad for Happy Times Cruise Lines… Sure enough, you work in Connecticut, and you’ve been thinking about vacationing in the Mediterranean. But how do they know that? Whoever they are.”

As Warner goes on to explain, “they” are probably DoubleClick, an advertising broker launched in March of that year. By July, it had profiles for four million people, and 25 major websites on its books. “The next time you log on to a DoubleClick site, its software notes your email address, checks out your user profile, and uploads an ad customised for you – within milliseconds of your signing on,” she writes.

Underlying some modern iterations of this advertising system is a process known as real-time bidding (RTB), still used today by companies including Google, which bought DoubleClick in 2008. When you log on to a web page linked to an ad network, the network’s software will parse available information about you from your logins, cookies on your computer, and so on, and create a ‘bid request’ at an ad exchange.

“On that exchange, different advertisers will bid for the right to fill that space on your website,” says marketing analytics consultant Andrew Willshire. “The ad exchange will weigh up all the potential bidders, and whoever has bid the highest will get the right to serve that impression to the viewer. This process happens in thousandths of a second, which is why the adverts look like they are there the whole time to the user.”

Much the same thing happens when you log on to social platforms: you’ll see content that advertisers have bid for, based on your profile. The difference is that social media sites can know their audience like no advertising platform ever before.  

Changing the world?

The advent of social media in the early 21st century was rapid. By 2011, a UK government report estimated that three in five internet users also used social media, up from under one in five in 2007. Market researchers, social researchers and others quickly took notice. The aforementioned UK government report, The Use of Social Media for Research and Analysis: A Feasibility Study, argued that “when compared to traditional surveys, social media data offer considerable advantages in terms of how quickly results are delivered, the scale at which results can be brought in, and (potentially) how cheaply they can be obtained”.

Jake Steadman was one of the many researchers excited about using secondary data from social media and similar sources. “I probably went in just naively assuming it was going to change the world,” he says. After years of working for marketing and research agencies, Steadman went to O2 as its first head of real-time research. (He has since held roles at Twitter and Deliveroo.)

“No-one really knew what it was,” he says of his new role. “But it meant they [O2 ] recognised this new and emerging insight source, which was social data, but they didn’t really know how to access it or how to use it. And nor did I, I’d never done it before. But we both decided to take a bit of a leap of faith.”

Steadman is frank about his early enthusiasm for social media data. “I think I arrogantly assumed it was going to replace everything,” he says. Nor was he alone in thinking that. Consultant Ray Poynter says: “There was probably a lot of optimism around how much we could do with social media data. One of the mantras that people talked about was, ‘Why talk to some of us when you can listen to all of us?’ There is a lot of sense in that.”

As Poynter explains, social media presented organisations with the means to listen to “real customers talking about their real experiences, on topics we had not thought to ask about or we had not prioritised”.

“It’s fantastic at answering questions you didn’t ask,” he says.

Indeed, there are some questions you would never, or could never, put in a survey. For example, it’s easy to find out what sort of products people currently like just by asking them. But, if you want to create a new product, the ultimate goal should be to figure out what people might like in future.

“Take, for example, matcha tea,” says Steve King, chief executive of data analytics company Black Swan. Matcha is the green powder used in the Japanese tea ceremony, prepared by growing green tea in very particular conditions and then grinding it finely.

“It really has been around for ages,” says King. But then it began to be drunk in different forms and in different places.

Black Swan’s data-gathering is designed to pick up on the weak signals associated with big data techniques, like a few visitors to San Francisco trying a new drink in a hip café and then raving about it online.

“People start making [matcha tea] and selling it in a cool cafe,” says King, “then the little cool brands start running quite small production lines – higher costs, but a bit more agile. So, you’ll see it in the cool organic shops. From there, it begins to be a slightly larger trend, and then your bigger companies pick it up – and the joy of data is that you saw that whole thing.”

Social media data is good for identifying long-term trends, and innovations that straddle contexts. However, says Poynter, there are still questions that are better answered by researchers doing the asking. Hypotheticals, adverts or products that don’t exist yet can’t be tested by passive observation of electronic word of mouth.

Of course, no matter how many social media users rave about a product like matcha tea, it is unlikely that these views are representative of the broader population.

As Steadman ultimately discovered, after the initial giddy wave of social media enthusiasm had passed: “O2, like every brand, did segmentations, and brand measurement, and customer experience measurement and all those kinds of metrics. Social data sits alongside those. It doesn’t replace them; it augments and adds cultural context. But you still need to have statistically robust measures in place.”

Even Black Swan, whose focus is social media data, combines the latter with other forms of secondary data, and surveys, to get a multi-dimensional picture. But for King, social media data has advantages over surveys, because you don’t influence responses by the way you ask the questions. “By going bottom-up rather than top-down, you’ve got more granular data that allows you to build models, and then build algorithms and prediction,” he says.

This mix of big-picture and granular detail also appeals to statistician Simon Raper, founder of Coppelia Machine Learning and Analytics. He uses the example of viewer recommendations for a streaming service. “If I want to understand what the average customer is doing, or something broadly about viewing patterns, I can just take a sample and I don’t have to go crazy on the big data stuff.” But, he says, “the huge amount of data is going to make it possible for me to answer questions about some niche viewing, like Japanese horror.”

Raper likens it to a pixelated image. “If you’ve got a picture of a crowd, a low-resolution picture doesn’t really matter if you just want to make out the shape of the crowd, or groups within it. But if you want to zoom in on someone’s face, then you need a really high-definition image.” It’s almost “paradoxical”, he says, that one of the things we can do with enormous amounts of data and processing power is to focus on the smaller details.

The privacy question

The capacity to zoom in on the individual brings us back to one of the most powerful, and disquieting, aspects of social media – especially when it is being used by the same platform to collect data and to target advertising. Though the predictive strength of data such as Facebook ‘likes’ has been overstated, we undoubtedly share more unsolicited information about ourselves, in digital form, than any humans before us.

Add to that the capacity of analytical algorithms to combine datasets, to find statistical relationships and to infer interests and demographic details from other data, and market segmentation turns into microsegmentation.

In November 2012, The Guardian reported how Barack Obama’s re-election team “used cookies to service targeted digital adverts to voters’ computers, honing the message according to the individual’s age, gender, occupation, interests and voting history”.

Yet microtargeting came to be viewed somewhat less favourably after the successful Brexit campaign in the UK and Donald Trump’s election as US president in 2016.

Cambridge Analytica, which claimed to be able to combine data profiling with psychological profiling techniques, became the focus for a new awareness of how political campaigning uses social media data. Although the Information Commissioner’s Office (ICO) concluded that Cambridge Analytica was not involved in the UK referendum on leaving the EU, Facebook was fined £500,000 by the regulator for failing to protect its users’ data from misuse.

While the majority of internet users continue to use social media and other apps that track and profile them, half of US adults report having avoided a product or service because of privacy concerns, according to April 2020 Pew Research Center research, and in the US and UK, tech companies have been called to account by elected politicians.

One response to increased customer concern and regulation has been for social media companies to monopolise the data they collect. “Facebook in particular has become very, very reluctant to sell information,” says Poynter. And to the surprise of some, the tech giants have also been willing to change their data-gathering habits, even to go beyond what the law demands in some cases. Google, for instance, has proposed to block third-party cookies – the type used to track internet activity and target ads – from its Chrome internet browser from 2023 (a number of rival browsers already do so).

“I think everybody was surprised, including the people working at Google, that Google basically did more than it needed to do for GDPR [the General Data Protection Regulation],” says researcher Simeon Duckworth. “Everybody imagines, if you take away cookies, the internet will fall apart. Now Google has actually gone and said that it will do that.”

Duckworth is sceptical that data-based profiling and targeting is always effective, especially for large advertisers who want to reach a mass audience: the extra expense of tools like microtargeting and A/B testing to refine a marketing message may not be justified by extra effectiveness. Small and niche advertisers are most likely to see results from narrowing down their target audience, but they’re also less likely to be able to afford it.

For this reason, Duckworth does not think that tougher regulation will destroy the business model of digital advertising. “It’s not going to kill the internet, but it is going to redistribute it towards bigger platforms, bigger publishers away from the smaller ones,” he says. Large companies like Google and Facebook will have the scale to work within new laws using new techniques. Smaller companies will lose access to data and the benefits of targeting a niche audience.

So, what is the future of the data economy? A fairer, more transparent, less manipulative internet environment? Or simply one in which a handful of big tech companies run the show, and smaller players go under or sell out? The final article in this series will seek to answer these and other questions.

You can download the complete four-part ‘History of the data economy’ here.

0 Comments