FEATURE26 January 2022

The history of the data economy: The future of data

x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.

Data analytics Features Impact Innovations Technology

Data is now the fuel that drives business – identifying potential markets, shaping new products and targeting consumers. Impact has partnered with Significance, the magazine of the Royal Statistical Society, to jointly publish a series exploring the past, present and future of the data economy. In this fourth and final part, Timandra Harkness considers what the coming years have in store for the data industries.

The-future-of-data-crop

Do you want to feel special? Go to coveryourtracks.eff.org and click the ‘Test Your Browser’ button. That’s how I found out that my web browser fingerprint is unique among the 220,694 that the Electronic Frontier Foundation tested in the previous 45 days.

This was a surprise. It means that even if I refuse tracking cookies – which I do – advertisers can still follow me around different websites, using a combination of innocuous details such as my browser version, screen size, graphics setup and system fonts.

In short, getting rid of third-party cookies, as Google has promised to do from its Chrome browser by late 2023, will not bring the online data economy to a screeching halt. But that doesn’t mean things will carry on as before. Major changes are afoot in the data-driven industries, spurred by privacy concerns, tightening regulation, and technological advances.

Federated learning

In this series, we followed the progress of statistical and computing methods for drawing insights from data: from sampling to constructing an ‘n=all’ whole population model; from Victorian techniques of regression and dimension reduction to machine-learning models, the detailed workings of which are mysterious even to those who programme them.

The next challenge for those using data to understand people will be to preserve people’s privacy and autonomy while drawing conclusions – if not about them personally, about relevant populations. Take Google’s cookie announcement. You might, cynically, suggest that it’s merely a way for Google to monopolise the ability to target adverts to you. But some of Google’s ‘Privacy Sandbox’ proposals have the potential to radically change how researchers and marketers work.

Fledge, developed by Google, is one potential solution to the problem of, say, being repeatedly targeted by car ads after you’ve researched a car online. It lets the user’s own browser automatically tag a topic of interest for a specified length of time. No central data store will flag the browser or shopper. It’s like wearing a lanyard at an event that says “talk to me about cars” but which you can take off at the end of the day, instead of being added to somebody’s marketing list forever.

Another of Google’s ideas is FLoC, or federated learning of cohorts. Like Fledge, FLoC moves away from the idea of allowing ad tech companies to amass browsing data on individual web users in a centralised pool. Instead, it creates many ‘cohorts’ of web browsers, grouped by patterns of activity. The web user’s own browser then calculates which of these cohorts corresponds most closely to its recent browsing history. That browser-selected cohort is used to target relevant ads to the browser, not the person, who can remain anonymous throughout. The system preserves practical anonymity by letting each user hide in a cohort of thousands of individuals.

The ‘federated learning’ in FLoC refers to a way of training machine-learning programs without amassing a large quantity of centralised data. The program is given access to many smaller databases, each of which trains the model locally, with only the results centralised and aggregated.

Such systems have a particular appeal for companies such as Google and Apple, who make not only apps but ecosystems. These companies really aren’t that interested in collecting data on us, as individuals, argues digital rights researcher Michael Veale: “What they want is the ability to do calculations over your data.” He gives the example of voice assistants, such as Apple’s Siri. This app learns to recognise a user’s voice to improve an iPhone user’s experience. By teaching Siri to recognise their voice, however, the user might have helped train the average Siri model – the one that comes pre-installed on every iPhone – to do a better job of understanding a specific accent.

The improvements you help make as an iPhone user “don’t reveal anything about you”, says Veale. “They’re just improvements to that average model that came to you.” And it is the improvements, not the voice data, that are sent to Apple’s central algorithm, where “they can be aggregated up and synthesised into a societal improvement, which then gets downloaded again to everybody’s device, and vice versa. That’s federated learning,” says Veale. “And that’s private insofar as you’re not sending the data, you’re just sending the way that the model learned to get a bit better from your data.”

You can see how this shift in approach might benefit corporations such as Google and Apple, which make, if not always the physical device, certainly the operating system on which a device runs. The user controls their own data, but Google and Apple will be the gatekeepers to it, from which all insights can be drawn.

Personal data stores

This is not the only possible model of the future. How about one in which your data sits not on a phone or computer made by a Silicon Valley giant, but in a small wooden box on a shelf in your house? That was the vision of a delightfully quirky project called BBC Box.

In 2019, a research and development team within the BBC created a hexagonal box – with a whiff of Dr Who’s Tardis – containing a Raspberry Pi computer that ran a personal data-management system named Databox. The idea was that personal data from a range of different digital services would be stored within the BBC Box, and it would be up to the user to decide which other apps and computer programs could access and process that data. For example, the BBC developed its own ‘Profiler’ app that would produce an anonymised profile of the box’s owner. That profile – but not the data – could be exported by the user to a system to produce recommendations of TV shows the user might like.

“Starting from the premise that we’re the BBC, and we have a duty of care – not just to our contributors, but to our audience as well – preserving people’s privacy is part of that duty of care,” says Bill Thompson, principal research engineer at BBC R&D. “We are examining models for developing audience insights that don’t require us to know anything about you, but that let you tell us enough about yourself… [to build a model giving] a more granular and useful understanding of our audience than we would get by knowing about you particularly.”

Obviously, it’s not necessary to have a physical container in which to store your data. A virtual container would work just as well. The internet of the future could be a honeycomb of data cells, each one containing an individual’s personal data.

The creator of the world wide web, Tim Berners-Lee, is looking at exactly that model. Concerned that the internet has become a machine for monetised surveillance, rather than an ecosystem of cooperative sharing, he has been working on a new vision of the web, called Solid – with the name derived from the phrase ‘social linked data’. Solid started at the Massachusetts Institute of Technology and now has its own start-up – Inrupt – to take it closer to fruition.

Meanwhile, the same BBC R&D team that built BBC Box is working on an experimental pod-based personal data store (PDS) approach to recommendations, called My PDS. Like the BBC Box, the idea is that each ‘pod’ pulls together data from different sources – BBC iPlayer, Spotify and Netflix – to create a media profile to which the user can, if they want, grant access to other BBC apps, such as the BBC Sounds app.

All of these projects are experimental prototypes. In Europe, such ideas have been given a leg-up thanks to ‘the right to data portability’ enshrined in the General Data Protection Regulation (GDPR). This right is described by the UK Information Commissioner’s Office as allowing “individuals to obtain and reuse their personal data for their own purposes across different services”.

Many prototype PDS designs have been built to facilitate this sort of sharing and reuse of data. Some include a dashboard for terms and conditions, so users can be alerted if these change after data has been shared. Others include a token that travels with the data, like a watermark in a digital photograph, specifying what permission has been agreed for its use.

One app, CitizenMe, lets individuals collect data about themselves in a PDS and offer it to places where it could be useful. “The first place is market consumer insights, obviously,” says chief executive StJohn Deakins, “because if you’ve got a large cohort of people with lots and lots of deep multivariate personal data, you can drive a huge amount of insight off the top of that.”

CitizenMe users might receive offers to share data and answer questions for cash, and they can donate data for good causes, or participate in studies that give them information about themselves. Deakins says he learned that “people don’t really care about the data, but they care about the stories that data tells – especially about themselves, or people they’re close to.”

Liz Brandt, chief executive at Ctrl-Shift, sees many opportunities arising from GDPR’s right to data portability. For individuals, greater ownership and control of their personal data could allow them to demand a share of the benefits from its use. For businesses and researchers, greater user control might mean that data quality improves, and that they are no longer getting messy, out-of-date or deliberately misleading information from unwilling subjects.

Based on a Ctrl+Shift report, Brandt thinks the UK economy “can gain £27.8bn in productivity and efficiency through data portability”, but just as important, she thinks, are “the new innovative things you can do with it”. Realising this potential, though, will require a change to the existing data economy – not just in the UK, but internationally.

This is one thing on which everyone seems to agree: the need for a new system of regulation, of interoperability for apps and programs, and of shared infrastructure to make all the parts work together.

Where to now?

It is tempting to believe that, in the near future, each of us will have our data tucked away in an individual account, over which we have complete control. Certainly, regulation, consumer inclination and technology are converging towards an expectation of greater privacy and control for the person concerned. If this future does come to pass, the challenge for regulators will be to turn their attention from data to infrastructure. If just a few companies control the systems within which our personal data stores operate, they will arguably have as much power as today’s data giants, albeit with less liability for when things go wrong.

It is, however, misleading to think of all data about an individual as ‘belonging to’ that individual. If I use my ‘smart’ railcard to move around a city, every tap in and out of a station generates data for the transport company, and they’re not going to stop collecting that data centrally, because they need it to operate their systems, predict demand, and perhaps to understand more about who they are not serving. Nor is my bank going to stop keeping a record of all my transactions.

What is likely to happen is a growing separation between ‘personal data’ and ‘aggregated data about people’. Regulators are already pushing that division by increasing the risks to those who collect, store and use personally identifiable information (PII), with hefty fines for misuse or leaks. Reputational risks, too, mean large tech companies have a vested interest in not collecting PII if they can avoid it.

Indeed, Veale believes that, over time, “data is going to become less relevant”. For organisations of all stripes, the value of data has always been the information about relationships that it captures: between people, between people and companies, and even between people and their devices. Understanding, targeting and influencing people is the end goal, not amassing vast piles of ones and zeroes. When extracting value from data can be done at the point where the data is generated – via on-device processing, for example – why should companies need to accumulate data themselves? They can turn their attention instead to “convincing people to integrate more and more of this stuff” – things such as smart watches, digital assistants, and smart refrigerators – “into their homes, lives, bodies”, says Veale. “Then these companies are just intermediaries.”

“They try to get in the middle of stuff,” Veale explains, “whether it’s your messages, your payments, your social connections and friends, between you and your music, you and your cooker; because this cumulative power can give them the options to shape your environment, shape your behaviour, shape your interaction to extract money from you.”

If this all sounds like the future of the ‘data economy’ is already here, well, maybe it is – parts of it, at least – but you can bet there’s more to come.

You can download the complete four-part ‘History of the data economy’ here.

Timandra Harkness is a presenter, writer and comedian, and author of Big data: does size matter?

0 Comments