FEATURE3 June 2024

Is this the real life? The rise of synthetic data

x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.

AI Features Impact Technology Trends

With the artificial intelligence revolution gathering pace, synthetic data is the next new kid on the block for the research sector. Liam Kay-McClean reports

graphic of a swirling multicoloured abyss

How time flies. It has barely been a year and a half since generative artificial intelligence (AI) chatbots, powered by large language models (LLMs), became mainstream, but their impact on the world and the research industry is starting to look profound. And there is more to come, with synthetic data predicted by some to be the next stage of the generative AI revolution.

Synthetic data is, broadly, information that is artificially generated, typically by algorithms, rather than produced by real-world events. Synthetic data can be used, in part, to validate mathematical models or to train machine-learning models and LLMs. In a research context, part of its use could eventually stretch to creating synthetic personas to replace or augment human research respondents.

The Industrial Revolution of the late 18th and early 19th centuries showed the appeal of replacing low-yield and time-consuming hand-production techniques with factory and machine-made goods, and some argue that synthetic data could help to stimulate a similar revolution in data production and processing. The benefits could be numerous, reducing the time and cost of insight generation and freeing up practitioners for more strategic thinking and in-depth analysis. There is even an argument that it could help address the data-quality crisis facing the industry.

“It ought to help the average market researcher to be more strategic, as it is providing quick access to a useful data source, so less time is spent on process and more time can be spent on building hypotheses, interrogating data and developing insight,” says Phil Sutcliffe, managing partner at Nexxt Intelligence. “I would hope that, if synthetic data becomes widely used as a cheaper data source, it frees up cash so the primary research that does happen can be more effective.”

Ansie Collier, global director of innovation at MMR Research, says the early-stage product-innovation process could also benefit from synthetic data. “There is often a need to explore different options and prototypes, but the budget to produce or test all of that with consumers is a big barrier,” she explains.

“If you are in the research and development space, the potential of using synthetic data is not to make the final decisions, but to narrow down your options so that, when you do engage with consumers, you get closer to the ‘why’ and a deeper understanding about what drives decisions. You are focusing that investment and optimising spend where it matters most.”

Businesses are already taking a look, says Michael Hess, co-founder and chief executive at Emporia Research. “I don’t imagine a world in three to five years where we are not relying on synthetic data in some part of the research process. We are already seeing large brands testing the waters.”
There is considerable caution, however. Synthetic data is largely untested and in early-stage development, and it is not yet likely to reach the levels of accuracy needed to act as an authentic replacement for human responses, or to build an effective simulacrum of a human.

Andrew Cooper, founder and chief executive at Verve, says comparisons between LLM-based survey responses and those from humans – with the model’s predictions compared with a later survey

of a sample of research participants – showed promise, but indicated that the technology is not yet at the stage where it is providing authentically human responses.

However, he feels that synthetic data has more potential to reduce burdens on respondents, rather than acting as a replacement. This could include replacing complicated, long or repetitive research tasks with AI-led research, using synthetic data. “If we can create really strong simulations of human beings, we could use AI to do onerous, dull or practical insight,” Cooper says. “If you spend £100m on an aeroplane, you don’t need to put trainee pilots in it and hope they don’t crash it. Rather, have a simulation that helps them learn a lot and, when they eventually get in the aircraft, it all goes well.

“Why don’t we simulate consumers in a robust way, so we can ask it the dull, dumb and learning questions, and then save the real people for the good stuff?” This could, in turn, he argues, make research more interesting for respondents and, ultimately, lead to better results.

The quality of the data fed into synthetic models needs to be robust – the ‘garbage in, garbage out’ rule of IT. This means being aware that synthetic respondents could end up perpetuating biases in existing datasets.

There is also the related issue of how to keep synthetic data up to date with real-world changes – the models are, in essence, ‘backward looking’ rather than predicting future changes in society and among consumers. For example, could a persona built on a 10-year-old synthetic database lose its ability to aid research looking at emerging consumer trends?

Will synthetic data render the market research industry obsolete in future? Cooper does not think that will be the case. “I see AI as a high-performance sports car – it has the ability to get us from A to B really quickly, but, unless you’re a skilled driver and know how to use it well, it is very easy to spin out on the first corner and fall off a very deep cliff.

“AI can augment the quality of the research you get or diminish it. Human intelligence and cultural intelligence is very important to maximise what the AI can bring.”

According to Steve Phillips, chief executive at Zappi, it is unlikely that buyers would want to replace spend on human-led research with synthetic data to inform major business decisions.

“I think that most clients, in most situations, would not trust a purely synthetic data answer,” he says. “Not because it is factually inaccurate, but because they believe they have created something that synthetic data couldn’t answer, as they have created something new. So, I don’t think it will take over the industry.

“But, like AI, it will change it. You need to think about how you adapt and survive, and how you take the opportunities. If you try to ignore it, like AI it will have negative consequences.”

0 Comments