Great primary data a prerequisite for synthetic data
Speaking at last week’s webinar on synthetic data’s potential to reshape market research, Debrah Harding, managing director at the Market Research Society, said that synthetic data is sometimes seen as a way of “getting away from other problems, such as data quality” but argued “it is much more complicated than that and much more nuanced”.
The webinar followed on from a recent Strat7 white paper evaluating the performance and reliability of synthetic data in market research, finding that the technology struggles when subjected to complex analysis.
The paper raised concerns about some effects of using synthetic data, such as boosted responses lacking the logical consistency of real people, and recommended that the use of synthetic data should be limited to no more than 5% of the overall sample – and even then, only be used to boost underrepresented demographic groups.
Harding told the GRBN webinar: “In order to be able to do synthetic data, you need brilliant primary data. And to have great primary data, you still need to address all of these issues like data quality.
“The interconnectedness of the conversation of all of these issues needs to be clearly articulated.”
Harding added that more work was needed to make sure that different methods of using synthetic data in research were properly defined so that clients could better distinguish between its different uses.
“There is an important risk that we need to be managing that is not to get to a situation where all of this data has all been combined in one pool – primary data and synthetic data, and you can’t tell the difference,” Harding said.
“There needs to be clear, blue water between the synthetic and the primary data so we can continually be sure that whatever is being shared with clients, they know exactly what they are getting.”
Hasdeep Sethi, group AI product lead and data science director at Strat7, said that while “synthetic data does have uses”, “it can’t be seen as a replacement for real human data, it has to work alongside it”.
He added: “There’s no getting away from the fact that synthetic data is ultimately based on real data in some format.
“Work with it in some way, but you can’t just blindly trust in either direction that it is a great tool or a bad tool – there is a lot of nuance there. Use it on a case-by-case basis.”
Sethi said that Strat7’s research understood the way that synthetic data worked as “effectively trying to impute the most likely response”, resulting in fewer outliers in the data. “The nuanced insights you get from running full-scale research might get lost if a synthetic sample is big enough.”
The industry needed definitions and nomenclature around the use of synthetic data, he added, to make sure that “people have a common understanding on what they are getting”.
Sethi concluded: “This technology is not going away – it can’t be dismissed forever. There will always be good applications for it, but there will be studies where you need that precision, therefore the answer is don’t have any synthetic, and there are going to be cases where you just want some direction on where to go next, and having a big set of error bands is fine.”
Melanie Courtright, chief executive at the Insights Association, said that there was a need to expand testing of synthetic data tools, and warned that data privacy rules still applied as “there’s a lot of people who don’t want their answers, data and information to be training datasets”.
Courtright added: “Synthetic data still requires really great primary data at a great quality. We still need to solve for that, and it requires even more on the data and compliance world, and data security becomes an even bigger thing.
“Imagine someone breaching one data set versus someone breaching a training data set – those are very different things and very different outcomes. We don’t want our profession to have one of those breaches and suddenly have a public black eye.”
Yogesh Chavda, founder at Y2S Consulting, said that synthetic data had been around for years, and marketing and market research needed to think carefully about how to use it properly.
“Unfortunately, in the market research world, the conversation has been about replacement of real data as opposed to augmentation with real data to be able to simulate and generate scenario planning exercises,” he explained.
“Right now, we have a gap in the marketing and market research worlds where we don’t have those simulation tools – for the most part, they don’t exist. I see it as a big opportunity for the market research industry to start building those simulation tools.”
Chavda added: “We should be optimistic and be willing to verify what we’re doing. What I don’t want people to take away from the conversation is ‘synthetic data is bad, don’t do it’. I think would be the wrong message to send the entire industry.”

We hope you enjoyed this article.
Research Live is published by MRS.
The Market Research Society (MRS) exists to promote and protect the research sector, showcasing how research delivers impact for businesses and government.
Members of MRS enjoy many benefits including tailoured policy guidance, discounts on training and conferences, and access to member-only content.
For example, there's an archive of winning case studies from over a decade of MRS Awards.
Find out more about the benefits of joining MRS here.
0 Comments