The right safeguards can protect public opinion polls from AI pollution

A recent study warned that sophisticated AI agents can now imitate humans convincingly in online surveys – replicating mouse movements, keystrokes and other behavioural signals while producing responses purported to be indistinguishable from those of real participants. It’s a real concern.
But as someone who researches online methodology and data quality for a living, I believe the threat has been blown significantly out of proportion. Well-designed surveys, run on platforms with robust quality protections, can effectively mitigate this risk, if the risk even truly exists in the first place. The supposed apocalypse of AI agents and the end of online survey research has been massively overstated.
What the study actually shows
The study in question, published in PNAS by Dartmouth political scientist Sean J. Westwood, makes a valid contribution: it demonstrates what is possible with modern AI agents in closed environments. I'll also concede that poorly designed surveys, run on platforms lacking adequate protection, might indeed be vulnerable to fake responses.
But the paper demonstrates technical feasibility, not large-scale reality. Ask yourself: have you seen any published, peer-reviewed research documenting large-scale infiltration of AI agents in real surveys? No such research exists.
It’s also worth clarifying that there are two distinct threats researchers tend to conflate. The first is a human participant using an AI to help answer questions, most notably open text response questions that are easy to copy over to an LLM and paste its response back in the survey. The second is a fully autonomous bot posing as a human and taking the entire survey. The first is real, relatively common, and fairly straightforward to police. The second – the scenario driving the current panic – is largely imagined.
Westwood’s paper focuses on the second threat, but the agent he describes was purpose-built for the survey by a team of academics and the author has not shared the full code for independent testing. More importantly, the agent still provided tells that might signal automation through an LLM, such as high levels of general knowledge (nearly 75% accuracy on US state capitals, far beyond typical human performance) and suspicious uniformity in refusing certain tasks.
In a real-world scenario, you wouldn't have the luxury of adjusting the prompt to avoid such tells, and without it, the agent’s ability to modulate its responses convincingly becomes considerably harder to guarantee. That makes it a far cry from swarms of bots autonomously flooding online panels at scale.
Passing a survey is not the same as beating a platform
Demonstrating that an agent can complete a single survey convincingly is not the same as demonstrating it can operate undetected within an actual platform ecosystem.
First-party survey platforms – closed, managed environments that recruit, verify and maintain their own pools of human participants – operate at an entirely different level of scrutiny. These are the platforms used by academic researchers, companies gathering consumer or expert insight and political pollsters. Prolific is one, CloudResearch Connect is another.
On these platforms, participants have persistent identities, participation histories built up over months or years, and verified device and account records. Population-level monitoring tracks anomalies not just within a single response, but across an entire pattern of participation. An agent that looks human on one survey would still need to build a credible account from scratch and avoid triggering signals specifically designed to detect coordinated synthetic activity. Most platforms would find this trivial to catch.
The economics don't add up
Even setting aside the technical barriers, the economics simply don't make sense. Survey fraud only pays at scale, and well-run platforms make that extremely difficult by limiting the number of accounts per respondent to one, and controlling the flow of opportunities so that participants aren’t flooded with endless tasks.
Survey participation is not especially lucrative to begin with, and detection means permanent exclusion, forfeiture of earnings and blacklisting across platforms. The better a platform’s defences, the worse that risk-reward calculation becomes. The existence of a capable tool does not guarantee its widespread adoption when users have little to gain and a lot to lose.
Detection is a solvable problem
The Westwood study tests its agent against quality checks most researchers know well, and that have been effective in screening out low-quality humans, such as attention checks, open-text screening and response consistency. These are useful, but they are the bare minimum of fraud detection, not the cutting edge.
Sophisticated platform-level checks – such as device fingerprinting, browser environment, how a user physically interacts with their device, network-level signals – are far harder to fool, and are close to 100% effective at distinguishing automated activity from genuine human participation. The Westwood study demonstrates that agents can pass the easy tests. It does not demonstrate they can pass the ones that actually matter.
We put this to the test ourselves. We built a survey on Qualtrics designed to mimic a standard research study, incorporating questions from established psychology tasks, personality scales and custom items. Detection methods included: attention checks; consistency checks; reverse shibboleth items (questions that are easy for an LLM to answer, but most humans would not know the answer to); cognitive traps; mouse tracking; Qualtrics reCaptcha; and Prolific’s bot authenticity check.
We ran 125 high-approval human participants through the study, alongside five AI agents – GPT, Claude, Perplexity, Gemini and an in-house agent built by our engineers – each completing the survey 25 times.
The bot authenticity check distinguished between humans and agents with 100% accuracy. CloudResearch independently reported similar results: 100% detection of AI agents, with genuine human respondents identified at greater than 99% accuracy. To Professor Westwood, we extend an open invitation: run your agent against our checks. If it truly is indistinguishable from humans, that will prove it.
Human-like, compared to what?
Perhaps the most fundamental oversight in the Westwood paper is that it claims to demonstrate a human-like agent, but never actually compares the agent’s responses to those of real human participants. There is no human benchmark. The agent is evaluated against subjective quality checks, not against the messy, inconsistent, sometimes contradictory ways that real people actually complete surveys.
Without a direct comparison, the claim of human-likeness is self-referential: the agent looks human-like by the authors’ own definition of what human-like should look like. This is a significant methodological gap that has gone almost entirely unremarked on in the commentary surrounding the paper.
Vigilance, not panic
None of this should be read as dismissing Westwood’s work. It is a legitimate warning shot the research industry would be foolish to ignore. Agents will get more capable, cheaper to build, and harder to detect. The right response is investment in stronger platform-level defenses, collaborative sharing of detection methods across the industry, and ongoing stress-testing against the best adversarial tools available.
What is not warranted is the conclusion that online survey research is broken or dying. That narrative is being driven far more by headlines than by anything the paper demonstrates.
How many people declaring the end of online research have read the study in full? The nuance matters. We owe it to our field, and to the decisions that depend on good data, to engage with the evidence carefully, rather than react to the loudest interpretation of it.
Andrew Gordon is a staff researcher at Prolific
We hope you enjoyed this article.
Research Live is published by MRS.
The Market Research Society (MRS) exists to promote and protect the research sector, showcasing how research delivers impact for businesses and government.
Members of MRS enjoy many benefits including tailoured policy guidance, discounts on training and conferences, and access to member-only content.
For example, there's an archive of winning case studies from over a decade of MRS Awards.
Find out more about the benefits of joining MRS here.









0 Comments