FEATURE30 March 2012

Behind the scenes at Google Consumer Surveys


Brian Tarran picks through the details of Google’s ‘surveywall’, how it works – and why it may be too much for some consumers to get over.

But largely missed amid the launch brouhaha is a paper Google published to explain how Consumer Surveys work and why they are “more accurate than both probability and non-probability panels”.

Here’s the basics: a consumer wishing to gain access to a website’s premium content must answer one or two questions to pass what Google describes as a “surveywall”. Surveys can be longer than one or two questions, of course, but only one or two questions are presented per content request. If two questions are asked, the first is always a screener requiring a simple yes/no response. So in a 20-question survey one respondent is likely to answer just 5% of the questions, unless he or she runs into a lot of surveywalls.

“Since Google Consumer Surveys only allows one-question or screening two-question surveys, analysis of the relationship between survey questions are difficult and sometimes not even possible”

This has its advantages though. In trials, Google says the average response rate was 16.75% and it compares that to less than 1% for most internet intercept surveys, 7-14% for telephone surveys and 15% for internet panels. For two-question screener surveys, however, the completion rate averages at 9.25%.

Helping reduce the burden on respondents, Google says the surveys won’t explicitly ask for demographic and location information; instead it infers that from IP addresses and the presence of a cookie belonging to its ad network DoubleClick, and other sources. “Income and urban density can be computed by mapping the location to census tracts and using the census data to infer income and urban density,” write the authors of the paper Paul McDonald, Matt Mohebbi and Brett Slatkin. “Gender and age group can be inferred from the types of pages the respondent has previously visited in the Google Display Network” – again using the DoubleClick cookie.

Google uses this inferred demographic and location data to employ stratified sampling, with the target population for internet access obtained from the Current Population Survey’s (CPS) internet use supplement, dated October 2010. Post-stratification weighting is then used to compensate “for sample deficiencies”.

McDonald et al write: “Although Consumer Surveys attempts to build an optimal allocation of respondents to each question over the life of the survey, this is not always possible in practice due to additional constraints such as completing the survey in a timely manner, publisher inventory at the time and competition with other surveys”. The CPS population data used in sampling is also used in weighting.

In testing the accuracy of Consumer Surveys, Google commissioned two survey research firms to run identical, three-question online surveys about media consumption – one using probability-based sampling, the other non-probability – requiring 1,000-2,000 responses for each question from US adults 18 and over.

Results were then matched against three benchmarks – video on demand, digital video recorder and satellite dish usage in American households, as measured by a semi-annual random digit dial telephone survey of 200,000 respondents. Post-stratification weighting was employed, with the non-probability panel data weighted against the latest Census, while the probability sample and Consumer Surveys were weighted against the CPS.

Google says it found Consumer Surveys’ average absolute error to be 3.25%, against 4.7% for the probability sample and 5.87% for the non-probability sample.

The verdict: more accurate, improved response rates and, Google suggests, “respondents that may be more representative than respondents of more traditional internet surveys” thanks to recruitment that takes place via publiher sites, rather than panels.

But the search giant isn’t afraid to confront the limitations of its approach. “Since Google Consumer Surveys only allows one-question or screening two-question surveys, analysis of the relationship between survey questions are difficult and sometimes not even possible,” according to the paper. There’s also the in-built internet user profile bias – younger, more educated with higher incomes – and the limitations of the publisher network, which is large but not all-encompassing.

And then there’s the biggest question of all. How does a person surfing the web react when stumbling on to a website that wants to know which type of credit card he or she uses, or whether they are users of a particular healthcare product? Google attempts to mitigate this by reassuring the user that it is them asking the question, not a spammer or phiser, but does that really help? This is a company often accused of knowing too much about web users, and now it wants to know more? Some might find the ‘surveywall’ too much to get over.

  • For more thoughts on Google’s new service, see Lenny Murphy’s post at the Greenbook Blog and Tom Ewing’s at the Blackbeard Blog. And check out the full Google whitepaper here.


11 years ago

Two question survey can then be extended into as many questions as needed, since responders 'identity' is not so heard to be tracked. Each can just answer own two questions and then data groupped and assorted by all kinds of parameters.

Like Report

11 years ago

Does the paper comment on the accuracy of the demographic analysis? Or just 'overall' accuracy rates of the predicted data? How accurate is 'inferred' demographic analysis based on cookie and IP? (I'm not an expert so I'm genuinely asking!)

Like Report