The American Association for Public Opinion Research put some big methodological issues under the microscope at this year’s annual conference in Boston. Researchscape’s Jeffrey Henning reviews the highlights.
On trust, truth and transparency
Where most research conferences giddily chase hype, the annual conference of AAPOR doggedly pursues truth. The American Association for Public Opinion Research is focused on methodological rigour, research on research and relentless improvement. This year’s conference ended Sunday in Boston. Specific highlights for me were research transparency, the role of journalism, non-probability samples, weighting, and reaction to Google Consumer Surveys.
As I wrote in these pages in 2010, AAPOR is committed to increasing the transparency of public opinion research, and the Transparency Initiative committee reported its progress. While individuals who are AAPOR members have committed to disclosure, organisations themselves are not AAPOR members and have not made any commitment: as a result, many organisations with AAPOR members do not disclose everything required. The Transparency Initiative will make this an organisational responsibility. Highlighting the need for greater transparency, Natalie Jackson presented a Marist Institute study showing that just 9% of pollsters reporting 2012 Presidential election results disclosed all of AAPOR’s survey disclosure items.
Transparency is important to help journalists better understand the quality of the surveys they are reporting on – and how seriously to take the results. In a panel on the role of blogs in public opinion research dissemination, I discussed the need for better education of journalists about the basics of polling and how few journalists ask key questions about polls. An exception is Marjorie Connelly of The New York Times, who shared how her team vets surveys for Times reporters and bloggers.
One of the key questions that reporters are supposed to ask is whether and how survey results are weighted. Stas Kolenikov of Abt SRBI and Trent Buskirk of The Nielsen Company discussed the need to do at least post-stratification weighting on demographic variables for non-probability surveys. Julia Clark and Neal El-Dash of Ipsos Public Affairs looked at the performance of different calibration models for a non-probability survey. While they had used a Bayesian estimate last year in the run up to the 2012 Presidential election, with the actual electoral results now in hand they examined 8 possible calibration models: unweighted, 6 weighting schemes, and their Bayesian estimate. The only weighting method that performed worse than no weights at all was weighting by demographic variables within states.
An AAPOR task force on non-probability sampling, however, stated in their report, “It is not clear whether weighting adjustments or related procedures can allow researchers to make accurate population estimates based on non-probability samples, such as opt-in web panels.” The report [PDF] appears well worth reading in its entirety (I confess to being only on page 44 so far). At a presentation of the report summary, co-chaired by J. Michael Brick and Reg Baker, audience opinions were heated, with some feeling the industry was missing out by not taking non-probability sampling more seriously while others felt such sampling had nothing serious to offer. It’s clear that some attendees would only trust the results of a focus group if it were probability based!
Fishing for respondents
While the task force’s report only briefly touched on river sampling, it was discussed in multiple sessions, including recruiting from the river of Facebook, Craigslist and Google visitors. The most discussed method of river sampling was Google Consumer Surveys (GCS). Scott Keeter of Pew Research Center took pains to emphasise that Pew remains committed to rigorous, probability-based sampling for all major work but wants to look at non-probability sampling for particular purposes; Pew plans to continue to use GCS for quick reaction polls, for testing questions and for fielding open-ended questions to inform development of closed-end questions. In poster sessions, NORC and RTI both described tests of Google Consumer Surveys: NORC found GCS estimates of cell phone usage by demographics to be inaccurate, but RTI and Nielsen found GCS estimates to have an average absolute error of 5.6 across 11 items, outperforming a probability-based panel (6.4) and an opt-in-panel (12.1).
Jon Krosnick of Stanford University pulled many of the conference themes together in his presentation. He began with this quote from the task force’s report: “Sampling methods used with opt-in panels have evolved significantly over time… Research evaluations of older methods of non-probability sampling from panels may have little relevance to the current methods being used.” Krosnick countered that in fact these issues have been studied for many years and there is little to indicate things are different today. In fact, last year he fielded his largest assessment of accuracy yet, comparing 27 benchmark questions across 9 studies as well as 17 benchmarks using “a river sample”. The results were similar to prior research: contrary to concerns about the degradation of probability sampling, the RDD sample and the Internet probability sample were the most accurate methods, within 2 percentage points once weighted. The 7 opt-in panels were less accurate, with absolute error ranging from 5 to 9 percentage points and with weighting not systematically improving results.
River sampling was by far the least accurate; while Krosnick didn’t name the provider, which provided complimentary panel to him, he noted that it was used on news sites to allow visitors to continue to read articles and that it capped question length (nudge, nudge, know who I mean?). The length limit was why only 17 benchmarks were used. Krosnick also pointed out how easy it would be to cherry pick the items that would make the river sample’s results look fabulous.
To weight or not to weight? To use Google Consumer Surveys for point estimates or not? Those were just some of the topics debated. Since the conference ran 8 concurrent sessions at each of 11 different time periods, there are over 214 million different permutations of sessions that any one of the 1,100 attendees could have experienced. All the more reason not to take my impressions alone but to check out recaps from Mark Blumenthal, Reg Baker, Annie Pettit and Margaret Roller. And take it from this sample of one: AAPOR 2014 is worth adding to your calendar.
Jeffrey Henning is president of Researchscape International