OPINION16 September 2011

Confirmation bias – or, Why I love this paper by Boyd & Crawford

Danah Boyd (pictured), “one of the most influential women in technology”, warns researchers: just because social media data is accessible doesn’t make it ethical to use.

Call it confirmation bias – I’m sure it has a part to play – but I recommend you read this new paper by Microsoft Research’s Danah Boyd and Kate Crawford of the University of New South Wales in which they pose questions to challenge the all-pervading positivity about Big Data and its supposed benefits to society and business.

The term ‘Big Data’ has been bandied about a lot in the course of the research industry’s recent debate about social media data, privacy and ethics – and Boyd and Crawford have plenty to say that will be of interest to researchers wrestling with these topics.

I was particularly drawn to the questions they ask about the ethics of Big Data and the way researchers are seeking to use ‘public’ data from social media sites.

Market research industry bodies have been criticised in certain quarters for publishing social media research guidelines that stick to established ethics about the importance of gaining the informed consent of research participants and protecting their anonymity. They’ve been called antiquated and out of touch – but that’s a charge that’s impossible to level at Boyd, who was named one of the most influential women in technology by Fast Company magazine in 2009.

Yet the points she and Crawford raise in their paper are broadly similar to the concerns research industry bodies seek to address.

They write:

“With Big Data emerging as a research field, little is understood about the ethical implications of the research being done. Should someone be included as part of a large aggregate of data? What if someone’s ‘public’ blog post is taken out of context and analysed in a way that the author never imagined? What does it mean for someone to be spotlighted or to be analysed without knowing it? Who is responsible for making certain that individuals and communities are not hurt by the research process? What does consent look like?”

Boyd and Crawford even address – albeit indirectly – the major bugbear to come out of the industry’s own big privacy debate: the suggestion, in a Market Research Society discussion paper, that consent should be sought for every individual whose social media data is used as part of a research project.

“It may be unreasonable to ask researchers to obtain consent from every person who posts a tweet,” write Boyd and Crawford, “but it is unethical for researchers to justify their actions as ethical simply because the data is accessible. Just because content is publicly accessible doesn’t mean that it was meant to be consumed by just anyone… There is a considerable difference between being in public and being public, which is rarely acknowledged by Big Data researchers.”

I make similar points in my own comment piece, and in the discussion thread that followed it – hence the confirmation bias I mentioned at the outset of this blog – but Boyd and Crawford set out the issues better than I ever could. The paper is due to be presented next week at an Oxford Internet Institute symposium. Thanks to Tom Ewing for sharing the link via Twitter.