OPINION12 October 2010

A close scrape

A Wall Street Journal article about web scraping prompts a return to the question of whether social media data is easy pickings for researchers, or a potential minefield of bad PR?

The Wall Street Journal has a story today about Nielsen, and how its Buzzmetrics social media monitoring business was found “scraping” content from a website called PatientsLikeMe, where people suffering from diseases and mental disorders can go to share their experiences and support others like them.

Nielsen had apparently created a user account, which anyone can do, to access the site and begin collecting data. You can read the story in full here, but here’s an interesting excerpt:

“I felt totally violated,” says Bilal Ahmed, a 33-year-old resident of Sydney, Australia, who used PatientsLikeMe to connect with other people suffering from depression… After PatientsLikeMe told users about the break-in, Mr. Ahmed deleted all his posts, plus a list of drugs he uses. “It was very disturbing to know that your information is being sold,” he says. Nielsen says it no longer scrapes sites requiring an individual account for access, unless it has permission.

I picked up on this as it serves to support a question we’ve asked previously on Research (here and here). That is: just because social media data is there to be mined, does that mean researchers should? And should researchers do so without asking, whether the data is in the public domain or not?

The Journal tries to muster some outrage at Nielsen’s actions, but it is hard to chide the company for anything other than a breach of the site’s user agreement – and the fact that it didn’t ask permission to carry out its research.

PatientsLikeMe co-founder Ben Heywood blogged:

“While this was not a security breach, it was a clear violation of our User Agreement (which expressly forbids this type of activity) and, more significantly, a violation of the community’s trust. Your Account Information (e.g. your names and emails) was NOT in danger of being stolen. It is likely that the forum information that was “scraped” would be sold as part of that company’s internet monitoring product.”

But here’s the kicker:

“In fact, we sell a similar service, PatientsLikeMeListenTM, to our clients so they better understand the voice of the patient.”

Heywood tells the Wall Street Journal 218 members quit following publication of his blog post: whether this is because of Nielsen’s actions or PatientsLikeMe’s own data-selling practices is not known.

Still, the website has made it so the information is out there for members so they can make an educated decision about what they share about themselves and how that data is likely to be used.

Today, following publication of the Journal article, Heywood wrote:

“We believe this incident (and this article) have spurred an important ongoing discussion about what is right, just and appropriate regarding how companies operate in this new networked world.”

It’s a discussion the research industry should be having too. Here is a good place to start.

@RESEARCH LIVE

9 Comments

14 years ago

The more probing question isn’t so much whether this new industry will take hold (it already has), but how it will play out? That is, what, specifically, will come to be the ‘acceptable practices’ of it? Can we one day expect a 400-level college course called “Facebook Stalking Ethics?” http://scallywagandvagabond.com/2010/10/scrapers-is-facebook-stalking-now-a-viable-career-option/

Like Report

14 years ago

‘Carouser’ makes a very valid point: “The more probing question isn’t so much whether this new industry will take hold (it already has), but how it will play out? That is, what, specifically, will come to be the ‘acceptable practices’ of it?” As it so happens, the Market Research Standards Board is keen to address this very issue and currently has some new online guidelines out for consultation that deals directly with this area. As Chair of MRSB, I would welcome input to the debate. All details can be found here… http://www.mrs.org.uk/standards/consultation.htm

Like Report

14 years ago

I must admit I was very disappointed to hear this. The public internet is the public internet. But when a password protected site is accessed as public site, I have a real problem with it. Though current MRA, CASRO, and MRIA standards may not specifically mention web crawling, they do mention privacy and permission. I can't fathom how those standards could be read in a way that made this type of behaviour acceptable. I look forward to the more explicit standards which these organizations will be releasing in short order. I can only hope that there is more to the situation that outsiders like me are unaware of.

Like Report

14 years ago

In my view, even the "public internet" is such a grey area. When did anybody give researchers permission to hang around social media picking up other peoples' scraps? If you asked them, do you think the average person would approve? It makes the whole industry look slightly unsavoury. I'm not sure the argument that "it's public so anyone has the right to look at it" holds water. Does that really imply that anyone has the right to USE such material in a systematic way for unintended purposes, without the express permission of the "publisher"? Good manners and good taste should surely prevail, whatever the absolute letter of the law.

Like Report

14 years ago

ESOMAR makes this very clear in the Guidelines on Passive Data Collection - specifically Section 3.6 It raises a more general question of enforceability. The ESOMAR Guidelines are only that - guidelines. There is nothing to stop anyone going against what ESOMAR or other bodies recommend.

Like Report

14 years ago

Dan is right that the ESOMAR Guideline on Passive Data Collection, Observation and Recording addresses the issue, however, I think encouraging better behaviour is not just about enforcement, it is also about persuading online researchers to use their common sense. The Observation guideline says that a researcher joining a restricted group – a “walled garden” - should always announce their presence and objectives and seek the permission either of the moderator, if there is one, or the members of the group. ESOMAR is expanding on this advice in a new guideline on the use of social media in research. Common sense suggests that people will be concerned if they think they are being secretly observed or “spied on” in a space which they regard as private. I think the problem is as much about a lack of understanding of appropriate behaviour in the online space as it is one of weak enforcement. In the 1930’s researchers working for Mass-Observation used hidden cameras and recorded private conversations as part of their standard approach to measuring public opinion. Researchers have not done that for half a century because of the concern and resentment it causes among the public, who are the very people whose co-operation we need when doing research. Many online researchers and marketers have not yet learned that the principles that apply in the real world are just the same in the virtual world. Hopefully, researchers will take the opportunity created by this news, and no doubt there will be more problems like it in the future, to educate colleagues in appropriate behaviour. The revised ESOMAR Guideline for Online Research which has just been circulated for consultation and which will be published in the next few weeks says: “there should be three over-riding guiding principles for online researchers First, treat the respondent (or the person who is willing to participate in a survey) with respect. Researchers need to create a relationship with the public based on trust, respect and reciprocity by ensuring that people who participate in an online survey have a good experience. Second, researchers must be sensitive to consumer concerns and remain mindful that market research depends for its success on public confidence. Researchers should avoid activities and technology practices that could undermine public confidence in the market research industry. Third, researchers must remain diligent in maintaining the distinction between research and commercial activities such as direct marketing or advertisement targeting. Where researchers are involved with activities which use research approaches such as interviews but which are not intended solely for research purposes, they must not describe this as market, social or opinion research. If we can ersuade online researchers to observe these principles, which the rest of the research community has been following for years, it will help us maintain public support for our work.

Like Report

14 years ago

A call for more common sense. Agreed. Where can we buy that?

Like Report

14 years ago

What sort of a company is Nielsen? I now have a negative preconceptions about it. I now suspect the company is unsound. In other words I wouldn't trust it with a barge-pole. What has Neilsen done to regain the trust it has squandered? Simply saying it won't do it again is not enough - surely? The issue is - how did they get into the position of doing this in the first place? Is there a good long account of this issue from Neilsen's point of view?

Like Report

14 years ago

Fully agree with the references to ESOMAR guiding principles. Password-protected areas are not public information, and should never be used as if the authors had authorized it. On the other hand, I can't see anything wrong in using published information, such as readers' mail published in paper newspapers, or posts in public bulletin boards? "Using" does not mean re-publishing it, but quotations should be allowed (as far as they do not give new access to personal data), as for any published material. I understand the issue is not AS clear as this - for example, what about board pages that have been published in the page, but have been turned offline? I don't know. The line between publication and non-published material should nevertheless remain the guideline, IMHO. http://www.i-s-e-e.com

Like Report