OPINION12 October 2010

A close scrape

A Wall Street Journal article about web scraping prompts a return to the question of whether social media data is easy pickings for researchers, or a potential minefield of bad PR?

The Wall Street Journal has a story today about Nielsen, and how its Buzzmetrics social media monitoring business was found “scraping” content from a website called PatientsLikeMe, where people suffering from diseases and mental disorders can go to share their experiences and support others like them.

Nielsen had apparently created a user account, which anyone can do, to access the site and begin collecting data. You can read the story in full here, but here’s an interesting excerpt:

“I felt totally violated,” says Bilal Ahmed, a 33-year-old resident of Sydney, Australia, who used PatientsLikeMe to connect with other people suffering from depression… After PatientsLikeMe told users about the break-in, Mr. Ahmed deleted all his posts, plus a list of drugs he uses. “It was very disturbing to know that your information is being sold,” he says. Nielsen says it no longer scrapes sites requiring an individual account for access, unless it has permission.

I picked up on this as it serves to support a question we’ve asked previously on Research (here and here). That is: just because social media data is there to be mined, does that mean researchers should? And should researchers do so without asking, whether the data is in the public domain or not?

The Journal tries to muster some outrage at Nielsen’s actions, but it is hard to chide the company for anything other than a breach of the site’s user agreement – and the fact that it didn’t ask permission to carry out its research.

PatientsLikeMe co-founder Ben Heywood blogged:

“While this was not a security breach, it was a clear violation of our User Agreement (which expressly forbids this type of activity) and, more significantly, a violation of the community’s trust. Your Account Information (e.g. your names and emails) was NOT in danger of being stolen. It is likely that the forum information that was “scraped” would be sold as part of that company’s internet monitoring product.”

But here’s the kicker:

“In fact, we sell a similar service, PatientsLikeMeListenTM, to our clients so they better understand the voice of the patient.”

Heywood tells the Wall Street Journal 218 members quit following publication of his blog post: whether this is because of Nielsen’s actions or PatientsLikeMe’s own data-selling practices is not known.

Still, the website has made it so the information is out there for members so they can make an educated decision about what they share about themselves and how that data is likely to be used.

Today, following publication of the Journal article, Heywood wrote:

“We believe this incident (and this article) have spurred an important ongoing discussion about what is right, just and appropriate regarding how companies operate in this new networked world.”

It’s a discussion the research industry should be having too. Here is a good place to start.