Jane Bainbridge
Editorial

27 June 2016

The language of the internet

Johannes Eichstaedt mines social data to determine the psychological state of populations, with some compelling findings. He spoke to Jane Bainbridge

speech bubble made up of lots of illustrations of people

Throughout his academic career, Johannes Eichstaedt has always worked with data – it’s just that initially, as a physicist, it was particle physics data rather than psychological data he was processing. But when he realised that he “didn’t much care for working in a particle accelerator”, Eichstaedt switched to psychology.

Now, as a data scientist in psychology at the University of Pennsylvania, and co-founder of the World Well-Being Project, his time is spent using natural language processing to measure well-being among populations.

His research has included using Tweets to predict heart disease and Facebook statuses to identify depression.

Language patterns

In the case of heart disease and Twitter, he was part of the team of scientists that analysed more than 50,000 tweeted words to characterise community-level psychological correlates of dying from atherosclerotic heart disease (AHD) in the US.

The language patterns identified as risk factors reflected negative social relationships, disengagement and negative emotions such as anger; while positive emotions and psychological engagement emerged as protective factors. In their findings, published in Psychological Science, the researchers found that “a cross-sectional model based only on Twitter language predicted AHD mortality significantly better than a model combining 10 common demographic and socioeconomic risk factors, including smoking, diabetes and obesity”.

Twitter topics that positively correlated with county-level AHD mortality included hostility and aggression; hate and interpersonal tension; and boredom and fatigue. In comparison, topics that negatively correlated were skilled occupations; positive experiences and optimism.

“It’s not that Twitter has some magical prediction power that other variables don’t have. It’s an extremely good predictor of income and education and of communities where people smoke – so it picks up predictors of health behaviour, and then it adds a sliver of psychological causation that the other variables don’t seem to be getting at,” says Eichstaedt.

A linear discriminant analysis (LDA) algorithm crunches the data by working with 2,000 language clusters that distil what people talk about in their Facebook statuses or Tweets.

But how accurate is this social media data? Eichstaedt says there are two biases in it – sample bias and desirability bias. However he says the sample bias is overestimated: “The median age on Twitter is 32 and for the US population it’s about 36/37”, adding that once the sample is big enough, the model re-stratifies the sample to be more representative.

He says there’s some evidence that people misrepresent themselves (desirability bias), in particular suppressing negative emotion but that “the variance between people is still highly interpretable”.

facebook data

And there are differences between the media in terms of data. With Facebook data, the users have to give permission, which means “if you get 50,000 in a sample that’s amazing – generally data from Facebook users is 3 – 10 times as good as Twitter users”.

For his depression study he used Facebook data.

“For psychological insight, Facebook is preferable; it’s just you can’t get its data for that many people.” But in this case he was using data collected by someone else, which he reinterpreted to understand depression.

Looking forward he thinks diabetes, which seems to have a lot of behavioural predictors, might be an area worth researching – not that all areas of wellbeing research are ripe for social media data analysis.

“As long as your data is big enough it will always work; the question is, will it improve on other methods? And there the answer is sometimes no; when trying to predict something like cancer, it didn’t work because income and education appear to be a much better predictor than what’s happening on Twitter.”

We hope you enjoyed this article.
Research Live is published by MRS.

0 Comments

Data analytics Features Healthcare Impact North America Trends

Powered by The Research
Buyers Guide

FIND YOUR NEXT AGENCY.

Advanced Search

Interviews

Melanie Courtright: ‘Uncertainties increase demand for insights’

16 Apr Liam Kay-McClean

Feature

‘We are sparring partners to our clients’: Kantar’s Nicki Morley on innovation and insight

2 Apr Liam Kay-McClean

Feature

Jigna Tailor: ‘Courtesy goes a long way’

15 Apr Liam Kay-McClean

Newsletter

Stay connected with the latest insights and trends...

Sign Up

Featured Company from the RBG Directory

Town/Country: London
Email: info@watermelonresearch.com

Watermelon are Customer Experience, Insight and Fieldwork Operations specialists. We work with brands to listen to customers to identify and prioritise CX improvement opportunities and initiatives that deliver ROI. We . . .

Latest From MRS

Our latest training courses

Our new 2025 training programme is now launched as part of the development offered within the MRS Global Insight Academy

See all training

Specialist conferences

Our one-day conferences cover topics including CX and UX, Semiotics, B2B, Finance, AI and Leaders' Forums.

See all conferences

MRS reports on AI

MRS has published a three-part series on how generative AI is impacting the research sector, including synthetic respondents and challenges to adoption.

See the reports

Latest

Themes

Specialisms

Regions

About

Contact Us

Sign in/Register

Search

The language of the internet

Language patterns

facebook data

We hope you enjoyed this article.
Research Live is published by MRS.

0 Comments

Display name

Email

Join the discussion

FIND YOUR NEXT AGENCY.

Popular

Eight charged with survey fraud in US

‘Uncertainties increase demand for insights’: Melanie Courtright on the outlook for US research

Lessons from marketing winners: Using utility to drive change

Everyone’s talking about young men’s misogynistic attitudes – but who’s actually listening to them?

Choosing the right metrics to track for brand growth

Can insight help to reach young men in a changing world?

Interviews

Melanie Courtright: ‘Uncertainties increase demand for insights’

‘We are sparring partners to our clients’: Kantar’s Nicki Morley on innovation and insight

Jigna Tailor: ‘Courtesy goes a long way’

Newsletter

Featured Company from the RBG Directory

Latest From MRS

Our latest training courses

Specialist conferences

MRS reports on AI

Find your next agency...

Latest

Themes

Specialisms

Regions

About

Contact Us

Sign in/Register

Search

The language of the internet

Language patterns

facebook data

We hope you enjoyed this article.Research Live is published by MRS.

0 Comments

Display name

Email

Join the discussion

FIND YOUR NEXT AGENCY.

Related

Circana appoints global retail and media president

YouGov adds tool for shopper analytics

Leadership changes at MarketCast

Popular

Eight charged with survey fraud in US

‘Uncertainties increase demand for insights’: Melanie Courtright on the outlook for US research

Lessons from marketing winners: Using utility to drive change

Everyone’s talking about young men’s misogynistic attitudes – but who’s actually listening to them?

Choosing the right metrics to track for brand growth

Can insight help to reach young men in a changing world?

Interviews

Melanie Courtright: ‘Uncertainties increase demand for insights’

‘We are sparring partners to our clients’: Kantar’s Nicki Morley on innovation and insight

Jigna Tailor: ‘Courtesy goes a long way’

Newsletter

Featured Company from the RBG Directory

Latest From MRS

Our latest training courses

Specialist conferences

MRS reports on AI

Progress faster...with MRS membership

Mentoring

CPD/recognition

Webinars

Codeline

Discounts

Find your next agency...

We hope you enjoyed this article.
Research Live is published by MRS.

Progress faster...
with MRS
membership