The Blog

Troubled #hearts -- in 140 Characters

Is it possible that aggregate Twitter patterns might also be revealing in some useful way? Could Twitter offer snapshots of communities as well as individuals? A team of scientists has been exploring this possibility.
This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.

I joined Twitter in 2008, and I've always been impressed by the diversity of this floating conversation. People will just as soon tweet about dinner as the sorry state of American politics, and they are by turns thoughtful and shallow, original and fraudulent, snide and generous of spirit. In 140 characters or fewer, users reflect the range of human emotion, from joy to rage, wonder to boredom, cynicism to hopefulness.

Individual Twitter users can obviously reveal a lot about their lives and feelings, even in terse tweets. But what about very large numbers of tweets, by many people in many places? Is it possible that aggregate Twitter patterns might also be revealing in some useful way? Could Twitter offer snapshots of communities as well as individuals?

A large team of University of Pennsylvania scientists has been exploring this possibility. Led by psychological scientist Johannes Eichstaedt and information scientist Hansen Schwartz, the researchers wondered if the vast amount of language contained in tweets might be a valuable public health resource -- specifically, if this linguistic bonanza might offer a way to gauge a community's risk for heart disease.

Scientists have identified many of the key risk factors for heart disease, such as smoking, inactivity, obesity and hypertension, and these insights have significantly diminished risk of the world's leading killer. Psychological traits such as chronic stress and depression are also important risk factors, while optimism and social support are known to be protective. These psychological characteristics often affect entire communities, putting large numbers of people at risk for disease. Community-wide interventions could improve health, but assessing community risk is difficult and expensive.

That's where Twitter comes in. The Penn scientists are pioneers in an emerging field called digital epidemiology, and their aim is to use social media as a cheap and flexible method to assess the psychological traits -- and thus health risks -- of entire communities. To test this method's potential, the scientists collected 148 million tweets from across the U.S., sorted into their 1,347 counties of origin. The scientists also gathered socioeconomic and demographic data on these counties, which are home to 88 percent of Americans.

They used two different methods to analyze the language used in each county's aggregate tweets for 10 months in 2009 and 2010. They measured specific words and topics, both negative (hostility, cursing, aggression, boredom and fatigue) and positive (wonder, hope, triumph, opportunity), and used these linguistic patterns to characterize communities at risk for heart disease. They then compared these risk patterns to the actual mortality rates for each county, obtained from the Centers for Disease Control. The idea was to see if the disease-relevant information contained in a given county's Twitter language predicted heart disease mortality.

And it did, clearly. As reported in a forthcoming article in the journal Psychological Science, negative relationships, negative emotions, disengagement and (especially) anger were all significantly correlated with heart disease. This held true even after controlling for income and education, suggesting that Twitter language captures important information not accounted for by socioeconomic status. By contrast, positive emotions and engagement were associated with lower heart disease mortality. Engagement with life -- considered a key component of successful aging -- emerged as the most potent protective factor.

What's more, Twitter language was a better predictor of heart disease mortality than 10 common demographic and behavioral risk factors, including such infamous ones as smoking and high blood pressure.

It's interesting to note that the typical Twitter user is 31, considerably younger than those at risk for heart disease. So the people tweeting are not the people dying. This suggests that the young adults' tweets are revealing the combined psychological character of their community, which in turn predicts aggregate health outcomes. In short, the language of Twitter may offer a window into the potent influence of community character, and may prove to be a valuable tool for public health.

Follow Wray Herbert's reporting on psychological science in The Huffington Post and on Twitter at @wrayherbert.