By Kate Starbird, University of Washington, Human Centered Design & Engineering; and Emma Spiro, University of Washington, iSchool
The problem of “fake news” on social media has received considerable attention recently, with some claiming that widespread propagation of misinformation had an impact on the U.S. election. Though these arguments may be somewhat overstated, there is little doubt that the modern information-sharing environment facilitates the spread of rumors, misinformation, and politically-slanted disinformation. An important aspect of this issue involves the roles and responsibilities of the social media companies whose platforms are increasingly mediating public discourse. Is it possible to detect fake news and other misinformation spreading through social media? And if so, what can or should social media platforms do about it?
In research funded by the National Science Foundation, we have been studying a related phenomenon—online rumoring during crisis events—for three years. This project has dual goals: to better understand how rumors spread during crisis events and to develop methods for automatically detecting rumors in social media, specifically Twitter. Though somewhat distinct from the problems of fake news and political propaganda, our work does intersect with these issues and may provide insight into how to address them.
In the crisis context, rumors—which can be thought of as stories of unverified truth value—are not necessarily intentional or malicious, but are often part of the natural sense-making process that occurs as people work to collectively process imperfect information. And indeed, sense-making rumors are one prominent type of online rumor in this context. But we also encounter intentionally false rumors that seem to be introduced into the space with the primary motivation of viral spread. And we see our share of conspiracy theories as well, including ones that share commonalities with the “fake news” stories that were prominent during the 2016 election season—i.e., propagation by online sites and accounts that have a strong political agenda. Rumors of these different types tend to propagate differently through and across social media, and methods for identifying them could be optimized for a specific type.
The challenge of detecting online misinformation can be approached from several directions. A common method uses machine learning (ML) to automatically assess credibility of posts. In machine learning, computer algorithms learn how to categorize data through analysis of multiple examples or “training data.” Most ML algorithms are designed to look at certain features of the data—features that are selected by the algorithm’s designers. For example, researchers have shown three types of features to be useful for determining veracity or credibility of tweets, including message-based features (content of the posts), user-based features (profile content, friend and following relationships, content and timing of previous posts), and network-based features (reshares, comments, linked-to domains).
Some of the most promising approaches seek to utilize the work of the online crowd to help in the detection process. One idea we’ve explored—building off prior work—involves leveraging the “self-correcting” crowd. This approach assumes that misinformation may propagate in a myriad of ways, but that the ways people challenge or correct false rumors are more consistent. In our research, we attempted to train machine learning algorithms to identify rumor-correcting tweets—and from there to find the rumors they were correcting. Unfortunately, our research on the 2013 Boston Marathon Bombings showed that for many rumors the crowd correction tends to be relatively small and lags (temporally) behind the rumor. However, another type of content—messages that contain uncertainty about the rumor’s veracity—often appear earlier and at higher volumes than explicit corrections. This suggests that expressed uncertainty in message content could be a useful feature for automatic early detection.
Other potential methods for detecting misinformation include explicit recommendation systems (e.g., asking users to set a flag indicating a message contains false information) as well as formal crowdsourcing efforts that distribute messages to paid or volunteer crowdworkers and ask them to rate each post’s credibility. The most successful solutions are likely to be hybrid ones that integrate automated, ML algorithms based on a variety of features with real-time feedback from people to catch errors and refine the models. Researchers have experimented with a system like this for determining credibility of tweet content using paid and volunteer crowdworkers to calibrate ML models. Social media platforms could utilize similar human-in-the-loop solutions using paid employees or crowdworkers, though each has its drawbacks.
One difficulty is that all of these techniques, if made transparent to the public, can be “gamed” by those seeking to avoid detection. This is especially true of the techniques that rely upon feedback from the “crowd,” which can be exploited by organizations who mobilize crowdworkers to promote content they agree with and/or demote content they don’t like. But this also applies to purely automated solutions—any publicization of the tools and strategies for detecting misinformation can be used by the perpetrators to develop methods for circumventing the filters.
Detecting misinformation is only a small part of the “fake news” problem. A more complex, socio-technical question is what to do next, after a fake story or false rumor has been detected. Should the system simply remove that information? Are we comfortable with that kind of censorship? We’d guess most of us would say no. But there may be other ways to help slow or stop the propagation of misinformation, perhaps by providing through the social media platform some signals related to a post’s perceived veracity—e.g., automatically generated credibility scores and/or links to places where the misinformation is being challenged. We believe that one potential solution (and an important research direction) is to redesign these platforms to help people become better at discerning information credibility “on their own”—through features that support both better decision-making in the moment and increased information literacy over the long term.