Bing, Google and other search engines are very, very good at detecting potentially problematic drug combinations. Better, apparently, than the Federal Drug Administration.
In 2011, analysts at Stanford University data-mined the FDA's Adverse Event Reporting System records and discovered that hyperglycemia is a previously unknown side effect of combining the antidepressant paroxetine and the anti-cholesterol drug pravastatin. Inspired by this finding, a separate team of scientists, led by Dr. Ryen White of Microsoft Research, decided to see if there was evidence of this adverse side effect in online search records from 2010, before the effects were known to the FDA.
The findings, published Wednesday in the Journal of the American Medical Informatics Association, show that online search data from 2010 revealed the problematic side effects of combining paroxetine and pravastatin far before that information was publicly known. In follow-up experiments, the team showed that online search data alone could also detect little-known hyperglycemic side effects for 31 other drugs.
In the course of the experiment, White's team analyzed "82 million drug, symptom and condition queries" mined from the 2010 search records of six million people using all three major search engines -- Google, Bing and Yahoo. The team found that 10 percent of all the people who searched for information on paroxetine and pravastatin later searched for information related to hyperglycemia. That's in contrast to 5 percent of users who searched only for information on paroxetine and 4 percent of users who searched only for information on pravastatin. The New York Times reported this week that the researchers were "surprised by the strength of the 'signal' that they detected in the searches and argued that it would be a valuable tool for the FDA to add to its current system for tracking adverse effects."
The new study is the latest in a series of high-profile studies indicating the power of Internet data mining. In similar studies, data analysts used Internet-generated information to predict the success of unreleased films via information from Wikipedia and BoxOfficeMojo, and to sort real news from rumors on Twitter.
The efficacy of data mining does lend itself to privacy concerns, however. In January, an MIT team showed that it was possible, via information analytics, to match sequenced DNA to supposedly anonymous donors. The study prompted some hysteria: the blog i09 speculated that in the near future, data mining might produce "advertisements based on your DNA."