A data fabrication scandal has rocked the media and academic worlds. In December 2014, many media outlets reported on a seemingly groundbreaking study that purported to show how conversations can change opinions about same-sex marriage. Fast forward six months, and there are allegations that the data used in the study were faked. Stories like this can lead to public mistrust of data and science, especially for the many people who don’t use data or statistics in everyday life. How are people supposed to know what data to trust?
Here’s the good news: You can determine whether to trust the data yourself, without relying on experts with often-differing opinions, even if you’ve never opened a statistics book. What follows are six very simple questions that anyone can ask and answer to assess a data-based article. The answers to these questions wouldn’t have rooted out fraud at the level alleged in the experiment on changing opinions -- even experts didn’t find that -- but in 99.98 percent of cases that don’t involve fraud, these questions will give you a good indication of the data quality.
1. Where did the data come from?
The source of the data shouldn’t be a mystery. Think of the source of the data the same way you’d think of a source for any piece of information -- after all, “data” are simply many pieces of information. If the source of the data is “anonymous” or not provided, there’d better be a good reason.
Not all sources are equal, however. If the analysis is showing that Democrats did something bad, and the data were collected by a Republican-affiliated group, be skeptical. If a toothpaste company collected data that say their brand is preferred over all other brands, be skeptical.
2. Why were the data collected?
Some people or organizations have an agenda for collecting data beyond telling an unbiased, completely truthful story. They might be interested in a certain outcome and could tailor data collection to that desired outcome. If the data were collected specifically to address the issue in the article, be sure the source is trustworthy. If the purpose for collecting the data doesn’t seem to have anything to do with the analysis you’re reading, be skeptical. If the purpose for collecting the data is to promote the company, a product or a candidate, be skeptical.
3. How were the data collected?
Data collection processes can be fairly boring to read about, even if they’re simple. However, it’s important that there’s some information about how the data were collected so readers know that it didn’t grow on a tree or wasn’t simply made up. The details of data collection can be made up as well, but providing a description of how it was done indicates that the data were actually collected. And, if the way in which the data were collected doesn’t seem to match the goals of the analysis -- for example, an analysis of how women feel about health care uses data collected from all adults, meaning men and women -- you know to be skeptical.
4. How did the writer get the data?
If the data were collected by someone other than the author, how the author has access to the data is important. If the data are publicly available, or the author makes it publicly available, that’s good. It means anyone can go in and try to recreate the analysis, and if something is wrong in the original analysis you’re more likely to find out. If the data are only available to specific people, or not available at all, be skeptical.
5. Do the conclusions make sense?
The conclusions drawn in the article should be related to where the data came from, why it was collected and other details the author provides in the article. If you don’t understand how the author reached a particular conclusion, be skeptical.
That doesn’t mean you should ignore any conclusion that goes against what you think. Keep in mind that data allow us to study things at a much larger scale than our own personal experience, which means data analysis may highlight patterns that you don’t see in your own life. But that doesn’t mean the data are inaccurate -- it simply means the world is complex. Relatedly, because the world is complex, any conclusion that seems too simple or perfect should elicit some skepticism.
6. Is there a discussion of shortcomings, complications or uncertainty?
It’s extremely rare that a single analysis would show a definitive conclusion without any room for question. Virtually every time someone analyzes data or does any type of statistics using data, there is some flaw or shortcoming, and there is always uncertainty about any conclusion when statistics are involved. These things should be acknowledged. If an author seems to claim that their conclusion is the only possible interpretation of the data, be skeptical of that claim.
In the case of the same-sex marriage study, these questions would have generated some skepticism -- the original data were not publicly available prior to the study’s publication (question No. 4) and the findings seemed too good to be true (question No. 5) -- but the alleged fraud was undetectable using only these questions. Again, though, that situation is rare.
Generally if you’re satisfied with the answers to these six questions, that’s a good indication you can believe the claims made in the article. If not, you can ask more questions. The best part is you decided for yourself.