More than a century ago, Mark Twain famously said, "There are lies, damned lies, and statistics." It's a great quote that is more true than ever. As I wrote in Too Big to Ignore, in an era of Big Data, there is tremendous opportunity and arguably more incentive to create, ignore, and pervert information.
Against this backdrop, I recently read Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics by Gary Smith. It's a refreshing take on the practice of bastardizing numbers. What's more, this is no theoretical text. Smith's examples shine a much-needed light on how many ostensibly intelligent folks either intentionally or inadvertently use statistics in questionable ways.
I recently sat down with him to discuss his book.
PS: What inspired you to write the book?
GS: This book's title refers to Ronald Coase's cynical comment that, "If you torture the data long enough, it will confess." Thirty years ago, calling someone a "data miner" was an insult comparable to being accused of plagiarism. Today, people advertise themselves as data miners. We are told that government, business, finance, medicine, law, and our daily lives are improved by data miners ransacking data to discover truth.
Too little time is spent distinguishing between good data and rubbish, between good research and nonsense. We make bad decisions because we assume that computers are infallible; no matter what kind of garbage we put in, computers will spit out gospel.
PS: Jim Collins sold quite a few copies of Good to Great. Still, you claim that his methodology was flawed. Explain.
GS: Collins freely admits that he began his study with no idea why some companies do better than others. After he identified eleven companies with extraordinary stock returns, he looked for common characteristics and attached catchy names, like Level 5 Leadership.
When Collins wrote that he "developed all of the concepts ... by making empirical deductions directly from the data," he was boasting that his study was unbiased and professional. In fact, he was admitting that his research was pure data mining -- deriving theories from data, instead of testing theories with data.
The fundamental problem is that when we look at any group of companies, the best or the worst, we can always find common characteristics. Every one of the eleven companies selected by Collins happen to have either an i or an r in its name, and several have both an i and an r. Is the key for going from good to great to make sure that your company's name has an i or r in it? Of course not.
Finding common characteristics after the companies have been selected is inevitable and uninteresting. The interesting question is whether the common characteristics identified by Collins are of any use in predicting which companies will succeed in the future. The short answer is no. Since the publication of his book, five of his eleven stocks have done better than the overall stock market, six have done worse.
PS: As Mark Twain famously noted, people have lied with statistics for a very long time. Is the problem getting worse? Why or why not?
GS: Yes, much worse. Lying with statistics used to mean using an arsenal of tricks to tweak the data; for example, not adjusting for inflation, looking at carefully selected time periods, or stretching graphs. Now, armed with lightning-fast computers, people can massage, distort, and mangle mountains of data in ways that were unimaginable before.
In addition, the massaging, distorting, and mangling are often done by well-intentioned researchers who are blissfully unaware of the mischief they are committing. It is how well-respected people came up with the now-discredited ideas that coffee causes pancreatic cancer and that people can be healed by positive energy from self-proclaimed healers living thousands of miles away.
PS: What do you hope that people gain from reading the book?
GS: The combination of Big Data and big computers can be big trouble.
If a theory doesn't make sense, be skeptical. For example, a study published in one of the world's top medical journals concluded that Japanese and Chinese Americans are susceptible to heart attacks on the fourth day of every month because in Japanese, Mandarin, and Cantonese, the pronunciation of four and death are very similar, so that the stress experienced on the fourth day of the month was comparable to being pursued down a dark alley by a vicious dog.
No, I am not making this up. This study was taken seriously and reported worldwide even though it is preposterous and insulting. It is also false.