Driven by Data

Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. – Clifford Stoll and Gary Schubert

Data play an increasingly important role in our lives these days. Partly the cause is vastly improved technology: We have new tools to collect massive amounts of data, ever-expanding storage capability, and much faster computers along with more sophisticated algorithms for analysis. But partly the cause is cultural: Our society has chosen to emphasize data (and their fruits, usually some derived statistics) as evidence that we view as superior to ‘mere’ observation or judgment. This bias is apparent in the language we use to describe good decision-making — it should be data driven, involve metrics and focus on ‘measurable outcomes.’ This last phrase was even codified in law by the Government Performance Results Act of 1993 (GPRA).

But while most people are fascinated by the power of data, some skeptics worry about its unanticipated consequences. In a famous 1976 paper, the social scientist Donald Campbell made the observation that the very act of collecting data about human activities can change the data itself:

"The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."

A prominent recent example comes from data on acceptance rates, which U.S. News uses in ranking colleges. While a lower acceptance rate ought to indicate a more prestigious institution, a college can game the system by enticing large numbers of unqualified students to apply, artificially lowering its acceptance rate. Other examples of corruption abound, from crime statistics to best-seller lists and television sweeps. When we collect data on human activities, we frequently change the way humans behave.

Around the same time, Nobel Prize-winning economist Herbert A. Simon noted that data and quantitative measures often do "not even remotely describe the processes that human beings use for making decisions in complex situations." Simon's initial examples concerned public policy, where data and statistics alone cannot answer basic questions (for example, whether it is better to invest in new facilities or in additional staff for public parks). There are many examples of complex social policy decisions that cannot be settled by data, no matter how massive.

These two concerns — the corruption of quantitative measures and the inability of even massive amounts of data to capture social complexity — are particularly worrisome given our current obsession with data-driven education. We capture large amounts of data about standardized test scores for many students over many years, and we calculate various statistics based on those data. We use those statistics to measure the quality of teachers, the worth of schools, and the efficacy of education policies. But these measures are easily corrupted when educators focus only on the tests or even when they engage in outright cheating. And any measure based on test scores alone captures only a small part of the complexity of education. Education has suffered because of this. The lesson is not that tests are bad or useless, but rather that test score data offer only a circumscribed view of education.

These concerns predate the recent explosion of ‘big data,’ but they are amplified by that explosion. The sophisticated mathematics we use to analyze vast collections of data sometimes disguises weak data. The elaborate algorithms we use to derive statistics are sometimes based on faulty assumptions or hidden biases. When policy makers uncritically rely on data, but do not understand the mathematics used to analyze it, they can make decisions that appear to be objective and scientific without actually being so.

While this is a particular problem in social science, especially in education, it affects science itself when, for example, the practice of medicine is governed by doctor ‘report cards’ or the value of research is measured by mysterious numbers derived from citation counts.

Data is indispensible, both in science and social science, but no matter how alluring, numbers are not a priori superior to other forms of evidence. Often knowledge, understanding and wisdom are more important.

(A version of this essay first appeared in the 2014 Annual Report of the Simons Foundation.)

This post was published on the now-closed HuffPost Contributor platform. Contributors control their own work and posted freely to our site. If you need to flag this entry as abusive, send us an email.