When I asked a group of colleagues to suggest topics for blogs several said take material from Reading Educational Research: How to Avoid Getting Statistically Snookered, a book designed to make non-researchers wiser consumers of research and statistics. Well, I don't know. You say "statistics" and some people's eyes glaze over while others show the deer-in-the-headlights fearfulness. So this is an experiment. Let me know what you think. If you want to comment off blog, reach me at firstname.lastname@example.org.
I had wanted the words "commission reports" and "op-eds" in the title because that's where ideologues and spinmeisters to their most devious work, but the publisher said it would make the cover look too crowded. The 32 "Principles of Data Interpretation" around which the book is organized certainly apply to commission reports and op-eds. One principle is "Watch out for Simpson's Paradox." Simpson's Paradox pops up often in social science research (put "simpson's paradox" into Google and you get almost 37,000 items).
Simpson's Paradox occurs whenever the whole group shows one pattern but subgroups show a different pattern. We can see this in SAT trends. In 2005, the national average verbal SAT score was 508, just four points higher than the 504 from 1981 (the first year the College Board released data by ethnicity). School critics moaned that SAT scores were stagnant (neutral observers might have used the word "stable"). But when we look at scores using the College Board's ethnic categories, a different picture emerges:
So, while the gain for the total sample is only 4 points, every single ethnic group has a double digit advance and some of them are quite large. That's the paradox: How can the subgroups show such impressive improvement while the whole group stands still?
To understand that outcome, let's look at some hypothetical test scores:
Assume the 500's at Time 1 are the SAT scores of white students and the 400's are SAT scores of minority students. The average score for the whole group is 480. Time 2 is some time in the future, it doesn't matter when, and another set of students. Assume that that the 510's are the SAT scores of white students and that the 430's are the SAT scores of minority students.
So, white students gained ten points from Time 1 to Time 2, 500 to 510. Minority students gained more, 30 points, 400 to 430. But the average at Time 2 is only 478, two points lower than at Time 1. Professor Simpson strikes.
The explanation lies in the changing makeup of the test taking groups. At Time 1, only 20% of the test takers were minorities. At Time 2, they make up 40% of the group. Their scores are improving, but they are still lower than whites' so as they become a larger and larger proportion of the total sample of test takers, their improving-but-lower test scores attenuate the overall average or, in this case, actually cause it to fall.
This hypothetical example very much reflects reality: In 1981 whites made up 85% of all SAT test takers, in 2005, only 63%. If we looked at the results for the National Assessment of Educational Progress (NAEP), we'd see the same thing.
For example, in 2004, 9-year-olds scored 11 points higher in reading than they did in 1971, NAEP's first year. White 9-year-olds scored 12 points higher, black students 30 points higher and Hispanic students 22 points higher (the way NAEP samples students, Asians and American Indians do not form groups large enough to generate a reliable average score).
Simpson's Paradox can also apply to events that happen in the present. For instance, a study of death rates at two hospitals might find that Hospital A has a higher rate than B. But it could be that Hospital A specializes in handling people who are seriously ill. If we were to look at the death rates analyzed by the condition of the patients when they arrived at the hospital, hospital A might turn out to be the safer place to go.
Look for Simpson's Paradox whenever you have reason to think the composition of a group is changing over time. Look for it also whenever you think the subgroups might differ on some important variable like the condition of patients entering hospitals.