My colleagues and I have been reviewing a lot of research lately, as you may have noticed in recent blogs on our reviews of research on secondary reading and our work on our web site, Evidence for ESSA, which summarizes research on all of elementary and secondary reading and math according to ESSA evidence standards. In the course of this work, I've noticed some interesting trends, with truly revolutionary implications.
The first is that reports of rigorous research are appearing very, very fast. In our secondary reading review, there were 64 studies that met our very stringent standards. 55 of these used random assignment, and even the 9 quasi-experiments all specified assignment to experimental or control conditions in advance. We eliminated all researcher-made measures. But the most interesting fact is that of the 64 studies, 19 had publication or report dates of 2015 or 2016. Fifty-one have appeared since 2011. This surge of recent publications on rigorous studies was greatly helped by the publication of many studies funded by the federal Striving Readers program, but Striving Readers was not the only factor. Seven of the studies were from England, funded by the Education Endowment Foundation (EEF). Others were funded by the Institute of Education Sciences at the U.S. Department of Education (IES), the federal Investing in Innovation (i3) program, and many publishers, who are increasingly realizing that the future of education belongs to those with evidence of effectiveness. With respect to i3 and EEF, we are only at the front edge of seeing the fruits of these substantial investments, as there are many more studies in the pipeline right now, adding to the continuing build-up in the number and quality of studies started by IES and other funders. Looking more broadly at all subjects and grade levels, there is an unmistakable conclusion: high-quality research on practical programs in elementary and secondary education is arriving in amounts we never could have imagined just a few years ago.
Another unavoidable conclusion from the flood of rigorous research is that in large-scale randomized experiments, effect sizes are modest. In a recent review I did with my colleague Alan Cheung, we found that the mean effect size for large, randomized experiments across all of elementary and second reading, math, and science is only +0.13, much smaller than effect sizes from smaller studies and from quasi-experiments. However, unlike small and quasi-experimental studies, rigorous experiments using standardized outcome measures replicate. These effect sizes may not be enormous, but you can take them to the bank.
In our secondary reading review, we found an extraordinary example of this. The University of Kansas has an array of programs for struggling readers in middle and high schools, collectively called the Strategic Instruction Model, or SIM. In the Striving Readers grants, several states and districts used methods based on SIM. In all, we found six large, randomized experiments, and one large quasi-experiment (which matched experimental and control groups). The effect sizes across the seven varied from a low of 0.00 to +0.15, but most clustered closely around the weighted mean of +0.09. This consistency was remarkable given that the contexts varied considerably. Some studies were in middle schools, some in high schools, some in both. Some studies gave students an extra period of reading each day, some did not. Some studies went for multiple years, some did not. Settings included inner-city and rural locations, and all parts of the U.S.
One might well argue that the SIM findings are depressing, because the effect sizes were quite modest (though usually statistically significant). This may be true, but once we can replicate meaningful impacts, we can also start to make solid improvements. Replication is the hallmark of a mature science, and we are getting there. If we know how to replicate our findings, then the developers of SIM and many other programs can create better and better programs over time with confidence that once designed and thoughtfully implemented, better programs will reliably produce better outcomes, as measured in large, randomized experiments. This means a lot.
Of course, large, randomized studies may also be reliable in telling us what does not work, or does not work yet. When researchers get zero impacts and then seek funding to do the same treatment again, hoping for better luck, they and their funders are sure to be disappointed. Researchers who find zero impacts may learn a lot, which may help them create something new that will, in fact, move the needle. But they have to then use those learnings to do something meaningfully different if they expect to see meaningfully different outcomes.
Our reviews are finding that in every subject and grade level, there are programs right now that meet high standards of evidence and produce reliable impacts on student achievement. Increasing numbers of these proven programs have been replicated with important positive outcomes in multiple high-quality studies. If all 52,000 Title I schools adopted and implemented the best of these programs, those that reliably produce impacts of more than +0.20, the U.S. would soon rise in international rankings, achievement gaps would be cut in half, and we would have a basis for further gains as research and development build on what works to create approaches that work better. And better. And then better still.
There is bipartisan, totally non-political support for the idea that America's schools should be using evidence to enhance outcomes. However a school came into being, whoever governs it, whoever attends it, wherever it is located, at the end of the day the school exists to make a difference in the lives of children. In every school there are teachers, principals, and parents who want and need to ensure that every child succeeds. Research and development does not solve all problems, but it helps leverage the efforts of all educators and parents so that they can have a maximum positive impact on their children's learning. We have to continue to invest in that research and development, especially as we get smarter about what works and what does not, and as we get smarter about research designs that can produce reliable, replicable outcomes. Ones you can take to the bank.
This blog is sponsored by the Laura and John Arnold Foundation