There is a very old joke that goes like this:
What's the second-worst thing to find in your apple? A worm.
What's the worst? Half a worm.
The ESSA evidence standards provide clearer definitions of “strong,” “moderate,” and “promising” levels of evidence than have ever existed in law or regulation. Yet they still leave room for interpretation. The problem is that if you define evidence-based too narrowly, too few programs will qualify. But if you define evidence-based too broadly, it loses its meaning.
We've already experienced what happens with a too-permissive definition of evidence. In No Child Left Behind, "scientifically-based research" was famously mentioned 110 times. The impact of this, however, was minimal, as everyone soon realized that the term "scientifically-based" could be applied to just about anything.
Today, we are in a much better position than we were in 2002 to insist on relatively strict evidence of effectiveness, both because we have better agreement about what constitutes evidence of effectiveness and because we have a far greater number of programs that would meet a high standard. The ESSA definitions are a good consensus example. Essentially, they define programs with "strong evidence of effectiveness" as those with at least one randomized study showing positive impacts using rigorous methods, and "moderate evidence of effectiveness" as those with at least one quasi-experimental study. “Promising” is less well-defined, but requires at least one correlational study with a positive outcome.
Where the half-a-worm concept comes in, however, is that we should not use a broader definition of "evidence-based". For example, ESSA has a definition of "strong theory." To me, that is going too far, and begins to water down the concept. What program in all of education cannot justify a "strong theory of action"?
Further, even in the top categories, there are important questions about what qualifies. In school-level studies, should we insist on school-level analyses (i.e., HLM)? Every methodologist would say yes, as I do, but this is not specified. Should we accept researcher-made measures? I say no, based on a great deal of evidence indicating that such measures inflate effects.
Fortunately, due to investments made by IES, i3, and other funders, the number of programs that meet strict standards has grown rapidly. Our Evidence for ESSA website (www.evidenceforessa.org) has so far identified 101 PK-12 reading and math programs, using strict standards consistent with ESSA definitions. Among these, more than 60% meet the “strong” standard. There are enough proven programs in every subject and grade level to give educators choices among proven programs. And we add more each week.
This large number of programs meeting strict evidence standards means that insisting on rigorous evaluations, within reason, does not mean that we end up with too few programs to choose among. We can have our apple pie and eat it, too.
I'd love to see federal programs of all kinds encouraging use of programs with rigorous evidence of effectiveness. But I'd rather see a few programs that meet a strict definition of “proven” than to see a lot of programs that only meet a loose definition. 20 good apples are much better than applesauce of dubious origins!
This blog is sponsored by the Laura and John Arnold Foundation