Daniel Koretz’ TheTesting Charade: Pretending to Make Schools Better may be the best book on testing since his Measuring Up: What Educational Testing Really Tells Us. We should all be grateful to Koretz’ editor who told him to stop “pulling your punches.” That’s why The Testing Charade “finally” uses “honest adjectives to describe the harm high-stakes testing has done to students and teachers.”
That being said, Koretz had been correct to use “carefully measured” academic language in his earlier discussions of education policy. Since he was such a respected scholar, even the most true-believing, accountability-driven reformers had to listen to Koretz’ advice. He also had to be diplomatic in order to negotiate access to data that school systems carefully guard, and advise superintendents and other education leaders. In some of the most valuable parts of the book, Koretz is thus able to explain the edu-politics that created a testing regime that remains “Beyond All Reason.” (Emphasis is Koretz’)
These conversations illustrate why Koretz had to conclude his analysis with a reminder that thirty years ago he and other social scientists warned that test-based accountability “wouldn’t succeed.” The stakes attached to tests were much smaller back then but he predicted that even those milder accountability systems would “face only three options: cheat, find other ways to cut corners, or fail.” However, neither Koretz or anyone else “predicted just how extreme the failures of test-based reform would be.” He didn’t anticipate cheating on the scale that it occurred. He expected bad test prep, but he “didn’t expect states and districts would openly peddle it to their teachers.”
Koretz understood that instruction would be displaced by test prep. But he writes mournfully:
I didn’t foresee just how much time testing and test prep would swallow or that filling students’ time with interim tests and test prep would become the new normal. And I didn’t foresee that test-based accountability would fundamentally corrupt the notion of good teaching, to the point where many people can’t see the difference between test prep and good instruction.
Finally, Koretz predicted test score inflation but he didn’t anticipate such “jaw-dropping” gamesmanship. And the “absurdity” of evaluating teachers the way that the Duncan administration and Gates Foundation pushed never occurred to him. Koretz was willing to provide expert witness testimony for the lawsuit challenging Florida’s evaluation law, where teachers were held accountable for outcomes of students who they didn’t teach, but he notes, that “any reasonably intelligent fourth-grader would recognize that Florida’s policy is absurd.”
The book’s first invaluable sets of insights come from Koretz’ account of trying to advise policy-makers, such as Arne Duncan and edu-philanthropy staff members. The pushback from these sincere non-educators often was angry and/or condescending. Staffers would often jump into the weeds of their policy planning but remain uninterested in the social science which should have informed their policies. Not understanding that the setting of unachievable goals for data-driven growth would create “perverse incentives,” reformers just made their targets up. Perhaps most alarming, top policy makers brushed off warnings about the unintended harm their metrics would produce.
The part of The Testing Charade that taught me the most was Koretz’ accounts of trying to enlighten school leaders. His proposals for studying the validity of scores were often brushed off by districts that handed that duty to their consultants. Worse, Koretz offered an inspired opportunity for assessing the damage that would be done by test prep. He provided 125 superintendents with a packet of test prep lessons. All were “boring as sin.” Some would jack up scores, but without providing any redeeming benefit. Some were “potentially OK,” and some were “completely unacceptable.” He asked the administrators to classify the different test prep methods as good or bad.
About 1/3rd of the superintendents “got the point,” and would say things like “if you think this is bad, you should see what goes on in some of our schools.” Most “labeled virtually everything as ‘good,’” and many became visibly angry.
A different but comparable pattern emerged when the experiment was conducted with former teachers, who were no longer pressured to engage in trickery. He was told that “this exercise would make no sense to many young teachers.” A former teacher said that the bad test prep was “precisely the sort of thing that she had been told explicitly is good instruction.” Koretz’ students told him that he was misunderstanding test prep as something that competes with good instruction, because, “in their experience, raising scores had become the end goal, the mark of a ‘good’ teacher.” (Emphasis is Koretz’)
The book's most dire prediction, the worse case where “the test becomes the curriculum,” has become true. But some of today’s teachers see that transgression as a virtue. Today’s teachers are often taught that the test “should define what they teach.”
In my experience with Oklahoma City in the 1990s, City, testing wasn’t a burden. Students and teachers told horror stories about teach-to-the-test in the 1980s, but we’d survived that. The district’s testing experts were candid about the inevitability of No Child Left Behind corrupting data by replacing norm- referenced tests with criterion referenced tests. NCLB didn’t have that much teeth to it, mostly creating the prospect of one headline after another proclaiming “failed schools.” But I was invited to share an analysis with superintendents about the Ohio NCLB plan. Because of that state’s Electoral College votes in the upcoming 2004 election, it would be given the most leeway for “kicking the can down the road.” We were told that our job was to delay the day of test-driven judgment until western Republican governors demanded that NCLB be amended.
In my experience, NCLB derailed the real improvement my poor district had been achieving. The real damage was done when we briefly endured a rookie superintendent from the Broad Academy, and when graduation exams and the Obama administration started to hold individuals accountable. But, I also understood that the less punitive reforms in the 1990s paved the way for the panic inspired by NCLB, and the Bush administration reforms wore educators down, leading to the disasters of the Obama years. Koretz helps this saddened but incorrigible Obama supporter understand these patterns more deeply.
I especially welcome Koretz’ accounts of why the setting of unreasonable targets opened the door for the worst episodes of “jukin the stats.” Clearly they encouraged systems to collaborate with consultants “who earned their living selling tests.” The Testing Charade is especially timely now that states are crafting the new ESSA plans. In perhaps his most heretical recommendation, Koretz says that goals should be doable in the real world.
Even though it will make me sound like a dinosaur to say so, I especially appreciate Koretz’ challenge to narrowing the curriculum and the aligned and paced curriculum pacing practices that we used to ridicule. As educators used to be able to say, those guidelines encourage “in one ear and out the other,” skin-deep instruction. Instead, our diverse students “need measures that are not too closely aligned with each other.” And Koretz is also wise in saying, “we need to curtail sharply the use of the ‘interim’ or ‘benchmark’ assessments.” One of the most tragic debacles I have witnessed during the NCLB era was prompted by benchmark testing. About 40% of our students who were subject to benchmarking dropped out, and about half of our instructional time was wasted on those assessments. (Emphasis is Koretz’)
Aside from the question as to what produced Trumpism, I’m most dismayed and perplexed by the question why we haven’t rejected test-based accountability. I’m grateful to Koretz for helping me understand the answer, but I have to admit that the truth hurts worse than I expected. One of his answers involves the multiple ways that systems and policy-makers have sabotaged efforts to study the outcomes that reforms produced. It’s not just hubris but it’s also self-interest that keeps reformers from addressing the real question. Reformers make exaggerated claims about the gains produced by test-based policies but, more importantly, they ignore the “massive” damage they inflicted. If we faced facts, “It would be hard to justify continuing an approach that does so much damage while creating so little benefit.” So, what does it say about us when we stay the course?
Essentially, we’ve dumped toxic testing on our children. Due to the Opt Out movement, and with the ESSA, the demand that this poison be spilled in our buildings has been countered. The question is whether educators will engage in a meaningful cleanup, and restore the environment so that meaningful learning can flourish.
Koretz closes with nine, smart principles, including: “Pay Attention to Context;” “Don’t Expect Schools to Do It All;” “Accept the Need for Human Judgment;” and Linda Darling Hammond’s advice to “Stop Just Kicking the Dog Harder." As I’ve indicated, I especially appreciate his recommendation, “Set Reasonable Targets,” and I would think that even reformers should be willing to “Monitor More than Student Achievement,” and “Create Counterbalancing Incentives.” One of Koretz’ recommendations is, “Monitor, Evaluate, and Revise.” In other words, policy makers should “try out,” as opposed to just “try” new things. Instead of pretending that we know all the answers, educators should evaluate experiments and make adjustments based on what happens in the real world.
I’m afraid Koretz also answers the question of why schools mostly refused to change course in a meaningful manner. Many educators have only taught in an age of data-driven reform. Many have never wrked in an environment where holistic teaching and learning was widely practiced. My hope is that a new generation of teachers, who experienced test-based pedagogies during their public school career will lead a meaningful transformation. Having experienced reform in schools where children were treated like test scores, the next generation of education are more likely to “Pay Attention to Other Important Stuff,” like our children.