I knew in advance that things would be weird at this debate, symposium, whatever with William Sanders. About three years ago in my Phi Delta Kappan research column, I summarized an article critical of Sanders' value-added assessment model. Sanders visited upon me a two-hour telephone explanation-harangue of why I was wrong. He visited upon the editors a lonnnggg letter. On learning that I would be the other principal speaker, Sanders insisted that he have the last word. Otherwise, he wouldn't come. Can you accept these terms, my host at North Carolina State wanted to know. I said I could.
Sanders made a presentation which had virtually nothing to do with anything that he is known for, which is using value-added assessment (VAA) to determine which teachers are effective. I had been less than enamored with this from the beginning since his first model used off-the-shelf items from McGraw-Hill's CTBS. Wait, you're using norm-referenced test items to pass judgments on teachers? Oh, please. In this talk, though, he did not consider the background knowledge of the listeners, most of whom were teachers hearing about value-added for the first time and one could almost see the bullets of jargon zipping past their ears.
A value-added model tests students at the beginning of the year and at the end. The change in test scores over the year is the "value" that has been added. The question then becomes: how much of this added value does the teacher account for (as opposed to what is added by parents, community, etc.)?
My points were these:
VAA makes more sense than the current successive-cohorts system for determining AYP. It makes more sense to follow kids over time, although if the goal remains 100% proficiency the whole operation remains nuts.
VAA is circular: it defines effective teachers as those who raise test scores, then uses test score gains to determine who's an effective teacher.
Aside from Sanders, those working in VAA (Henry Braun, Howard Wainer, Dan McCaffrey, Dale Ballou, J. R. Lockwood, Haggai Kupermintz, from all of whom I had quotes) acknowledge that it cannot permit causal inferences about individual teachers. At best, it is a beginning step to identify teachers who might need additional professional development.
It is regressive in that it reinforces the idea that schools have teachers in boxes with 25 kids. Sanders claims his technique can deal with team-taught classes, but even if that is true, and he offered no data, it misses the dynamic of schools. As Kupermintz put it, "The TVAAS model represents teacher effects as independent, additive and linear. Educational communities that value collaborations, team teaching, interdisciplinary curricula and promote student autonomy and active participation may find [it of little use]. It regards teachers as independent actors and students as passive recipients of teacher 'effects'..." In fact, as class size gets smaller, the TVAAS makes it harder for a teacher to look outstanding or ineffectual.
Sanders' model improperly assumes that educational tests form equal-interval scales, but they do not and no amount of finagling with item response theory will fix that. On a thermometer, a true equal interval scale, the amount of heat needed to go from 10 degrees to 11 is the same as that needed to go from 110 to 111. On a test, it might require very different amounts of "achievement" to get from one point to another on different parts of the scale. Sanders believes that using NCE's cures this (ha). It presumes that the teacher "effect" persists -- like a diamond, it lasts undiminished forever. I'd like to run that by a few cognitive psychologists. It presumes that academic achievement is unidimensional.
And, perhaps most crucially, it presumes that students and teachers are randomly assigned to classes and overlooks that they are not. Many people choose a school by choosing where to live and within districts they sometimes choose a school other than the neighborhood school. Teachers with seniority get to choose what school or what classes they teach. They don't usually choose hard-to-teach kids. And parents exert pressure--here, parents kill to get their kids into Pat Welsh's high school writing classes. Big changes in test scores might well reflect these deviations from randomness as much as anything teachers do in their classrooms. Value-added models typically act as if this isn't important. It is.
Worst, even ignoring its failures, value-added might not give stable results. An article by J. R. Lockwood and others in the Spring, 2007 issue of the Journal of Educational Measurement finds that, using a test that tests mathematical procedures, they could generate a list of effective teachers. Using a test of math problem solving they could generate a list of effective teachers. But they weren't the same lists!
Value-added is currently being oversold. At the Battelle for Children website, one read, "Combining value-added analysis and improved high school assessments will lead to improved high school graduation rates, increased rigor in academic content, high college going rates less college remediation and increased teacher accountability." And how many validity studies support these assertions?
Sanders' 15 minutes of last word was a rambling, illogical lecture of the type a father might visit on a prodigal. The sponsors were embarrassed, the audience was pissed. At the reception that followed, for a while Sanders sorta took over a group I was talking with and I concluded that Sanders has an extremely limited yet extremely rigid idea of how schools work (his doctorate is in biostatistics and he worked with the Atomic Energy Commission and in agriculture until the late 80's), rejects any conclusion counter to his own and, in spite of his age, somewhere around 75, is as defensive as any novice.