Evaluating Evaluation

measuring tape of the tailor...
measuring tape of the tailor...

When discussing the current flock of evaluation tools, both for teachers and for schools, the defense always seems to work back around to this:

How are we going to know which teachers and schools are doing well or not? It's better than nothing, and we have to do something. 

It's a specious argument. If I'm on a sidewalk bleeding and broken and a random stranger approaches with a running chainsaw offering to chop away, I am not going to say, "Sure, go ahead. It'll be better than nothing." Very often things that are "better than nothing," are, in fact, worse than nothing, and I would argue that VAM, for instance, is as destructive as any sidewalk chainsaw medical practice. And it's worth noting that reformsters never use the "better than nothing" argument to justify leaving an ineffective teacher or school in place.

That said, we cannot simply go to a system in which the taxpayers insert their money into a big, black box marked "School" and trust that everything inside the box is hunky dory (a system used in public schools decade ago and in charter schools currently). "Take our word for it," is not an accountability system.

I have some ideas about how to run an evaluation system, but rather than push those today (still saving them for when I retire to start my million-dollar consulting business), let's do something else. We know there are lots of ideas out there about how to evaluate schools and teachers. How can we tell the good plans from the bad plans? What are the characteristics of a good evaluation system? How can we evaluate the evaluation?

Here are the traits that are essential for a useful, viable, good evaluation system.

Give the community voice

One of the challenges of evaluation schools and teachers is that we have about seventy-eight gadzillion ideas about what schools and teachers are supposed to be doing. Teachers often find themselves in the position of a person who thought she was hired to bake pies and finds herself in trouble for having ugly upholstery on her couch.

When it comes to the question of what the schools and teachers are supposed to be doing, the primary voice that must be heard is the voice of the community that the school serves. In other words, any evaluation system that involves outside folks or government officials coming into a community and saying, "Be quiet. We will tell you what your schools should be doing" is a crappy system.

Give a survey, form a committee, do regular outreach, put community leaders in positions of power-- whatever you do, your evaluation system must be based on the priorities set and chosen by all the members of the community. If some guy in the state capital or a policy making group thinks those priorities are "wrong," that's tough. Welcome to democracy.

Embrace the chaos

Somewhere between the deeply sainted Mrs. McAwesometeach and the widely loathed Mrs. O'Suxbuckets are many teachers living in a greyer area (and, honestly, you can still find students who hated the former and who loved the latter). Teacher and school performance vary over time and over student, and the complex constellation of skills involved in teacher guarantee that there are a million different ways to be good at the job.

Any system that draws a hard, bright line between the effective and the ineffective is a crappy system, because no such line can be drawn. Can we find individuals on both extremes on which we can have clear agreement? Probably. But any system that assumes that we can clearly and decisively sort every single school and teacher is a fool's game.

This includes systems that try to distribute teachers or schools on a bell curve. The bell curve guarantees that, even if all the teachers in a school are awesome, some of them will be labeled "sucky" by the system. This also includes any system that tries to reduce teacher ratings to a single score and then tries to create a cut-off line for those scores.

I know some of you want solid hard numberfied data on schools and teachers. You can't have it. You just can't. You can't go through your neighborhood and give each couple a hard data numerical rating of their marriage, and you can't give each of your children a hard data numerical rating on how swell they are and rank their family standing accordingly.

The fundamental basis of education is relationships. You can roughly sort into "probably keepers," "probably not keepers," and "somewhere in the middle." The more precise you attempt to make your system, the more mistakes your system is going to make and the more your system is going to warp and twist and generally screw up your school.

Neither carrots nor sticks, but helping hands

The purpose of an evaluation system is to make the school better. Isn't it? I mean, was there some other purpose I'm missing? No? Good.

The stack-ranking, reward and punish systems completely abdicate systemic responsibility for improvement. They are bosses that say, "Hey, something needs to be fixed here. You, buddy!! You figure out how to fix this right now, or else" or "Hey, this needs to work better. I've got a fiver here for the first person who can get it fixed for me." In both cases, the system sloughs off all responsibility for analyzing, addressing or ameliorating any problems-- it just pushes all of that off on the evaluatees. It is the world's worst coach-- "Hey, you suck. Get back in there and suck less, somehow."

An evaluation system should produce actionable recommendations, and it should result in the necessary assistance to pursue those actions. It does not help to say to a school, "Hey, you are too poor. Be less poor, will you?" Find the problem, and get help for the problem from the appropriate source.

That means that a good evaluation system must also value--

Richness over granules

We've been saying it over and over, but it needs to be repeated until policy makers act as if they get it: a single poorly-constructed narrow standardized test of reading and math does not give us any useful any information about teacher performance.

When I was student teaching, my co-op worked with me daily, and my supervisor worked with me weekly. In my first year, that same supervisor worked with me monthly. He knew tons about me in the classroom, not just in terms of pedagogical techniquery, but how I interacted with different kinds of students in different sorts of situations. His knowledge of my skills (or lack thereof) was rich and deep, and resulted in direct coaching that was specifically tailored to me and my needs. It allowed  us to address what I needed as a teacher quickly and effectively.

Compare that to someone handing me (or my principal) a bunch of student scores and saying, in effect, "Your kids' scores last year weren't good enough. Make sure they're better this year."

Information that is rich, deep, and personal is the key to driving meaningful improvement and growth. We know that to be true for students; why would it not be true for teachers and schools?

"Multiple measures" are generally a dodge by the same folks who believe that only numbers count as information. Multiple observations. Walk-throughs. Student and alumni interviews and questionnaires. Peer review. Gather a ton of information-- not data, but information. And then, the hardest part.


All of that information has to be weighed, sown together, and judged by a live local human being.

I understand the desire to get human judgment out of the system. I am well aware that there are Jerks With Power out there, and that a big JWP can make an ungodly mess.

But you cannot create an unbiased system. You can't. Systems that are set in stone and automatically triggered by data points are just a faceless form of human bias, with every mechanized lever of the system an expression of the biases of the person who designed it (and who doesn't actually have to look anybody in the face when the system implements the designer's judgment).

You cannot take judgment out of the system. What you can do is take it out from behind the curtains and machinery that try to obscure it. What you can do is put it in the hands of professionals who understand their field well enough to put the work ahead of personal bias. What you can do is create a system where there are redundancies (many peoples' judgment is weighed) and counterbalances as simple as having to deliver your judgment face-to-face instead of through a digitized report from faceless software. What you can do is create a system where the people who exercise judgment have to own it.

Your goal is not a system devoid of human judgment, but a system where that judgment reflects the priorities of the community, the realities of the school, teachers and students, and the professionalism of the person making the call. Your goal is a culture of support and excellence and humanity; not one of data, punishment and fear.

Simple enough

Come up with a system that includes all of these features, and I think you may have something worthwhile, something that can actually help grow schools and teachers who are the best they can be. It will be far better than better than nothing.

Originally posted at Curmudgucation