The allegations of fraud leveled by Daily Kos founder Markos (Kos) Moulitsas and the analysis of Mark Grebner, Michael Weissman and Jonathan Weissman are compelling and troubling. As Doug Rivers wrote here earlier today, they demonstrate that "something is seriously amiss" in the Research 2000 data. All of us that care about polling data need to consider the larger issues raised by their analysis and their allegations.
The most urgent question a lot of non-statisticians have been asking, how damning is the evidence? The short answer is that some of the patterns uncovered by Grebner, Weissman and Weissman have no obvious explanation consistent with what passes for standard survey practice (even given the generous mix of art and science at work in pre-election polling). They demand a more complete explanation.
Of the patterns uncovered by Grebner, et. al., the easiest to describe to non-statisticians -- and for my money the most inexplicable -- involves the strange matching pairs of odd or even numbers. They examined the many cross-tabulations of results among men and among women posted to Daily Kos. If the result for any given answer category among men (such as the percentage favorable) was an even number, the result among women was also an even number. If the result among men was an odd number, the result among women was also an odd number. They found that strange consistency of odd or even numbers in 776 of 778 pairs of results that they examined.
Put simply, there is virtually no possibility that this pattern occurred by chance. Your odds of winning $27 million in the Powerball lottery tonight are vastly greater. Some automated process created the pattern. What that process was, we do not know.
While there are many true statisticians that design samples and analyze survey data, very few do the kind of forensic data analysis that Grebner, Weissman and Weissman have presented. One true expert in this field who is universally respected, is University of Michigan Professor Walter Mebane (Disclosure: Mebane was my independent study advisor at Michigan 25 years ago). I emailed him last night for his reaction.
Mebane says he finds the evidence presented "convincing," though whether the polls are "fradulent" as Kos claims "is unclear...Could be some kind of smoothing algorithm is being used, either smoothing over time or toward some prior distribution."
When I asked about the specific patterns reported by Grebner, et. al., he replied:
None of these imply that no new data informed the numbers reported for each poll, but if there were new data for each poll the data seems to have been combined with some other information---which is not necessarily bad practice depending on the goal of the polling---and then jittered.
In other words, again, the strange patterns in the Research 2000 data suggest they were produced by some sort of weighting or statistical process, though it is unclear exactly what that process was.
As such, I want to echo the statement issued this morning by the National Council on Public Polls calling for "full disclosure of all relevant information" about the Research 2000 polls in question:
"Releasing this information will allow everyone to make a judgment based on the facts," [NCPP President Evans] Witt added. "Failure to release information leaves allegations unanswered and unanswerable."
In the absence of that disclosure, and unless and until the parties have their day in court, it is also important that we give the Grebner, Weissman and Weissman analysis the respect it deserves and subject it to a thorough "peer review" online. It is all too easy to use a blog to lob sensational accusations at suspicious characters, especially when those accusations are grounded in subjects that are "all but impossible for a lay-person to be able to investigate" unless "you have a degree in statistics" (to quote our colleagues at The Hotline earlier today).
The courts have discovery and cross-examination, academic journals have a slow process of anonymous review. Online, we provide such review through reader comments and deeper analysis posted by "peers" that critique work in something much closer to real time. Examples I've seen already include the comments earlier today by Doug Rivers and the blog post by David Shor. Grebner, et. al. have made a compelling case, but it is vital that we kick the tires on their work before leaping to conclusions. Remember, the truly "full disclosure" that a law suit's discovery process will certainly provide may take months or even years to occur.
We will all have more to say on this subject in the days ahead, but for the moment, I want to echo a point Josh Marshall made yesterday. Research 2000 was not the creation of Daily Kos, nor was it the product of a business model built on ignoring the mainstream media and disseminating data over the internet. "They've been around for some time," Marshall wrote yesterday, "and had developed a pretty solid reputation." Their clients included local television stations plus the following daily newspapers (according to the Research 2000 web site): The Bergen Record, The Raleigh News & Observer, The Concord Monitor, The Manchester Journal Inquirer, The New London Day, The Reno-Gazette, The Fort Lauderdale Sun-Sentinel, The Spokesman-Review, and The St. Louis Post-Dispatch
A colleague asked me yesterday about the "upshot of this situation, how bad is it going to be for the [polling] industry?" The answer depends on where the evidence leads us, of course, but the early implications are ominous. The polling industry cannot simply continue on a business-as-usual course. We must push for complete disclosure as a matter of routine and we need to develop better objective standards for what qualifies as a trustworthy poll.
PS: The Atlantic Wire's Max Fisher has a thorough summary of the first wave of online commentary on the DailyKos/Research 2000 controversy. I'd also recommend the short-but-sweet commentary from Washington Post pollster Jon Cohen:
However this dispute turns out, there's a new, blazing light on the rampant confusion about the right ways to judge poll quality. Saving the longer discussion, one thing is clear: to assess quality, one needs to know the facts. At this point, too little is currently known about the Daily Kos/Research 2000 poll to make definitive statements. (Research 2000 has a record of releasing more information than about their polling than some other prolific providers.)