For all the challenges facing the election polling industry, one of the most basic remains figuring out whether people actually vote. Pollsters hoping to call elections correctly need to include Americans who will show up at the ballot box and screen out voters who won't.
Those who probably won’t vote tend to favor different candidates than those who are more likely to. A new report from Pew Research shows that how pollsters determine which camp someone falls into can result in completely different predictions about who will win.
Recent elections show the hazards of erring toward either side. Pollsters including Gallup miscalled the 2012 election in large part because they didn't expect Obama's coalition of young and minority voters to turn out as strongly as that demographic did. In the 2014 midterms, some pollsters included too many Democrats and missed a Republican wave.
That 2014 miss prompted Pew Research to dig into its own polling data to figure out what caused their polls to underestimate Republican support. The organization found that one of the biggest factors was determining who the likely voters were. How pollsters made that call could change the estimate by up to 8 percentage points.
But there’s not much consensus on how to best identify likely voters. Pew’s newly-released report details how the same polling data on 2014 House races can shift from a 2-point Democratic lead to a 6-point Republican lead, depending on how likely voters are chosen. The data itself wasn’t biased toward either party; the differences are all in the likely voter calculation.
In testing 14 commonly used ways to identify likely voters, Pew found that any attempt to screen for likely voters is an improvement over including all registered voters, or even all those who say that they intend to vote, "both of which include far too many people who ultimately will not cast a ballot," according to the report.
Not all methods were equally accurate, though. One major difference is between reaching interviewees through a method called "random digit dialing" -- which, as it sounds like, involves calling a list of randomly generated numbers -- and “registration-based sampling,” which pulls respondents from a database of registered voters and includes information about their voting history. Methods including voter file data tended to produce the most accurate results.
Pew notes that while there's been more blurring of the lines in recent years, the public polls released by media and university pollsters tend to use random digit dialing, then “narrowing down to prospective voters by asking people a series of questions that gauge interest in the election, past voting behavior and intention to vote." Campaign pollsters, who tend to work on behalf of private clients, tend to use the registration-based methods, looking at information in the databases on their interviewees' past voting behavior, rather than relying on respondents to tell them about their voting history.
The bad news is that there’s no guarantee the same likely voter decisions that worked best in 2014 will work best in 2016 or other future elections. Pew’s conclusion is that choosing likely voters “is likely to continue to vex pollsters, especially if no official record of past voting is available as an input to the models.”