The HuffPost presidential forecast model uses state and national polls from the HuffPost Pollster database to predict who will become the next commander in chief of the U.S. The process averages the polls, estimates the probable outcome in each state and then calculates the likelihood that the leading candidate wins more than 270 electoral votes on Nov. 8.
Step 1: Poll Averaging
We estimate the probability of a win in each state’s presidential race by using Pollster’s Bayesian Kalman filter model to average publicly available polls in the HuffPost Pollster database. Briefly, Kalman filter models combine “noisy” data ― which is not completely precise ― into a single estimate of the underlying “signal” ― that is, what’s actually happening. For HuffPost, that means the model looks for trends in the polls and produces its best estimate of the polling average.
That model runs 100,000 simulations of the polling data to find the most likely polling average. The model needs starting values and information that tells the simulations how to work. These are the “priors” for the model. Many Bayesian models ― including the Pollster averaging model as it’s implemented for our charts ― use “uninformed” priors that don’t affect the model or provide any background information.
However, we do use information from previous elections in these priors to make predictions in our presidential model. The model is predicting vote share proportions for each candidate, so we need information on how elections have turned out in the past. We use Cook Political Report’s ratings for current and past elections to create our priors.
The values for our 2016 priors are based on an analysis of state-level Cook presidential race ratings issued in July or August of the election years 2004 through 2012. We pooled all presidential races rated “toss-up” from 2004 to 2012 and calculated the average and standard deviation of the actual vote proportions for each candidate. Then we did the same calculations for races rated “solid Democrat,” “solid Republican,” “likely Democrat,” “likely Republican,” “lean Democrat” and “lean Republican” ― all of the different Cook ratings.
These priors start the simulations, and then polling data is incorporated to make the estimates more precise. The priors typically become inconsequential once the polling data is added, but the information is helpful if there aren’t very many polls. The model begins running simulations to calculate a candidate’s estimates on the first date of the first poll. It incorporates the polls available for each subsequent day, pulling in additional surveys as it continues toward the current date — at which time all of the polls meeting HuffPost’s criteria are being considered. Newer polls are more influential in a given day’s average than older polls, because older polls are inherently less reliable, more uncertain measures of the current state of the race.
In states where there are fewer than five available polls in 2016 or fewer than two polls since July 2016, we use Cook Political Report ratings to estimate where the race stands.
Step 2: Estimating The Probable Election Day Outcome For Each State
The individual state probabilities are produced relatively simply from the state’s poll average and how undecideds in the state might affect the outcome.
The HuffPost Pollster charts stop on the current date, but for this forecast we run the simulations out to Election Day, Nov. 8. Since we don’t have polling data for the future, the model assumes voter intentions generally continue along their current trajectories. But without new data, the outcomes of the races get less certain as time goes on, meaning the probability of a candidate winning goes down.
We also incorporate how undecided voters might affect the outcome. At the state level, we assume that a third of the undecided voters won’t vote, but the other two-thirds might. So we add two-thirds of today’s undecided proportion to the state’s uncertainty by increasing the margin of error.
Step 3: Estimating The Electoral College Outcome
Finally, we simulate the election 10 million times using the state-by-state averages plus information from national polls and correlations between the states.
For the national election simulation, we assume (again) that one-third of undecided voters won’t vote. But instead of just adding in the undecideds from each state, we consider the undecideds at the national level as well. We add one-third of the state-level undecideds and one-third of the national undecideds to the state’s margin of error. National values are calculated using the same poll-averaging model to average polls from the most accurate pollsters in 2012.
We use the national numbers to increase uncertainty since there’s a possibility the polls will be wrong. In past presidential elections, polls have been off from the actual results by around 3-4 percentage points, but the error can vary significantly by year. So instead of using past values, we use variance in this year’s national polls to quantify by how much the polls might miss the election results. This assumes that the error in polls is correlated with how widely the polls vary leading up to Election Day ― which it typically is, according to comparisons of past presidential polls and outcomes. We add the variance of the national average into the variance for the electoral vote counts.
With these adjustments in place, we simulate the election using Monte Carlo simulations. In a typical simulation, the computer would pick a random number representing one possible outcome of the race, then compare that number to the probability of Hillary Clinton winning in that state. Then a different random number would be selected for the next state.
However, we know state outcomes aren’t independent in a national election. For example, what happens in Florida is very likely affected by what happens in North Carolina, for example. To account for this, first we calculated the expected correlations among the state-by-state averages by finding the correlations of the Democratic vote shares in each state from 1932 to 2012. Then, instead of generating independent random numbers for each state, we tell the model to generate correlated random numbers that follow the historical pattern. This process forces the state outcomes to be correlated, but still reflect what the polling in that state says is happening.
The correlated numbers generated for each state are compared to the probability of Clinton winning in that state. If the number is lower than or the same as the probability, that “spin” counts as a Clinton win and awards that state’s electoral votes to her. Otherwise it’s a Trump win, and he gets the electoral votes. For example, if Clinton has a 35 percent chance of winning in Florida according to the model, a random number from 0 to 35 would award Florida’s 29 electoral votes to Clinton, but a number from 36 to 100 would give them to Trump.
The model does these calculations across all the states, then adds up the number of electoral votes for each candidate in each election simulation. The proportion of times Clinton wins 270 or more electoral votes is the probability Clinton becomes president. The proportion of times Trump wins 270 or more electoral votes is the probability Trump becomes president. The probability of a tie, which would throw the election to the House of Representatives, is the proportion of times Clinton and Trump both receive 269 electoral votes.
The full forecast can be viewed here.