A Macroscopic View of Models

PoliStat | Nov. 3, 2018, 1:01 p.m.

Both ORACLE and FiveThirtyEight predict a similar chance for the Democrats to take over the House. But beyond that, what are the differences between the models’ predictions?

We sampled predictions from ORACLE and FiveThirtyEight’s Classic Model for October 30th, 2018. Further, we normalized FiveThirtyEight vote shares to only include the Republican and Democrat candidates so that it would be comparable to ORACLE’s model. Below are the twenty districts with the greatest vote share differences between forecasts.


CA-05: 

0.222626

CA-32: 

0.146246

CA-19: 

0.106554

OH-07: 

0.066015

KS-04: 

0.063636

WV-01: 

0.062063

IL-08: 

0.061607

VA-09: 

0.060538

IL-17: 

0.059798

IN-09: 

0.058255

CO-02: 

0.057170

IN-03: 

0.056988

MA-06: 

0.056951

IN-01: 

0.056602

MA-02: 

0.054880

CA-52: 

0.054259

IL-10: 

0.053880

GA-02: 

0.053605

PA-11: 

0.053488

CO-04: 

0.053068


Before anything else, we must filter anomalies in the data. FiveThirtyEight’s CA-05 forecast is incorrect; their raw CSV file predicts a vote share of 0.9998 for the Democrat, but the site shows .804. Additionally, certain districts have multiple candidates for a party running, most notably LA-01. We won’t consider these districts in the rest of our analysis.

Let’s first examine the distribution of vote share differences. Negative differences indicate that ORACLE predicts a higher vote share for Democrats, and vice versa.

As expected, the differences are centered around zero. The data are slightly skewed left; this implies that ORACLE is more variable than FiveThirtyEight when it predicts a lower vote share for Democrats. Beyond that, the other distinct feature is the spike in values at zero; this can be explained by the districts with a single candidate. Since they are not facing any challengers, both ORACLE and FiveThirtyEight can give a 1.0 vote share.

Next, let’s examine possible explanations for these vote share differences. Although FiveThirtyEight does not publish their model’s source code, they undoubtedly incorporate polling in their model. ORACLE’s model assigns more weight polling to districts with more polls; this also appears to be a part of FiveThirtyEight’s. This motivates us to examine how the number of polls in a district affects its difference in predictions. Each point represents a district.

There is definitely a correlation between number of polls and variance in vote share differences. The more polls used for a district, the closer the two models' predictions are. It is worth noting that ORACLE and FiveThirtyEight use the exact same polls. Although we have less data points at higher values, they visibly cluster closer to zero. So what does this mean? The other aspects of the models — prior elections, district fundamentals, etc. — seem to be more variable. As polling gains greater weight (more polls), the models’ overall predictions become more similar.