Blairvoyance and CANTOR

PoliStat | Nov. 3, 2018, 5:52 p.m.


Introduction

    Nate Silver’s 538 uses a system called CANTOR “to infer what polling would say in unpolled or lightly polled districts, given what it says in similar districts.” To calculate, CANTOR uses the k-nearest neighbors algorithm, a supervised machine learning algorithm that uses demographic, geographical, and political statistics from each district. CANTOR also considers national trends-- for example, if Republican incumbents are doing poorly in polled districts, then this will generalize to unpolled districts. 538 does not provide more details, but the 10 most similar districts are given.

    Blairvoyance, on the other hand, does not consider similarity scores. For each inputted demographic, there is a dimension that measures it. The result is a 59-dimensional space, with 1 dimension for the poll result. A piecewise polynomial is fitted through this using the RBF kernel, which is basically a measurement of distance. Blairvoyance then takes the shortest distance between this polynomial and the unpolled district. This unpolled district is represented by a vertical line through the poll result dimension. The result is a point on this polynomial, which corresponds to a “poll result”. The weight of this result decreases as the district’s partisanship increases. This is because polls are biased towards close races, and therefore Blairvoyance does not apply as much to partisan districts.

    Relying on the fundamentals alone ignores any recent events that may affect specific races. Therefore, the purpose of these systems is to allow the model to focus on recent events, as they take polling results, which measure the current opinions. In this blog post, we will analyze and criticize both CANTOR and Blairvoyance.


An inherent flaw in comparing demographically similar districts

    The first problem is that although districts may be similar, their candidates may be significantly different. Perhaps they are long-time incumbents, or they are a low-effort candidate who fails to do any fundraising. An example is Oregon’s 5th district, where Kurt Schrader, incumbent since 2010, is running against Mark Callahan, who has never held elected office. Furthermore, Schrader has fundraised 27 times more than Callahan (a mere $20,000). Somehow, CANTOR labels this district as very similar to Washington’s 3rd district, where challenger Carolyn Long raised even more money ($2,058,000) than the incumbent, Jaime Herrera Beutler.


Are CANTOR’s predictions reasonable?

    An additional problem is that 538 is untransparent about CANTOR. Therefore, we cannot possibly see how CANTOR comes up with certain predictions based off its input data. We selected 77 close races by using Ballotpedia’s list of battleground districts. 538 provides the top 10 similar districts, with their similarity scores and adjusted poll ratings. In 15 of these districts, any function of the top 10 districts can not possibly create the CANTOR rating.

    Consider Pennsylvania’s 6th. CANTOR somehow gives this district a score of D+19.1. None of the ten most similar districts come close to as Democratic as this number. The only conceivable way this happens is if there are many extremely Democratic districts farther down the similarity list. This would not be good either, because that would mean that CANTOR is considering too many districts. These districts are dissimilar enough that they should not be considered for CANTOR.




    The table below shows the districts where CANTOR’s prediction cannot conceivably be the weighted average of the top 10 districts. The “Diff” column is the difference between CANTOR’s prediction and a linearly weighted (by similarity score) average of the top 10 districts. The dark red row shows a district where CANTOR’s prediction is mathematically possible, but 16.8% away from our calculated weighted average.




    An extreme example is New York’s 15th, the most democratic district in the country. CANTOR predicts D+92.4 even though it’s impossible to come up with this as a weighted average of any polled districts. We therefore conclude that CANTOR must be doing something behind the scenes that we are not aware of.


Perhaps CANTOR takes the similar districts’ overall rating, not just their adjusted poll rating?

    Consider Florida 26 and Florida 27, which are very similar demographically and therefore affect each other through CANTOR. Each district’s polling average affects the other’s overall prediction. Thus, if we suppose that CANTOR makes Florida 26 use Florida 27’s overall rating and vice versa, then this would result in an infinite recursion. So, this is not possible.


How much weight is given to CANTOR for each district’s prediction?

    For the Lite model, 538 solely uses CANTOR when polls are not available, while in heavily polled districts such as Kentucky’s 6th, CANTOR has no weight at all. The Lite model therefore relies heavily on CANTOR overall, since a majority of districts are unpolled. For both the Classic and Deluxe model, CANTOR’s weight varies from between 3% (lots of polls) to 10% (no polls). This weight is fairly insignificant; the fundamentals and polls are usually weighted much more heavily than CANTOR. Additionally, the Classic model is consistently more accurate than the Lite model-- 95.4% vs. 94.2%, considering results since 1998. This implies that either the fundamentals are more accurate than CANTOR, or that CANTOR has a detrimental effect on the prediction’s accuracy.


Blairvoyance’s inaccuracy in partisan districts

    A core problem with Blairvoyance is that it tends to predict the margin is close to 0. This is because pollsters are more interested in close races, and so the Blairvoyance input data consists of the less partisan districts. Additionally, any sort of average is likely to produce results closer to the middle. Thus, Blairvoyance can not produce very partisan output.

    About 320 of the 435 districts can be described as “safe” for either the Democrats or Republicans, meaning that the probability of winning is greater than 95% for one of the candidates. This corresponds to a margin of about 18.4% (or -18.4%). Yet, Blairvoyance predicted, on November 1st, that 54 districts have a margin greater than 18.4% (or less than -18.4%). Blairvoyance is therefore wildly inaccurate when used for partisan districts, as it predicts the races as being closer than they actually are.


Blairvoyance slightly favoring the Republicans

    Another issue is that Blairvoyance is skewed towards the Republicans. The mean margin is -0.049, and the standard deviation is 0.1223; a 1 sample T-test shows that Blairvoyance is definitely lopsided (P-value = -8.35). The histogram also shows that a disproportionately high number of districts are predicted as being close races. (An accurate histogram of the margins should be bimodal.)




Blairvoyance’s weighting scheme

    To solve this issue, Blairvoyance has a much lower weight in partisan districts. Suppose that the weight of a 1-day-old A poll is \(w_a\), and bigmood is the prediction based off the fundamentals and the generic ballot shift. Then, the weight is \(w_a \times (1 - 2|\text{bigmood} - 0.5)\). However, this weight still seems too high. Consider a safe district with a margin of D+28, such as Oregon’s 1st. The fundamentals therefore predict about 64%, and so the poll weight is \(w_a \times 0.72\), which is still significant-- its overall weight in Oregon 1st’s prediction is about 25%. Compare to CANTOR’s classic model, which gives CANTOR a weight between 3% to 10%.


Blairvoyance and battleground districts

    Let us consider the districts where bigmood is between 40% and 60%. This is where Blairvoyance is weighted heavily, i.e. more than 0.8 * wa. The average difference between Blairvoyance and bigmood for these districts is 4.2%, which is quite small. So, perhaps Blairvoyance is helpful in fine-tuning the predictions for swing districts.


Blairvoyance and the national prediction

    The overall effect of Blairvoyance includes lowering the standard deviation of the prediction, shifting all predictions closer to 50%, as well as a slight Republican bias. Since a lower standard deviation makes us more certain of the result, while predictions closer to the center make us less certain, these effects may cancel each other out.


Conclusion

    CANTOR’s main problem is a lack of transparency on where its numbers are coming from. This is worrying, considering that if 538's Lite model is used, CANTOR has 100% weight in districts without polling. Blairvoyance’s issue is that pollsters usually poll closer races, and therefore its predictions are skewed towards the center, making it inaccurate for solid D or R districts.


Sources

https://sites.berry.edu/vbissonnette/index/stats-homework/
https://projects.fivethirtyeight.com/2018-midterm-election-forecast/house/
https://fivethirtyeight.com/methodology/how-fivethirtyeights-house-and-senate-models-work/
https://ballotpedia.org/U.S._House_battlegrounds,_2018
http://www.centerforpolitics.org/crystalball/2018-house/