When We Say 70 Percent, It Really Means 70 Percent

One of FiveThirtyEight’s goals has always been to get people to think more carefully about probability. When we’re forecasting an upcoming election or sporting event, we’ll go to great lengths to analyze and explain the sources of real-world uncertainty and the extent to which events — say, a Senate race in Texas and another one in Florida — are correlated with one another. We’ll spend a lot of time working on how to build robust models that don’t suffer from p-hacking or overfitting and which will perform roughly as well when we’re making new predictions as when we’re backtesting them. There’s a lot of science in this, as well as a lot of art. We really care about the difference between a 60 percent chance and a 70 percent chance.

That’s not always how we’re judged, though. Both our fans and our critics sometimes look at our probabilistic forecasts as binary predictions. Not only might they not care about the difference between a 60 percent chance and a 70 percent chance, they sometimes treat a 55 percent chance the same way as a 95 percent one.

There are also frustrating moments related to the sheer number of forecasts that we put out — for instance, forecasts of hundreds of U.S. House races, or dozens of presidential primaries, or the thousands of NBA games in a typical season. If you want to make us look bad, you’ll have a lot of opportunities to do so because some — many, actually — of these forecasts will inevitably be “wrong.”

Sometimes, there are more sophisticated-seeming criticisms. “Sure, your forecasts are probabilistic,” people who think they’re very clever will say. “But all that means is that you can never be wrong. Even a 1 percent chance happens sometimes, after all. So what’s the point of it all?”

I don’t want to make it sound like we’ve had a rough go of things overall.1 But we do think it’s important that our forecasts are successful on their own terms — that is, in the way that we have always said they should be judged. That’s what our latest project — “How Good Are FiveThirtyEight Forecasts?” — is all about.

That way is principally via calibration. Calibration measures whether, over the long run, events occur about as often as you say they’re going to occur. For instance, of all the events that you forecast as having an 80 percent chance of happening, they should indeed occur about 80 out of 100 times; that’s good calibration. If these events happen only 60 out of 100 times, you have problems — your forecasts aren’t well-calibrated and are overconfident. But it’s just as bad if they occur 98 out of 100 times, in which case your forecasts are underconfident.

Calibration isn’t the only thing that matters when judging a forecast. Skilled forecasting also requires discrimination — that is, distinguishing relatively more likely events from relatively less likely ones. (If at the start of the 68-team NCAA men’s basketball tournament, you assigned each team a 1 in 68 chance of winning, your forecast would be well-calibrated, but it wouldn’t be a skillful forecast.) Personally, I also think it’s important how a forecast lines up relative to reasonable alternatives, e.g., how it compares with other models or the market price or the “conventional wisdom.” If you say there’s a 29 percent chance of event X occurring when everyone else says 10 percent or 2 percent or simply never really entertains X as a possibility, your forecast should probably get credit rather than blame if the event actually happens. But let’s leave that aside for now. (I’m not bitter or anything. OK, maybe I am.)

The catch about calibration is that it takes a fairly large sample size to measure it properly. If you have just 10 events that you say have an 80 percent chance of happening, you could pretty easily have them occur five out of 10 times or 10 out of 10 times as the result of chance alone. Once you get up to dozens or hundreds or thousands of events, these anomalies become much less likely.

But the thing is, FiveThirtyEight has made thousands of forecasts. We’ve been issuing forecasts of elections and sporting events for a long time — for more than 11 years, since the first version of the site was launched in March 2008. The interactive lists almost all of the probabilistic sports and election forecasts that we’ve designed and published since then. You can see how all our U.S. House forecasts have done, for example, or our men’s and women’s March Madness predictions. There are NFL games and of course presidential elections. There are a few important notes about the scope of what’s included in the footnotes,2 and for years before FiveThirtyEight was acquired by ESPN/Disney/ABC News (in 2013) — when our record-keeping wasn’t as good — we’ve sometimes had to rely on archived versions of the site if we couldn’t otherwise verify exactly what forecast was published at what time.

What you’ll find, though, is that our calibration has generally been very, very good. For instance, out of the 5,589 events (between sports and politics combined) that we said had a 70 chance of happening (rounded to the nearest 5 percent), they in fact occurred 71 percent of the time. Or of the 55,853 events3 that we said had about a 5 percent chance of occurring, they happened 4 percent of the time.

We did discover a handful of cases where we weren’t entirely satisfied with a model’s performance. For instance, our NBA game forecasts have historically been a bit overconfident in lopsided matchups — e.g., teams that were supposed to win 85 percent of the time in fact won only 79 percent of the time. These aren’t huge discrepancies, but given a large enough sample, some of them are on the threshold of being statistically significant. In the particular case of the NBA, we substantially redesigned our model before this season, so we’ll see how the new version does.4

Our forecasts of elections have actually been a little bit underconfident, historically. For instance, candidates who we said were supposed to win 75 percent of the time have won 83 percent of the time. These differences are generally not statistically significant, given that election outcomes are highly correlated and that we issue dozens of forecasts (one every day, and sometimes using several different versions of a model) for any given race. But we do think underconfidence can be a problem if replicated over a large enough sample, so it’s something we’ll keep an eye out for.

It’s just not true, though, that there have been an especially large number of upsets in politics relative to polls or forecasts (or at least not relative to FiveThirtyEight’s forecasts). In fact, there have been fewer upsets than our forecasts expected.

There’s a lot more to explore in the interactive, including Brier skill scores for each of our forecasts, which do account for discrimination as well as calibration. We’ll continue to update the interactive as elections or sporting events are completed.

None of this ought to mean that FiveThirtyEight or our forecasts — which are a relatively small part of what we do — are immune from criticism or that our models can’t be improved. We’re studying ways to improve all the time.

But we’ve been publishing forecasts for more than a decade now, and although we’ve sometimes tried to do an after-action report following a big election or sporting event, this is the first time we’ve studied all of our forecast models in a comprehensive way. So we were relieved to discover that our forecasts really do what they’re supposed to do. When we say something has a 70 percent chance of occurring, it doesn’t mean that it will always happen, and it isn’t supposed to. But empirically, 70 percent in a FiveThirtyEight forecast really does mean about 70 percent, 30 percent really does mean about 30 percent, 5 percent really does mean about 5 percent, and so forth. Our forecasts haven’t always been right, but they’ve been right just about as often as they’re supposed to be right.

How Bernie’s 2020 Map Might Change Without The #NeverHillary Vote

Bernie Sanders picked up support in some unusual places during his 2016 campaign to be the Democratic presidential nominee. The self-described democratic socialist won states such as Oklahoma and Nebraska that are typically associated with right-of-center policy views. He also did surprisingly well with self-described conservative voters — granted, a small-ish part9 of the Democratic primary electorate — picking up almost a third of their votes. Perhaps less surprisingly given that Sanders isn’t technically a Democrat, he performed really well with independent voters, winning them by roughly a 2:1 margin over Hillary Clinton.

So as Sanders launches his 2020 campaign as a candidate with both formidable strengths and serious challenges, his biggest problem might seem to be that there’s more competition for his base this time around, with Massachusetts Sen. Elizabeth Warren and others also competing for the leftmost part of the Democratic electorate. An equally big problem for Sanders, however, is that voters this time around have more alternatives to Hillary Clinton — left, right and center — to choose from.

Roughly one-quarter of Sanders’s support in Democratic primaries and caucuses in 2016 came from #NeverHillary voters: people who didn’t vote for Clinton in the 2016 general election and who had no intention of doing so. (The #NeverHillary label is a little snarky, but it’s also quite literal: These are people who never voted for Clinton despite being given two opportunities to do so, in the primary and the general election.) This finding comes from the Cooperative Congressional Election Study, a poll of more than 50,000 voters conducted by YouGov in conjunction with Harvard University. The CCES asked voters who they voted for in both the primaries and the general election; it also asked voters who didn’t vote in the general election who they would have chosen if they had voted. Here’s the overall breakdown of what Sanders primary voters did in November 2016.10

What Bernie Sanders primary voters did in November 2016
Voted for Hillary Clinton 74.3%
Voted for Donald Trump 12.0
Voted for Gary Johnson 3.2
Voted for Jill Stein 4.5
Voted for other candidates or voted but didn’t recall 2.5
Didn’t vote but said they would have voted for Clinton 1.6
Didn’t vote and didn’t say they would have voted Clinton 1.9

Voters in shaded categories are #NeverHillary voters.

Source: COOPERATIVE CONGRESSIONAL ELECTION STUDY

About 74 percent of Sanders’s primary voters also voted for Clinton in November 2016. Another 2 percent didn’t vote but said on the CCES that they would have voted for Clinton if they had voted; it doesn’t seem fair to consider them anti-Clinton voters, so we won’t include them in the #NeverHillary camp. The remaining 24 percent of Sanders voters were #NeverHillary in the general election, however. Of these, about half voted for Trump, while the remaining half voted for Gary Johnson, Jill Stein, another third-party candidate or didn’t vote.11

Overall, Sanders won 43 percent of the popular vote in Democratic primaries and caucuses in 2016. If 24 percent of that 43 percent were #NeverHillary voters, that means Sanders’s real base was more like 33 percent of the overall Democratic electorate. That isn’t nothing — it could easily carry the plurality in a divided field — and there were plenty of Clinton voters who liked Sanders, so he could pick up some of their votes too. But it does jibe with polls showing that Sanders and Warren together have around 30 percent of the Democratic primary electorate in 2020 and not the 43 percent that Sanders got in 2016.

You might be tempted to think that these #NeverHillary voters are leftists who thought Clinton was too much of pro-corporate, warmongering centrist. But relatively few of them were. Less than a fifth of them voted for Stein, for example. Instead, these voters were disproportionately likely to describe themselves as moderate or conservative. Among the 31 percent of self-described conservatives who voted for Sanders in the Democratic primaries, more than half were #NeverHillary voters, for example. A large minority of the independents and Republicans who supported Sanders were #NeverHillary voters as well.

#NeverHillary voters were conservative, not super liberal

The ideological and partisan breakdown of #NeverHillary voters in the 2016 Democratic primaries

Sanders Voters
Group Clinton Sanders Pro-Sanders** #NeverHillary
Very liberal 45.2% 54.6% 46.9% 7.7%
Liberal 55.6 43.7 39.4 4.3
Somewhat liberal 59.4 40.2 32.7 7.5
Middle-of-the-road 60.2 38.7 24.9 13.8
Conservative* 66.5 31.3 14.9 16.4
Sanders Voters
Group Clinton Sanders Pro-Sanders #NeverHillary
Democrats 66.2% 32.9% 28.8% 4.1%
Independents and Republicans 33.6 65.0 37.9 27.1

* Includes voters who described themselves as “conservative,” “somewhat conservative” or “very conservative.“
** Sanders voters who voted for Clinton in the general election or didn’t vote but said they would have voted for Clinton.

Source: COOPERATIVE CONGRESSIONAL ELECTION STUDY

A more complicated way to characterize the #NeverHillary vote is via regression analysis. Using the CCES — which permits fairly intricate regression model designs because of its large sample size — I took all of Sanders’s primary voters in 2016 and evaluated a host of variables to see what predicted whether they were #NeverHillary in the general election.

The most significant variables were, first, whether the voter was a Democrat, and second and third, two policy questions that have proven to be highly predictive of voter preferences in the past: whether the voter thinks that white people benefit from their race and whether the voter wanted to repeal the Affordable Care Act. Non-Democrats, voters who didn’t think whites benefited from their race, and voters who wanted to repeal the ACA were much more likely to be #NeverHillary voters. Voters who were rural, poor, who lived in the South or the Northeast, who were born-again Christians, who were conservatives, and who were military veterans were also somewhat more likely to be #NeverHillary, other factors held equal. Black people, Hispanics, women, liberals, millennials, union members and voters with four-year college degrees were less likely to be #NeverHillary voters.

In addition, some factors related to the primary calendar affected the #NeverHillary vote. After Trump won the Indiana primary, effectively wrapping up the Republican nomination, more anti-Clinton voters filtered into the Democratic primaries. And the #NeverHillary vote was lower in states where an open Republican primary or caucus was held on the same date as the Democratic one. This implies that a fair number of #NeverHillary voters would actually have prefered to vote in the Republican primary. But if they couldn’t, because the Republican primary was closed or wasn’t held on the same date, they voted in the Democratic primary (for Sanders or another Democrat and against Clinton) instead.

We can also evaluate the geographic breakdown of the #NeverHillary vote. In each state, we can estimate the anti-Clinton vote in two ways, either by directly measuring it (e.g., 19 percent of Sanders voters the CCES surveyed in Illinois were #NeverHillary) or through the regression technique that I used above (which is similar to an MRP analysis). Without getting too much into the weeds, I used a blend of the two methods in each state based on the sample size of Sanders voters there; the direct measurement is more reliable in states with a large sample, while the regression method is better in states with a smaller one. The table below shows where the largest share of Sanders voters (as well as voters who chose another Democratic candidate apart from Clinton and Sanders12) were anti-Clinton voters:

Sanders benefited from #NeverHillary voters in red states

The breakdown of Sanders and #NeverHillary voters in the 2016 Democratic primaries

#NeverHillary
State Sanders’s Share of pop. vote share of Sanders voters who were #NeverHillary voted sanders Other Total
Alaska 79.6% 49.8% 39.7% 0.1% 39.7%
W.Va. 51.4 45.2 23.2 7.1 30.4
Okla. 51.9 42.3 21.9 3.7 25.6
Vt. 86.0 28.3 24.3 0.2 24.5
Idaho 78.0 30.4 23.8 0.4 24.2
Neb. 57.1 42.0 24.0 0.0 24.0
Utah 79.2 29.6 23.4 0.3 23.7
Ky. 46.3 37.9 17.6 3.9 21.4
Ore. 56.2 32.1 18.1 1.0 19.0
R.I. 54.7 32.1 17.6 1.2 18.8
Mont. 51.6 31.8 16.4 2.4 18.8
N.D. 64.2 19.6 12.6 5.7 18.3
Hawaii 69.8 25.9 18.1 0.1 18.2
Maine 64.3 28.0 18.0 0.1 18.1
Kan. 67.7 26.4 17.9 0.0 17.9
N.H. 60.1 27.5 16.6 1.2 17.8
S.D. 49.0 34.8 17.1 0.0 17.1
Nev. 47.3 35.1 16.6 0.0 16.6
Del. 39.2 36.8 14.4 0.6 15.0
Wash. 72.7 19.3 14.0 0.1 14.1
Mo. 49.4 25.8 12.7 0.6 13.3
Md. 33.8 31.4 10.6 2.0 12.7
Mass. 48.5 24.4 11.8 0.9 12.7
La. 23.2 40.8 9.4 3.2 12.6
Calif. 46.0 24.2 11.1 0.5 11.6
Ind. 52.5 22.2 11.6 0.0 11.6
Mich. 49.7 21.1 10.5 1.2 11.6
Pa. 43.5 25.1 10.9 0.5 11.4
Ariz. 41.4 24.2 10.0 1.3 11.3
N.C. 40.9 20.9 8.5 2.6 11.1
Minn. 61.7 17.5 10.8 0.0 10.8
Wis. 56.6 18.6 10.5 0.2 10.7
Conn. 46.4 20.8 9.6 1.0 10.6
N.Y. 42.0 25.1 10.5 0.0 10.5
N.M. 48.5 20.8 10.1 0.0 10.1
Ark. 30.0 23.9 7.2 2.2 9.4
Ill. 48.6 18.4 8.9 0.5 9.4
Fla. 33.3 23.8 7.9 1.3 9.2
N.J. 36.6 24.2 8.8 0.1 9.0
Ohio 43.1 19.5 8.4 0.4 8.8
Tenn. 32.5 22.7 7.4 0.8 8.2
Iowa 49.6 15.4 7.6 0.3 8.0
S.C. 26.0 28.8 7.5 0.3 7.8
Va. 35.2 21.3 7.5 0.3 7.8
Colo. 59.0 11.7 6.9 0.4 7.3
Texas 33.2 19.0 6.3 0.9 7.2
Ala. 19.2 25.5 4.9 1.7 6.5
D.C. 20.8 28.0 5.8 0.4 6.2
Ga. 28.2 19.4 5.5 0.3 5.7
Wyo. 56.7 9.3 5.3 0.1 5.4
Miss. 16.6 14.8 2.5 0.5 3.0

Source: COOPERATIVE CONGRESSIONAL ELECTION STUDY

The largest number of #NeverHillary voters, as a share of the Democratic primary electorate, were in Alaska, West Virginia, Oklahoma, Vermont, Idaho, Nebraska, Utah and Kentucky. Other than in Vermont, where extreme loyalty to Sanders generated a large number of write-in votes for Sanders and other candidates in the general election, those are obviously really red and largely rural states. Apart from Kentucky, they were also all states won by Sanders in the primaries.

Although there may have been something of a market for a populist candidate in these states, it’s also likely that Sanders benefited from being the only alternative to Clinton. In fact, there are several states where the #NeverHillary vote pushed Sanders over the top and where the pro-Sanders vote alone wouldn’t have been enough for him to win. These are Indiana, Michigan, Montana, Nebraska, Oklahoma, Oregon, Rhode Island and West Virginia.

The good news for Sanders is that the states where he benefited the most from the #NeverHillary vote — especially in Appalachia and in the Interior West — have relatively low delegate tallies. So they’re places that he can potentially afford to lose. It does mean, however, that Sanders will have to hit his mark in his other strong regions, including New England (where Warren will provide fierce competition), the Upper Midwest (where Sen. Amy Klobuchar of Minnesota could create problems in her home state and Wisconsin) and the Pacific Northwest (where Sanders would prefer that candidates like Washington Gov. Jay Inslee and former Colorado Gov. John Hickenlooper not enter the race).

It also means that Sanders won’t just be competing against other progressives but also against relatively moderate candidates. If #NeverHillary voters from 2016 are again looking for an anti-establishment candidate, Sanders could still fit the bill. If they want a moderate instead, however, they’ll have a lot more choices than they did in 2016 in the form of candidates like Klobuchar and (if they enter the race) Joe Biden and Beto O’Rourke. It’s also possible that #NeverHillary voters were mostly motivated by sexism, in which case any of the male candidates could stand to benefit.

None of this dooms Sanders by any means. On balance, he probably benefits from a divided field, in fact, wherein his extremely loyal base gives him a high floor of support. But a multi-way race is way different than a two-way one, so Sanders’s coalition may not be all that similar to what we saw in 2016.


From ABC News: