We Analyzed 40 Years Of Primary Polls. Even Early On, They’re Fairly Predictive.

Over the past few weeks, FiveThirtyEight has explored who led in early primary polls of presidential cycles from 1972 to 2016 and who went on to win the nomination. And what we’ve seen is that national surveys conducted in the year before a presidential primary are relatively good indicators of which candidates will advance to the general election, especially when polling averages are adjusted to reflect how well known each candidate was. Now, in the third and final part of our series, we are going to analyze 40-plus years of polls to better understand their predictive power.

There are a number of ways to tackle this question, but one relatively easy way to see how predictive early polls are is to compare a candidate’s polling average1 to their eventual share of the national primary vote. And we found that as a candidate’s polling average increased, their vote share in the primaries also tended to increase. In the chart below, for the calendar year before the primaries began, we averaged each candidate’s polls in the first half of the year (January through June) and in the second half of the year (July through December), and then plotted those two averages against the share of votes each person won in the next year’s primaries, for every competitive nomination process from 1972 to 2016. The correlation is pretty strong for both halves of the year,2 though polls from the second half of the year matched the outcomes a little better, which is not surprising — after all, those polls were conducted closer to the start of primary season.

But it’s easier to see trends if we group some candidates together rather than looking at them all individually, so let’s sort candidates into six big buckets based on their polling average. That clearly shows us that candidates with higher polling averages were also more likely to win higher shares of the primary vote and, therefore, the nomination. Those polling at 35 percent or higher rarely lost the nomination, regardless of whether they attained those heights in the first or second half of the year. They also, on average, won more than half the national primary vote. But those polling below 20 percent in either the first half or second half of the year had at best a 1-in-10 chance of clinching the nomination, and they rarely won a sizable chunk of the popular vote.

High polling averages foreshadowed lots of primary votes

Candidates’ share of the national primary vote by average polling level in the first half of the year before the presidential primaries and polling average in the second half of that year, 1972-2016

First half Second half
Poll Avg. Share who became nominee Avg. Primary Vote share Share who became nominee Avg. Primary Vote share
35%+ 75% 57% 83% 57%
20%-35% 36 27 25 25
10%-20% 9 8 9 12
5%-10% 3 7 10 10
2%-5% 5 5 0 4
Under 2% 1 2 1 1

We included everyone we had polling data for, no matter how likely or unlikely they were to run. If a candidate didn’t run or dropped out before voting began, they were counted as winning zero percent of the primary vote.

Sources: POLLS, CQ Roll call, DAVE LEIP’s atlas of u.s. presidential elections

We can also take these polling averages and estimate the probability of a candidate winning a party’s nomination using a logistic regression. And as you can see, candidates polling above 20 percent — whether it’s in the first half of the year (the orange line) or the second half (purple line) — have a higher probability of winning the nomination. In fact, the results for the first and second half of the year are nearly identical — in the second half of the year, candidates with the same polling average had a slightly lower win probability, but we’re talking about a maximum difference of less than 4 percentage points.3 There are certainly more sophisticated ways one could look at this data, but even these simple methods can show that polls conducted this far out in the primary season still have a reasonable amount of predictive power.

We can go a step further and improve our analysis by accounting for a candidate’s level of name recognition.4 In previous installments of this series, we rated candidates’ fame on a five-tier scale,5 and this time we’re using those previous rankings to split up our polling data into two roughly equal groups — candidates with high name recognition6 and those with low name recognition.7 This gives us a broader understanding of whether being well known influenced a candidate’s chances of winning the nomination. (We also limited this part of our analysis to just the first half of the year to see what role name recognition played very early in the cycle.)

And as you can see, well-known candidates who polled in the double digits tended to win a higher share of the primary vote. But candidates who had high name recognition while only polling in the single digits were generally in trouble. Of the 84 highly recognized candidates who polled below 10 percent in surveys from the first half of the year before the primaries, only President Trump went on to win his party’s nomination. And Trump was an unusual case — Republicans started out with strongly negative views of him but quickly changed their tune even though they were already familiar with him. Meanwhile, candidates with lower name recognition in the first half of the year only occasionally advanced to the general election, and in each case, it was on the Democratic side — George McGovern in 1972, Jimmy Carter in 1976, Michael Dukakis in 1988 and Bill Clinton in 1992.

Name recognition makes a big difference

Candidates’ share of the national primary vote by average polling level in the first half of the year before the presidential primaries and whether they had high or low name recognition, 1972-2016

High name recognition Low name recognition
Poll Avg. Share who became nominee Avg. Primary Vote share Share who became nominee Avg. Primary Vote share
35%+ 75% 57%
20%-35% 36 27
10%-20% 9 8
5%-10% 0 4 14% 19%
2%-5% 5 3 5 6
Under 2% 0 0 2 2

We included everyone we had polling data for, no matter how likely or unlikely they were to run. If a candidate didn’t run or dropped out before voting began, they were counted as winning zero percent of the primary vote.

Sources: POLLS, CQ Roll call, DAVE LEIP’s atlas of u.s. presidential elections

In fact, we can use a logistic regression to estimate a high- and low-name-recognition candidate’s chance of winning the nomination based on their polling average (much like we did above, but last time we didn’t sort candidates into categories based on name recognition). And as you can see in the chart below, a low-name-recognition candidate didn’t stand much of a chance of winning unless they were able to climb past 10 percent in the polls in the first half of the year before the primaries. If they were able to hit that mark, then their odds of winning were slightly less than 1 in 4, which put them ahead of a high-name-recognition candidate polling at the same level.

Intuitively, this makes sense — relatively few unknown candidates could poll as high as 10 percent this far out in the election cycle. But for those who could get that much support even though only a small share of people knew about them, their polling numbers signaled a great deal of potential. Take Dukakis in the 1988 cycle: His polling average was about 8 percent in the first half of 1987, and we estimated that his average name recognition was somewhere around 20 percent. Not a bad polling average when you consider that most respondents didn’t know who he was.

In other words, a candidate’s adjusted polling average — polling average divided by name recognition, which we delved into at length in the first two parts of this series — is a decent proxy for teasing out the strength of a candidate, especially early in the election cycle. By accounting for how well known a candidate is, we can get a better read on the field in front of us, including here in the 2020 election cycle. As primary season draws nearer, we’ll be keeping an eye on any candidates with low name recognition who still manage to win a significant chunk of support in the polls.