How big is one-tenth of one percent?
Tonight the five leading candidates in the race for a U.S. Senate seat in Louisiana will take the stage for a televised debate. Unlike the last debate, a sixth candidate will join them: the noted white supremacist David Duke.
In order to avoid the chaos of having two dozen candidates on stage, Raycom set a polling threshold of five percent in a recent Mason-Dixon Polling & Research poll to qualify for the debate. Duke hit 5.1 percent.
One-tenth of one percent…in one poll.
The decision to put Duke on the stage has sparked controversy on the campus of Dillard University, the historically black institution that serves as the site of the debate, and beyond. The university's president even called the poll 'rigged.'
Large candidate fields create obvious problems for debates. Just take a look at the debates in the Republican presidential primary. Debate sponsors, therefore, look for ways to cull the crowd to something more manageable. Often they turn to polls as a measure of viability.
This is a bad idea for at least three reasons.
Qualifying on one-tenth of one percent is no better than deciding based on a coin toss
When you have a margin of error of four points, as the Raycom/Mason Dixon poll does, one-tenth of one percent basically amounts to nothing.
The margin of error is how we measure the uncertainty that stems from the fact that surveys interview randomly-selected samples of voters rather than interviewing everyone who will actually vote in the election. The Raycom/Mason Dixon poll includes 625 respondents. If they had talked with 625 different respondents instead of the 625 particular respondents they actually interviewed, the results would have been a little different. On average, we would expect the results from all (quality) polls to come out to whatever the real level of support for Duke is among voters. But any one poll will differ somewhat from that value and from any other one poll. The margin of error gives us a sense of how much the results bounce around from sample to sample.
It does not really tell us about the likelihood that Duke actually has support from 5.1 percent of voters. But it does allow us to say what the chances are you would end up with 5.1 percent supporting Duke in a poll for any particular value for Duke’s actual level of support.
For example, let’s imagine Duke’s actual level of support was just a hair under the debate threshold at 4.9 percent. If Raycom knew that, then presumably they would not let him on stage. They do not know that, so they do a poll of 625 likely voters with a four point margin of error. What are the chances that someone who does not meet the threshold polls above the threshold in this case? It is basically a coin toss: 49 percent.
That’s the best case scenario for a candidate who falls below the threshold. Even when they are further from the threshold, there is still a decent shot the poll will put them over based on random chance alone.
We do not know the actual level of support for Duke, but a reasonable place to start is by looking at all the polls. Duke’s support in the Raycom/Mason Dixon poll is on the high end of what we’ve seen across the polls released in the last two months. Most polls have him at two or three percent. This does not mean something is wrong with the Raycom/Mason Dixon poll. Random sampling means you sometimes get outliers…and 5.1 percent is not even that far out of what we would expect to see from time to time.
As of this writing, the Huffington Post polling average has Duke at two points, and the FiveThirtyEight average has him at 3.8 points. They use different sets of polls and different methods of computing their averages, but they agree that Duke’s level of support is probably under five percent.
If Duke’s actual level of support among the electorate is two points, as the Huffington Post average suggests and which would fall below the Raycom threshold, then about one in every five polls should have him above five points based on random chance alone.
Similarly, if his actual level of support is 3.8 percent, as the FiveThirtyEight’s average suggests but still below the Raycom threshold, then we would expect about 44 percent of polls to have Duke above the Raycom threshold.
Based on the randomness of sampling, there is a very good chance that someone who is actually under the threshold would qualify for the debate.
By the same logic, a candidate whose actual level of support is above five percent could fall short of the threshold in any particular poll. Let’s take Rob Maness as an example. The polling averages have him under five points, but let’s imagine hypothetically that his actual support is just above the threshold for qualification. There would still be a 34 percent chance that he would show up below the threshold in this poll.
Pollsters make decisions that shape the outcomes
What I have written so far is a best case scenario in which the only reason a candidate’s result in a single poll might differ from his or her actual support is random error from sampling. There is much, much more.
Polling results do not come from on high untouched by human hands. Pollsters have to make real decisions about how to process raw data once collected. Raw data from a single sample could look demographically different from the population you care about due to random sampling and non-response. Perhaps a bit more whites than the typical electorate or a bit less female. Pollsters deal with this by weighting the data to a known demographic profile of their target population. The sample will also include a significant number of respondents who will not vote. Pollsters want to screen them out, which makes sense if you want to make a statement about what the electorate thinks.
These decisions about how to weight data and identify likely voters are crucial steps in the process. Good pollsters should take these steps, but there is no single best way to decide how exactly to do them. There are better and worse ways to be sure, but even high quality pollsters will make different decisions and – here’s the key part – those decisions shape the results.
Recently the New York Times Upshot provided raw data from a single presidential poll to five highly reputable pollsters or statisticians and asked them to make their own decisions about how to process the data. Despite starting with the same raw data and the apparent reasonableness of each group’s approach to processing the data, the five groups came up with five different sets of results from the poll. The results spanned from Clinton +4 to Trump +1.
Five points! This is not random sampling error that the margin of error is giving some clue about, this is a whole other source of variability called the design effect. Most poll coverage ignores it (in part because most pollsters ignore it).
The folks at Mason Dixon or Raycom made decisions about how to weight the data and screen for likely voters. We do not know exactly why they settled on the choices they made or how the results would differ if they made other (perhaps equally reasonable) choices about weighting and likely voter screening.
However, we can be pretty sure their decisions amount to more than one tenth of one percent.
Polls – especially polls of primaries in Louisiana – often miss by more than one tenth of one percent
Finally, polls in Louisiana’s primaries for U.S. Senate and governor elections tend to miss the final result by a bit more than the typical poll in other states. For some reason our primaries are just a bit more difficult to poll accurately. I have written about this before.
At the end of the day, all these issues mean the decision to put one candidate on stage and leave another off because of where they fall in relation to the threshold is an arbitrary decision that leaves a lot to randomness and decisions of pollsters.
None of this means that the Raycom/Mason Dixon poll is a ‘bad’ poll. But that’s exactly the point. Even if the poll was conducted exceptionally well and the pollster made reasonable choices about weighting and likely voter screening, it would remain incredibly naïve to use it in this way for the reasons I just laid out.
A poll doesn’t have to be ‘rigged’ for this to be a bad way to use it.