The 41st presidential election featured the heavily favored Republican candidate, Governor Thomas E. Dewey, against the underdog, Harry S. Truman, the Democratic candidate. Three of the top polls, including the Gallop Poll, predicted that Dewey would be the winner by a significant margin. The polls were so trusted that the Chicago Daily Tribune printed its headline “Dewey defeats Truman” before the election results were fully tallied. Yet, when all the votes were counted, it turned out that Truman won 303 electoral votes and Dewey won 189. Truman also won 49.6% of the popular vote while Dewey won only 45.1%, and the rest is history.
Following a detailed analysis of the polling practices that led to the upset, the Social Science Research Council identified four reasons for this dismal prediction failure.
- The polling was terminated 2 weeks before the election,
- The “polling sample” was obtained by selecting names randomly from the telephone book. The pollsters failed to recognize that people who owned telephones in those days were among the more educated demographic, thereby skewing the sample.
- The statistical model assumed that undecided voters would vote for candidates in the same proportion as the decided voters.
- The survey determined voter candidate preference but did not measure the likelihood that the voter would actually cast a vote.
Problems 2 and 4 continue to plague pollsters as they attempt predictions in the upcoming presidential election. Although sampling techniques are more refined today than in the past, achieving a random sample that is representative of the entire population is extremely difficult, because it requires numerous assumptions about the demographics. Which candidate will the Hispanic vote, which represents 20% of the U.S. population, favor? Will this be different for males and females? Does it depend on the level of education of the voter?
A random sample, if closely representative of the entire population, can be an excellent predictor of the behavior of an entire population. For example, the Nielsen ratings that assess the television viewing preferences of more than 140 million American households are based on a survey of merely 25,000 households, i.e., about 0.0002 of the households. (This will be increased to 41,000 households in January 2025 to gain more precision.) The fact that many assumptions are required in establishing a random sample that mirrors the entire population of voters seriously undermines the precision of any prediction. Furthermore, polls involving electoral preferences have not been able to measure with any degree of accuracy, the likelihood that a voter who prefers a particular candidate will get off the couch and vote.
Adding to this uncertainty is the fact that many people in the southeast states of Georgia, the Carolinas, Alabama and Florida have been devastated by hurricane Helene and Milton and may not be able to get access to a voting station or may be swamped with bigger problems that prevent them from voting. There is also the possibility that one of the candidates could “misspeak,” and cause a major swing among undecided voters.
When the pollsters talk about “the margin of error,” they are basing those estimates on assumptions about the statistical distribution of the errors, again relying on somewhat suspect assumptions. So Mark Barabak’s assertion about an “educated guess” is not far off the mark, when applied to presidential elections. Of course, when statistical predictions are based on random samples that involve fewer assumptions (such as the Nielsen ratings) they can be remarkably reliable.