@seawulf575 “though I’ve never seen one that skews WAY right.”
There are a few, but they never get cited and are not included in any of the poll aggregators or election models that I’m familiar with. So while they exist, they are entirely inconsequential. There aren’t a whole lot that skew way to the left, either. The problem is that some models include the ones that do. Again, I return to SurveyMonkey. Their polls are notoriously unreliable and skew way to the left, yet for some reason The New York Times takes them seriously.
“First off, they all said Hillary would win.”
I completely understand why you think this, but I don’t think it is accurate. The problem is twofold: (1) statistics don’t work they way that people think they work, and (2) a lot of media outlets promote mistaken ideas about how statistics work. So while I don’t think it is true that the polls said Clinton would win, I don’t think you are at fault for believing that they did say that. The blame for this misconception lies at the feet of the people presenting the statistical data, not the people it is presented to.
What I mean is this: most media outlets said that the polls were predicting a Clinton victory, but that’s not actually what they said if you dig into the numbers. The best analysis suggests that Clinton had about a 70% chance of winning and Trump had a 30% chance of winning. But things that have a 30% chance of happening actually happen all the time. There was only a 30% chance of rain in my area today, and it’s been pouring since 6:00 am.
Unfortunately, most media outlets didn’t have great election forecast models. They estimated that Clinton had a 90–99% chance of winning (which the numbers did not actually bear out), and they took that to mean she was definitely going to win (which obviously turned out to be wrong). But the whole point of giving something odds other than 0% or 100% is that there is uncertainty. Anyone who claims that something is definitely going to happen when there is still uncertainty is being irresponsible, and a lot of election forecast modelers were irresponsible in 2016.
If you have a good statistical model, it should be “wrong” about as often as it’s odds suggest. That is, if you have ten races with 70/30 odds, then the person who has the 30% shot should win in three out of those ten races. If the person with the 70% shot wins in all ten races, then you have a bad model. It should have given the winners better odds. But this isn’t how the data is usually presented. We’re told that a 70% chance is a prediction of victory rather than an assessment of probability.
If we take this into account, the 2016 polls were as accurate as polls in previous years. The failure was in some of the models (particularly the Upshot model created by The New York Times) and in a lot of the reporting. We cannot expect everyone to have a degree in statistics or even a strong background in the subject, however, so it is the responsibility of people presenting the data to contextualize it. That did not happen in 2016, and isn’t really happening in 2020 either.
“they say the guy that can only get barely 2% of the Republican votes would suddenly be the big favorite to win the general election. That doesn’t make any sense at all.”
I completely agree with you on this point. But again, this was more a failure of punditry than the actual numbers. Like you said, they were trying to sway public opinion instead of reporting public opinion. But it was the pundits, not the polls, that were doing this.
“And let’s face it…you can make polls say whatever you like. You get to choose the questions, you get to choose the population to test, you get to do all the manipulation you like.”
You can, and some pollsters do. But plenty of pollsters take great care to avoid manipulating polls to the extent that they can. I studied survey construction and statistical modeling in college (it was a big part of my minor). It’s not easy, and a lot of the problems that creep into polls are either unintentional or come from legacy problems (e.g., when people continue to use old questions with known flaws because they allow an apples-to-apples comparison with previous polls).
“Now the one I saw that I DID like was the one that asked the question if the people would be honest about who they were going to vote for. There was a huge number that said no. That question, by itself, threw doubt into every poll they had done.”
Well, it’s one way of calibrating the confidence interval. A lot of polls include questions that try to uncover how direct and truthful a respondent is being, though that is obviously a much more direct way of trying to figure it out. Whether and to what extent it casts the poll into doubt, however, depends at least in part on how the pollster attempts to compensate for untruthful respondents. The raw data of a poll tells us very little. A large part of both the art and the science is found in how the data is processed into something useful.