My fantasy about what the term ”ordinary least squares” might mean to someone who just happens across it
For me, when I think just about the words, it sounds like a kind of oxymoron about a person. Who is the least square? Of those who are least square, who is the most ordinary? Somehow I picture a guy from the midwest with a short haircut who goes to church, but is secretly a jazz musician, or something like that.
Description of the Statistical term: Ordinary Least Squares
Ordinary Least Squares, or OLS, is the most common (ordinary) statistical method for trying to see if one thing is correlated with another, for example, if height is correlated with weight, or vice versa. You take a bunch of measurements of people’s height and weight. As you might predict, using your common sense, the taller a person is, the more they weigh. However, you don’t know the average (so-called “normal”) correlation between height and weight.
If you take your observations, and you plot them on a graph, you’ll see a trend where the height and weight of the smallest people is plotted on the lower left, and that of the tallest people is on the upper right. In general, it looks kind of like a cloud of points (I’ve provided a link to an example below).
What we want is an equation that relates the average relationship between height and weight. Essentially, this equation will draw an imaginary straight line through the cloud of observations (scroll down a little to figures 4.1 and 4.2 to see this “scatter plot”).
This imaginary line represents the best “fit” to the cloud of observations, explaining the relationship in a way that average of the sum of the squares of the distance from each point to this imaginary line is the smallest (the least) it can be. You square in order to deal with the problem of negative and positive distances (you don’t want them to cancel each other out—the square of a negative number is always a positive number).
Thus, “ordinary” least squares. You have to use different techniques when the relationship between various correlated factors is in the shape of a curve, or if there are other problems that make OLS give you results that don’t make sense, when you apply common sense. However, in most cases, OLS works the best, and that’s why it is called “ordinary.” There are other techniques for creating these imaginary lines that relate various factors to each other, such as Maximum Likelihood.
To create these lines, you have to perform an awful lot of addition, multiplication and division operations—and you have to guess at the relationship, and test all kinds of different formulas that describe the relationship. Computers are really good at this repetitive kind of task, so these days, we let computers figure this shit out for us, and we interpret the meaning of the resulting line.
So, this is my first attempt to explain a statistical concept, and I have no idea whether it makes sense to you or not. Please let me know. I find that a lot of people are afraid of statistics, partly because it uses these equations that have very strange symbols in them, and look completely un-understandable. The truth is that statistics is based on your intuitions about the relationship between things, and it is calculated by millions of simple operations (addition and multiplication and division) that you can do in your sleep.
Since it is intuitive, we often find scientists “proving” what we think are perfectly obvious things—such as the relationship between height and weight. However, sometimes the numbers don’t prove what our intuition says, such as the relationship between race and intelligence. One hundred years ago, people thought the relationship was obvious. Certain races were smarter than others. As it turns out, common knowledge was wrong. Intelligence has no relationship to race.