A linear model is a statistical tool that describes the relationship between a variable you want to predict and one or more variables you think influence it, using an equation that combines those inputs in a straight-line fashion. It’s one of the most widely used frameworks in statistics, forming the backbone of techniques from simple regression to complex multi-variable analyses.
The Core Idea
Statisticians think of data as having two parts: a systematic component (the pattern) and a noise component (the randomness). A linear model captures the pattern by expressing an outcome as a combination of inputs, each multiplied by a weight that reflects how much influence that input has. Whatever the model can’t explain gets lumped into an error term, sometimes called the residual.
In the simplest case, you have one predictor variable and one outcome. Say you want to predict someone’s weight based on their height. The model looks like this: weight equals an intercept, plus a slope multiplied by height, plus some error. The intercept is the baseline value of your outcome when the predictor is zero. The slope tells you how much the outcome changes for every one-unit increase in the predictor. If the slope is 2.5, that means each additional inch of height is associated with 2.5 more pounds of weight, on average.
The error term captures everything the model doesn’t account for: genetics, diet, measurement imprecision, and countless other factors. It represents the gap between what the model predicts and what actually happens.
Simple vs. Multiple Regression
When your model has a single predictor, it’s called simple linear regression. When you add more predictors, it becomes multiple linear regression. The jump from one to two or more predictors is straightforward conceptually. Instead of one slope, you get several, each describing how its corresponding variable relates to the outcome while holding the others constant.
For example, you might predict a person’s blood pressure using their age, body weight, sodium intake, and exercise frequency. The model assigns a separate coefficient to each of those predictors, letting you isolate the individual contribution of each one. Everything you learn about the simple case (one predictor) extends to the multiple case with only minor modifications.
How the Best-Fit Line Is Found
The most common method for fitting a linear model is called ordinary least squares, or OLS. The idea is intuitive: draw a line through your data, then measure the vertical distance between each data point and the line. Square those distances (so negatives don’t cancel out positives), add them up, and find the line that makes that total as small as possible. That line is your “best-fitting line.”
OLS has a powerful mathematical property. When certain conditions are met, it produces the most precise estimates possible among all methods that use a linear combination of the data and don’t systematically over- or under-estimate the true values. In technical terms, it’s the “best linear unbiased estimator.” This is why OLS became the default approach for fitting linear models.
What “Linear” Actually Means
A common misconception is that a linear model can only describe straight-line relationships. In reality, “linear” refers to how the model handles its parameters (the coefficients), not necessarily the shape of the relationship. You can include squared terms, logarithms, or other transformations of your predictors and still have a linear model, as long as the coefficients combine in a straightforward additive way. A model predicting salary from years of experience and years-of-experience-squared is still linear in its parameters, even though it traces a curve on a graph.
Assumptions Behind the Model
Linear models work well when several conditions hold. The errors should average out to zero, meaning the model isn’t systematically biased in one direction. The errors should have roughly constant spread across all values of the predictors, a property called homoscedasticity. When you plot the residuals against predicted values, you want to see a flat, random scatter. If the residuals fan out or form a funnel shape, that signals the spread is changing, which can make your results unreliable.
The errors should also be independent of each other. Data collected over time, for instance, can violate this if today’s value is influenced by yesterday’s. Finally, for many of the standard tests to work properly, the errors should follow a roughly bell-shaped (normal) distribution. Mild violations of these assumptions are often tolerable, but severe violations can distort your conclusions.
Measuring How Well the Model Fits
The most common measure of model fit is R-squared, also called the coefficient of determination. It ranges from 0 to 1 and tells you what proportion of the variation in your outcome is explained by the predictors. An R-squared of 0.85 means the model accounts for 85% of the variation in the data, with the remaining 15% left unexplained.
If R-squared equals 1, every data point falls exactly on the predicted line, which almost never happens with real data. If R-squared equals 0, the predictors explain none of the variation, and the model’s predictions are no better than simply guessing the average. In practice, what counts as a “good” R-squared depends entirely on the field. In physics, you might expect values above 0.99. In social science research, 0.30 could be considered useful.
Testing Whether a Relationship Is Real
Finding a slope in your data doesn’t automatically mean the relationship is meaningful. It could be noise. To distinguish signal from noise, statisticians test whether each coefficient is significantly different from zero. The logic goes like this: if there were truly no relationship, how likely would you be to see a slope this large just by chance?
This is done by dividing the estimated coefficient by its standard error to get a test statistic, then calculating a p-value. A p-value below 0.05 is the conventional threshold for declaring statistical significance. In one example from Penn State, researchers testing whether age predicted a distance-related outcome found a test statistic of negative 7.09 and a p-value effectively equal to zero, providing strong evidence that the two variables were linearly related.
Beyond the Standard Linear Model
The standard linear model assumes the outcome is continuous and that errors follow a normal distribution. But many real-world outcomes don’t fit this mold. A yes/no outcome (did the patient recover?) or a count (how many accidents occurred?) requires a different approach.
This is where generalized linear models come in. They extend the linear framework by allowing different error distributions and by using a “link function” that transforms the outcome so the linear equation still works. For binary outcomes, logistic regression uses a logit link, which converts probabilities into a scale that can range from negative to positive infinity. For count data, Poisson regression uses a log link. In each case, the right-hand side of the equation stays linear in the parameters, but the left-hand side is transformed to match the type of data you’re working with. The standard linear model is actually a special case of this broader family, where the link function is simply the identity (no transformation needed) and the errors are normal.
Where Linear Models Show Up
Linear models are everywhere. In medicine, researchers use simple linear regression to test whether a particular eye measurement predicts errors in post-surgical vision outcomes. In economics, multiple regression helps isolate the effect of education on earnings after controlling for experience, location, and industry. In marketing, it quantifies how ad spending across different channels relates to sales revenue.
The appeal is partly interpretive. Each coefficient has a clear meaning: it’s the expected change in the outcome for a one-unit change in that predictor, holding everything else constant. Few statistical methods offer that combination of flexibility, mathematical rigor, and plain-English interpretability, which is why linear models remain a starting point for nearly every quantitative analysis.

