How to Interpret Linear Regression Results

Interpreting linear regression means reading each piece of the output, from coefficients to p-values to residual plots, and understanding what it tells you about the relationship between your variables. A regression table can look intimidating, but every number answers a specific question: how strong is the relationship, is it real or random noise, and does the model actually fit the data well? Here’s how to read each part.

What the Coefficients Actually Tell You

The coefficient (often labeled “B” or “Estimate”) for each predictor tells you how much the average outcome changes when that predictor increases by one unit, holding everything else constant. The key word is “average.” A coefficient of 2.18 means that for each one-unit increase in X, the average value of Y increases by about 2.18 units. It does not mean every individual observation will shift by exactly that amount. This distinction matters because regression models the conditional mean of Y, not any single data point.

The intercept is the predicted average value of Y when all predictors equal zero. Sometimes this is meaningful (predicting test scores when study hours equal zero). Other times it’s nonsensical (predicting weight when height equals zero). In those cases, treat the intercept as a mathematical anchor for the line rather than something worth interpreting on its own.

Unstandardized vs. Standardized Coefficients

Unstandardized coefficients are in the original units of your variables: dollars, years, kilograms. They’re the most intuitive to explain. “Each additional year of education is associated with $3,200 more in annual income” is easy for anyone to grasp.

Standardized coefficients (sometimes called beta weights) convert everything to standard deviations. A standardized coefficient of 0.45 means a one-standard-deviation increase in that predictor is associated with a 0.45-standard-deviation change in the outcome. The appeal is comparison: if your predictors are measured in completely different units (years of schooling vs. IQ score, for example), standardized coefficients let you see which predictor has a stronger relative influence. A larger standardized coefficient generally indicates a stronger effect.

That said, standardized coefficients have real limitations. They depend on the variability of your specific sample, so comparing them across different studies or populations can be misleading. If standardizing genuinely helps you think about a predictor in terms of standard deviations, use it. Otherwise, stick with unstandardized coefficients for clearer communication.

Reading P-Values and T-Statistics

Each coefficient in your output comes with a t-statistic and a p-value. These test a simple question: is this coefficient meaningfully different from zero, or could random chance explain it? The t-statistic measures how many standard errors the coefficient sits away from zero. Larger t-values correspond to smaller p-values, which means stronger evidence that the relationship is real.

The standard threshold is a p-value below 0.05. If a coefficient’s p-value is 0.03, you can reject the idea that the true coefficient is zero and call it statistically significant. If the p-value is 0.42, the data don’t give you enough evidence to conclude that predictor has a real effect. Larger standardized coefficients tend to have larger t-values, so the predictors with the strongest effects are usually the most statistically significant as well.

One thing p-values don’t tell you is whether the effect is practically meaningful. A coefficient can be statistically significant but tiny in real-world terms, especially with large sample sizes. Always look at the size of the coefficient alongside its p-value.

How to Assess Model Fit With R-Squared

R-squared tells you the proportion of variation in your outcome variable that the model explains. An R-squared of 0.72 means your predictors collectively account for 72% of the variation in Y. The remaining 28% is unexplained, captured in the residuals. Higher is generally better, but what counts as “good” depends entirely on your field. In physics, you might expect R-squared above 0.95. In social science, 0.30 can be perfectly respectable.

The problem with plain R-squared is that it increases every time you add a predictor, even if that predictor is useless. Add a column of random numbers to your model and R-squared will still tick upward slightly. This makes it unreliable for comparing models with different numbers of predictors.

Adjusted R-squared fixes this by penalizing you for adding variables that don’t improve the model. It uses a formula that accounts for both the number of data points (N) and the number of predictors (K). If you add a useful predictor, adjusted R-squared goes up. If you add a useless one, it goes down. When you’re comparing models or deciding whether an extra variable belongs, adjusted R-squared is the better number to watch. It will always be less than or equal to regular R-squared.

Interpreting Categorical Predictors

When one of your predictors is a category (like gender, region, or treatment group), the software converts it into one or more binary variables. One category becomes the reference group, and every other category gets a coefficient representing the difference from that reference.

If your reference group is “control” and the coefficient for the “treatment” group is 4.7, that means the treatment group’s average outcome is 4.7 units higher than the control group’s, holding other variables constant. The intercept, in this setup, represents the predicted average for the reference group (when all other predictors equal zero).

Which group serves as the reference matters for how you read the output but doesn’t change the underlying model. If you’re confused about what the coefficients mean, check which category your software dropped as the baseline.

What Residual Plots Reveal

Residuals are the gaps between your model’s predictions and the actual data. Plotting them is the most direct way to check whether your model’s assumptions hold. You’re looking for specific patterns, and each one signals a different problem.

In a residuals vs. fitted values plot, you want to see points scattered randomly around a horizontal line at zero, with no obvious shape. If you see a curve or parabola, that suggests a non-linear relationship your model isn’t capturing. You may need a polynomial term or a transformation of one of your variables.

In a scale-location plot (sometimes called spread-location), you’re checking whether the spread of residuals stays roughly constant across all predicted values. If the residuals fan out, getting wider as the fitted values increase, that’s heteroscedasticity. It means your model’s predictions are more precise for some ranges of the outcome than others. This doesn’t bias your coefficients, but it does make your standard errors (and therefore your p-values) unreliable. Weighted regression or a transformation of the outcome variable can help.

Spotting Influential Data Points

A single unusual observation can drag your entire regression line in its direction. Two tools help you find these troublemakers.

Leverage measures how far a data point’s predictor values are from the center of the data. High-leverage points sit at the extremes of X. They have the potential to be influential but aren’t necessarily a problem if they follow the overall pattern.

Cook’s distance combines leverage with the size of the residual to measure how much the regression results would change if you removed that single point. The common guideline: a Cook’s distance greater than 0.5 deserves a closer look, and a value greater than 1 is quite likely to be genuinely influential. When you find a high-influence point, investigate it. Is it a data entry error? An unusual but valid case? Removing it blindly is bad practice, but understanding why it’s influential often reveals something important about your data.

Checking for Multicollinearity

Multicollinearity means two or more of your predictors are highly correlated with each other. When this happens, the model struggles to separate their individual effects, inflating the standard errors of the coefficients and making them unstable. Your overall model might fit well, but the individual coefficients become unreliable.

The Variance Inflation Factor (VIF) quantifies this for each predictor. A VIF of 1 means no correlation with other predictors. The higher the VIF, the more that predictor overlaps with others. The traditional rule of thumb uses a VIF of 10 as the danger threshold, but recent research has pushed back on this. Simulation studies show that a VIF of 10 is not strict enough, and meaningful multicollinearity problems can appear with VIF values between 3 and 5. Some researchers flag anything above 4 as potentially problematic. There is no universal consensus, but treating 5 as a conservative threshold is more reliable than waiting until 10.

If you find high VIF values, your options include removing one of the correlated predictors, combining them into a single variable, or using a technique like ridge regression that handles collinearity more gracefully.

Reporting Your Results

If you’re writing up regression results for a paper or report, a few conventions keep things clear. Report coefficients, standard errors, t-statistics, and p-values for each predictor. Include R-squared and adjusted R-squared for the overall model. Report exact p-values to two or three decimal places (p = .03, p = .006), except when they fall below .001, in which case simply write p < .001.

For the F-statistic that tests whether your overall model is significant, include the degrees of freedom. Present R-squared values to two decimal places. If you’re following APA style, italicize statistical symbols like R², t, F, and p, but you don’t need to define these standard abbreviations for your audience.

The most effective results sections pair the numbers with plain-language interpretation. After reporting that B = 3.2, p = .004, tell the reader what that means in context: “Each additional year of experience was associated with a $3,200 increase in average salary, and this relationship was statistically significant.”