What Is Model Fit in Regression? A Clear Explanation

Model fit in regression measures how well your statistical model captures the patterns in your data. It’s the gap between what your model predicts and what actually happened. A model with good fit produces predictions that land close to the real values, while a model with poor fit misses the mark in ways that suggest it’s either too simple, too complex, or built on the wrong assumptions. Assessing model fit is how you decide whether your regression equation is trustworthy enough to draw conclusions from.

What Model Fit Actually Measures

When you run a regression, you’re fitting a line (or curve) through a cloud of data points. No line passes through every point perfectly, so there’s always some discrepancy between your model’s predictions and the observed values. Model fit quantifies that discrepancy. The smaller and more random those errors are, the better the fit.

Think of it this way: if you’re trying to predict someone’s weight from their height, your regression line will overestimate some people and underestimate others. Model fit asks whether those misses are small and scattered randomly, or whether they reveal a systematic problem, like the relationship actually being curved rather than straight. The errors themselves, called residuals, are the raw material for almost every model fit assessment.

R-Squared: The Most Common Measure

R-squared (also called the coefficient of determination) is the metric most people encounter first. It tells you the proportion of variation in your outcome that your predictors explain. An R-squared of 0.75 means your model accounts for 75% of the variation in the outcome, while 25% remains unexplained.

R-squared always falls between 0 and 1. If it equals 1, every data point sits exactly on the regression line. If it equals 0, your predictors explain nothing at all, and the regression line is perfectly flat. In practice, you’ll land somewhere in between.

What counts as a “good” R-squared depends entirely on your field. In the physical sciences and engineering, values above 0.70 are typically expected, and researchers in physics or chemistry often consider 0.70 to 0.99 a good result. In the social sciences and psychology, where human behavior introduces enormous variability, values as low as 0.10 to 0.30 are often considered acceptable. An R-squared of 0.25 might be excellent in one context and embarrassing in another.

The Limits of R-Squared

R-squared has a well-known flaw: it never decreases when you add more predictors to your model, even if those predictors are meaningless. Throw in a completely random variable and R-squared will stay the same or inch upward. This makes it unreliable for comparing models with different numbers of predictors.

There’s a deeper problem, too. R-squared says nothing about prediction error. Even with the exact same model accuracy, R-squared can swing anywhere between 0 and 1 just by changing the range of your input variable. A model can have a high R-squared and still make poor predictions, or a low R-squared and still be the best available description of a messy relationship. Mean squared error is often a better gauge of how close your predictions actually land.

Adjusted R-Squared

Adjusted R-squared fixes the biggest weakness of regular R-squared by penalizing model complexity. It accounts for both the number of predictors and the number of observations. When you add a predictor that doesn’t genuinely improve explanatory power, adjusted R-squared decreases, effectively punishing you for making the model more complicated without making it more accurate.

This makes adjusted R-squared far more useful when you’re deciding between models. If one model uses three predictors and another uses seven, comparing their regular R-squared values is misleading. Comparing their adjusted R-squared values gives you a fairer picture of which model actually explains the data better relative to its complexity.

The F-Test for Overall Significance

R-squared tells you how much variation your model explains, but it doesn’t tell you whether that amount is statistically meaningful. The F-test fills that gap. It tests whether your model, taken as a whole, does a better job than the simplest possible alternative: a flat line that just predicts the average outcome for everyone.

The null hypothesis of the F-test is that none of your predictors matter, that you’d do just as well ignoring them entirely. A large F-statistic (with a small p-value) means the error in your full model is substantially smaller than the error in the “no predictors” model, giving you confidence that your regression captures something real. If the two models produce similar error, there’s no reason to prefer the more complex one.

The F-test is especially useful in multiple regression, where individual predictors might look insignificant on their own but contribute meaningfully as a group.

Residual Plots: Seeing Fit Problems Visually

Numbers alone can hide important problems. Residual plots let you see them. The most common version plots residuals (prediction errors) on the vertical axis against fitted values (predictions) on the horizontal axis. It’s the single most useful diagnostic tool in regression.

A well-behaved residual plot has three characteristics. The residuals bounce randomly around the zero line, suggesting the linear relationship assumption is reasonable. They form a roughly horizontal band, suggesting the spread of errors stays consistent across all predicted values. And no single point stands dramatically apart from the rest.

When the plot shows a curved pattern instead of a random scatter, the relationship between your variables probably isn’t linear, and a straight-line model is missing something systematic. When the residuals fan out like a cone, getting wider as predictions increase, the variance of your errors isn’t constant, a problem called heteroscedasticity. Both patterns indicate poor model fit that R-squared alone might not reveal.

AIC and BIC for Comparing Models

When you’re choosing between competing models, especially models that aren’t simply nested versions of each other, information criteria offer a principled way to compare them. The two most common are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC).

Both balance two things: how well the model fits the data and how many parameters it uses. A model with more parameters will almost always fit the training data better, but at the cost of added complexity. AIC and BIC penalize that complexity, rewarding models that achieve good fit with fewer moving parts. Lower values indicate better models.

The key difference between them is how harshly they penalize complexity. BIC applies a stronger penalty that grows with sample size, so it tends to favor simpler models than AIC does. Neither gives you an absolute measure of fit. They only tell you which model is better relative to the others you’re comparing. Their biggest advantage is that the models being compared don’t need to be nested, meaning one doesn’t have to be a simplified version of the other.

Overfitting and Underfitting

Model fit isn’t just about getting close to your current data. A model that fits the training data too perfectly often performs terribly on new data, a problem called overfitting. The model has memorized the noise and quirks of the specific dataset rather than learning the underlying pattern. The telltale sign is a big gap between training accuracy and test accuracy: the model looks great on data it’s seen before but falls apart on anything new.

Underfitting is the opposite problem. The model is too simple to capture the real patterns in the data, so it performs poorly on both training data and new data. If accuracy is low across the board and doesn’t improve during training, the model probably needs more predictors, a different functional form, or both.

The goal is a model that sits between these extremes: complex enough to capture genuine relationships, simple enough to generalize beyond the data it was built on. Adjusted R-squared, AIC, and BIC all push you in this direction by penalizing unnecessary complexity, but splitting your data into training and testing sets remains the most direct way to check whether your model generalizes.

Putting It All Together

No single metric tells you everything about model fit. R-squared gives you a quick sense of explanatory power but can mislead you about prediction accuracy and is blind to model complexity. Adjusted R-squared corrects for complexity but still shares some of R-squared’s limitations. The F-test tells you whether your model is statistically better than nothing. Residual plots reveal structural problems that summary statistics miss entirely. AIC and BIC help you choose between competing models.

In practice, assessing model fit means looking at several of these tools together. A model with a reasonable R-squared, a significant F-test, clean residual plots, and a low AIC relative to alternatives is one you can feel confident about. A model where these indicators disagree is one worth investigating further.