What Does It Mean When a Residual Is Positive?

A positive residual means the actual value is higher than what the model predicted. In other words, the prediction fell short of reality. If you’re working with a regression line on a graph, a positive residual shows up as a data point sitting above the line.

How Residuals Are Calculated

A residual is simply the difference between what actually happened and what your model said would happen. The formula is:

Residual = Observed value − Predicted value

If a student studied 15 hours and your regression model predicts they’d score about 75 on an exam, but they actually scored 80, the residual is 80 − 75 = 5. That’s a positive residual of 5 points. The model underestimated their performance by 5 points.

A negative residual is the opposite: the model overestimated. If it predicted 75 but the student scored 70, the residual would be −5. Zero means the prediction was exactly right.

What a Single Positive Residual Tells You

One positive residual on its own isn’t a problem. In any dataset, roughly half the residuals will be positive and half will be negative. In fact, when you use standard regression (ordinary least squares), the residuals are mathematically guaranteed to sum to zero. Positive and negative residuals balance each other out by design.

The size of a positive residual matters more than its direction. A residual of 0.5 in a dataset where most values range from 0 to 100 is trivially small. A residual of 30 in that same dataset deserves a closer look. Statistical software often flags data points where the standardized residual (the residual divided by its standard deviation) exceeds 2 in absolute value. A standardized residual larger than 3 is widely considered an outlier, meaning the observed value is unusually far from what the model expected.

Patterns of Positive Residuals Signal a Problem

Individual positive residuals are normal. Clusters of them are not. If you plot your residuals and see that they’re consistently positive in one region (say, for low values of your predictor variable) and consistently negative in another, your model is systematically off. It’s underpredicting in some places and overpredicting in others.

This pattern typically means a straight line doesn’t fit your data well. The classic example: residuals are positive for small values, negative for medium values, and positive again for large values. That curved pattern tells you the true relationship between your variables is nonlinear, and a straight regression line is missing the shape of the data. A model with a curve or additional terms would fit better.

If your residuals are positive across the board (or nearly so), that’s a different issue. It means your model is consistently underpredicting, which can happen if important variables are missing from the model or if the data has shifted since the model was built.

Positive Residuals on a Graph

On a scatterplot with a regression line, a positive residual is the vertical distance from the line up to the data point. The point sits above the line because the real value exceeded the prediction. The further above the line, the larger the positive residual.

On a residual plot (where residuals are plotted against predicted values or the predictor variable), positive residuals appear above the horizontal zero line. In a well-fitting model, you want to see residuals scattered randomly above and below zero with no visible pattern. If most points in a particular range cluster above the zero line, the model is underpredicting in that range.

When a Large Positive Residual Changes Your Model

A data point with a large positive residual can sometimes pull the regression line toward it, changing predictions for every other point in the dataset. Statisticians measure this pull using a metric called Cook’s distance, which combines two things: how large the residual is and how far the data point’s predictor value sits from the center of the data. A point that scores high on both, with a big residual and an unusual position along the horizontal axis, can shift the entire regression line when it’s included.

This is why checking for influential points matters. A single unusually high observation might represent a data entry error, a special case that doesn’t belong in your sample, or a genuine extreme value. Removing it and refitting the model lets you see how much it was driving your results. If the regression line barely moves, the point wasn’t influential despite its large residual. If the line shifts substantially, that one observation was doing a lot of the work.

Residuals vs. Errors

You’ll sometimes see “residual” and “error” used interchangeably, but they’re technically different. An error is the difference between an observed value and the true population value you’ll never actually know. A residual is the difference between an observed value and the estimate your model produces from sample data. Since you can never observe the true population value directly, you can never calculate the true error. Residuals are the practical, observable version of errors, and they’re what you actually work with when evaluating a model.