How to Find Predicted Value From a Regression Equation

To find a predicted value from a regression equation, you plug your known value of X (the independent variable) into the equation and solve for Y. If your regression equation is Y = 3 + 2X and you want the predicted value when X is 5, the answer is 3 + 2(5) = 13. That’s the core of it, but the details matter depending on what kind of regression you’re working with and where your coefficients come from.

The Simple Linear Regression Equation

A simple linear regression equation takes the form:

Predicted Y = b₀ + b₁(X)

Here, b₀ is the y-intercept, which is the predicted value of Y when X equals zero. b₁ is the slope, which tells you how much Y changes for every one-unit increase in X. X is the independent variable you’re using to make the prediction. Once you know these three pieces, the math is just arithmetic: multiply the slope by your X value, then add the intercept.

For example, say you ran a regression predicting weekly sales based on advertising spend, and you got this equation: Predicted Sales = 200 + 4.5(Ad Spend). To predict sales when ad spend is $1,000, you’d calculate 200 + 4.5(1000) = 4,700. Your predicted sales figure is $4,700.

Where to Find the Coefficients

If you’re pulling coefficients from statistical software, the exact labels vary but follow a pattern. In SPSS, look for the column labeled “B” in the coefficients table. The row labeled “constant” is your intercept (b₀), and the rows for each predictor variable give you the slopes. In R, the output typically labels these as “Estimate” or “Coefficients.” Excel’s regression output uses similar terminology.

The key is finding two things: the constant (intercept) and the coefficient for each predictor. Once you have those numbers, you can write out the full equation yourself and plug in any X value you want.

Multiple Regression: More Than One Predictor

When your model has more than one independent variable, the equation expands but the process stays the same. A multiple regression equation looks like this:

Predicted Y = b₀ + b₁(X₁) + b₂(X₂) + … + bₖ(Xₖ)

Each coefficient tells you the effect of that particular variable on Y after adjusting for all the other variables. So b₁ indicates how much larger you’d expect Y to be for a case that’s identical to another except for being one unit higher in X₁.

To get the predicted value, multiply each X by its corresponding coefficient, add them all together, and add the intercept. If your equation is Predicted GPA = 1.2 + 0.02(SAT Score) + 0.3(Study Hours), and a student has an SAT of 1200 and studies 15 hours per week, the predicted GPA is 1.2 + 0.02(1200) + 0.3(15) = 1.2 + 24 + 4.5 = 29.7. (This is a made-up example; obviously real GPA models would have different scales.)

The coefficient of determination, R², tells you the percentage of variation in Y that’s explained by all the X variables combined. A higher R² generally means your predicted values will be closer to the actual values.

Handling Categorical Variables

When one of your predictors is a category rather than a number (like treatment group, gender, or region), the regression uses dummy variables coded as 0 or 1. If you have three treatment levels, the software creates two dummy variables (always one fewer than the number of categories). The category that doesn’t get its own variable is the “reference” group, and its predicted value equals the intercept alone.

Say your regression output gives you: Y = 5.5 – 4.0(Level 1) – 2.0(Level 2). To predict Y for someone in Level 1, set Level 1 = 1 and Level 2 = 0, giving you 5.5 – 4.0(1) – 2.0(0) = 1.5. For someone in Level 2, set Level 1 = 0 and Level 2 = 1, giving you 5.5 – 4.0(0) – 2.0(1) = 3.5. For Level 3 (the reference group), both dummies are 0, so the predicted value is simply 5.5.

Using Excel Without Writing the Equation

If you’d rather skip the manual calculation, Excel has built-in functions that do this for you. The FORECAST.LINEAR function takes three arguments: the X value you want to predict for, your range of known Y values, and your range of known X values. So if your X data is in cells A2:A20 and your Y data is in B2:B20, and you want to predict Y when X is 50, you’d type:

=FORECAST.LINEAR(50, B2:B20, A2:A20)

Excel fits the regression line behind the scenes and returns the predicted value directly. The older FORECAST function works identically but is being phased out in favor of FORECAST.LINEAR. Both handle simple linear regression only. For multiple regression predictions, you’ll need to use the regression coefficients from Excel’s Data Analysis tool and calculate manually, or use an add-in.

Checking Your Prediction With Residuals

Once you have predicted values, you can compare them to actual observed values using residuals. A residual is simply the actual value minus the predicted value:

Residual = Observed Y – Predicted Y

A positive residual means the actual value was higher than what the model predicted. A negative residual means it was lower. If you’re checking a single prediction against a known outcome, this tells you exactly how far off the model was. Across your whole dataset, patterns in residuals can reveal whether the regression model is a good fit or whether it’s systematically over- or under-predicting in certain ranges.

When Predictions Become Unreliable

Regression equations are built from a specific range of observed data, and predictions work best when your X values fall within that range. Predicting within the range is called interpolation. Predicting outside it is called extrapolation, and it’s where things get risky.

If your data covers advertising budgets from $500 to $5,000, using the equation to predict sales at $3,000 is reasonable. Predicting at $50,000 is extrapolation, and the linear relationship that held in your data range may not hold at extreme values. Research in ecology has shown that extrapolation can produce substantial positive bias, meaning models tend to overestimate when pushed beyond their data boundaries. This applies across fields, not just ecology.

With multiple regression, extrapolation is harder to spot because you could be within the range of each individual variable but at an unusual combination of values that the model never actually encountered. As a practical rule, the further your input values are from the center of your original data, the less confidence you should place in the prediction.