A regression equation is a mathematical formula that describes the relationship between variables. In its simplest form, it looks like this: y = b₀ + b₁x. That’s it: an outcome variable on the left, and on the right, a starting value plus a coefficient multiplied by a predictor variable. If you’ve seen the slope-intercept form from algebra (y = mx + b), you already recognize the structure. Regression equations build on that same idea, with some added notation depending on the type of regression.
The Simple Linear Regression Equation
The most common regression equation relates one predictor to one outcome. The population version looks like this:
Y = β₀ + β₁x + ε
Each piece has a specific meaning:
- Y is the outcome (or dependent variable), the thing you’re trying to predict or explain.
- β₀ (beta-zero) is the intercept, the value of Y when x equals zero.
- β₁ (beta-one) is the slope, representing how much Y changes for every one-unit increase in x.
- x is the predictor (or independent variable), the input you’re using to make predictions.
- ε (epsilon) is the error term, capturing the random noise or variability that the equation can’t explain.
The Greek letters β₀ and β₁ represent the “true” population values, which you never actually know. When you run a regression on real data, your software estimates those values and swaps in regular letters. The estimated version typically looks like ŷ = b₀ + b₁x, where b₀ and b₁ are the calculated estimates and ŷ (pronounced “y-hat”) is the predicted value. The hat symbol over the y signals that it’s a prediction, not an observed data point. The error term drops out of this version because you’re describing the best-fit line itself, not the scatter around it.
What the Slope and Intercept Tell You
The intercept (b₀) is the predicted value of your outcome when the predictor equals zero. Sometimes that’s meaningful, like predicting a baseline test score before any hours of studying. Other times it’s just a mathematical anchor point with no real-world interpretation, like predicting house price when square footage is zero.
The slope (b₁) is the more useful number. It tells you the expected change in the outcome for every one-unit increase in the predictor. If you’re predicting weight based on height, a slope of 4.2 means that for every additional inch of height, the predicted weight increases by 4.2 pounds. A negative slope means the outcome decreases as the predictor increases.
Multiple Regression: More Than One Predictor
Most real-world problems involve more than one predictor, and the equation simply extends by adding terms. The general form looks like this:
Y = β₀ + β₁x₁ + β₂x₂ + … + βₚxₚ + ε
Each x represents a different predictor variable, and each β represents that predictor’s individual effect on the outcome. The subscripts just number them: x₁ might be square footage, x₂ might be number of bathrooms, x₃ might be age of the house, and so on. This is no longer a line through two-dimensional space. It describes a surface or higher-dimensional shape, but the logic is identical.
One important detail: each coefficient in a multiple regression is interpreted while holding all other predictors constant. So β₁ represents the expected change in Y for a one-unit increase in x₁, assuming x₂ through xₚ stay the same. This is why these are sometimes called “adjusted” or “partial” regression coefficients.
What a Filled-In Equation Looks Like
With actual data, the Greek letters and subscripts are replaced with concrete numbers. Here’s what a real estate pricing model looks like after running the regression:
Price = 504,928 − 65.854(Size) + 33.051(Age) + 5,130(Bathrooms) − 17,727(Bedrooms) + 77.091(HOA) + …
Each number in front of a variable is the estimated coefficient for that predictor. To use the equation, you plug in the values for a specific house. If a home is 2,000 square feet and 15 years old with 2 bathrooms, you multiply each value by its coefficient, add the intercept, and get a predicted price. The negative coefficient on Bedrooms means that, after controlling for everything else in the model, adding a bedroom is associated with a lower price. That might seem counterintuitive, but it reflects the reality that more bedrooms in the same square footage often means smaller rooms.
A simpler version of the same dataset produced an equation with just one predictor: Price = 305,911 + 23.422(HOA). That’s easier to read and interpret, but captures far less of what actually drives home prices.
Population Parameters vs. Sample Estimates
You’ll see two different notation styles depending on the context, and they mean slightly different things. Greek letters like β₀ and β₁ refer to the true population parameters, the exact values that would describe the relationship if you could measure every single case in existence. You never know these values directly.
When you collect a sample and run your analysis, you get estimates of those parameters. These are written as b₀ and b₁ (plain Latin letters) or sometimes as β̂₀ and β̂₁ (betas with hats). The fitted equation using these estimates is written as ŷ = b₀ + b₁x₁ + b₂x₂ + … + bₖxₖ. If you’re reading a research paper and see betas, the author is usually describing the theoretical model. If you see b-values or betas with hats, they’re reporting results from actual data.
Residuals: The Gap Between Prediction and Reality
No regression equation predicts perfectly. The difference between what you actually observe (y) and what the equation predicts (ŷ) is called a residual. It’s written as e = y − ŷ. If your equation predicts a house will sell for $320,000 and it actually sells for $335,000, the residual is $15,000.
Residuals are related to but distinct from the error term (ε) in the theoretical equation. The error term represents the true unknown gap between each data point and the actual population relationship, which you can never see. The residual is your observable estimate of that gap, calculated as the distance between each data point and your fitted line. When statisticians evaluate whether a regression model is working well, residuals are one of the first things they examine.
Logistic Regression Looks Different
Not all regression equations predict a number. When the outcome is a yes-or-no category (did the customer buy or not, does the patient have the condition or not), you use logistic regression. The equation takes on a different appearance because it predicts the probability of an outcome rather than a raw value.
The core of logistic regression uses what’s called the logit, which is the natural logarithm of the odds:
ln(p / (1 − p)) = β₀ + β₁x₁ + β₂x₂ + … + βₖxₖ
The right side looks identical to a regular regression equation. The left side is what changes. Here, p is the probability of the outcome occurring, and (1 − p) is the probability of it not occurring. Their ratio gives you the odds, and taking the logarithm of the odds converts it into a scale that can range from negative infinity to positive infinity, which makes the math work.
If you rearrange this equation to solve for p directly, you get the sigmoid form: p = exp(β₀ + β₁x₁ + …) / (1 + exp(β₀ + β₁x₁ + …)). This version always produces a result between 0 and 1, which makes sense for a probability. The S-shaped curve it produces is why logistic regression gets its characteristic shape when graphed.
Reading Regression Output
In practice, you’ll rarely see a regression equation written out in one neat line. Statistical software presents results in a table, with one row per predictor. Each row shows the coefficient (the b-value), a standard error, a test statistic, and a p-value. To reconstruct the equation, you take the intercept from the top row, then add each predictor’s coefficient multiplied by that variable. The equation is hiding in the coefficient column of the output table.
Regardless of whether the equation has one predictor or twenty, uses linear or logistic form, or describes a population or a sample, the underlying structure stays consistent: an intercept plus a weighted combination of predictors. Once you recognize that pattern, every regression equation you encounter will look familiar.

