What Is the Equation of the Line of Best Fit?

The line of best fit is expressed as ŷ = b₀ + b₁x, where b₀ is the y-intercept and b₁ is the slope. This equation draws a straight line through a scatter plot of data points in a way that minimizes the total distance between the line and every point. You can then plug in any x value to get a predicted y value.

If that notation looks unfamiliar, it’s the statistical version of the formula you likely learned in algebra: y = mx + b. The logic is identical. The slope tells you how much y changes for every one-unit increase in x, and the y-intercept tells you where the line crosses the vertical axis.

How the Equation Differs From y = mx + b

In algebra, y = mx + b describes a perfect line. Every point sits exactly on it. In statistics, data is messy, so the line of best fit uses a hat symbol (ŷ) instead of plain y. That hat is important: it signals that the value is a prediction, not an exact measurement. The actual data points will scatter above and below the line.

The notation also swaps out m and b for b₁ (slope) and b₀ (y-intercept). In more formal textbooks, you’ll see Greek letters: β₁ for the true population slope and β₀ for the true intercept, with b₁ and b₀ as the estimates you calculate from your sample. For most practical purposes, you’re working with the sample version: ŷ = b₀ + b₁x.

What the Slope and Intercept Actually Tell You

The slope (b₁) is the heart of the equation. It tells you: for every one-unit increase in x, the predicted value of y increases (or decreases) by b₁. If you’re plotting hours of study against exam scores and the slope is 5.2, that means each additional hour of study is associated with a 5.2-point increase in the predicted score.

A positive slope means the line angles upward from left to right. A negative slope means it angles downward, indicating that as x increases, y decreases.

The y-intercept (b₀) is the predicted value of y when x equals zero. Sometimes this has a meaningful interpretation, like a baseline measurement. Other times it’s just a mathematical anchor for the line. If your x values never get close to zero in practice, the intercept may not mean anything useful on its own.

How the Line Is Calculated

The method behind the line of best fit is called ordinary least squares, or OLS. The idea is straightforward: for every data point, measure the vertical distance between that point and the line. Square each of those distances (so negative gaps don’t cancel out positive ones), then add them all up. The “best” line is the one where that total is as small as possible.

The slope formula that achieves this is:

b₁ = Σ(xᵢ – x̄)(yᵢ – ȳ) / Σ(xᵢ – x̄)²

In plain terms, you take each data point, measure how far its x value is from the average x and how far its y value is from the average y, multiply those two distances together, and add up all those products. Then you divide by the sum of the squared x-distances. The result is the slope that produces the least total error.

Once you have the slope, the intercept follows naturally. The line of best fit always passes through the point (x̄, ȳ), the averages of your x and y values. So b₀ = ȳ – b₁(x̄). You don’t need to memorize these formulas for most real work, since calculators and spreadsheets handle the arithmetic, but understanding the logic helps you trust (or question) the output.

How to Tell If the Line Fits Well

Having an equation doesn’t mean it’s a good one. The standard measure of fit quality is R², called the coefficient of determination. It ranges from 0 to 1 and represents the proportion of variation in y that the line explains. An R² of 0.85 means 85% of the variation in your data is captured by the linear relationship, while 15% is unexplained scatter. An R² of 0 means the line explains nothing at all.

As a rough benchmark, an R² above 0.5 is generally considered a meaningful relationship, though what counts as “good” depends on the field. Predicting the orbit of a planet might require R² near 1.0. Predicting human behavior with R² of 0.3 might be genuinely useful.

R² is the square of the correlation coefficient (r), which ranges from -1 to 1 and indicates both the strength and direction of the relationship. A correlation of -0.9 means a strong negative trend; squaring it gives an R² of 0.81, showing the line still explains most of the variation even though the slope is negative.

Assumptions the Data Should Meet

The line of best fit assumes three things about your data. First, the relationship between x and y is actually linear. If your scatter plot curves, a straight line will systematically miss the pattern. Second, the spread of data points above and below the line should be roughly consistent across all values of x. If the points fan out like a cone (tight on one end, wide on the other), the line’s predictions will be unreliable in the wide zone. Third, the deviations from the line should follow a roughly normal distribution, meaning most points cluster near the line with fewer points far away.

Violating these assumptions doesn’t make the math break. You can always calculate a line. But the predictions and any conclusions you draw from the slope become less trustworthy.

Using the Equation for Predictions

Once you have ŷ = b₀ + b₁x, you can plug in x values to generate predictions. This works best for values of x that fall within the range of your original data, a practice called interpolation. If your data covers ages 20 through 60, predicting a value for age 35 is reasonable because you have observed data on both sides of it.

Predicting outside your data range is called extrapolation, and it’s risky. The linear pattern you observed may not hold beyond the data you measured. A classic example: if you model heart rate during the first six minutes of exercise, the line might show a steep upward slope. Extending that line to 30 minutes would predict an impossibly high heart rate. Extending it backward to 10 minutes before exercise could predict a negative heart rate. The math will happily produce these numbers, but they’re meaningless. The further you extrapolate from your data, the less you should trust the result.

Real-World Applications

The same equation scales to complex problems. In health research, for example, scientists have used regression equations to estimate physical fitness from easy-to-measure variables. One large-scale study of over 250,000 Korean adults produced an equation predicting grip strength from gender, age, body mass index, and body fat percentage: grip strength = 37.138 – (10.190 × gender) + (0.988 × BMI) – (0.457 × body fat%) – (0.042 × age). Each coefficient is a slope that tells you exactly how much the predicted grip strength changes per unit of that variable, holding the others constant.

This is multiple linear regression, an extension of the basic equation to include more than one x variable. The core idea is the same. You still have an intercept, slopes, and predicted values. The equation just grows: ŷ = b₀ + b₁x₁ + b₂x₂ + b₃x₃, and so on. Whether you’re working with one predictor or ten, the foundation is the same straight-line relationship between each input and the output.