How to Make a Best Fit Line: Excel, Python & More

A best fit line (also called a trendline or regression line) is a straight line drawn through a scatter plot that comes as close as possible to all the data points. It works by minimizing the total distance between each point and the line itself, giving you the single straight line that best represents the overall trend in your data. Whether you’re working by hand, in a spreadsheet, on a graphing calculator, or in code, the core idea is the same.

How the Math Works

Every straight line follows the equation y = mx + b, where m is the slope and b is the y-intercept. To find the best fit line, you need to calculate both values from your data. The method used is called least squares regression: it finds the line where the sum of all squared vertical distances between each data point and the line is as small as possible. Squaring those distances prevents positive and negative gaps from canceling each other out, and it penalizes larger gaps more heavily.

The slope tells you how much y changes for each one-unit increase in x. You calculate it by multiplying the correlation between your x and y values by the ratio of their standard deviations (the standard deviation of y divided by the standard deviation of x). Once you have the slope, the y-intercept is straightforward: take the mean of your y values and subtract the slope multiplied by the mean of your x values. This guarantees the line passes through the center point of your data.

You don’t usually need to do this arithmetic yourself. But understanding it helps you recognize what the line actually represents and why different tools sometimes give slightly different-looking results depending on how they handle your data.

Making a Best Fit Line in Excel

Excel builds a best fit line directly into any scatter chart. Start by selecting your data and inserting a scatter plot. Once the chart appears, click the + icon at the top right corner of the chart and check Trendline. Excel defaults to a linear trendline, which is exactly a best fit line.

If you need more control, select your data series in the chart, click the Chart Design tab, then go to Add Chart Element and choose Trendline. From the dropdown you can pick Linear, Exponential, Logarithmic, Polynomial, Power, or Moving Average. For a standard best fit line, choose Linear. To display the equation and a goodness-of-fit value on the chart, click More Trendline Options and check the boxes for “Display Equation on chart” and “Display R-squared value on chart.”

One note: Excel only shows the Trendline option after you’ve selected a specific data series. If your chart has multiple data series and you haven’t clicked on one, the option won’t appear.

Making a Best Fit Line in Google Sheets

Google Sheets follows a similar approach. Create a scatter chart from your data, then double-click the chart to open the editor. Click Customize, then Series. Scroll down to find the Trendline section and toggle it on. The default type is Linear.

Google Sheets gives you options to change the line color, opacity, and thickness. Under the label dropdown, you can choose to display the equation (y = mx + b format) directly on the chart. To see how well the line fits your data, enable the R-squared display, which shows a value between 0 and 1. This option only appears if your chart includes a legend. The closer R-squared is to 1, the more tightly your data follows the line.

Making a Best Fit Line on a TI-84 Calculator

Graphing calculators are common in math and science classes, and the TI-84 has a built-in linear regression function. First, enter your x-values into list L1 and your y-values into list L2 using the STAT menu and choosing Edit.

Then press STAT, arrow right to CALC, and select 4: LinReg(ax+b). You need to tell the calculator where your data lives: press 2nd then 1 (for L1), type a comma, then press 2nd then 2 (for L2). To store the equation so you can graph it, press VARS, arrow right to Y-VARS, choose 1: Function, and then select Y1. Press ENTER, and the calculator displays the slope (a), y-intercept (b), and correlation coefficient (r). The equation is now stored in Y1, so you can press GRAPH to see the line plotted over your scatter plot.

Making a Best Fit Line in Python

If you’re working with data in Python, NumPy’s polyfit function handles linear regression in a single line of code. Set the degree to 1 for a straight line:

import numpy as np
x = np.array([1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([2.1, 4.0, 5.8, 8.2, 9.9])
m, b = np.polyfit(x, y, 1)

This returns two values: the slope (m) and y-intercept (b). You can then plot the original data as a scatter plot and overlay the line using matplotlib by generating predicted y-values with y_line = m * x + b. The polyfit function uses the same least squares method described above, minimizing the squared distance between each data point and the resulting line.

Drawing a Best Fit Line by Hand

When you need to estimate a best fit line without technology, plot all your points on graph paper first. Look at the overall direction of the data, then place a ruler so that roughly equal numbers of points fall above and below the line. The line should pass through the general center of the cloud of points, not connect the first and last points.

A good check: the line should pass near the point defined by the mean of all your x-values and the mean of all your y-values. Calculate those two averages, mark that coordinate, and make sure your line runs through or very close to it. This won’t be as precise as a calculated regression, but it produces a reasonable estimate for homework, lab reports, or quick visual analysis.

How to Tell If Your Line Fits Well

The R-squared value (written as R²) measures how much of the variation in your data the line explains. It ranges from 0 to 1. An R² of 0.85 means the line accounts for 85% of the variability in your y-values.

What counts as a “good” R² depends entirely on the field. In physics and engineering, values above 0.70 are typically expected because the relationships being measured are tightly controlled. In finance, values between 0.40 and 0.70 are considered strong. Social sciences and psychology often work with R² values as low as 0.10 to 0.30 because human behavior introduces much more variability. In ecology, 0.20 to 0.50 is often acceptable. Clinical medical research considers anything above 0.15 to 0.20 meaningful.

If your R² is low, that doesn’t necessarily mean you did something wrong. It may mean the relationship between your variables isn’t linear, or that other factors are influencing the outcome that your model doesn’t include.

How Outliers Affect the Line

A single unusual data point can dramatically shift your best fit line, but only under certain conditions. An outlier is a point whose y-value doesn’t follow the general pattern. A high-leverage point is one with an extreme x-value, sitting far to the left or right of the rest of your data. Points that are both outliers and high-leverage are the most dangerous: in one Penn State example, removing a single such point changed the slope from 3.32 to 5.12, a massive shift.

Points that are outliers but sit in the middle range of x-values tend to have much less effect on the slope. Before removing any data point, make sure there’s a legitimate reason (a measurement error, a data entry mistake, or a fundamentally different condition). Deleting inconvenient points just to improve your R² produces misleading results.

When a Straight Line Is the Wrong Choice

A best fit line assumes four things about your data. First, the relationship between x and y is actually linear, not curved. Second, the spread of points around the line is roughly equal across the entire range of x (not fanning out or narrowing). Third, the data points are independent of each other, meaning one observation doesn’t influence the next. Fourth, the scatter around the line follows a roughly normal, bell-shaped distribution at any given x-value.

If your scatter plot shows a clear curve, a linear best fit line will misrepresent the trend. In that case, consider a polynomial, logarithmic, or exponential trendline instead. Both Excel and Google Sheets offer these as alternative trendline types in the same menu where you select Linear. If your data fans out as x increases (forming a cone shape), the line may still look reasonable, but the predictions at higher x-values will be less reliable than those at lower values.