Plotting a regression line means drawing the single straight line that best fits a set of data points on a scatter plot. The line is calculated using a method called least squares, which minimizes the total distance between each data point and the line itself. You can plot one by hand with a formula, or let software do it in a few clicks.
The Formula Behind the Line
A regression line follows the equation ŷ = b₀ + b₁x, where b₁ is the slope and b₀ is the y-intercept. The slope tells you how much y changes for each one-unit increase in x, and the intercept is where the line crosses the y-axis.
The slope is calculated as b₁ = r × (s_y / s_x), where r is the correlation coefficient between your two variables, s_y is the standard deviation of your y values, and s_x is the standard deviation of your x values. Once you have the slope, the intercept is b₀ = ȳ − b₁ × x̄, using the means of x and y. This guarantees the regression line always passes through the point (x̄, ȳ), the center of your data.
How to Calculate It by Hand
Start by finding the mean of your x values and the mean of your y values. Then calculate the standard deviation of each. Next, compute the correlation coefficient r, which measures how tightly your points cluster around a straight line (ranging from -1 to +1). Plug those three numbers into the slope formula, then use the slope and the two means to solve for the intercept.
Once you have the equation, pick two x values within your data range, calculate the corresponding ŷ values, plot those two points, and connect them with a straight line. That line extends across your scatter plot and represents the best linear fit for your data.
Plotting in Excel
Excel makes this nearly automatic. First, select your data and insert a scatter chart. Then click the + icon at the upper right corner of the chart and check the “Trendline” box. Excel draws a linear trendline through your data immediately.
To see the actual equation and how well the line fits, hover over the Trendline option in that same menu, click the small arrow, and select “More options.” In the Format Trendline panel, check both “Display Equation on chart” and “Display R-squared value on chart.” The equation gives you the slope and intercept, and the R-squared value tells you how well the line explains your data.
Plotting in Python
Python’s Seaborn library has a function called regplot() that plots both a scatter plot and a fitted regression line in one step. Pass in your x and y data, and it draws the points, the best-fit line, and a shaded confidence band around it. If you need to plot regression lines across multiple subgroups, lmplot() combines the same regression plotting with a grid layout so you can compare groups side by side.
Seaborn also provides residplot(), which plots the residuals (the vertical distances between each point and the line) so you can check whether a straight line is actually appropriate for your data. If you prefer working directly with Matplotlib, you can calculate the slope and intercept using NumPy’s polyfit function, then plot the line manually over your scatter plot for full control over styling.
Plotting on a TI-84 Calculator
Enter your x values into list L₁ and y values into L₂. To see a scatter plot first, press 2nd then Y= to open STAT PLOT, turn on Plot 1, and press ZOOM then select ZoomStat to auto-scale the window.
To add the regression line, press STAT, arrow right to CALC, and choose option 4: LinReg(ax+b). Tell the calculator where your data lives by entering L₁, L₂, then store the result in Y₁ by pressing VARS, selecting Y-VARS, then Function, then Y₁. Press ENTER to run the calculation, then press GRAPH. The regression line appears directly on top of your scatter plot.
Check Your Data Before You Plot
A regression line only makes sense if your data actually follows a roughly linear pattern. Before fitting the line, look at your scatter plot. If the points curve, fan out, or form clusters, a straight line will misrepresent the relationship. Four conditions need to hold for a linear regression to be valid: the relationship between x and y is linear, the data points are independent of each other, the spread of points around the line is roughly even across all x values, and the residuals (the errors) are approximately normally distributed.
The quickest way to verify these is a residual plot, which graphs the difference between each observed y value and the value the line predicts. In a good residual plot, the points bounce randomly around zero with no visible pattern. If you see a curve, a funnel shape, or obvious clusters, a straight line isn’t the right model for your data.
How to Tell If the Line Fits Well
The R-squared value (written as R²) is the standard measure of fit. It ranges from 0 to 1 and tells you what percentage of the variation in y is explained by x. An R² of 0.9 means the regression line accounts for 90% of the variability in your data, with 10% left unexplained. An R² close to 0 means the line captures almost none of the pattern. You can always calculate R² by squaring the correlation coefficient r.
The Confidence Band Around the Line
Many tools, including Seaborn and statistical software like Minitab, draw a shaded region around the regression line. This is typically a 95% confidence band, and it represents the range where the true average y value likely falls for each x. The band is narrowest near the center of your data (around x̄) and widens toward the edges, reflecting greater uncertainty when you move away from the bulk of your observations.
A prediction band is wider still. While the confidence band estimates where the average falls, the prediction band estimates where a single new data point might land. Both are useful for understanding how precise your regression line really is.
How Outliers Affect the Line
Not all outliers distort a regression line equally. A point that has an unusual y value but sits in the middle of your x range often has little effect on the slope. In one Penn State analysis, removing such an outlier only changed the slope from 5.04 to 5.12. But a point that is extreme in both x and y can be highly influential. In another example from the same analysis, a single extreme point dragged the slope from 5.12 down to 3.32 and dropped the R² from 97% to 55%.
The key distinction is leverage. A data point with an extreme x value has high leverage, meaning it has the potential to pull the line toward itself. When a high-leverage point also doesn’t follow the general trend, it becomes influential and can seriously distort your results. You can spot these by scanning your scatter plot for points that sit far from the cluster in the x direction, then checking whether the regression line shifts noticeably when you remove them.

