What Is the Null Hypothesis for Linear Regression?

The null hypothesis for linear regression states that the slope of the regression line equals zero. Written formally, it’s H₀: β₁ = 0. This means the predictor variable (x) has no linear relationship with the outcome variable (y). If the null hypothesis is true, knowing x tells you nothing useful about y.

What the Null Hypothesis Actually Says

A simple linear regression model takes the form y = β₀ + β₁x, where β₀ is the y-intercept and β₁ is the slope. The slope is the key piece: it tells you how much y changes for each one-unit increase in x. When you set up a null hypothesis of H₀: β₁ = 0, you’re proposing that the true slope in the population is zero. In other words, x has no effect on y, and the best prediction for y is just a flat horizontal line at the average value of y.

The alternative hypothesis, H₁: β₁ ≠ 0, is the claim you’re actually trying to support. It says there is a statistically significant linear relationship between x and y, meaning the slope is something other than zero. This is a two-tailed test by default, because the slope could be positive or negative. If you have a directional prediction (say, you expect the relationship to be positive), you can use a one-tailed alternative like H₁: β₁ > 0, but the two-tailed version is standard.

How the Slope Is Tested

To decide whether to reject the null hypothesis, regression uses a t-test on the slope coefficient. The test statistic is calculated as:

t = b₁ / SE(b₁)

Here, b₁ is the slope estimate from your sample data, and SE(b₁) is the standard error of that estimate, which captures how much the slope would vary across different random samples. The logic is simple: if the true slope were zero, how surprising would your observed slope be? A large t-value means your observed slope is far from zero relative to the noise in the data, which makes the null hypothesis harder to defend.

This t-statistic follows a t-distribution with n − 2 degrees of freedom, where n is your sample size. You lose two degrees of freedom because the model estimates two parameters: the intercept and the slope. From the t-value, software calculates a p-value. If that p-value falls below your chosen significance level (commonly 0.05), you reject the null hypothesis and conclude that the relationship between x and y is statistically significant. A p-value above 0.05 means the data are consistent with a zero slope, so you fail to reject the null.

It’s worth noting that the p-value is not a measure of how strong the relationship is. A tiny p-value with a very large sample might correspond to a slope that’s statistically significant but practically meaningless. Always look at the actual slope estimate and the R² value alongside the p-value.

The Null Hypothesis for Multiple Regression

When your model includes more than one predictor, there are two levels of null hypothesis testing. First, each individual predictor has its own null hypothesis: H₀: βₖ = 0, tested with its own t-test. This tells you whether that specific predictor contributes to the model after accounting for all the other predictors.

Second, there’s an overall F-test for the entire model. The null hypothesis here is that all slope coefficients equal zero simultaneously. You can think of it as comparing two models: a “reduced” model that only includes the intercept (predicting y with just its average), and the “full” model with all your predictors. In formal notation:

Null (reduced model): y = β₀ + ε (no predictors matter)
Alternative (full model): y = β₀ + β₁x₁ + β₂x₂ + … + ε (at least one predictor matters)

If the F-test is significant, at least one of your predictors has a real relationship with the outcome. If it’s not significant, the model as a whole doesn’t explain y better than simply using the mean. For simple linear regression with just one predictor, the F-test and the t-test on the slope give you the same result.

What About the Intercept?

Most regression output also shows a null hypothesis test for the intercept: H₀: β₀ = 0. This tests whether the predicted value of y is zero when x equals zero. In practice, this test is often irrelevant. If your data don’t include values near x = 0, or if x = 0 doesn’t make real-world sense (like testing whether height predicts weight when height is zero), the intercept’s p-value is meaningless.

Only interpret the intercept test when two conditions are met: your data actually include observations near x = 0, and it’s scientifically plausible that the outcome could be zero at that point. Otherwise, treat the intercept as a mathematical necessity for fitting the line and ignore its p-value.

Assumptions That Must Hold

The null hypothesis test only gives valid results if four assumptions about your data are met. Mild violations won’t ruin your analysis, but serious violations can make your p-values unreliable.

Linearity: The relationship between x and y is actually linear, not curved. If you plot the residuals (the differences between observed and predicted values), they should scatter randomly around zero with no pattern.
Normality: The residuals follow a roughly normal distribution. This matters most with small samples. With large samples, the t-test is robust to non-normal residuals.
Equal variance: The spread of residuals stays consistent across all values of x. If the residuals fan out or narrow as x increases, your standard errors (and therefore your p-values) may be wrong.
Independence: Each observation is independent of the others. This is about how you collected the data, not something you can fix after the fact. Repeated measurements on the same person or data collected over time often violate this assumption.

Before interpreting any null hypothesis test in regression, check these assumptions using residual plots. If the assumptions are badly violated, the p-value next to your slope might be telling you something that isn’t true.