When to Reject the Null Hypothesis in a T-Test

You reject the null hypothesis in a t-test when your calculated t-value exceeds the critical value for your chosen significance level, or equivalently, when your p-value falls below that significance threshold (typically 0.05). This means the difference you observed in your data is unlikely enough under the assumption of “no effect” that you can confidently say something real is going on.

What the Null Hypothesis Actually Claims

The null hypothesis is the default assumption that there’s no meaningful difference or effect. In a t-test, it states that two group means are equal, or that a single group’s mean equals some specific value. Your entire analysis is built around testing whether the data gives you enough reason to abandon this default position.

For example, if you’re testing whether a new teaching method improves exam scores compared to the standard method, the null hypothesis says both methods produce the same average score. The alternative hypothesis says they don’t. The t-test gives you a structured way to decide between these two positions using your data.

The Two Ways to Make the Decision

Comparing the T-Value to the Critical Value

When you run a t-test, you get a t-value, which measures how far your sample result is from what the null hypothesis predicts, scaled by the variability in your data. A t-value of 0 means your data perfectly matches what you’d expect if the null were true. The further your t-value gets from 0, the more your data conflicts with the null hypothesis.

To decide whether your t-value is “far enough,” you compare it to a critical value from the t-distribution table. This critical value depends on two things: your significance level (alpha) and your degrees of freedom, which are determined by your sample size. For a significance level of 0.05 and a two-tailed test with 20 degrees of freedom, the critical value is approximately 2.086. If your calculated t-value is greater than 2.086 or less than -2.086, you reject the null hypothesis.

Using the P-Value

The p-value approach gives you the same answer in a more intuitive way. The p-value tells you the probability of getting a result as extreme as yours (or more extreme) if the null hypothesis were actually true. A p-value of 0.03 means there’s only a 3% chance of seeing data this extreme in a world where the null hypothesis holds.

The rule is straightforward: if the p-value is less than or equal to your significance level, reject the null hypothesis. If your alpha is 0.05 and your p-value is 0.03, you reject. If your p-value is 0.08, you don’t. Both methods, the critical value approach and the p-value approach, always produce the same conclusion for the same data and significance level.

One-Tailed vs. Two-Tailed Tests

The type of test you run affects when rejection happens. A two-tailed test checks whether the means differ in either direction, so you split your significance level across both tails of the distribution. With alpha at 0.05, each tail gets 0.025. This means your t-value has to be more extreme to reach significance because you’re guarding against differences in both directions.

A one-tailed test checks for a difference in only one direction, such as whether a drug increases recovery speed rather than simply changes it. All of your alpha goes into one tail, making it easier to reject the null hypothesis in that specific direction. You’d use a one-tailed test only when you have a strong prior reason to expect the effect in one direction and a difference in the other direction would be meaningless to your question. If there’s any chance a result in the opposite direction matters, use two-tailed.

Choosing a Significance Level

The most common significance level is 0.05, meaning you’re willing to accept a 5% chance of rejecting the null hypothesis when it’s actually true (a false positive, or Type I error). This threshold isn’t a law of nature. It’s a convention that balances the risk of false positives against the ability to detect real effects.

In fields where false positives are especially costly, stricter thresholds are used. Particle physics uses a significance level equivalent to roughly 0.0000003 (the “five sigma” standard). Some medical research and genetics studies use 0.01 or 0.001. In exploratory social science research, 0.05 remains standard, and some researchers even argue for 0.10 in early-stage work where missing a real effect is more costly than a false alarm.

You should set your significance level before looking at your data. Choosing it after you’ve calculated the p-value introduces bias, because you’d be tempted to pick whichever threshold gives you the result you want.

What Degrees of Freedom Do

Degrees of freedom determine the shape of the t-distribution you’re comparing against, and they directly affect the critical value. With small samples, the t-distribution has heavier tails, meaning the critical value is larger and harder to exceed. As your sample size grows, degrees of freedom increase, the t-distribution narrows, and the critical value shrinks closer to the values you’d see in a normal distribution.

For a one-sample t-test, degrees of freedom equal your sample size minus one. For an independent two-sample t-test, it’s roughly the total of both sample sizes minus two (the exact calculation varies depending on whether you assume equal variances). For a paired t-test, it’s the number of pairs minus one. A study with 10 participants has a critical value around 2.26 for a two-tailed test at the 0.05 level, while a study with 100 participants has a critical value around 1.98. Larger samples make it easier to detect real differences.

A Worked Example

Suppose you’re comparing the average commute time of remote workers who moved to a new city (sample of 25 people, mean of 22 minutes) against a known population average of 26 minutes. Your calculated t-value comes out to -2.45, and you’re running a two-tailed test at the 0.05 significance level.

With 24 degrees of freedom (25 minus 1), the critical value from the t-table is approximately 2.064. Since the absolute value of your t-statistic (2.45) exceeds 2.064, you reject the null hypothesis. The corresponding p-value would be roughly 0.02, which is below 0.05, confirming the same decision. You’d conclude that remote workers who relocated have a significantly different average commute time from the general average.

What Rejection Actually Means

Rejecting the null hypothesis does not prove your alternative hypothesis is true. It means the data is inconsistent enough with “no difference” that you’re comfortable acting as though a real difference exists. There’s still a small probability (equal to your significance level) that you’ve made a Type I error and rejected a true null hypothesis.

Equally important: failing to reject the null hypothesis doesn’t prove there’s no difference. It means your data didn’t provide strong enough evidence to rule out the possibility of no difference. The effect might be real but too small for your sample size to detect, which is a problem of statistical power. Small samples can easily miss moderate effects.

Statistical significance also doesn’t tell you whether the difference matters in practical terms. A study with 10,000 participants might find a statistically significant difference of 0.2 points on a 100-point scale. That’s real in a statistical sense but probably irrelevant in practice. Always pair significance testing with effect size measures or confidence intervals to judge whether the result is meaningful in context.

Common Mistakes That Lead to Wrong Decisions

One frequent error is running multiple t-tests on the same dataset without adjusting the significance level. If you compare five groups using ten separate t-tests at the 0.05 level, your overall chance of at least one false positive jumps to around 40%. When comparing more than two groups, an ANOVA with post-hoc corrections is the appropriate approach.

Another mistake is treating a p-value just above 0.05 as fundamentally different from one just below it. A p-value of 0.049 and a p-value of 0.051 represent nearly identical evidence. The 0.05 cutoff is a decision tool, not a bright line separating truth from falsehood. Report your exact p-value and let the context inform interpretation rather than treating the threshold as absolute.

Finally, violating the assumptions of the t-test can make your rejection decision unreliable. The t-test assumes your data is roughly normally distributed (less important with larger samples, thanks to the central limit theorem), that observations are independent, and in the two-sample version, that variances are reasonably similar across groups. When these assumptions break down badly, consider non-parametric alternatives like the Mann-Whitney U test, which don’t require normality.