When Do You Reject the Null Hypothesis?

You reject the null hypothesis when your p-value is less than or equal to your chosen significance level (called alpha). In most fields, that significance level is set at 0.05, meaning there’s a 5% or lower chance the results occurred by random chance alone. This single rule drives the vast majority of statistical decisions in science, medicine, and social research.

The P-Value Rule

Every hypothesis test starts with a null hypothesis, which is essentially the “nothing interesting is happening” assumption. Maybe a new drug works no better than a placebo, or two groups have no real difference. Your goal is to determine whether your data provides strong enough evidence to reject that assumption.

The p-value measures how likely you’d be to see results at least as extreme as yours if the null hypothesis were actually true. A small p-value means your data would be very unlikely under the null hypothesis, which is evidence against it. The decision rule is straightforward: if p ≤ alpha, reject the null hypothesis. If p > alpha, don’t reject it.

The most common alpha level is 0.05, a convention dating back to the statistician Ronald Fisher, who suggested that a 1-in-20 probability was a reasonable cutoff for calling something significant. Some fields use stricter thresholds. Medical research sometimes requires 0.01, and particle physics famously uses roughly 0.0000003 (the “five sigma” standard). The alpha you choose before running your test determines how much evidence you need.

The Critical Value Method

There’s a second way to make the same decision, and it gives identical results. Instead of comparing p-values to alpha, you compare your calculated test statistic to a cutoff called the critical value. If your test statistic is more extreme than the critical value, you reject the null hypothesis. If it’s less extreme, you don’t.

Think of it this way: the critical value carves out a “rejection region” on the edges of your statistical distribution. If your result lands in that region, it’s far enough from what you’d expect under the null hypothesis that you reject it. Both the p-value approach and the critical value approach are mathematically equivalent, just two different angles on the same question.

One-Tailed vs. Two-Tailed Tests

Where you place the rejection region depends on the type of question you’re asking. A two-tailed test checks whether a value is significantly different in either direction. At an alpha of 0.05, it splits the significance level in half, placing 0.025 in each tail of the distribution. Your result needs to land in the top 2.5% or bottom 2.5% to qualify as significant.

A one-tailed test concentrates all of alpha in one direction. If you’re only interested in whether a treatment is better (not just different), the full 0.05 goes into one tail, making it easier to reject the null hypothesis in that specific direction. The tradeoff is that you can’t detect an effect in the opposite direction. For a symmetric distribution, the one-tailed p-value is exactly half the two-tailed p-value, so a result that isn’t quite significant in a two-tailed test might cross the threshold in a one-tailed test.

Non-inferiority trials in medicine use a specific version of one-tailed testing. Here, the null hypothesis is flipped: it assumes the new treatment is worse than the standard. Rejection means you’ve shown the new treatment isn’t meaningfully inferior. These trials typically use a one-sided significance level of 0.025.

The Confidence Interval Shortcut

You can also decide whether to reject the null hypothesis by looking at a confidence interval. If a 95% confidence interval for your estimate does not contain the null value (usually zero for a difference, or one for a ratio), you would reject the null hypothesis at the 0.05 level. If the null value falls inside the interval, you would not reject.

This works because of a direct mathematical relationship: the set of values a hypothesis test would not reject is exactly the confidence interval. So checking whether a value falls inside or outside the interval is a valid test. Many researchers prefer reporting confidence intervals because they also show the range of plausible effect sizes, not just a yes-or-no decision.

What Can Go Wrong

Rejecting the null hypothesis doesn’t guarantee you’re right. A Type I error happens when you reject a true null hypothesis, essentially a false alarm. Your alpha level is the maximum probability of this happening. At alpha = 0.05, you accept up to a 5% chance of incorrectly rejecting the null hypothesis over the long run.

The opposite mistake, a Type II error, happens when you fail to reject a null hypothesis that is actually false. You miss a real effect. The probability of avoiding this mistake is called statistical power, which is the ability to correctly reject a false null hypothesis. Low power usually comes from small sample sizes. The fewer observations you have, the harder it is to detect real differences, and the more likely you are to miss them.

Why Sample Size Matters

Larger samples make it easier to reject the null hypothesis, but that’s not always a good thing. Statistical significance depends on both the size of the effect and the size of the sample. With enough data, even a tiny, practically meaningless difference will produce a p-value below 0.05.

A well-known example: a massive trial with over 22,000 participants found that aspirin reduced heart attacks with a p-value below 0.00001, highly significant statistically. But aspirin did not reduce overall cardiovascular death. The enormous sample size made a small effect detectable, which raises the question of whether the finding, on its own, justified changing medical practice. As one widely cited paper put it, “sometimes a statistically significant result means only that a huge sample size was used.”

P-values are considered “confounded” by sample size for this reason. They blend together two things you’d want to evaluate separately: how big the effect is and how much data you have.

Statistical vs. Clinical Significance

Rejecting the null hypothesis tells you a difference probably exists. It does not tell you the difference matters. This distinction between statistical significance and clinical significance is one of the most important concepts in applied research.

A clinically significant result is one that actually improves a patient’s quality of life: better physical function, longer remission, less pain, improved mood. A statistically significant result only means the math checks out. Many outcomes can clear the statistical bar without being clinically relevant. A blood pressure drug that lowers readings by 1 mmHg might produce a significant p-value in a large trial, but no doctor would change a treatment plan based on that difference.

The American Statistical Association has pushed researchers to move beyond treating p ≤ 0.05 as a simple pass/fail gate. Their 2016 statement encouraged broader interpretation of results, including effect sizes, confidence intervals, and practical importance, rather than relying on a p-value alone. The core message: rejecting the null hypothesis is the start of interpretation, not the end of it.