How to Reject the Null Hypothesis with P-Value

You reject the null hypothesis when your p-value is less than or equal to your significance level (alpha). In most studies, alpha is set at 0.05, so any p-value at or below 0.05 leads to rejection. If the p-value is greater than alpha, you do not reject the null hypothesis. That single comparison is the core decision rule, but understanding what it actually means requires a bit more context.

What the Null and Alternative Hypotheses Are

Every statistical test starts with two competing statements. The null hypothesis proposes that there is no effect, no difference, or no association between the things you’re studying. The alternative hypothesis is what you’re actually trying to demonstrate: that a treatment works better, that two groups differ, or that a relationship exists.

Counterintuitively, you don’t set out to prove the alternative hypothesis directly. Instead, you assume the null hypothesis is true and then ask how likely your observed data would be under that assumption. The p-value answers that question.

What the P-Value Actually Tells You

The p-value is the probability that you would see results as extreme as (or more extreme than) what your study found, assuming the null hypothesis is true. A p-value of 0.03, for instance, means there’s a 3% chance of seeing data this extreme if there really were no effect at all.

A common mistake is reading that backward. A p-value of 0.03 does not mean there’s a 3% probability the null hypothesis is true. It also doesn’t mean your results happened “by chance” 3% of the time. It’s a statement about how incompatible your data are with the assumption of no effect. The smaller the p-value, the greater that incompatibility.

The Step-by-Step Decision Process

Here’s how the full process works in practice:

  • State your hypotheses. Define the null hypothesis (no effect) and the alternative hypothesis (the effect you expect).
  • Choose your alpha level before collecting data. This is your threshold for what counts as “unlikely enough” to reject the null. Most researchers use 0.05, meaning they accept a 5% risk of being wrong.
  • Run your statistical test. Whether it’s a t-test, chi-square, or regression, the test produces a p-value from your data.
  • Compare the p-value to alpha. If p ≤ alpha, reject the null hypothesis in favor of the alternative. If p > alpha, do not reject the null hypothesis.

The key detail many people miss: alpha must be set before you look at the data. Choosing your threshold after seeing the results introduces bias and defeats the purpose of the test.

Why 0.05 Is the Standard (and When It Shouldn’t Be)

The 0.05 threshold means the researcher is willing to be wrong about 5% of the time, or 1 time in 20. Historically, statisticians concluded this was acceptable for many applications, and it stuck. Most published research articles still specify an alpha of 0.05.

But 0.05 isn’t sacred. When the stakes of a wrong conclusion are high, researchers tighten the threshold. An alpha of 0.01 means you want to be 99% sure before rejecting the null. In that case, a p-value of 0.02 would not be enough to reject, even though it would pass at the 0.05 level. The choice depends on how costly a false conclusion would be in your specific situation.

Why “Fail to Reject” Is Not the Same as “Accept”

You’ll notice the language is always “fail to reject” rather than “accept” the null hypothesis. That distinction matters. When your p-value is above alpha, you haven’t proven the null hypothesis is true. You simply don’t have strong enough evidence against it. Your study may have been too small to detect a real effect, or the effect may genuinely not exist. The data can’t tell you which.

Type I and Type II Errors

Two kinds of mistakes can happen during hypothesis testing. A Type I error (false positive) occurs when you reject the null hypothesis even though it’s actually true. Your alpha level is the maximum probability you’re willing to accept for this error. At alpha = 0.05, you’re allowing up to a 5% chance of a false positive.

A Type II error (false negative) occurs when you fail to reject the null hypothesis even though it’s actually false. The probability of this error is called beta. Lowering alpha to reduce false positives makes false negatives more likely unless you also increase your sample size. In practice, researchers balance both risks based on what matters more in their context. If falsely claiming an effect would be dangerous, a stricter alpha is appropriate. If missing a real effect would be the bigger problem, keeping beta low takes priority.

A Small P-Value Doesn’t Mean a Big Effect

One of the most important things to understand about p-values is what they don’t tell you. A p-value reveals whether an effect likely exists, but it says nothing about how large or meaningful that effect is. With a big enough sample, even a tiny, practically meaningless difference will produce a statistically significant p-value.

Consider a study with 10,000 participants that finds a new drug lowers blood pressure by 0.5 points more than a placebo. The p-value might be well below 0.05, but a half-point difference is clinically irrelevant. The statistical test detected a real difference, yet that difference is too small to matter in anyone’s treatment. This is the distinction between statistical significance and practical significance. As statistician Jacob Cohen put it, the primary product of research should be a measure of effect size, not a p-value. Knowing that an effect exists is only useful when you also know how large it is.

The American Statistical Association has emphasized that scientific conclusions should not be based solely on whether a p-value crosses a specific threshold. Study design, measurement quality, external evidence, and effect size all matter. A p-value is one piece of the puzzle, not the final answer.

A Quick Example

Suppose you’re testing whether a new teaching method improves exam scores compared to the standard approach. Your null hypothesis is that there’s no difference in scores between the two methods. You set alpha at 0.05 before collecting data.

After running your test, you get a p-value of 0.008. Since 0.008 is less than 0.05, you reject the null hypothesis and conclude that the teaching methods produce different results. If instead the p-value had been 0.12, you would fail to reject the null. You wouldn’t claim the methods are equal, just that your data didn’t provide sufficient evidence of a difference.