You reject the null hypothesis when your p-value is less than or equal to your chosen significance level (alpha), and you fail to reject it when the p-value is greater than alpha. Most studies set alpha at 0.05, meaning you need less than a 5% probability of seeing your results under the assumption that the null hypothesis is true before you can reject it. That comparison is the core decision rule, but applying it correctly requires understanding the full testing process and what your decision actually means.
The Five-Step Testing Process
Every hypothesis test follows the same basic sequence, regardless of whether you’re running a t-test, z-test, or chi-square test:
- State your hypotheses. The null hypothesis is your default assumption, usually that there’s no effect or no difference. The alternative hypothesis is what you’re trying to find evidence for.
- Compute the test statistic. Using your sample data, calculate a number that summarizes how far your observed result is from what the null hypothesis predicts.
- Determine the p-value. This tells you the probability of getting a result at least as extreme as yours if the null hypothesis were actually true.
- Make a decision. Compare the p-value to alpha and either reject or fail to reject the null hypothesis.
- State a real-world conclusion. Translate the statistical decision into a plain-language statement about your research question.
The P-Value Approach
The p-value approach is the most common method. If your p-value is less than or equal to alpha, you reject the null hypothesis in favor of the alternative. If your p-value is greater than alpha, you do not reject the null hypothesis.
Say you’re testing whether a new drug lowers blood pressure compared to a placebo, with alpha set at 0.05. You collect your data, run the test, and get a p-value of 0.03. Because 0.03 is less than 0.05, you reject the null hypothesis and conclude there’s statistically significant evidence that the drug has an effect. If you’d gotten a p-value of 0.12 instead, you’d fail to reject the null, meaning your data didn’t provide strong enough evidence to conclude the drug works.
Common alpha levels are 0.05, 0.01, and 0.10. A smaller alpha makes it harder to reject the null hypothesis, which reduces your chance of a false positive but increases your chance of missing a real effect.
The Critical Value Approach
Instead of comparing p-values, you can compare your test statistic directly to a cutoff called the critical value. If your test statistic is more extreme than the critical value, you reject the null hypothesis. If it’s less extreme, you fail to reject.
What “more extreme” means depends on the direction of your test. In a right-tailed test (where the alternative hypothesis predicts a value greater than the null), your test statistic needs to exceed the critical value on the right side of the distribution. In a left-tailed test, it needs to fall below the critical value on the left side. In a two-tailed test, it needs to be extreme in either direction, so there are two critical values to check against.
For example, with a sample size of 15 and alpha at 0.05, the critical value for a one-tailed t-test is about 1.76. If your calculated test statistic is 2.1, that’s more extreme than 1.76, so you reject the null. If it’s 1.3, you don’t. Both approaches, p-value and critical value, always produce the same decision.
The Confidence Interval Method
A third way to make the same decision uses confidence intervals. If you build a 95% confidence interval around your estimate and it does not contain the null hypothesis value (often zero for a difference, or one for a ratio), you reject the null at the 0.05 significance level. If the interval does contain the null value, you fail to reject.
This method has a practical advantage: it shows the range of plausible values for the true effect, giving you information about both direction and size. A hypothesis test gives you a yes-or-no answer, while the confidence interval tells you how big the effect might be.
Why It’s “Fail to Reject,” Not “Accept”
This language trips up almost everyone, but the distinction matters. When your p-value is too large to reject the null hypothesis, you haven’t proven the null is true. You’ve only shown that your data didn’t provide enough evidence against it. Statisticians use “fail to reject” specifically to preserve this distinction.
The reason is mathematical. A null hypothesis typically states that some value equals exactly zero (or some other specific number). To actually prove that, you would need an estimate with zero bias and infinite precision, which is impossible with real data. Because an estimate and its variability can never be exactly zero, traditional statistical tests simply cannot demonstrate the absence of an effect. Failure to reject a null hypothesis does not support the conclusion that no meaningful association exists. It may just mean your sample was too small to detect one.
Type I and Type II Errors
Every decision you make in hypothesis testing carries risk of being wrong in one of two ways.
A Type I error (false positive) happens when you reject a null hypothesis that’s actually true. Your alpha level directly controls the probability of this error. Setting alpha at 0.05 means you’ve accepted a 5% chance of incorrectly rejecting a true null hypothesis.
A Type II error (false negative) happens when you fail to reject a null hypothesis that’s actually false. The probability of this error is called beta. Many studies set beta at 0.20, which means accepting a 20% chance of missing a real effect. The flip side of beta is statistical power (1 minus beta), so a beta of 0.20 gives you 80% power, meaning an 80% chance of correctly detecting an effect that truly exists.
These two errors pull in opposite directions. Making alpha stricter (say, 0.01 instead of 0.05) protects you from false positives but makes false negatives more likely unless you increase your sample size. Designing a good study means choosing alpha, beta, and sample size together so that both error rates stay acceptable.
When You Fail to Reject: What to Consider
A common mistake is treating a failure to reject as proof that nothing is going on. Before drawing that conclusion, consider whether your study had enough statistical power. Many experiments use sample sizes too small to detect real effects, and the result is a non-significant p-value that reflects inadequate data rather than a true absence of effect.
If you specifically want to provide evidence that an effect is absent or negligibly small, standard hypothesis testing isn’t the right tool. Equivalence testing and other specialized methods exist for that purpose. The standard framework is designed to find evidence of effects, not to confirm their absence.
What P-Values Don’t Tell You
The American Statistical Association released a formal statement clarifying several widespread misunderstandings about p-values. The key points are worth internalizing if you’re making decisions based on hypothesis tests.
A p-value does not measure the probability that the null hypothesis is true. It measures how incompatible your data are with a specific statistical model. A p-value of 0.03 does not mean there’s a 3% chance the null hypothesis is true. It means that if the null were true, you’d see data this extreme only 3% of the time.
Statistical significance also doesn’t tell you anything about the size or practical importance of an effect. A tiny, meaningless difference can produce a highly significant p-value if the sample is large enough, and a large, important difference can fail to reach significance if the sample is too small. Whenever possible, report effect sizes and confidence intervals alongside your p-value so that readers can judge both the statistical and practical significance of your findings.
Scientific conclusions shouldn’t rest on whether a p-value crosses a single threshold. The difference between a p-value of 0.049 and 0.051 is trivial, yet one leads to “reject” and the other to “fail to reject.” Treat the decision rule as a guide, not a bright line that separates truth from fiction.

