You reject the null hypothesis when your p-value is less than your pre-set significance level, called alpha. The most common alpha threshold is 0.05, meaning you reject the null hypothesis when the p-value falls below 0.05. That single rule drives most statistical decision-making in science, medicine, and social research, but understanding what it actually means (and what it doesn’t) matters just as much as memorizing the cutoff.
The Basic Decision Rule
Before you run any statistical test, you choose a significance level (alpha). This is the maximum risk of a false positive you’re willing to accept. Then you run your test, get a p-value, and compare the two numbers:
- P-value less than alpha: reject the null hypothesis. Your result is statistically significant.
- P-value greater than or equal to alpha: fail to reject the null hypothesis. Your result is not statistically significant.
Notice the phrasing: you “fail to reject” rather than “accept” the null hypothesis. That distinction is intentional. A non-significant result doesn’t prove the null hypothesis is true. It simply means your data weren’t strong enough to rule it out.
Why 0.05 Is the Standard Threshold
The 0.05 cutoff has been the default in most fields for decades. It means you’re allowing a 5% chance of concluding there’s an effect when there actually isn’t one. In pharmaceutical research, a stricter threshold of 0.01 (1% chance) is more common because the stakes of a false positive, approving an ineffective drug, are higher. The FDA uses the same framework: reject the null hypothesis if the p-value is below alpha, which typically means demonstrating that a drug’s effect is real and not due to chance.
Some researchers have pushed to make the default even stricter. A 2017 proposal published in Nature Human Behaviour recommended changing the standard threshold from 0.05 to 0.005 for claims of new discoveries, arguing that too many findings at the 0.05 level fail to replicate. In particle physics, the threshold has long been far more extreme, roughly 1 in 3.5 million, because claiming a new particle exists demands near-certainty.
The key point: alpha isn’t fixed by some law of nature. You choose it based on how costly a false alarm would be in your specific situation.
What the P-Value Actually Tells You
A p-value answers a narrow question: if the null hypothesis were true (if there really were no effect), how likely would you be to see data at least as extreme as what you observed? A p-value of 0.03, for example, means there’s a 3% probability of getting results this extreme by chance alone, assuming no real effect exists.
That definition is easy to twist into something it isn’t. Two of the most common misreadings are “the p-value is the probability that your results occurred by chance” and “the p-value is the probability that the null hypothesis is true.” Both are wrong. The p-value is calculated on the assumption that the null hypothesis is already true, so it can’t simultaneously tell you the probability of that assumption being correct. The American Statistical Association issued a formal statement in 2016 specifically to correct these misunderstandings, emphasizing that a p-value does not measure the probability that the studied hypothesis is true or that the data were produced by random chance alone.
The Trade-Off Between False Positives and False Negatives
Choosing your alpha level involves a trade-off between two types of errors. A Type I error (false positive) happens when you reject a null hypothesis that is actually true. You conclude there’s an effect, but there isn’t one. Your alpha level is the maximum probability you’re willing to accept for this kind of mistake. Setting alpha at 0.05 means you’re comfortable with up to a 5% chance of a false positive.
A Type II error (false negative) goes the other direction: you fail to reject a null hypothesis that is actually false. A real effect exists, but your test missed it. The probability of a Type II error is called beta, and statistical power (1 minus beta) is the likelihood your test will correctly detect a real effect.
Here’s the tension: lowering alpha to reduce false positives automatically makes false negatives more likely, unless you compensate by increasing your sample size. If you move your threshold from 0.05 to 0.005, you need stronger evidence to reject the null hypothesis, which means smaller real effects are more likely to slip through undetected. Researchers typically choose a low alpha when false positives are especially dangerous and accept more false-negative risk, or they choose a higher alpha when missing a real effect would be the bigger problem.
Statistical Significance Is Not the Same as Importance
A p-value below 0.05 tells you an effect probably isn’t zero. It does not tell you the effect is large, meaningful, or worth caring about. With a big enough sample, even a trivially small difference will produce a significant p-value. A study with 50,000 participants might find that a new teaching method raises test scores by 0.2 points on a 100-point scale, with p = 0.001. Statistically significant, yes. Practically meaningful, probably not.
This is why effect size matters alongside the p-value. Effect size measures how large the difference or relationship actually is, independent of sample size. As one widely cited editorial in the Journal of Graduate Medical Education put it: “Statistical significance is the least interesting thing about the results. You should describe the results in terms of measures of magnitude, not just does a treatment affect people, but how much does it affect them.” A complete picture requires both: the p-value tells you whether an effect exists, and the effect size tells you whether anyone should care.
How Confidence Intervals Complement the P-Value
Confidence intervals offer another way to make the same reject-or-not decision while giving you more information. A 95% confidence interval provides a range of values within which the true population parameter likely falls. If that range does not include the null value (typically zero for a difference between groups, or 1 for a ratio like an odds ratio), the result is statistically significant at the 0.05 level. If the interval does include the null value, it’s not significant.
The advantage of confidence intervals is that they show magnitude and precision at a glance. A confidence interval of 2.1 to 8.7 for a difference in blood pressure tells you the effect is significant (the interval doesn’t include zero), the effect could be as small as about 2 points or as large as nearly 9, and the estimate has a fair amount of uncertainty. A p-value alone would only tell you the first part. The narrower the confidence interval, the smaller the p-value tends to be, because both reflect how precisely your study estimated the true effect.
Putting It Into Practice
When you’re reading a study or running your own analysis, the sequence looks like this: set your alpha level before collecting data, run the appropriate statistical test, compare the resulting p-value to alpha, and reject the null hypothesis only if the p-value comes in below your threshold. But don’t stop there. Check the effect size to see if the result is large enough to matter. Look at the confidence interval to understand the range of plausible values. And consider whether the study had enough participants to detect a meaningful effect in the first place.
The American Statistical Association’s 2016 statement captured this well in six principles, two of which are worth highlighting. First, scientific conclusions should not be based only on whether a p-value crosses a specific threshold. Second, a p-value by itself does not provide a good measure of evidence regarding a hypothesis. The p-value is one piece of a larger puzzle. It tells you whether your data are incompatible with the assumption of no effect, but the size, direction, and real-world relevance of that effect require tools beyond the p-value alone.

