You reject the null hypothesis when your test statistic is more extreme than the critical value. In practical terms, this means your calculated value falls in the “rejection region,” the outer tail(s) of the distribution where results are unlikely enough to count as evidence against the null hypothesis. The critical value is the boundary line: cross it, and you reject; fall short, and you don’t.
The Core Decision Rule
Every hypothesis test produces a test statistic, a single number that summarizes how far your sample result sits from what the null hypothesis predicts. The critical value marks the threshold on that same scale. If your test statistic lands farther from center than the critical value, you reject the null hypothesis. If it lands closer to center, you fail to reject.
What “farther from center” means depends on the direction of your test:
- Right-tailed test (alternative hypothesis says the true value is greater): reject if your test statistic is greater than the positive critical value.
- Left-tailed test (alternative hypothesis says the true value is less): reject if your test statistic is less than the negative critical value.
- Two-tailed test (alternative hypothesis says the true value is simply different): reject if your test statistic is less than the negative critical value or greater than the positive critical value.
For example, with 15 degrees of freedom and a significance level of 0.05, a right-tailed t-test has a critical value of about 1.76. If your calculated t-statistic comes out to 2.1, that’s past the boundary, so you reject. If it comes out to 1.3, you don’t. For a two-tailed test with the same degrees of freedom, the critical values shift outward to roughly ±2.14 because the 5% rejection area is split between both tails.
What the Rejection Region Actually Represents
The rejection region is the portion of the sampling distribution where outcomes are so unlikely under the null hypothesis that you treat them as evidence the null is wrong. If your significance level (alpha) is 0.05, the rejection region covers 5% of the distribution’s total area. The critical value is simply the cutoff point where that 5% begins.
In a one-tailed test, all 5% sits in a single tail. In a two-tailed test, it’s split: 2.5% in the upper tail and 2.5% in the lower tail. This is why two-tailed critical values are always larger in absolute terms than one-tailed critical values at the same alpha level. You need a more extreme result to reject when you’re splitting your rejection area across both ends of the distribution.
Common Critical Values to Know
For large samples where you’re using the standard normal (Z) distribution, these critical values come up repeatedly:
- Alpha = 0.10, one-tailed: 1.282
- Alpha = 0.05, one-tailed: 1.645
- Alpha = 0.05, two-tailed: 1.960
- Alpha = 0.01, one-tailed: 2.326
- Alpha = 0.01, two-tailed: 2.576
These Z critical values apply when you know the population standard deviation or when your sample is large enough that the t-distribution effectively becomes the normal distribution. NIST data confirms that at 100 degrees of freedom, the t critical values already match the normal values to two decimal places. With smaller samples, you need the t-distribution, which produces larger critical values to account for extra uncertainty.
How Degrees of Freedom Change the Threshold
When you use a t-test, the critical value depends on your degrees of freedom, calculated as your sample size minus one (df = n − 1). Fewer degrees of freedom means a wider, flatter distribution and a larger critical value. This makes it harder to reject the null hypothesis with small samples, which is appropriate since small samples give less reliable estimates.
Consider a two-tailed test at alpha = 0.05. With just 5 degrees of freedom (a sample of 6), your critical value is 2.571. With 30 degrees of freedom, it drops to 2.042. With 100, it’s 1.960, identical to the Z value. The practical takeaway: the same test statistic might lead to rejection with a large sample but not with a small one, because the critical value bar is set higher when you have less data.
When to Use Z vs. T Critical Values
The Z-test requires that you already know the population’s standard deviation, that the data is continuous and roughly normally distributed, and that the sample was drawn randomly. In practice, knowing the population standard deviation is rare, so most real-world tests with a single mean use the t-distribution instead.
The t-distribution handles the additional uncertainty of estimating the standard deviation from your sample. As samples grow larger, the t-distribution converges to the normal distribution, which is why Z critical values are sometimes called the t critical values at infinite degrees of freedom.
Critical Value Method vs. P-Value Method
The critical value approach and the p-value approach always produce the same conclusion. They’re two ways of applying the same logic. With the critical value method, you compare your test statistic to a threshold. With the p-value method, you compare a probability to your significance level: if p is less than alpha, you reject.
These are mathematically equivalent. If your test statistic falls beyond the critical value, then by definition the p-value is smaller than alpha. If the test statistic falls short of the critical value, the p-value is larger than alpha. The critical value method is often easier to work with by hand because you look up one number in a table and make a direct comparison. The p-value method is what most software reports, and it gives you additional information about how strong the evidence is, not just whether it crosses the threshold.
Why 0.05 Is Not the Only Option
Most textbooks and research papers default to an alpha of 0.05, making 1.96 (two-tailed Z) one of the most recognized numbers in statistics. But this convention is just that: a convention. The American Statistical Association has noted that the habitual use of 0.05 persists partly through inertia.
The 0.05 threshold assumes that a false positive (rejecting a true null hypothesis) is roughly four times worse than a false negative (failing to detect a real effect). That ratio makes sense in some contexts and not others. In medical testing where a false positive leads to unnecessary surgery, you might want alpha at 0.01 or lower. In an exploratory study where missing a real finding is costly, 0.10 could be reasonable. Choosing your alpha level thoughtfully, rather than defaulting to 0.05, produces better decisions. The critical value changes accordingly: a stricter alpha pushes the critical value farther into the tails, requiring stronger evidence to reject.
With very large samples, this matters even more. A large dataset can produce a statistically significant result at the 0.05 level for effects so small they have no practical importance. In those situations, lowering alpha prevents you from rejecting the null hypothesis over trivial differences.

