What Is a Two-Sided Hypothesis Test and When to Use It

A two-sided hypothesis test (also called a two-tailed test) checks whether a value is different from a specific number in either direction, higher or lower. Unlike a one-sided test, which only looks for an effect in one direction, a two-sided test asks: “Is there any difference at all?” This makes it the default choice in most scientific research, because it doesn’t assume ahead of time which direction the results will go.

How a Two-Sided Test Is Set Up

Every hypothesis test starts with two competing statements. The null hypothesis says there’s no difference or no effect. The alternative hypothesis says there is one. In a two-sided test, the alternative hypothesis uses a “not equal to” sign rather than “greater than” or “less than.” For example, if you’re testing whether a coin is fair, the setup looks like this:

  • Null hypothesis: The probability of heads equals 0.50.
  • Alternative hypothesis: The probability of heads does not equal 0.50.

That “does not equal” is what makes it two-sided. You’re open to the possibility that the coin could be biased toward heads or toward tails. You don’t need to pick a direction before you start.

How Alpha Gets Split Between Two Tails

The significance level, usually set at 0.05 (or 5%), represents how much risk you’re willing to accept of incorrectly rejecting the null hypothesis. In a two-sided test, that 0.05 gets divided equally between the two ends of the distribution. This means 0.025 (2.5%) sits in the upper tail and 0.025 sits in the lower tail.

This split has a practical consequence: it’s harder to reach statistical significance with a two-sided test than with a one-sided test at the same alpha level. Your test statistic needs to be more extreme to land in one of those smaller 2.5% rejection zones. For a large sample using the normal distribution, the critical value at the 0.05 significance level is ±1.96. Your test statistic must fall beyond 1.96 in either direction to reject the null hypothesis. For comparison, a one-sided test at the same alpha level only requires a critical value of 1.645.

Calculating the P-Value

The p-value tells you how likely your observed result (or something more extreme) would be if the null hypothesis were true. In a two-sided test, you need to account for extreme results in both directions. The rule is straightforward: the p-value for a two-sided test is always twice the p-value of the corresponding one-sided test.

Say you run a test and get a test statistic of 2.10. If the one-sided p-value for that result is 0.018, the two-sided p-value is 0.036. You’d compare that 0.036 to your significance level of 0.05. Since 0.036 is smaller, you reject the null hypothesis and conclude there’s a statistically significant difference.

If your test statistic had been 1.80 instead, the one-sided p-value might be 0.036, making the two-sided p-value 0.072. That’s above 0.05, so you would fail to reject the null hypothesis.

A Real-World Example

Researchers studying whether maternal smoking affects children’s IQ scores used a two-sided test. Their null hypothesis stated that mean IQ scores were the same for children of mothers who smoked 10 or more cigarettes per day during pregnancy and children of non-smoking mothers. The alternative hypothesis stated that the means were not the same.

Why two-sided? Because before analyzing the data, the researchers acknowledged the theoretical possibility that smoking could be associated with higher or lower IQ scores. A two-sided test was the honest approach. The result turned out to be statistically significant, and the confidence interval then provided evidence of which direction the difference fell. This illustrates an important point: the p-value from a two-sided test tells you that a significant difference exists, but you look at the data itself (or the confidence interval) to determine the direction.

The Link to Confidence Intervals

Two-sided hypothesis tests and confidence intervals are two sides of the same coin. A 95% confidence interval corresponds directly to a two-sided test at the 0.05 significance level. The relationship works like this: if the value you’re testing against (say, zero for “no difference”) falls outside your 95% confidence interval, you would reject the null hypothesis in the two-sided test. If it falls inside, you would fail to reject.

This connection makes confidence intervals a useful companion to two-sided tests. The test gives you a yes-or-no answer about statistical significance, while the confidence interval shows you the plausible range of the true value and helps you judge whether the difference is large enough to matter in practice.

When to Use a Two-Sided Test

A two-sided test is appropriate whenever you don’t have a strong, pre-specified reason to look in only one direction. In practice, this covers most situations. If you’re comparing a new teaching method to a standard one, a two-sided test lets you detect whether the new method is better or worse. If you’re testing whether a drug changes blood pressure, a two-sided test catches both increases and decreases.

One-sided tests are reserved for cases where only one direction matters or where theory strongly predicts a specific direction. For example, you might use a one-sided test if you’re only interested in whether a new material is stronger than an existing one, and a weaker result is irrelevant to your decision. But many journals and regulatory agencies prefer or require two-sided tests, precisely because they’re more conservative and don’t bake assumptions about direction into the analysis.

The tradeoff is statistical power. Because a two-sided test splits its rejection region across both tails, it requires a larger sample size or a bigger effect to detect a significant result compared to a one-sided test. If you’re designing a study and you’re confident about the direction of the effect, a one-sided test gives you more power to detect it. But if there’s any chance the effect could go the other way, a two-sided test protects you from missing it entirely.