When to Use a Pooled T-Test vs. Welch’s

A pooled t-test is the right choice when you’re comparing the means of two independent groups and their population variances are roughly equal. That equal-variance condition is the key deciding factor. When it holds, pooling gives you more statistical power than the alternative (Welch’s t-test). When it doesn’t hold, pooling can produce misleading results, especially if your group sizes also differ.

What the Pooled T-Test Actually Does

The pooled t-test (also called Student’s t-test for independent samples) combines the variability from both groups into a single “pooled” estimate of variance. It does this by taking a weighted average of the two sample variances, where each group’s contribution is weighted by its degrees of freedom (its sample size minus one). The total degrees of freedom equal the combined sample sizes minus two.

This pooling step is the entire difference between a pooled t-test and Welch’s t-test. Welch’s version keeps the two variance estimates separate and adjusts the degrees of freedom downward to compensate. That adjustment costs you a small amount of statistical power, which is why pooling is preferable when the assumption actually holds.

The Four Assumptions You Need to Meet

Before using a pooled t-test, four conditions should be reasonably satisfied:

Independence of observations. Subjects in one group can’t also appear in the other group, and no subject in either group can influence subjects in the other. Violating this produces inaccurate p-values regardless of which t-test you choose.
Random sampling. Your data should represent the population you’re drawing conclusions about.
Approximate normality. The variable you’re measuring should follow a roughly normal distribution within each group. Non-normal distributions, particularly those that are heavily skewed or have thick tails, reduce the test’s power. With moderate or large samples, though, mild violations of normality still tend to produce accurate p-values.
Equal variances (homoscedasticity). The spread of data in both groups should be approximately the same. This is the assumption unique to the pooled version, and the one that matters most for deciding between pooled and Welch’s.

Why Equal Variances Matter So Much

The pooled t-test is not overly sensitive to most assumption violations, but unequal variances are a real problem. When the true population variances differ, the pooled formula can produce incorrect Type I error rates, meaning you’ll reject the null hypothesis more (or less) often than your stated significance level suggests. The result is a p-value you can’t trust.

This problem gets worse when your two groups have different sample sizes. With unequal sample sizes, the pooled variance formula gives more weight to the variance from the larger group. If that larger group also happens to have the smaller variance, the test becomes liberal (too many false positives). If the larger group has the bigger variance, the test becomes conservative (misses real effects). When both groups have the same sample size, unequal variances cause less damage because the weighting is balanced.

So the practical rule: if your sample sizes are equal, the pooled t-test is fairly robust to moderate variance differences. If your sample sizes are unequal, even modest variance differences can distort your results.

How to Check for Equal Variances

Several formal tests exist for checking whether two groups share the same variance. Levene’s test is the most commonly used because it doesn’t require that your data be perfectly normal. It works by comparing how far individual observations fall from their group mean. For data that are approximately normal, Bartlett’s test is more powerful at detecting variance differences, but it’s sensitive to non-normality, making it less reliable as a general-purpose tool.

A simpler approach: just look at the ratio of the two sample variances. A common rule of thumb is that if the larger variance is no more than about twice the smaller one, pooling is reasonable.

There’s a catch with the formal testing approach, though. Using a preliminary test like Levene’s to decide which t-test to run (pooled if the variance test is non-significant, Welch’s if it’s significant) can itself inflate your overall Type I error rate. You’re making a decision conditional on one test’s outcome, which changes the statistical properties of the second test. A 2017 paper in the International Review of Social Psychology demonstrated through simulations that this two-step procedure performs worse under realistic conditions than simply using Welch’s t-test every time.

When Pooling Gives You an Edge

If the equal-variance assumption genuinely holds, the pooled t-test is the more powerful option. “More powerful” means it’s better at detecting a real difference between groups when one exists. The power advantage comes from the extra degrees of freedom: pooling uses all available data to estimate a single variance, giving you a slightly more precise estimate than keeping the variances separate.

In practice, this advantage is modest. For large samples, the difference in power between pooled and Welch’s tests is negligible because both tests have plenty of degrees of freedom. The power gain from pooling is most noticeable with small samples, which is also when verifying the equal-variance assumption is hardest (because small samples give unstable variance estimates).

The scenarios where a pooled t-test clearly makes sense:

You have strong prior reason to believe the variances are equal (for example, the two groups were sampled from the same population or process, or previous studies with large samples consistently show similar variances).
Your sample sizes are equal or very close to equal, which provides a natural buffer against mild variance differences.
Your samples are small enough that the extra degrees of freedom from pooling meaningfully improve power, and you have good evidence supporting the equal-variance assumption.

Why Many Researchers Default to Welch’s

There’s a growing consensus, particularly in psychology and biostatistics, that Welch’s t-test should be the default choice. The logic is straightforward: when variances are equal, Welch’s t-test performs almost identically to the pooled version. When variances are unequal, Welch’s maintains accurate error rates while the pooled test does not. You give up very little by defaulting to Welch’s, but you risk real problems by defaulting to pooled.

This reasoning has already shaped how statistical software handles the choice. R’s built-in t.test() function sets var.equal = FALSE by default, meaning it runs Welch’s test unless you explicitly request pooling. To use the pooled version in R, you need to set var.equal = TRUE. SPSS takes a different approach: it outputs both versions side by side and lets you decide which to report based on a Levene’s test result included in the output.

Simulation studies show that under realistic conditions where variances differ even slightly, the pooled test can yield biased results and invalid inferences. The equal-variance assumption is often violated in real-world research, and researchers may not always detect those violations reliably. For these reasons, many statisticians now treat Welch’s as the safer all-purpose option and reserve the pooled test for situations where equal variances are well-established.

A Practical Decision Framework

If you’re deciding between the two tests for a specific analysis, the choice comes down to how confident you are in the equal-variance assumption and how much the power difference matters to you.

Use the pooled t-test when you have equal (or nearly equal) sample sizes, your data are approximately normal, and you have good reason to believe the variances are similar, either from a formal test, a variance ratio close to 1, or prior knowledge about the populations. In this scenario, you get slightly more power and a cleaner analysis.

Use Welch’s t-test when your sample sizes are unequal, when you’re unsure about the variance assumption, or when you simply want a safer default that protects against incorrect p-values. The power cost is small, and the protection against inflated error rates is real. If you’re writing up results for publication, using Welch’s by default is increasingly accepted and often preferred by reviewers who are aware of the risks of unnecessary pooling.