What Is the Homogeneity of Variance Assumption?

The homogeneity of variance assumption is the requirement that the groups you’re comparing in a statistical test all have roughly the same spread of scores around their means. When you run a t-test or an ANOVA, the math behind those tests pools variance across groups into a single estimate. If one group’s data is tightly clustered while another’s is widely scattered, that pooled estimate becomes misleading, and your p-values can’t be trusted.

This assumption also goes by the name “homoscedasticity,” and its violation is called “heteroscedasticity.” Understanding when it matters, how to check for it, and what to do when it’s violated will save you from drawing wrong conclusions from your data.

Why This Assumption Exists

T-tests and ANOVAs work by comparing the differences between group means to the variability within those groups. To do this efficiently, they combine the variance from all groups into one pooled number. That pooling only makes sense if the groups actually have similar variances. If Group A has a variance of 5 and Group B has a variance of 500, averaging those together produces a number that accurately represents neither group.

The formal requirement is straightforward: the population variances of all groups being compared are equal. In notation, that’s σ₁² = σ₂² = σ₃² and so on. In practice, perfect equality never happens with real data, so the question becomes how much inequality your analysis can tolerate before results go wrong.

Which Tests Require It

The assumption applies to the most commonly used tests in statistics:

Independent samples t-test: compares the means of two groups
One-way ANOVA: compares means across three or more groups
Factorial ANOVA: compares means across multiple grouping variables
Linear regression: assumes the scatter of data points around the regression line stays constant across all predicted values

In regression, the assumption takes a slightly different form. Instead of comparing group variances, you’re checking that the residuals (the gaps between predicted and actual values) have consistent spread across the full range of your predictor variable. The underlying principle is the same: the error term should behave consistently, not grow or shrink depending on where you look in the data.

What Happens When It’s Violated

Violating this assumption distorts your results in two main ways. First, it inflates the Type I error rate, meaning you’re more likely to declare a statistically significant difference when none actually exists. Your test tells you p = .03, so you reject the null hypothesis, but the true probability of that result occurring by chance is actually much higher. Second, it can reduce statistical power, making it harder to detect real differences that do exist.

The severity depends on your sample sizes. When groups have equal sample sizes, ANOVA is fairly robust to moderate violations. The problems get serious when you have both unequal variances and unequal group sizes. If the larger variance happens to be in the smaller group, Type I error rates climb especially fast. If the larger variance is in the larger group, the test becomes overly conservative, missing real effects.

A Quick Rule of Thumb

You don’t need a formal test to get a first impression. Compare the standard deviations of your groups. If the ratio of the largest standard deviation to the smallest falls between 0.5 and 2, the assumption is likely reasonable. A group with a standard deviation of 8 compared to a group with a standard deviation of 12 gives a ratio of 1.5, which falls within that range. A ratio of 4 or higher is a clear red flag.

This rule works as a screening tool, but it’s not a substitute for a proper test when the stakes matter.

How to Test for It

Two formal tests are used most often, and they have different strengths.

Bartlett’s test compares group variances using a formula based on the ratio of pooled variance to each group’s individual variance. It’s a powerful test when your data are normally distributed. The catch is that it’s sensitive to non-normality. With skewed data, Bartlett’s test produces false positive rates near .10 across most sample sizes, roughly double the standard .05 threshold. That means it may tell you your variances are unequal when the real problem is just skewness in your data.

Levene’s test is the more practical choice for most situations. It works by computing the absolute deviations of each observation from its group mean (or median), then running an ANOVA on those deviations. It handles non-normal data much better than Bartlett’s test, with error rates closer to .06 at high sample sizes even with skewed distributions. For data that aren’t normally distributed, Levene’s test is the better option. For normally distributed data with more than two groups, Bartlett’s test has a slight edge in detection power.

Visual Diagnosis

Plotting your data often reveals variance problems faster than any test. For group comparisons, side-by-side boxplots show you immediately whether one group’s data is more spread out than another’s. If one box is three times the height of another, you have a variance problem.

For regression, the standard diagnostic is a residuals vs. fitted values plot. You plot your model’s predicted values on the x-axis and the residuals on the y-axis. If the assumption holds, the points form a roughly horizontal band of even width around zero. If you see a funnel or fan shape, where the scatter gets wider as fitted values increase, that’s heteroscedasticity. This visual check is often more informative than a formal test because it shows you the pattern and severity of the problem, not just a yes-or-no answer.

What to Do When Variances Are Unequal

Use a Test That Doesn’t Require It

The most straightforward solution is switching to a test designed for unequal variances. Welch’s t-test replaces the standard t-test for two-group comparisons and is now the default in many statistical software packages. It adjusts both the test statistic and the degrees of freedom based on each group’s individual variance rather than pooling them together.

For three or more groups, Welch’s ANOVA does the same thing. It assigns a weight to each group based on its sample size and observed variance, reducing the influence of groups with high variability. This makes it reliable even with small samples and substantially different variances. Many statisticians now recommend using Welch’s versions by default, since they perform nearly as well as the standard versions when variances are equal and much better when they’re not.

Transform the Data

Sometimes a mathematical transformation can stabilize the variances across groups. The two most common options are the square root transformation and the log transformation. The square root works well when variance increases proportionally with the mean. The log transformation is more aggressive and is particularly useful when standard deviations are two to four times the size of the mean, a pattern common in health expenditure data, reaction time studies, and other right-skewed measurements.

The tradeoff is interpretability. Once you log-transform your data and run your analysis, the results describe differences on the log scale, not the original scale. Converting back requires a retransformation step that introduces its own complications, especially when heteroscedasticity is present in the transformed data too.

Use Non-Parametric Tests

Non-parametric alternatives like the Mann-Whitney U test (for two groups) or the Kruskal-Wallis test (for three or more) don’t assume any particular distribution shape. They work with ranks rather than raw values, which makes them naturally less sensitive to variance differences. These are a reasonable fallback when transformations don’t fix the problem and Welch’s versions aren’t available for your specific design. Keep in mind that non-parametric tests answer a slightly different question: they compare the overall distributions or median ranks rather than means, so they’re not a perfect substitute if your research question is specifically about mean differences.

When It Matters Most

The practical impact of violating this assumption scales with the severity of the imbalance and your study design. Three factors make violations particularly damaging: large differences in group variances (ratios above 4:1), unequal sample sizes across groups, and small samples overall. If your groups are roughly the same size and your variance ratio stays below 2:1, standard tests hold up well. If you’re running an unbalanced design with small groups and widely different variances, you should treat this assumption seriously or default to Welch’s versions from the start.