How to Interpret Levene’s Test for Equality of Variances

Levene’s test tells you whether the spread of scores in two or more groups is roughly the same. You interpret it by looking at the p-value: if it’s greater than 0.05, the variances are considered equal, and if it’s 0.05 or less, the variances are significantly different. That single decision point determines which version of a follow-up test (like a t-test or ANOVA) you should use.

What the Test Actually Measures

Levene’s test checks whether the variability within each group in your data is similar enough to treat as equal. Many common statistical tests, including the independent-samples t-test and one-way ANOVA, assume that the groups you’re comparing have roughly the same variance. If one group’s scores are tightly clustered and another group’s scores are widely scattered, that assumption is violated, and your results may not be reliable. Levene’s test is a formal way to check before you proceed.

The null hypothesis is straightforward: all group variances are equal. The alternative hypothesis is that at least one group’s variance differs from the others. The test produces an F-statistic, which follows an F-distribution. A large F value means the groups differ more in their spread than you’d expect by chance.

How the Test Works Under the Hood

Levene’s test doesn’t compare the variances directly. Instead, it transforms your data into a simpler problem. For each data point, it calculates how far that value is from its group’s center (either the group mean or median). These distances, called absolute deviations, replace the original data. Then a standard one-way ANOVA is run on those distances. If one group’s deviations are systematically larger than another’s, the group with bigger deviations has more variance.

This transformation is what makes Levene’s test more versatile than older alternatives like Bartlett’s test, which requires your data to be normally distributed. Because Levene’s test works with absolute deviations rather than raw variances, it holds up well even when your data is skewed or has heavy tails.

Reading the Output in SPSS and Other Software

In SPSS, Levene’s test appears automatically when you run an independent-samples t-test. The output table has two key columns for the test itself:

F: The F-statistic. A larger value suggests bigger differences between group variances.
Sig.: The p-value. This is the number you use to make your decision.

In R, you’ll typically see columns labeled “Df” (degrees of freedom), “F value,” and “Pr(>F)” (the p-value). The degrees of freedom in the numerator equal the number of groups minus one, and the denominator degrees of freedom equal the total sample size minus the number of groups. You don’t need to calculate anything from these yourself; they’re used behind the scenes to determine the p-value.

The Two Possible Outcomes

There are only two interpretations, and they depend entirely on whether the p-value crosses the 0.05 threshold.

If the p-value is greater than 0.05, you fail to reject the null hypothesis. This means there’s not enough evidence to say the variances are different. You can proceed with the standard version of your test, such as the equal-variances t-test or a regular one-way ANOVA. For example, if Levene’s test returns F = 1.45 and Sig. = 0.23, the variances are considered approximately equal.

If the p-value is 0.05 or less, you reject the null hypothesis. The variances are significantly different across groups. In SPSS, this means you should read the second row of the t-test output, labeled “Equal variances not assumed,” which uses Welch’s correction. This adjusted version changes the degrees of freedom to account for the unequal spread, giving you a more accurate p-value for your main comparison. For an ANOVA, you’d switch to Welch’s ANOVA instead of the standard version.

A Worked Example

Suppose you’re comparing test scores between two teaching methods. You run an independent-samples t-test in SPSS, and the Levene’s test row shows F = 31.47 and Sig. = 0.000. Because 0.000 is far below 0.05, you reject the null hypothesis. The two groups have significantly different variances, so you use the “Equal variances not assumed” row for your t-test results.

Now imagine a different dataset where Levene’s test returns F = 0.82 and Sig. = 0.37. Since 0.37 is above 0.05, you have no evidence that the variances differ. You use the “Equal variances assumed” row for your t-test. In both cases, Levene’s test itself is not the main analysis. It’s a preliminary check that tells you which version of the main analysis to trust.

Mean-Based vs. Median-Based Versions

There are actually several flavors of Levene’s test, and the difference comes down to how “group center” is defined. The original version uses the group mean to calculate deviations. A modification known as the Brown-Forsythe test uses the group median instead. The median-based version is more robust when your data is skewed, because the median isn’t pulled toward outliers the way the mean is.

Most modern software defaults to the median-based version or gives you the option to choose. If your data is roughly symmetric, both versions give similar results. If your data is noticeably skewed, the median-based version is the safer choice. In R’s leveneTest() function from the car package, the default center is the median. SPSS uses the median-based version in its t-test output as well.

Common Mistakes When Interpreting the Test

The most frequent error is reading the p-value backward. A significant Levene’s test (p ≤ 0.05) does not mean your variances are equal. It means the opposite: variances are unequal. This trips people up because in most analyses, a significant result is what you’re hoping for. With Levene’s test, a non-significant result (p > 0.05) is typically the outcome you want, since it means the equal-variance assumption holds.

Another common mistake is treating Levene’s test as definitive proof that variances are or aren’t equal. Like any hypothesis test, it’s influenced by sample size. With very large samples, trivially small differences in variance can produce a significant result. With very small samples, even meaningful differences might not reach significance. If your groups are similar in size, most follow-up tests (especially Welch’s versions) are fairly robust to moderate variance differences regardless of what Levene’s test says. Some statisticians recommend defaulting to Welch’s t-test or Welch’s ANOVA in all cases, treating the equal-variance versions as a special case you only use when you have strong reason to believe variances are truly equal.

What to Do After Each Result

If Levene’s test is not significant, you proceed with the standard equal-variances version of your analysis. For a two-group comparison, that’s the pooled t-test. For multiple groups, it’s the standard one-way ANOVA with post-hoc tests like Tukey’s HSD that assume equal variances.

If Levene’s test is significant, you switch to methods that don’t assume equal variances. For two groups, use Welch’s t-test (the “Equal variances not assumed” row in SPSS). For multiple groups, use Welch’s ANOVA, and for post-hoc comparisons, the Games-Howell test is a common choice because it doesn’t require equal variances. These adjusted methods modify the degrees of freedom to reflect the actual variance structure of your data, producing more accurate p-values when groups differ in spread.