How to Interpret Levene’s Test: P-Values Explained

Levene’s test checks whether two or more groups have equal variances, and you interpret it primarily through its p-value. If the p-value is greater than 0.05, you can treat the variances as roughly equal. If it’s 0.05 or below, the variances differ enough to matter, and you’ll need to adjust your analysis accordingly. That single decision point drives most of what you’ll do with the result.

What Levene’s Test Actually Measures

Many statistical tests, especially ANOVA and the independent samples t-test, assume that the groups you’re comparing have similar amounts of spread in their data. Levene’s test is a formal way to check that assumption before you proceed.

The null hypothesis states that all group variances are equal. The alternative hypothesis states that at least one pair of groups has unequal variances. The test doesn’t tell you which groups differ or by how much. It simply flags whether the assumption of equal variance (called homogeneity of variance) holds across your data.

Under the hood, the test works by calculating how far each data point falls from its group’s center, then running an ANOVA on those distances. If one group’s data points tend to be much farther from the center than another’s, the test picks up on that difference. The original version of the test uses the group mean as the center point. A popular modification, the Brown-Forsythe version, uses the group median instead. Most statistical software defaults to the median-based version because it performs better with real-world data that isn’t perfectly symmetrical.

Reading the P-Value

The output of Levene’s test gives you an F statistic and a p-value. The F statistic itself rarely matters for your decision. Focus on the p-value.

  • P-value above 0.05: You fail to reject the null hypothesis. The variances are not significantly different, and you can proceed with your planned analysis (standard ANOVA, pooled t-test, Tukey’s HSD) as normal.
  • P-value at or below 0.05: You reject the null hypothesis. The variances are significantly unequal, and you should switch to methods that don’t assume equal variance.

A common mistake is treating a non-significant Levene’s test as proof that variances are perfectly equal. It’s not. It simply means the test didn’t find strong enough evidence of a difference. With small sample sizes, Levene’s test may lack the statistical power to detect real differences in variance. This is worth keeping in mind if your groups are small (under 20 or so per group) and the p-value lands somewhere between 0.05 and 0.20.

When the Result Is Significant

A significant Levene’s test means your groups have unequal variances, and any analysis that assumes equal variance will produce unreliable results. Here’s what to do depending on your situation.

If you were planning a one-way ANOVA, switch to Welch’s ANOVA. It adjusts the degrees of freedom to account for unequal variances and doesn’t require the homogeneity assumption. In R, you can specify var.equal=FALSE in the one-way ANOVA command. In SPSS, Welch’s test is typically reported alongside the standard ANOVA output by default.

If you were planning an independent samples t-test, use Welch’s t-test instead of the pooled (Student’s) version. Most software already reports both versions side by side, so you just read the correct row.

For post-hoc pairwise comparisons after ANOVA, the standard options like Tukey’s HSD assume equal variances. When that assumption is violated, switch to a method designed for unequal variances: Games-Howell, Tamhane’s T2, Dunnett’s T3, or Dunnett’s C. Of these, Games-Howell is the most widely used and is available in most major software packages.

When the Result Is Not Significant

A non-significant result is straightforward. You proceed with whatever analysis you originally planned. Standard ANOVA, pooled t-tests, and Tukey’s HSD all remain valid options. There’s nothing else you need to do or report beyond noting that the assumption was checked and met.

That said, some statisticians argue you should use Welch’s methods by default regardless of what Levene’s test says. The reasoning is that Welch’s corrections perform nearly as well as standard tests when variances are equal, while protecting you when they aren’t. If you’re unsure, using Welch’s approach is a safe bet in either direction.

Sample Size Can Distort the Result

Levene’s test is sensitive to sample size in two ways that can mislead you.

With very large samples (hundreds or thousands per group), the test becomes extremely sensitive. It may flag tiny, practically meaningless differences in variance as statistically significant. If your Levene’s test is significant but the actual variance values look similar when you inspect them, the significance may be driven by sample size rather than a real problem. In this case, look at the ratio of the largest to smallest group variance. A ratio under 2:1 is generally considered acceptable for ANOVA even if Levene’s test flags it.

Large differences in group sizes create a separate problem. When one group has far more observations than another, the original Levene’s test (mean-based) can reject the null hypothesis even when the variances truly are equal. This is a false positive driven by the imbalance, not by actual heterogeneity. The Brown-Forsythe modification (median-based) handles this better, which is one reason it became the default in most software.

With small samples, the opposite happens. The test lacks power to detect real differences in variance, so it may give you a reassuring non-significant result when the variances actually do differ. If your groups have fewer than about 20 observations each, consider supplementing Levene’s test with a visual check: plot your data by group using boxplots or dot plots and compare the spread visually.

Mean-Based vs. Median-Based Versions

When you see “Levene’s test” in software output, it may be using the group mean or the group median as the center point, and this matters for how robust the result is.

The original 1960 formulation by Howard Levene used group means. This version works well when your data is roughly normally distributed, but it becomes unreliable when data is skewed. In 1974, Brown and Forsythe showed that substituting the group median for the mean made the test far more robust to non-normal distributions. Their version performs better with skewed data because the median isn’t pulled toward outliers the way the mean is.

In R, the leveneTest() function from the car package defaults to the median-based version. You can switch to the original by specifying center = mean. In SPSS, the reported Levene’s test uses the mean-based version. If your data is noticeably skewed, the median-based version gives more trustworthy results. One edge case: when data has extremely heavy tails (very peaked distributions with far-flung outliers), both versions become conservative, meaning they may fail to detect real variance differences.

Reporting Levene’s Test

In a write-up, report the F statistic, degrees of freedom, and p-value. A typical sentence looks like this: “Levene’s test indicated equal variances across groups, F(2, 57) = 1.34, p = .27.” If the test was significant, note it and state what adjustment you made: “Levene’s test indicated unequal variances, F(2, 57) = 4.82, p = .01, so Welch’s ANOVA was used.”

You don’t need to describe the test in detail or explain how it works. Reviewers and readers familiar with statistics will recognize the name. The important thing is that you tested the assumption, reported the result, and adjusted your analysis if needed.