The null hypothesis for ANOVA states that all group means in your comparison are equal. Written formally for three groups, it looks like this: H₀: μ₁ = μ₂ = μ₃. If you’re comparing five groups, it extends to μ₁ = μ₂ = μ₃ = μ₄ = μ₅, and so on. ANOVA (analysis of variance) exists specifically to test this claim, and everything about how the test works flows from this single starting assumption.
The Null and Alternative Hypotheses
The null hypothesis is your default position: any differences you see in group averages are just random noise, not real effects. In plain terms, you’re saying “these groups all come from populations with the same mean.” It doesn’t matter whether you’re comparing three drug dosages, four teaching methods, or six fertilizer brands. The null hypothesis always takes the same form: all population means are equal.
The alternative hypothesis is not that every group differs from every other group. It simply states that at least one group mean is different. This is a crucial distinction that trips people up. If ANOVA gives you a significant result, you know something is different somewhere, but you don’t yet know which specific groups differ from each other. That’s a separate step.
How ANOVA Actually Tests the Null
ANOVA works by comparing two types of variability in your data. The first is between-group variance: how much the group averages spread out from the overall average. The second is within-group variance: how much individual data points scatter around their own group’s average. The test calculates an F-statistic, which is the ratio of between-group variance to within-group variance.
If the null hypothesis is true and all groups really do have the same population mean, the between-group variance should be roughly the same size as the within-group variance, giving you an F-value close to 1. A larger F-value means the group averages are more spread out than you’d expect from random variation alone. The bigger the F, the stronger the evidence against the null hypothesis.
You then compare this F-value to a threshold based on your chosen significance level. Most researchers use p < 0.05, a convention dating back to the statistician R.A. Fisher, though stricter thresholds like 0.01 or 0.001 are common when stronger evidence is needed. If your p-value falls below the threshold, you reject the null hypothesis and conclude that not all group means are equal.
Null Hypotheses in Two-Way and Three-Way ANOVA
A one-way ANOVA has a single null hypothesis because it tests one factor. When you add more factors, the number of null hypotheses multiplies. A two-way ANOVA tests three separate null hypotheses:
- Main effect of Factor A: there is no difference in means across the levels of the first factor.
- Main effect of Factor B: there is no difference in means across the levels of the second factor.
- Interaction effect: there is no interaction between Factor A and Factor B, meaning the effect of one factor doesn’t change depending on the level of the other.
A three-way ANOVA tests seven null hypotheses: three main effects, three two-way interactions, and one three-way interaction. Each null hypothesis follows the same logic. No difference in means, no interaction. Each gets its own F-statistic and p-value, and you evaluate them independently.
Assumptions That Must Hold
The null hypothesis test is only valid if your data meet certain conditions. ANOVA assumes three things. First, the observations in each group are independent of each other, meaning one participant’s score doesn’t influence another’s. Second, the data in each group come from a normally distributed population. Third, all groups share the same variance, a property called homogeneity of variance. Even if the null hypothesis is false and the group means truly differ, the spread of scores within each group should be roughly equal.
You can check the equal-variance assumption with Levene’s test, which has its own null hypothesis: all group variances are equal (σ₁² = σ₂² = σ₃² = … = σₖ²). If Levene’s test comes back significant, the equal-variance assumption is violated, and you may need a corrected version of ANOVA. Normality is typically checked with visual methods like Q-Q plots or formal tests. ANOVA is fairly robust to mild violations of normality, especially with larger sample sizes, but severe skew or heavy-tailed distributions can distort your results.
What Happens After You Reject the Null
Rejecting the null hypothesis tells you that at least one group mean differs, but it doesn’t tell you which pairs of groups are actually different. To answer that, you run post-hoc comparison tests. These are pairwise tests (comparing group 1 vs. group 2, group 1 vs. group 3, and so on) that adjust for the fact that you’re making multiple comparisons at once. Without that adjustment, you’d inflate your chances of a false positive with every additional comparison.
You should also look beyond statistical significance and consider effect size. A common measure for ANOVA is eta squared (η²), which tells you what proportion of the total variability in your data is explained by group membership. An η² below 0.01 is negligible. Values between 0.01 and 0.06 are considered small, 0.06 to 0.14 medium, and anything above 0.14 large. A statistically significant result with a tiny effect size means the group differences are real but practically unimportant.
What Failing to Reject Actually Means
If your F-test produces a p-value above your significance threshold, you fail to reject the null hypothesis. This does not prove the group means are equal. It means your data didn’t provide strong enough evidence to conclude they differ. The distinction matters. You might have too few participants to detect a real but small difference, or the variability within your groups might be so large that it drowns out the signal. Failing to reject is not the same as confirming the null. It’s an inconclusive result that says, given this data and this sample size, you can’t rule out that all groups are the same.

