When Should You Reject the Null Hypothesis in ANOVA?

You reject the null hypothesis in ANOVA when your p-value is less than or equal to your chosen significance level, typically 0.05. This means the differences you observed between group means are unlikely to have occurred by chance alone, and at least one group mean is significantly different from the others.

What the Null Hypothesis States in ANOVA

In a one-way ANOVA, the null hypothesis says there is no difference in the population means across the groups you’re comparing. If you’re testing whether three different diets lead to different weight loss, the null hypothesis claims all three diets produce the same average result. The alternative hypothesis simply states that the means are not all the same, meaning at least one group differs.

Notice the alternative hypothesis doesn’t tell you which group is different or by how much. ANOVA is a global test. It answers one question: are all group means equal, or not?

How the F-Statistic Drives the Decision

ANOVA works by comparing two types of variance: the spread between group means and the spread within each group. The F-statistic is the ratio of between-group variance to within-group variance. A large F-value means the groups differ from each other more than the individual observations differ within each group.

If the null hypothesis were true and all groups came from the same population, you’d expect the between-group variance and within-group variance to be roughly equal, giving an F-value close to 1. The further your F-value climbs above 1, the stronger the evidence that something real is separating the groups. Your software compares this F-value against the F-distribution (shaped by your specific degrees of freedom) to produce a p-value.

The Two Decision Rules

There are two equivalent ways to make the reject-or-not decision, and both give the same answer.

P-value approach: If your p-value is less than or equal to your alpha level (α), reject the null hypothesis. If the p-value is greater than alpha, do not reject it. With the standard α = 0.05, a p-value of 0.03 means you reject the null, while a p-value of 0.12 means you do not.

Critical value approach: Compare your calculated F-statistic to the critical F-value from an F-distribution table at your chosen alpha. If your F-statistic exceeds the critical value, reject the null. The critical value depends on two numbers: the numerator degrees of freedom (the number of groups minus one, or k − 1) and the denominator degrees of freedom (the total number of observations minus the number of groups, or N − k).

Most researchers today use the p-value approach because statistical software reports it automatically, but the logic is identical either way.

Choosing Your Alpha Level

The alpha level is the threshold you set before running the test. It represents the probability of rejecting the null hypothesis when it’s actually true (a false positive). The most common choice across fields is α = 0.05, a convention dating back to Ronald Fisher. At this level, you accept a 5% chance of incorrectly concluding that group means differ.

Some researchers argue for stricter thresholds, particularly in fields where false positives carry serious consequences. You might see α = 0.01 or even α = 0.005 in medical or genetics research. The key rule: choose your alpha before you look at the data. Adjusting it after seeing results undermines the entire framework.

Assumptions to Check Before Testing

An ANOVA result is only trustworthy if your data meet four core assumptions. Violating them can inflate your F-statistic or produce misleading p-values, potentially leading you to reject the null hypothesis when you shouldn’t.

Independence: Observations in each group must be independent of one another. This is handled through proper study design and randomization, not a statistical test.
Normality of residuals: The differences between each observation and its group mean should follow a roughly normal distribution. You can check this with a Shapiro-Wilk test on the residuals. ANOVA is fairly robust to mild violations of normality, especially with larger sample sizes.
Homogeneity of variance: The spread of data within each group should be roughly equal. Levene’s test evaluates this. If variances are unequal, alternatives like Welch’s ANOVA can adjust for the imbalance.
Additivity: The effects in your model should combine in a straightforward, additive way. This matters more in complex designs with blocking factors.

What Rejection Actually Tells You

Rejecting the null hypothesis means you have statistical evidence that not all group means are equal. It does not tell you which groups differ from each other, how large the differences are, or whether those differences matter in practice. A very large sample can produce a statistically significant result even when the actual differences between groups are trivially small.

This is where effect size comes in. Eta-squared (η²) measures what proportion of the total variation in your data is explained by the grouping variable. Standard benchmarks: values below 0.01 are negligible, 0.01 to 0.06 is small, 0.06 to 0.14 is medium, and 0.14 or above is large. A significant ANOVA with an η² of 0.02 means the group differences are real but explain very little of the overall picture.

What to Do After Rejecting the Null

Once you’ve established that at least one group mean differs, the natural next question is: which ones? ANOVA itself won’t answer that. You need post-hoc pairwise comparison tests, which systematically compare every possible pair of groups while controlling for the increased risk of false positives that comes from running multiple tests.

Tukey’s honestly significant difference (HSD) is the most widely used method when you want to compare all pairs of groups. It controls the overall false-positive rate across all comparisons and is straightforward to interpret. The Bonferroni correction is a more general alternative that works by dividing your alpha level by the number of comparisons. It’s more conservative, meaning it’s less likely to find differences, but it applies to a broader range of testing situations beyond just pairwise comparisons.

Both approaches are fully post-hoc, meaning you don’t need to specify your comparisons in advance. You run the ANOVA, see a significant result, and then explore which specific group differences are driving it.

Reporting Your Results

If you’re writing up results in APA format, the standard way to report an ANOVA looks like this: F(2, 45) = 6.34, p = .002. The first number in parentheses is the between-groups degrees of freedom (k − 1), the second is the within-groups degrees of freedom (N − k), followed by the F-value and the exact p-value to two or three decimal places. For very small p-values, report p < .001 rather than writing out a string of zeros.

Including the effect size alongside your F-test makes your results more informative. A reader can see not just that the difference was statistically significant, but how meaningful it was. Reporting both the p-value and η² gives the full picture: the p-value answers “is there a difference?” and the effect size answers “how big is it?”