A one-way ANOVA is the right statistical test when you want to compare the means of three or more groups that differ along a single factor. If you’re testing whether a new drug works at three different dosages, whether students from four different schools score differently on the same exam, or whether plants grow taller under five types of fertilizer, one-way ANOVA is the standard tool. It tells you whether at least one group’s average is significantly different from the others.
Why Not Just Run Multiple T-Tests?
The most common question is: why not just compare groups two at a time using t-tests? The answer comes down to error accumulation. Every time you run a statistical test at the typical 0.05 significance level, you accept a 5% chance of a false positive. That seems reasonable for a single test. But when you run multiple tests on the same data, those chances stack up fast.
The math is straightforward. The probability of getting at least one false positive across N tests is 1 − (1 − 0.05)^N. With just two tests, your false positive rate climbs to 9.75%, nearly double the intended 5%. With three groups, you’d need three pairwise comparisons, pushing your error rate even higher. By the time you’re comparing five or six groups, you’re almost guaranteed to “find” a difference that doesn’t actually exist. This is called the family-wise error rate, and it’s the core reason ANOVA exists. A one-way ANOVA runs a single overall test, keeping your false positive rate at 5% regardless of how many groups you’re comparing.
What One-Way ANOVA Actually Tests
The null hypothesis is simple: all group means are equal. If you have three groups, that’s μ₁ = μ₂ = μ₃. The alternative hypothesis is that at least one group mean differs from the others. This is an important distinction. A significant ANOVA result does not mean all groups are different from each other. It means the data are inconsistent with every group having the same average. The scenario where groups 1 and 2 are identical but group 3 is different would still trigger a significant result.
A common mistake is writing the alternative hypothesis as “all means are different.” That’s too narrow. It would miss cases like μ₁ = 5, μ₂ = 5, μ₃ = 10, where two groups are equal but one differs. The correct framing: “the population means are not all equal.”
The Four Assumptions You Need to Check
One-way ANOVA produces reliable results only when four conditions are met. Violating them can lead to misleading p-values.
- Independence: Each observation must be unrelated to every other observation. Measurements from one participant shouldn’t influence another’s. If you’re measuring the same people multiple times, you need a repeated-measures approach instead.
- Continuous outcome variable: The thing you’re measuring (your dependent variable) needs to be on an interval or ratio scale, meaning the difference between values is meaningful. Test scores, blood pressure readings, and reaction times all qualify. Categories or rankings do not.
- Normal distribution: The data within each group should be approximately normally distributed. With sample sizes above 30 per group, ANOVA is robust enough to handle moderate departures from normality. With smaller samples, you can check using a Shapiro-Wilk test or by visually inspecting histograms and Q-Q plots.
- Equal variances across groups: The spread of data in each group should be roughly similar. This is called homogeneity of variance. You can test it formally using Levene’s test, which works well even when your data aren’t perfectly normal. If you’re confident your data are normally distributed, Bartlett’s test has slightly better performance. When this assumption is violated, a Welch’s ANOVA is a solid alternative.
Real-World Scenarios
One-way ANOVA fits any design where a single categorical variable (the “factor”) divides subjects into groups and you’re measuring a continuous outcome. A pharmaceutical company testing whether three dosages of a pain reliever produce different levels of pain reduction would use one-way ANOVA with dosage level as the factor and pain score as the outcome. An educator comparing average test performance across four teaching methods uses one-way ANOVA with teaching method as the factor.
In clinical research, common applications include comparing blood glucose levels across three diet plans, comparing recovery times for patients receiving different types of physical therapy, or comparing cognitive test scores among groups exposed to different sleep durations. The key pattern is always the same: one factor, three or more groups, one continuous measurement.
If your design involves two factors simultaneously (say, both dosage and age group), you’d need a two-way ANOVA instead. If you’re comparing only two groups, a t-test is sufficient and produces identical results to an ANOVA with two levels.
What to Do After a Significant Result
A significant ANOVA tells you that at least one group differs, but not which one. To find out where the differences lie, you need a post-hoc test. Three are commonly used, each suited to a different situation.
Tukey’s HSD (honestly significant difference) is the best choice when you want to compare every group to every other group, which is the most common scenario. It’s designed specifically for all pairwise comparisons and gives exact results when group sizes are equal.
The Bonferroni correction works by dividing your significance threshold by the number of comparisons you’re making. If you’re running 6 comparisons at the 0.05 level, each individual test uses a threshold of 0.05/6 = 0.0083. It’s a good general-purpose method, though slightly more conservative than Tukey’s when you’re doing all pairwise comparisons.
Scheffé’s method is the most conservative option but also the most flexible. It protects against false positives for all possible contrasts between means, not just simple pairwise comparisons. That makes it useful when you want to test more complex combinations, like whether the average of groups 1 and 2 together differs from group 3. The trade-off is reduced statistical power, meaning you need larger differences to reach significance.
Measuring How Big the Difference Is
A significant p-value tells you that group differences are unlikely to be due to chance, but it says nothing about whether those differences are practically meaningful. For that, you need an effect size measure. The most common one for ANOVA is eta-squared (η²), which represents the proportion of total variation in your data that’s explained by group membership.
The standard benchmarks: an η² of 0.01 is a small effect (group membership explains about 1% of the variation), 0.06 is medium, and 0.14 is large. A study comparing three exercise programs might find a statistically significant difference in weight loss with an η² of 0.02, meaning the type of program explains only 2% of the variation in outcomes. That’s technically significant but probably not important enough to change clinical recommendations. Reporting effect size alongside your p-value gives a much more complete picture of your results.
When to Choose a Different Test
One-way ANOVA isn’t always the right call. If your data badly violate the normality assumption and your sample sizes are small, the Kruskal-Wallis test is a nonparametric alternative that doesn’t require normally distributed data. If you’re measuring the same subjects under multiple conditions (before treatment, during, and after), a repeated-measures ANOVA accounts for the fact that observations from the same person are correlated. And if your outcome variable is categorical rather than continuous, such as whether patients improved or didn’t, you’d use a chi-square test instead.
The decision tree is simple. Count your groups: if it’s exactly two, use a t-test. Count your factors: if it’s more than one, use a two-way or multi-factor ANOVA. Check your assumptions: if normality fails, consider Kruskal-Wallis. If everything lines up with three or more groups, one factor, and a continuous outcome that meets the assumptions, one-way ANOVA is the correct and most powerful choice.

