How to Interpret One-Way ANOVA Results Step by Step

A one-way ANOVA tells you whether the averages of three or more groups are different enough to rule out random chance. The result comes as an F-statistic and a p-value, but the output table contains several other numbers that, together, paint a fuller picture of your data. Here’s how to read each piece and what to do once you have your answer.

What the ANOVA Is Actually Testing

The one-way ANOVA tests a single question: are all group means equal, or is at least one group different? The null hypothesis says every group comes from the same population and any differences you see are just noise. The alternative hypothesis says at least one group mean differs from the others. Notice what it does not tell you: it won’t say which group is different or by how much. It only flags that a difference exists somewhere.

Reading the ANOVA Table Row by Row

Most software produces a table with two main rows: one labeled “Between Groups” (sometimes called “Model” or “Factor”) and one labeled “Within Groups” (sometimes called “Error” or “Residual”). Each row contains a Sum of Squares, degrees of freedom, and a Mean Square. Understanding these in order makes the F-statistic intuitive rather than mysterious.

Sum of Squares (SS). The between-groups SS measures how far each group’s average is from the overall average of all your data. A large number means the groups sit far apart. The within-groups SS measures how spread out individual observations are inside their own groups. A large number means there’s a lot of variability that has nothing to do with which group someone belongs to.

Degrees of freedom (df). For the between-groups row, degrees of freedom equal the number of groups minus one. If you’re comparing four treatments, that’s 3. For the within-groups row, degrees of freedom equal the total number of observations minus the number of groups. So 40 participants across four groups gives you 36. These numbers matter because they scale the sums of squares into fair comparisons.

Mean Square (MS). This is simply the sum of squares divided by its degrees of freedom. The between-groups mean square (MSB) equals SS(Between) divided by (number of groups minus 1). The within-groups mean square (MSE) equals SS(Within) divided by (total observations minus number of groups). Mean squares convert raw variability into a per-unit measure so the two sources can be compared directly.

The F-Statistic: What It Means

The F-statistic is the ratio of the between-groups mean square to the within-groups mean square. In plain terms, it compares the variation between your groups to the variation within your groups. An F of 1 means the groups differ about as much as you’d expect from random noise alone. The further F climbs above 1, the stronger the evidence that something real is driving the group differences.

For example, if your ANOVA table shows F = 3.629 with degrees of freedom of 2 and 87, you’d compare that to the critical value for your chosen significance level. At an alpha of 0.05, the critical F-value for those degrees of freedom is about 3.10. Because 3.629 exceeds 3.10, you’d reject the null hypothesis. In practice, you rarely need to look up critical values yourself because your software reports the exact p-value.

Interpreting the P-Value

The p-value tells you the probability of seeing an F-statistic this large (or larger) if all groups truly had the same mean. A small p-value means that outcome is unlikely under the null hypothesis, so you reject it. The standard threshold in most fields is 0.05, meaning you accept a 5% chance of incorrectly declaring a difference. Some fields use a stricter cutoff of 0.01 for extra confidence.

If your p-value is below your chosen alpha, the result is statistically significant. You can say that the group means are not all equal. If the p-value is above alpha, you do not have enough evidence to conclude the groups differ. That doesn’t prove the groups are identical; it means your data can’t distinguish any real difference from noise given your sample size.

Effect Size: How Big Is the Difference?

A significant p-value tells you a difference exists but says nothing about whether it’s meaningful. Effect size fills that gap. The most common effect size for ANOVA is eta-squared (η²), which represents the proportion of total variability in your data that’s explained by group membership. You can calculate it by dividing the between-groups sum of squares by the total sum of squares.

Cohen’s widely used benchmarks for eta-squared are:

  • Small effect: η² around 0.01, meaning group membership explains about 1% of the variation
  • Medium effect: η² around 0.06, explaining about 6%
  • Large effect: η² around 0.14, explaining 14% or more

A study could produce a highly significant p-value with a tiny effect size, especially with large samples. That combination means the difference is real but practically trivial. Reporting effect size alongside your p-value gives a much more honest picture of your results. Some researchers prefer omega-squared, which uses the same benchmarks but slightly corrects for bias in small samples.

Checking Assumptions Before You Trust the Results

An ANOVA result is only reliable if three assumptions hold. Before interpreting your output, verify each one.

Independence. Each observation must be unrelated to every other observation. This is a design issue, not something you test statistically. If the same person appears in two groups, or if participants influenced each other’s responses, the assumption is violated.

Normality. The data within each group should be roughly normally distributed. You can check this visually with a histogram or formally with a Shapiro-Wilk test. ANOVA is fairly robust to mild departures from normality, especially with larger samples, but severely skewed data can distort results.

Equal variances. The spread of scores should be similar across groups. Levene’s test is the most common check for this. If it comes back significant (p < 0.05), your groups have unequal variances and you may need a corrected version of the test, such as Welch’s ANOVA, rather than the standard one.

Post-Hoc Tests: Finding Which Groups Differ

A significant ANOVA result tells you at least one group is different but not which one. Post-hoc tests make pairwise comparisons between every combination of groups while controlling for the increased risk of false positives that comes with multiple comparisons. You should only run post-hoc tests after getting a significant overall ANOVA result.

The right post-hoc test depends on your data:

  • Tukey’s test compares every possible pair of groups and is the default choice when group sizes are equal. For unequal group sizes, the Tukey-Kramer version works.
  • Bonferroni correction is more conservative and works well when you have many comparisons but only expect one or two to be meaningful. It’s stricter than Tukey, so it’s less likely to detect small real differences.
  • Dunnett’s test is designed for studies where you’re comparing several experimental groups against a single control group. It only tests those specific pairs, not every combination.
  • Scheffé’s method is the most conservative option and best suited for exploratory analysis when you don’t have strong predictions about which groups should differ.
  • Games-Howell test is the go-to when equal variance is violated, as long as each group has at least six observations.

Each post-hoc test produces its own set of p-values, one for each pair of groups. A significant pairwise p-value tells you those two specific groups have different means.

Presenting Your Results

When reporting ANOVA results, the standard format includes the F-statistic, degrees of freedom, p-value, and effect size. A typical write-up looks like: “There was a significant effect of treatment on recovery time, F(2, 87) = 3.63, p = .03, η² = .08.” The numbers in parentheses after F are the between-groups and within-groups degrees of freedom, respectively.

For visual presentation, a plot showing each group mean with error bars (typically plus or minus one standard error) communicates the key finding at a glance. Bar plots with error bars are another common choice. Both highlight where the group means sit relative to each other and how much uncertainty surrounds each estimate. Box plots and dot plots are useful for exploring your data before analysis, but mean-and-error-bar plots tend to be clearer when presenting final ANOVA results to an audience.