How to Read an ANOVA Table and Interpret Results

An ANOVA table has six columns that work together to answer one question: do the groups you’re comparing have meaningfully different averages, or could the differences be due to random chance? Once you know what each column represents and how the numbers flow from left to right, the whole table clicks into place.

The Rows: Sources of Variation

Before looking at any numbers, start with the rows. A standard one-way ANOVA table has three: Factor (sometimes labeled “Between”), Error (sometimes labeled “Within”), and Total.

The Factor row captures how much the group averages differ from the overall average of all data points combined (the grand mean). If you’re comparing test scores across three teaching methods, this row measures the spread among those three method averages. The Error row captures the natural scatter of individual data points around their own group’s average. This is the variation your factor of interest can’t explain. The Total row is simply the overall variation in your entire dataset, ignoring group membership entirely. It equals the Factor and Error rows added together.

The Columns, Left to Right

Each column builds on the one before it. Reading them in order is the easiest way to follow the logic.

Degrees of Freedom (DF): This column tells you how many independent pieces of information go into each calculation. For the Factor row, the degrees of freedom equal the number of groups minus one. If you have four groups, that’s 3. For the Error row, it’s the total number of observations minus the number of groups. So 40 data points across 4 groups gives you 36. The Total row’s degrees of freedom equal the total number of observations minus one (39 in this example). A quick sanity check: the Factor and Error degrees of freedom should always add up to the Total.

Sum of Squares (SS): This is the raw measure of variation for each source. SS Between quantifies how far apart the group means are from the grand mean. SS Error quantifies how much individual data points scatter around their own group means. SS Total is the sum of both. Again, the first two rows should add up to the third. Large SS Between relative to SS Error hints that the groups genuinely differ, but you need the next columns to confirm.

Mean Square (MS): A sum of squares by itself is hard to interpret because it depends on how many data points contributed. Mean Square corrects for that by dividing the sum of squares by its degrees of freedom. So MS Between equals SS Between divided by its DF, and MS Error equals SS Error divided by its DF. You’ll notice the Total row has no Mean Square. It isn’t needed because the comparison that matters is between the other two rows.

F-Statistic (F): This is the ratio of MS Between to MS Error. It directly compares the variation explained by your factor to the unexplained variation. An F value of 1 means the groups differ about as much as you’d expect from random noise alone. The further F climbs above 1, the stronger the evidence that real group differences exist. Only the Factor row gets an F value.

P-Value (P): The final column converts the F-statistic into a probability. It answers: if the groups truly had identical population means, how likely would you be to see an F value this large or larger? A p-value below your chosen significance level (most commonly 0.05) means you reject the idea that all group means are equal. A p-value of 0.03, for instance, tells you there’s only a 3% chance of seeing differences this large if the groups were actually the same.

How to Judge Statistical Significance

The significance level, or alpha, is set before you look at the results. The most common threshold is 0.05, though some fields use 0.01 or 0.10. The rule is straightforward: if your p-value is less than your significance level, you reject the null hypothesis that all group means are equal. If the p-value is greater, you cannot reject it.

A common mistake is treating a p-value just above 0.05 as “almost significant.” The threshold is a line you draw in advance. A p-value of 0.051 and one of 0.30 both lead to the same conclusion: you don’t have enough evidence to say the groups differ at the 0.05 level. That said, a significant p-value only tells you that at least one group differs from at least one other. It doesn’t tell you which groups differ or by how much.

What a Significant Result Does Not Tell You

A low p-value confirms that some difference exists, but it leaves two important questions unanswered. First, which specific groups are different from each other? If you compared four diets and got a significant F-test, you know the diets aren’t all equivalent, but you don’t know whether Diet A beat Diet B specifically. That requires post-hoc tests, such as Tukey’s or Bonferroni corrections, which compare every pair of groups while adjusting for the increased risk of false positives that comes with multiple comparisons.

Second, statistical significance doesn’t tell you whether the difference is practically meaningful. A study with thousands of participants can produce a tiny p-value for a trivially small difference. That’s where effect size comes in.

Measuring Effect Size With Eta-Squared

Many ANOVA outputs include or allow you to calculate eta-squared, which tells you what proportion of the total variation in your data is explained by the factor. The formula is simple: divide SS Between by SS Total. The result ranges from 0 to 1.

General benchmarks for interpreting eta-squared: 0.01 is a small effect, 0.06 is medium, and 0.14 is large. So if your SS Between is 120 and SS Total is 500, eta-squared is 0.24, meaning the factor accounts for 24% of the variation in outcomes. That’s a large effect. Even without these benchmarks, thinking in terms of “percentage of variation explained” gives you practical context that a p-value alone never provides.

Reading a Two-Way ANOVA Table

When two factors are tested simultaneously, the table gains additional rows. A two-way ANOVA examining treatment and gender, for example, would have a row for treatment, a row for gender, a row for the interaction between them, and the usual Error and Total rows. Each of the first three rows gets its own SS, MS, F, and p-value.

The interaction row is the one that trips people up. A significant interaction means the effect of one factor depends on the level of the other. In a treatment-by-gender example, you might find that the treatment raises scores for men but lowers them for women. The main effect rows for treatment and gender could both look nonsignificant because those opposing effects cancel out when averaged. The interaction row catches this pattern. When an interaction is significant, interpret the main effects cautiously, because the story isn’t as simple as “treatment helps” or “gender matters.” Instead, the combination of both variables is what drives the outcome.

Checking the Assumptions Behind the Table

The numbers in an ANOVA table are only trustworthy if two key assumptions hold: the groups have roughly equal variance, and the residuals (the distances between each data point and its group mean) follow a roughly normal distribution.

Equal variance means the spread of scores within each group is similar. If one group’s data points are tightly clustered and another’s are wildly scattered, the F-test can give misleading results. Levene’s test is the most common formal check for this. It essentially runs its own ANOVA on the absolute deviations from each group mean, asking whether those deviations differ across groups.

Normality is checked on the residuals, not on the raw data. A histogram of those residuals should look roughly bell-shaped, and a Q-Q plot (which compares your residuals to what a perfect normal distribution would look like) should fall close to a straight diagonal line. Formal tests like the Shapiro-Wilk test can supplement your visual inspection. ANOVA is fairly robust to mild violations of normality, especially with larger sample sizes, but severe skew or heavy tails can compromise your results.

Most statistical software presents these diagnostics alongside or after the ANOVA table. Getting into the habit of checking them before interpreting the F and p columns will save you from drawing conclusions the data can’t support.