A two-way ANOVA produces three separate results: a main effect for each of your two independent variables and an interaction effect between them. Interpreting these results means reading each one in the right order, starting with the interaction, because it changes how you should read everything else.
The Three Results You Get
Every two-way ANOVA output includes three F-statistics, each with its own p-value. Suppose you’re testing whether a drug and a type of therapy both affect depression scores. Your output will show:
- Main effect of Factor A (drug): Did the drug make a difference, averaging across all therapy conditions?
- Main effect of Factor B (therapy): Did the therapy make a difference, averaging across all drug conditions?
- Interaction effect (A × B): Did the effect of the drug depend on which therapy patients received?
Each result includes an F-value, degrees of freedom, and a p-value. A typical format looks like F(1, 92) = 361.55, p < .001. The first number in parentheses is the degrees of freedom for the factor, and the second is the degrees of freedom for error (which reflects your sample size minus the number of groups).
Start With the Interaction
The interaction effect is the most important line in your output, and you should always interpret it first. If the interaction is statistically significant (typically p < .05), it means the effect of one factor changes depending on the level of the other factor. In the drug-and-therapy example, a significant interaction tells you the benefit of the drug was not the same for patients who received therapy versus those who didn’t.
When the interaction is significant, the main effects can be misleading. A main effect represents the average difference across all levels of the other factor, but if the relationship between your variables shifts at different levels, that average may not describe any group accurately. Think of it this way: if a drug works well with therapy but barely works without it, the “average effect of the drug” is a number that doesn’t reflect either situation.
When the interaction is not significant, your main effects are straightforward to interpret. A significant main effect for Factor A means Factor A produced a real difference in your outcome, regardless of Factor B, and vice versa.
How to Read Main Effects
A main effect compares the marginal means of one factor, collapsing across all levels of the other factor. If your main effect for drug is significant at F(1, 92) = 361.55, p < .001, that tells you the drug group had meaningfully different scores from the placebo group when you pool everyone together regardless of therapy condition.
One technical detail worth knowing: the main effects in a two-way ANOVA are not the same as running two separate one-way ANOVAs or t-tests. The two-way ANOVA partitions out the interaction effect before estimating main effects, which gives you a cleaner estimate of each factor’s independent contribution. This is one of the core advantages of using a two-way design rather than analyzing each factor separately.
What to Do With a Significant Interaction
A significant interaction tells you something interesting is happening, but it doesn’t tell you exactly where. To pin down the pattern, you need a simple effects analysis. This means testing the effect of one factor at each individual level of the other factor. In the drug-and-therapy example, you would test the drug versus placebo comparison separately for the therapy group and separately for the no-therapy group.
Think of it as replacing “by” with “for every level of.” You’re studying the simple effect of drug for every level of therapy. This tells you whether the drug worked in both therapy conditions or only in one. You can also flip it around and test the effect of therapy separately for the drug group and the placebo group.
When your factors have more than two levels, simple effects analysis may require post-hoc comparisons to identify which specific group differences are driving the effect. The Bonferroni correction is a reliable choice here, especially with smaller sample sizes. Tukey’s HSD is a popular alternative when you’re comparing every possible pair of means, but recent evidence suggests it can lose its ability to control false positives when sample sizes are small. Limiting yourself to a smaller set of planned comparisons with a Bonferroni correction is often the safer approach.
Using Interaction Plots
An interaction plot graphs the group means with one factor on the x-axis and separate lines for each level of the other factor. These plots are the fastest way to see what your interaction actually looks like. If the lines are roughly parallel, there’s likely no interaction. If they converge, diverge, or cross, an interaction is present.
The crossing pattern matters. In an ordinal interaction, the lines may converge or diverge but don’t actually cross within the range of your data. One group is always higher than the other, just by varying amounts. In a disordinal (or crossover) interaction, the lines actually cross: one group scores higher in one condition but lower in the other. A disordinal interaction is a stronger signal that main effects alone can’t describe your data, because the direction of the effect literally reverses.
Always pair the plot with your statistical tests. Eyeballing a graph can suggest an interaction, but only the F-test and simple effects analysis confirm it.
Checking Effect Size
Statistical significance tells you whether an effect is real, but effect size tells you whether it’s meaningful. Most software reports partial eta-squared (η²) alongside each F-test. The standard benchmarks from Cohen’s widely used guidelines are: 0.01 is a small effect, 0.06 is medium, and 0.14 is large.
A main effect can be highly significant (very low p-value) but explain only a tiny fraction of the variance in your outcome. This is common with large sample sizes, where even trivial differences become statistically detectable. Reporting effect sizes for both main effects and the interaction gives you, and your readers, a clearer picture of which factors actually matter in practical terms.
Check Your Assumptions First
Before interpreting any of these results, your data need to meet the same assumptions as a one-way ANOVA. The outcome variable should be approximately normally distributed within each group, and the variance should be roughly equal across all groups (homogeneity of variance). Most statistical software can run a Levene’s test for equal variances and produce normality plots or a Shapiro-Wilk test.
Minor violations of normality are usually tolerable, especially with larger samples, because the F-test is fairly robust. Unequal variances are more problematic and can inflate your false positive rate. If the largest group variance is more than about three or four times the smallest, consider using a corrected version of the test or a nonparametric alternative.
Reporting Your Results
Standard APA format reports each effect as F(df1, df2) = value, p = value, along with the effect size. For the drug-and-therapy example, you’d write something like: “There was a significant main effect of drug, F(1, 92) = 361.55, p < .001, η² = .80, and a significant main effect of therapy, F(1, 92) = 65.59, p < .001, η² = .42. The drug × therapy interaction was also significant, F(1, 92) = 24.30, p < .001, η² = .21.”
After reporting the omnibus results, describe the interaction in plain language and present the simple effects analysis that breaks it down. Include the group means and standard deviations so readers can see the actual scores, not just the test statistics. A well-labeled interaction plot alongside the numbers makes the pattern immediately clear to anyone scanning your results.

