How to Interpret MANOVA Results and Report Them

MANOVA (Multivariate Analysis of Variance) results tell you whether groups differ across multiple outcome variables simultaneously. Interpretation follows a specific sequence: check assumptions first, read the multivariate test statistics, assess effect size, then run follow-up tests to pinpoint where the differences lie. If you skip steps or read the output out of order, you risk drawing conclusions your data doesn’t actually support.

What MANOVA Actually Tests

MANOVA is the multivariate version of ANOVA. Where a standard ANOVA tests whether groups differ on a single outcome, MANOVA tests whether groups differ when you consider multiple outcomes together. It does this by comparing vectors of means (called centroids) for each group against the overall grand centroid, similar to how ANOVA compares group means to the grand mean.

The key advantage over running several separate ANOVAs is that MANOVA accounts for the correlations between your dependent variables. Running multiple ANOVAs inflates your chance of a false positive with each additional test. MANOVA handles all the dependent variables in one shot, preserving your overall error rate while also detecting patterns of group differences that individual ANOVAs might miss entirely.

Check Assumptions Before Interpreting Anything

Before you look at the main results, your output likely includes assumption tests. Two matter most: multivariate normality and homogeneity of covariance matrices.

Box’s M test checks whether the covariance matrices are equal across your groups. This test is notoriously sensitive, so researchers typically use a stricter significance threshold of .001 rather than the usual .05. If Box’s M is significant at p < .001, it means either the normality assumption or the equal covariance assumption (or both) may be violated, which can distort your main results. A non-significant Box’s M means you have no reason to reject the assumption that covariance matrices are equal, and you can proceed with more confidence.

Levene’s test, which you may also see in your output, checks the simpler question of whether variances are equal for each dependent variable individually. If Box’s M is significant, Levene’s test can help you figure out whether the problem is with the variances, the covariances, or both.

The Four Multivariate Test Statistics

Most software outputs four test statistics at once: Wilks’ Lambda, Pillai’s Trace, Hotelling’s Trace, and Roy’s Largest Root. Each one tests the same null hypothesis (that the group centroids are equal) but uses a slightly different mathematical approach. When assumptions are met and sample sizes are balanced, all four typically agree. When they disagree, your choice of which to report matters.

Wilks’ Lambda is the most commonly reported. It ranges from 0 to 1, where values closer to 0 indicate larger group differences. A small Lambda with a significant p-value means the groups differ meaningfully across your dependent variables. Wilks’ Lambda tends to perform well with normally distributed data, particularly when you have three or more dependent variables or when variances are unequal.

Pillai’s Trace is the most robust option when assumptions are questionable. It holds up better with unequal sample sizes, non-normal distributions, and violations of the equal covariance assumption. Many applied researchers default to Pillai’s Trace when Box’s M test is significant. Larger values indicate greater group differences.

Hotelling’s Trace works well when group differences are concentrated along a single dimension (one way in which the groups differ). Larger values indicate bigger differences.

Roy’s Largest Root is the most powerful when groups truly differ on only one underlying dimension, but it is also the most sensitive to assumption violations. It can perform well with normally distributed data and equal variances, but it’s the riskiest choice when conditions aren’t ideal.

Which Statistic to Report

If your data are reasonably normal, sample sizes are balanced, and Box’s M is non-significant, Wilks’ Lambda is a safe default. If you have unequal group sizes, non-normal data, or a significant Box’s M test, go with Pillai’s Trace. Research comparing these statistics across different conditions consistently finds that Pillai’s Trace maintains appropriate error rates under the widest range of violations, while Wilks’ Lambda is a strong performer when variances are unequal but data are closer to normal.

Reading the P-Value and F-Statistic

Each multivariate test statistic gets converted to an approximate F-statistic with degrees of freedom and a p-value. This is the core of your interpretation. If p < .05 (or whatever alpha level you set), you reject the null hypothesis and conclude that the groups differ on the combined set of dependent variables.

A typical line in your output might read: Wilks’ Lambda = .76, F(6, 144) = 3.48, p = .003. This tells you there is a statistically significant multivariate effect. The groups are not the same when you consider all your outcome variables together. But it does not tell you which specific variables drive the difference, or which groups differ from which. That requires follow-up tests.

Assessing Effect Size

A significant p-value tells you the effect is unlikely to be zero. Effect size tells you whether it’s large enough to care about. The most common effect size measure in MANOVA output is partial eta squared, which estimates the proportion of variance in the dependent variables explained by your grouping variable.

0.01 is considered a small effect
0.06 is a medium effect
0.14 or larger is a large effect

So if your output shows partial eta squared = .09, your grouping variable explains about 9% of the variance in the dependent variables combined, which falls in the medium range. With large samples, even trivial effects can produce significant p-values, so always check effect size alongside significance.

Follow-Up Tests After a Significant MANOVA

A significant MANOVA is only the first step. It tells you the groups differ somewhere across your dependent variables, but not where. You need follow-up analyses to find the specific differences.

Follow-Up ANOVAs

The most common approach is to run a separate one-way ANOVA for each dependent variable. This tells you which specific outcome variables show significant group differences. However, running multiple ANOVAs reintroduces the multiple-comparison problem that MANOVA was designed to avoid.

To control for this, apply a Bonferroni correction: divide your alpha level by the number of dependent variables. If you have four dependent variables and use alpha = .05, your adjusted threshold becomes .05 / 4 = .0125. Only consider an individual ANOVA significant if its p-value falls below .0125. This is conservative, but it prevents you from claiming differences that are really just statistical noise.

Discriminant Function Analysis

A more sophisticated follow-up is Discriminant Function Analysis (DFA). While follow-up ANOVAs examine each dependent variable in isolation, DFA identifies the linear combinations of dependent variables that best separate the groups. This preserves the multivariate nature of the original analysis and can reveal that it’s a specific pattern across variables, not any single variable alone, that distinguishes the groups. A systematic review of published research found that DFA was the most common multivariate follow-up to MANOVA, though it appeared in only about 5% of studies that reported post-hoc tests, many researchers opting for the simpler ANOVA route instead.

Post-Hoc Pairwise Comparisons

If your grouping variable has more than two levels and a follow-up ANOVA is significant, you still need pairwise comparisons (like Tukey or Bonferroni) to determine which specific groups differ from each other. This is the same logic as in univariate ANOVA: a significant F-test tells you the groups aren’t all equal, but not which pairs are different.

Reporting MANOVA Results

In APA format, you report the test statistic name, its value, the approximate F-ratio with both degrees of freedom, the p-value, and the effect size. The format mirrors how you report ANOVA results but specifies which multivariate statistic you used.

For example: “There was a significant effect of treatment on the combined dependent variables, Pillai’s Trace = .34, F(6, 144) = 5.43, p < .001, partial η² = .10.” Follow this with the results of your follow-up ANOVAs for each dependent variable, including their own F-values, degrees of freedom, p-values (compared against the Bonferroni-corrected threshold), and effect sizes.

If the overall MANOVA is not significant, you generally stop there. Running follow-up ANOVAs after a non-significant multivariate test inflates your false-positive rate and undermines the reason you ran a MANOVA in the first place.

Common Pitfalls

The most frequent mistake is ignoring the multivariate test entirely and jumping straight to individual ANOVAs. If you planned to examine each variable separately all along, you didn’t need MANOVA. The whole point is to evaluate the variables as a set first.

Another common error is including too many dependent variables. MANOVA works best when your dependent variables are moderately correlated (roughly .3 to .7). If they’re uncorrelated, separate ANOVAs are more appropriate. If they’re very highly correlated (above .9), you have redundant variables that add noise without adding information, and you should consider dropping one or combining them.

Finally, pay attention to your sample size relative to the number of dependent variables and groups. MANOVA requires more observations per cell than ANOVA does. A common guideline is to have at least as many observations in your smallest group as you have dependent variables, though more is always better for statistical power.