What Is a Mixed Model ANOVA and How Does It Work?

A mixed model ANOVA is a statistical test that analyzes data with two types of independent variables at once: one that differs between groups of participants (a between-subjects factor) and one that involves the same participants being measured multiple times (a within-subjects factor). It gets the name “mixed” because it combines these two factor types in a single analysis. If you’re studying whether a new therapy reduces anxiety, and you measure both a treatment group and a control group at three time points, a mixed model ANOVA is the tool designed for exactly that kind of data.

Between-Subjects and Within-Subjects Factors

The two building blocks of a mixed model ANOVA are its two factor types, and understanding the difference is essential. A between-subjects factor divides participants into separate groups. Each person belongs to only one group, like “treatment” or “control,” “drug A” or “drug B,” or “beginner” versus “expert.” The comparison here is across different people.

A within-subjects factor measures the same people under multiple conditions or at multiple time points. The classic example is time: you test participants at baseline, then again at one month, then again at three months. Because each person provides data at every time point, the measurements are linked. This dependency between observations is what makes the analysis different from a standard two-way ANOVA, where every data point comes from a separate person.

The mixed model ANOVA accounts for this dependency. It recognizes that repeated measurements from the same person tend to be more similar to each other than measurements from different people, and it adjusts the analysis accordingly.

Why the Interaction Effect Matters Most

A mixed model ANOVA produces three results: a main effect for the between-subjects factor, a main effect for the within-subjects factor, and an interaction between them. Of these three, the interaction is typically the most important finding.

The interaction tells you whether the groups changed at different rates or in different patterns over time. Imagine a study comparing a treatment group and a control group, measured before and after an intervention. The main effect of time might show that scores improved overall from pretest to posttest. The main effect of group might show that the treatment group had higher scores on average. But neither of those findings answers the real question: did the treatment group improve more than the control group?

That’s what the interaction reveals. On a graph, the interaction shows up as non-parallel lines. If both groups improved at the same rate, the lines tracking their scores over time would run parallel, and the interaction would not be significant. If the treatment group improved while the control group stayed flat, the lines would diverge, producing a significant interaction. When this happens, reporting only the main effects can be misleading. An overall improvement over time might be driven entirely by the treatment group, making the “main effect of time” an inaccurate summary of what actually occurred.

When the interaction is significant, the next step is to break it apart with follow-up comparisons. These typically involve testing the effect of time separately within each group, or comparing the groups at each individual time point, to pinpoint exactly where the differences lie.

How It Differs From a Repeated Measures ANOVA

A repeated measures ANOVA handles within-subjects factors only. It works when every participant goes through every condition and there are no separate groups to compare. A mixed model ANOVA extends this by adding the between-subjects grouping variable. If your study has groups and repeated measurements, you need the mixed version.

There’s also an important practical difference. The term “mixed model” sometimes refers to a more flexible class of analysis called linear mixed-effects models. These models share the same logic but handle real-world data problems more gracefully. A traditional mixed ANOVA requires complete data from every participant at every time point. If someone misses one measurement, their entire dataset is typically excluded. Linear mixed-effects models can use all available data from each participant, even when some time points are missing. They can also model individual differences in how participants respond over time, making them a better fit for messy, real-world datasets.

Assumptions You Need to Check

A mixed model ANOVA rests on several assumptions. The outcome variable must be continuous and approximately normally distributed. There should be no extreme outliers in any of the repeated measurements. And the between-subjects factor needs to have clearly defined groups.

The most distinctive assumption is sphericity, which requires that the variability in the differences between all pairs of time points is roughly equal. For instance, if you measure participants at three time points, the spread of difference scores between time 1 and time 2 should be similar to the spread between time 1 and time 3, and between time 2 and time 3. This is a strong assumption that real data frequently violate.

Mauchly’s test is the standard check for sphericity. If it comes back significant, it means the assumption is violated and your p-values for the within-subjects effects may be artificially low, making you more likely to claim a significant result that isn’t real. Two common corrections exist: the Greenhouse-Geisser and the Huynh-Feldt. Both work by reducing the degrees of freedom used to calculate the p-value, making the test more conservative. The Greenhouse-Geisser correction is the more cautious of the two and is generally the safer default choice.

Linear mixed-effects models sidestep the sphericity problem entirely. Instead of assuming equal variability across all time-point comparisons, they let you specify different patterns of correlation between repeated measurements, making them more flexible when this assumption is questionable.

A Concrete Example

Suppose a researcher wants to know whether a new teaching method improves student test scores more than a traditional one. Students in School A use the new method; students in School B use the traditional approach. Both groups take tests at the start of the semester, at the midpoint, and at the end.

The between-subjects factor is teaching method (new versus traditional), because each student experiences only one. The within-subjects factor is time (start, midpoint, end), because each student is tested at all three points. The mixed model ANOVA would test whether scores change over the semester (main effect of time), whether one group scores higher overall (main effect of teaching method), and critically, whether the new method produces faster or greater improvement than the traditional one (the interaction).

If the interaction is significant, the researcher would then run follow-up tests to clarify the pattern. Perhaps both groups started at similar levels, but by the end of the semester, the new-method group pulled ahead. That specific pattern only becomes visible when you dig into the interaction rather than relying on the main effects alone.

Reading the Output

Most statistical software produces two ANOVA tables for a mixed model analysis: one for within-subjects effects and one for between-subjects effects. Both tables contain the same key columns. The F-statistic is the test value that tells you whether an effect is large relative to the variability in your data. The p-value tells you the probability of seeing a result this extreme if there were truly no effect. And partial eta-squared is the most commonly reported effect size, representing the proportion of variance in your outcome that’s explained by each factor after accounting for the other factors.

When reporting results, you’d state the F-statistic with its degrees of freedom, the p-value, and partial eta-squared for each of the three effects. A typical write-up might read: “The interaction between group and time was significant (F(2, 148) = 7.01, p = .001, partial eta-squared = .087).” The degrees of freedom for any main effect equal the number of levels of that factor minus one. For the interaction, you multiply the degrees of freedom of the two main effects together.

Partial eta-squared values are useful for gauging practical significance beyond statistical significance. A significant p-value tells you an effect probably exists; partial eta-squared tells you how large it is. Values around .01 are generally considered small, .06 medium, and .14 or above large, though these benchmarks vary by field. Keep in mind that partial eta-squared can be influenced by study design choices like how many time points you include or how reliably you measure your outcome, so comparing values across very different studies requires caution.