What Is Repeated Measures? A Research Design Explained

Repeated measures is a research design in which the same participants are measured multiple times, either under different conditions or at different time points. Instead of assigning separate groups of people to each condition (a “between-subjects” design), a repeated measures approach tests every participant in every condition. This makes it one of the most efficient and statistically powerful ways to detect real differences in an experiment.

How It Works

In a standard experiment, you might split 60 people into three groups of 20, with each group experiencing a different condition. In a repeated measures design, all 60 people go through all three conditions. Each person serves as their own comparison point, which means individual quirks like natural ability, baseline health, or personality differences get factored out of the analysis. What’s left is a cleaner signal of how the conditions themselves actually differ.

The simplest everyday example: a taste test where you try three brands of coffee and rate each one. You’re the same person across all three tastings, so differences in your ratings reflect the coffee, not random variation between different groups of tasters.

Why Researchers Prefer It

The biggest advantage is statistical power. Because measurements from the same person are naturally correlated, repeated measures designs can strip away the “noise” of individual differences that would otherwise muddy the results. In technical terms, the error term in the analysis shrinks, making it easier to detect a real effect if one exists. A standard analysis that ignores this correlation would be less sensitive, like trying to hear a conversation in a louder room.

The practical payoff is dramatic: you need far fewer participants. With just two follow-up measurements per person, sample size requirements can drop by roughly 44% compared to a single-measurement design. With three follow-up measurements, that reduction reaches 56%, and with four, around 61%. For clinical trials, where recruiting patients is expensive and slow, this efficiency is a major reason repeated measures designs are so common. The FDA recognizes crossover studies, a type of repeated measures design, as valid evidence for drug applications.

Repeated Measures in Medical Research

In medicine, repeated measures show up in two main forms. The first is the crossover trial, where every patient receives each treatment being compared, just in a different order. A 36-week trial comparing two eye injection drugs for diabetic macular edema used this approach with only 56 patients. Each patient received both drugs across three treatment periods, with the order randomized. Because every patient experienced both treatments, the trial could detect meaningful differences with a fraction of the participants a traditional parallel-group trial would need.

The second common form is longitudinal measurement, where the same outcome (blood pressure, pain score, cognitive function) is tracked over time in the same patients. Anesthesia, critical care, and pain research frequently rely on this approach to investigate how outcomes change over weeks or months, and whether treatment groups diverge in those trajectories. The key statistical requirement in both cases is accounting for the correlation between measurements taken from the same person. Ignoring that correlation produces unreliable results.

The Main Drawback: Order Effects

When the same person goes through multiple conditions, the order they experience those conditions can skew results. This is the central vulnerability of repeated measures designs, and it takes several forms.

  • Carryover effects: Being tested in one condition changes how a participant responds in a later condition. A drug’s lingering effect, for instance, might still be active when the next treatment begins.
  • Fatigue effects: Participants perform worse in later conditions simply because they’re tired or bored from repeated testing.
  • Practice effects: Participants get better at a task over time, making later conditions look artificially superior.

Researchers manage these problems primarily through counterbalancing, which means varying the order of conditions across participants so that no single condition always comes first or last. A common method is the Latin Square design, where every condition appears in every position an equal number of times. For six conditions, a Williams design generates a specific 6×6 grid of sequences that ensures each condition follows every other condition exactly once, neutralizing immediate sequential effects. For an odd number of conditions, the design doubles in size to achieve the same balance.

In crossover drug trials, washout periods (gaps between treatments where no drug is given) also help reduce carryover.

Statistical Assumptions to Be Aware Of

The most commonly used analysis for repeated measures data is a repeated measures ANOVA, and it comes with a specific set of requirements. The outcome being measured must be continuous (like a score or a time, not a yes/no category). There should be no major outliers in any of the repeated measurements, and the data should be roughly normally distributed.

The trickiest assumption is called sphericity: the idea that the variability in differences between all pairs of conditions is roughly equal. If your experiment compares conditions A, B, and C, the spread of A-minus-B scores should be similar to the spread of A-minus-C scores and B-minus-C scores. This assumption is frequently violated in real data. A statistical check called Mauchly’s test can flag whether the violation is serious, and if it is, corrections like the Greenhouse-Geisser or Huynh-Feldt adjustment modify the analysis to compensate. The design also requires a balanced number of measurements per participant; everyone needs to be measured at every time point.

What Happens When Participants Drop Out

Missing data is a bigger headache in repeated measures designs than in simpler experiments, precisely because the power of the design depends on having complete data from each person across all conditions. If a participant misses one time point, their entire set of measurements becomes incomplete.

Modern approaches handle this with statistical models that can work with partial data rather than throwing out an entire participant’s records. Mixed-effects models, sometimes called hierarchical linear models, use a technique called maximum likelihood estimation that naturally accommodates missing observations. Another widely used approach is multiple imputation, which fills in plausible values for missing data points based on patterns in the rest of the dataset, then runs the analysis multiple times to account for the uncertainty of those estimates. Both methods adjust for observable differences between participants who stayed in the study and those who dropped out, making them far more reliable than older strategies like simply deleting incomplete cases.

Repeated Measures vs. Longitudinal Studies

These two terms overlap, but they’re not identical. A longitudinal study specifically tracks outcomes over time to see how they change. A repeated measures design is broader: it includes longitudinal tracking, but also covers experiments where the same person is tested under different conditions at roughly the same point in time (like comparing three different keyboard layouts in a single lab session). In that case, the “repetition” isn’t about time passing; it’s about the same person providing data under multiple conditions.

In longitudinal studies, the repeated measurements happen because following patients over time is the scientific goal. In non-longitudinal repeated measures experiments, collecting multiple measurements from the same person is often more about efficiency than about studying change over time. Both share the same core statistical challenge: measurements from the same person are correlated, and any valid analysis must account for that.