What Is an Interrupted Time Series Design and When to Use It

An interrupted time series (ITS) design is a research method that tracks an outcome over time, before and after some intervention or event, to determine whether the intervention had an effect beyond what was already happening. It’s one of the strongest study designs available when a randomized experiment isn’t possible, ranking second only to randomized controlled trials in its ability to control for bias when a well-chosen control group is included.

ITS is widely used in public health and policy research. Think of questions like: Did a new law reduce overdose deaths? Did a hospital’s hand-hygiene campaign lower infection rates? In each case, you can’t randomly assign people to live under different laws or policies, but you can watch what was happening before the change and compare it to what happened after.

How the Design Works

The core idea is straightforward. You collect repeated measurements of some outcome at regular intervals, like monthly hospital admissions or quarterly death counts. At some known point in time, an intervention occurs: a policy takes effect, a program launches, a regulation changes. The data before that point form one segment, and the data after form another. You then ask whether the pattern in the post-intervention segment looks different from what you’d expect based on the pre-intervention trend.

This is where the concept of the counterfactual comes in. The counterfactual is a projection of what would have happened if the intervention never occurred, based on the existing pre-intervention trend. Because you can never actually observe what would have happened in a world without the intervention, researchers use the statistical model to predict it. The gap between this predicted “no intervention” line and what actually happened after the intervention is the estimated effect.

What ITS Actually Measures

The most common statistical approach for ITS is segmented regression, which estimates two key effects. The first is a change in level: an immediate jump (or drop) in the outcome right after the intervention. The second is a change in trend: a shift in how quickly the outcome is rising or falling over time compared to the pre-intervention trajectory.

Consider a law restricting paracetamol pack sizes, aimed at reducing poisoning deaths. A level change would show up as a sudden drop in deaths the quarter after the law took effect. A trend change would show up as a steeper decline in deaths per quarter compared to whatever decline was already happening before the law. An intervention might produce one or both of these effects, and distinguishing between them matters. A level change suggests an immediate impact, while a trend change suggests the intervention gradually altered behavior or outcomes over months or years.

Data Requirements

ITS needs enough data points on both sides of the intervention to reliably estimate trends. The general recommendation is at least 12 time points before and 12 after the intervention. This gives the model enough information to separate a real signal from random noise and to detect issues like autocorrelation, where each data point is correlated with the ones around it. The Cochrane Effective Practice and Organisation of Care group sets a lower floor for systematic reviews, accepting studies with as few as three data points before and after, but this is a minimum for inclusion rather than a standard for strong evidence.

The data points need to be measured at consistent intervals: weekly, monthly, quarterly, or annually. Irregular spacing creates problems for the statistical models that underpin the analysis.

Threats to Validity

The biggest threat to an ITS study is history bias. This happens when some other event occurs around the same time as the intervention and affects the outcome. If a hospital launches a new infection-control program the same month that a new antibiotic becomes available, any drop in infection rates could be due to either change, and the ITS design alone can’t tell the difference.

Other threats include maturation (natural changes in the population over time that have nothing to do with the intervention), regression to the mean (if the intervention was triggered by an unusually extreme value that would have normalized on its own), and instrumentation changes (if the way the outcome is measured shifts partway through the study, such as a new diagnostic code being adopted). Attrition can also be a concern if the people being tracked systematically drop out in ways related to the intervention or the outcome.

Adding a Control Group

A controlled interrupted time series (CITS) addresses many of these threats by adding a comparison group that wasn’t exposed to the intervention. For example, if a smoking ban was implemented in one state, researchers might use a neighboring state without the ban as a control series. If deaths drop in the intervention state but stay flat in the control state, that’s much stronger evidence that the ban caused the change. If deaths drop in both states, something other than the ban is likely responsible.

The logic is simple but powerful: a lack of effect in a well-chosen control group makes the case for causation considerably stronger. When the results of a basic ITS and a controlled ITS agree, researchers can be more confident that the association is causal. This is why CITS designs are considered second only to randomized experiments in their ability to control for bias.

How ITS Differs From Difference-in-Differences

Readers exploring quasi-experimental methods often encounter difference-in-differences (DiD) alongside ITS, and the two are easy to confuse. The key distinction is structural. A standard ITS is a one-group design: you track a single population over time and look for a break in its trend. DiD is a two-group design that compares changes between an intervention group and a control group across two time periods.

DiD doesn’t require a long series of observations. It can work with just one measurement before and one after the intervention for each group. ITS, by contrast, needs many repeated measurements but doesn’t inherently require a control group. DiD relies on a parallel trends assumption, meaning the two groups would have followed the same trajectory in the absence of the intervention. This assumption can’t be formally verified with the collected data, which is a notable limitation. ITS instead relies on the assumption that the pre-intervention trend would have continued unchanged, which can at least be visually and statistically examined.

Handling Autocorrelation and Seasonality

Time series data come with statistical wrinkles that ordinary regression can’t handle well. The most common is autocorrelation: data points close together in time tend to be more similar to each other than distant ones. Monthly flu hospitalizations in January and February, for instance, are more alike than January and July. Standard segmented regression assumes that the residuals (the gaps between predicted and actual values) are independent of each other, which often isn’t true for time series.

Two alternatives address this directly. ARIMA models (autoregressive integrated moving average) explicitly account for the relationships between consecutive observations and the errors from past predictions. They’re considered a more flexible tool for evaluating health interventions. Generalized additive models (GAMs) capture nonlinear relationships without requiring the researcher to specify their exact shape, making them particularly useful when the model might be misspecified. Simulation studies have found that ARIMA tends to perform more consistently across different effect sizes and in the presence of seasonality, while GAMs are more robust when the underlying model structure is wrong.

In software, Stata offers a dedicated command called itsa that performs interrupted time series analysis for single and multiple group comparisons, with options to handle autocorrelation and heteroskedasticity. R users commonly perform segmented regression using general linear modeling packages, with additional tools available for ARIMA modeling and more complex specifications.

When ITS Is the Right Choice

ITS is especially well suited for evaluating population-level interventions where randomization is impractical or unethical: laws, regulations, public health campaigns, system-wide policy changes. It works best when the intervention has a clearly defined start date, when outcome data are collected routinely at regular intervals (like administrative health records), and when enough pre-intervention data exist to establish a reliable baseline trend.

It’s less appropriate when the intervention rolls out gradually with no clear boundary, when fewer than three pre- or post-intervention data points are available, or when complex patterns of instability in the data make trend estimation unreliable. In these situations, other quasi-experimental approaches may be better suited to the question at hand.