What Level of Evidence Is a Quasi-Experimental Study?

A quasi-experimental study is classified as Level II evidence in the most widely used nursing and healthcare evidence hierarchy, the Johns Hopkins Evidence-Based Practice model. That places it one tier below randomized controlled trials (Level I) and one tier above observational and descriptive studies (Level III). Understanding why it lands there requires knowing what makes a quasi-experimental study different from a true experiment and how various frameworks judge its strengths and limitations.

Where It Falls in the Evidence Hierarchy

The Johns Hopkins Evidence-Based Practice model, one of the most commonly referenced scales in healthcare, assigns quasi-experimental studies to Level II. Level I is reserved for randomized controlled trials (RCTs) and systematic reviews of RCTs. Level II also includes systematic reviews that combine RCTs with quasi-experimental studies, or that review quasi-experimental studies alone, with or without meta-analysis. Level III covers non-experimental and qualitative research.

Other hierarchies position quasi-experimental evidence similarly. In the traditional “evidence pyramid” used across many disciplines, quasi-experimental designs sit below RCTs but above cohort studies, case-control studies, and expert opinion. The exact label varies (some scales call it Level II, others Level III depending on how many tiers they use), but the relative position stays the same: stronger than purely observational research, weaker than randomized trials.

Why It Ranks Below an RCT

The single feature that separates a quasi-experimental study from a true experiment is randomization. In an RCT, participants are randomly assigned to either the treatment group or the control group, which distributes known and unknown differences between people roughly equally across both groups. A quasi-experimental study tests an intervention but does not randomly assign participants. Instead, groups might be formed based on which hospital ward a patient is already on, which school a student attends, or which clinic they visit.

Without random assignment, the groups being compared may differ in ways the researchers can’t fully account for. This is known as selection bias, and it’s the primary reason quasi-experimental studies rank lower than RCTs. If the treatment group happens to be younger, healthier, or more motivated than the comparison group, the results may reflect those differences rather than the effect of the intervention itself.

Beyond selection bias, quasi-experimental designs are vulnerable to several other threats to internal validity that were first cataloged by the psychologist Donald T. Campbell in 1957 and remain central to research methods today: history (outside events that affect the outcome), maturation (natural changes in participants over time), testing effects (participants performing differently simply because they’ve been measured before), statistical regression (extreme scores drifting toward the average on repeat measurement), instrument decay (changes in how outcomes are measured over time), and participant dropout. True experiments with random assignment control for most of these automatically. Quasi-experimental designs require careful statistical techniques and study planning to address them, and they can’t always do so completely.

Why It Ranks Above Observational Research

Despite its limitations, a quasi-experimental study still involves a deliberate intervention. Researchers actively do something to one group and compare the results, which puts it a meaningful step above purely observational designs like cohort studies or cross-sectional surveys where researchers simply watch what happens without intervening. That active manipulation of a variable is what keeps quasi-experimental research closer to the experimental end of the spectrum and earns it a higher evidence ranking.

A quasi-experimental study also typically includes a comparison group, even if that group wasn’t formed through randomization. Having a comparison group, even a non-equivalent one, provides a stronger basis for drawing causal conclusions than a single-group study with no comparison at all.

How the GRADE Framework Handles It Differently

Not every evidence system uses a fixed ladder. The GRADE framework (Grading of Recommendations, Assessment, Development and Evaluation), widely used in clinical guideline development, takes a different approach. Instead of permanently assigning a study to a tier based on its design, GRADE starts with a baseline and then adjusts up or down based on the quality of the specific study.

Under GRADE, evidence from observational and quasi-experimental designs generally starts at a low certainty rating, while RCTs start at high certainty. But the rating can shift. If a quasi-experimental study shows a very large treatment effect, a clear dose-response relationship (where more of the treatment produces more of the effect), or if all plausible biases would have reduced the observed effect rather than inflated it, the certainty rating can be upgraded. Conversely, even an RCT can be downgraded if it has a high risk of bias, inconsistent results across sites, or indirect evidence that doesn’t quite match the clinical question being asked.

This means a well-designed quasi-experimental study can sometimes carry more weight in GRADE than a poorly conducted randomized trial. The framework rewards execution, not just design.

When Quasi-Experimental Designs Are the Best Option

Quasi-experimental studies exist because randomization isn’t always possible. Sometimes the barrier is ethical: you can’t randomly assign people to smoke, experience poverty, or skip a vaccine during an outbreak. Sometimes it’s practical: a hospital implementing a new safety protocol across all its units can’t randomize which patients receive the new protocol and which don’t, because the change applies to everyone in that setting.

Education research relies heavily on quasi-experimental designs because schools rarely allow students to be randomly shuffled between classrooms or curricula. Public health interventions, policy changes, and community-level programs face the same constraints. In these fields, a quasi-experimental study with a well-chosen comparison group is often the strongest evidence that can realistically be produced.

Common Quasi-Experimental Designs

  • Non-equivalent groups design: Two existing groups (for example, two hospital units or two classrooms) are compared, with one receiving the intervention and the other serving as a comparison. Researchers measure outcomes in both groups, but since people weren’t randomly assigned, the groups may differ in important ways from the start.
  • Pretest-posttest design: A single group is measured before and after an intervention. Any change is attributed to the intervention, though outside events or natural changes over time could also explain the results.
  • Interrupted time series: Researchers collect data at multiple points before and after an intervention is introduced. The pattern of change over time helps distinguish the intervention’s effect from trends that were already underway. This is one of the stronger quasi-experimental designs because the repeated measurements make it easier to spot pre-existing trends.

Each design handles the threats to validity differently. An interrupted time series, for instance, deals well with maturation and historical trends because you can see the trajectory before the intervention started. A non-equivalent groups design is more vulnerable to selection bias but stronger than a single-group study because it at least offers a point of comparison.

What This Means in Practice

If you’re evaluating research for a class, a clinical question, or a guideline, the key takeaway is straightforward: a quasi-experimental study provides moderate-strength evidence. It’s not as convincing as a well-run randomized trial, but it’s substantially stronger than a case report, expert opinion, or observational study with no intervention. In the Johns Hopkins model specifically, it’s Level II. In GRADE, its rating depends on how well the study was executed and whether specific quality factors push the certainty up or down.

When reading a quasi-experimental study, the most important question to ask is how well the researchers addressed selection bias. Did they match the groups on key characteristics? Did they use statistical methods to adjust for baseline differences? Did they measure outcomes at multiple time points? The answers to those questions determine whether the study lives up to its Level II potential or falls short of it.