What Is a Root Cause Analysis in Healthcare?

A root cause analysis (RCA) is a structured investigation method used in healthcare to figure out why a serious patient safety event happened and how to prevent it from happening again. The defining principle is that it looks at system failures, not individual mistakes. Rather than asking “who messed up?”, an RCA asks “what about our systems allowed this error to reach the patient?” Hospitals, clinics, and other healthcare organizations use RCAs after events like wrong-site surgeries, medication errors, patient falls resulting in serious harm, and unexpected deaths.

Systems Thinking Over Individual Blame

The core philosophy behind RCA is that healthcare errors almost never come down to one person making one bad decision. They result from a chain of smaller failures in the broader system. The Agency for Healthcare Research and Quality distinguishes between two types of errors that RCA aims to uncover: active errors, which happen at the point of contact between a person and the system (a nurse selecting the wrong medication from a drawer, for example), and latent errors, which are hidden problems baked into the system itself (like two medications stored side by side in nearly identical packaging).

This systems-first approach matters because punishing individuals doesn’t fix the conditions that made the error possible. If a pharmacist misreads a prescription because the handwriting was illegible, firing the pharmacist doesn’t address the fact that the next pharmacist will face the same illegible handwriting. An RCA would trace back to the ordering system and recommend electronic prescribing or standardized order forms.

The Swiss Cheese Model

One of the most useful ways to understand how errors reach patients is the Swiss cheese model, developed by psychologist James Reason. Picture several slices of Swiss cheese stacked in a row. Each slice represents a safety barrier: a double-check by a second nurse, a barcode scan before administering medication, a surgical timeout to confirm the correct site. Every barrier has holes, meaning no single safeguard is perfect. An error reaches a patient only when the holes in multiple slices happen to line up at the same moment.

In theory, it should be rare for all those holes to align. In practice, healthcare systems often share what accident analysts call a “common failure mode,” where the same underlying problem (understaffing, poor equipment design, inconsistent training) weakens several barriers at once. When that happens, the holes are no longer random. They’re already partially aligned, and it takes only one more gap for harm to occur. RCA is designed to find those shared weak points and close them.

How the Investigation Works

An RCA follows a general sequence, though the specific tools vary by organization. It typically begins with building a detailed timeline of the event: what happened, when, who was involved, and what decisions were made at each step. Teams use process flow diagrams to map the sequence of actions and identify where things went off track.

From there, the team works to identify all possible contributing factors. A common tool for this is the fishbone diagram (also called an Ishikawa diagram), which organizes potential causes into major categories. In healthcare, these branches typically include equipment and supply factors, environmental factors, rules and policy factors, and people and staffing factors. Each branch gets explored for deeper causes.

The team then gathers additional information through interviews, chart reviews, and sometimes surveys of the staff involved. The goal is to separate contributing factors (things that made the error more likely) from true root causes (the underlying conditions that, if fixed, would prevent recurrence). Finally, the team develops corrective actions and a plan for measuring whether those actions actually work. A principle called the Pareto chart helps prioritize: roughly 80% of problems tend to stem from about 20% of causes, so the team focuses resources on the factors with the greatest impact.

The 5 Whys Technique

One of the simplest and most widely used RCA tools is the “5 Whys,” which involves repeatedly asking “why did this happen?” until the team reaches a root cause rather than a surface-level explanation. The test at each step is straightforward: if you fixed this particular problem, would the event still be likely to recur? If yes, you haven’t found the root cause yet. Keep asking.

A non-medical example from CMS illustrates this well. Say you got a flat tire because you drove over nails on your garage floor. Why were there nails? A box on the shelf got wet and fell apart. Why was the box wet? There’s a leak in the roof. If you had stopped at “nails on the floor” and simply swept them up, the leaking roof would eventually cause the same problem again. In healthcare, the logic is identical. A medication error caused by a nurse working a 16-hour shift isn’t solved by retraining the nurse. You need to ask why the nurse was working 16 hours, which might reveal scheduling policies, staffing shortages, or budget decisions that are the real root cause. It often takes three to five rounds of “why,” though sometimes more.

Who Conducts the Analysis

RCAs are performed by multidisciplinary teams, not by a single manager or department head. The VA’s National Center for Patient Safety, which runs one of the most established RCA programs in the country, emphasizes that frontline staff are usually in the best position to identify both problems and solutions. Teams typically include clinicians who understand the clinical workflow, quality and safety staff who know the investigation methodology, and representatives from any department involved in the event. Leadership support is also considered essential, not to direct the investigation, but to ensure the team has the authority and resources to implement real changes.

Strong Actions vs. Weak Actions

Not all corrective actions are equally effective. RCA teams use an “action hierarchy” to rank their recommendations from strongest to weakest. The distinction comes down to how much the fix depends on human behavior.

  • Stronger actions physically eliminate the hazard or make the error nearly impossible. Examples include redesigning a physical space, introducing engineering controls that force the correct action (like a connector that physically cannot attach to the wrong port), simplifying a process, or standardizing equipment across the organization.
  • Intermediate actions reduce risk but still rely partly on people doing the right thing. These include adding redundancy (a second verification step), reducing workload, implementing checklists, removing look-alike and sound-alike medications from the same storage area, and simulation-based training with regular refreshers.
  • Weaker actions depend almost entirely on human memory and compliance. Sending out a new policy memo, holding a one-time training session, adding a warning label, or requiring double-checks all fall into this category. They’re the easiest to implement but the least likely to prevent the next event.

A common criticism of RCA programs is that they produce too many weaker actions. Writing a new policy feels productive, but it changes nothing if the system conditions that caused the error remain in place. The strongest RCAs push for architectural, engineering, or process-level changes that don’t rely on anyone remembering to follow a new rule.

When RCA Is Required

The Joint Commission, which accredits most U.S. hospitals, requires a comprehensive systematic analysis (most commonly an RCA) after any sentinel event. A sentinel event is defined as a patient safety event that results in death, severe harm, or permanent harm, and that isn’t simply the natural progression of a patient’s illness. All sentinel events must undergo analysis regardless of whether the organization reports them to the Joint Commission.

If a sentinel event is reported, the organization has 45 business days from the event (or from becoming aware of it) to complete a thorough analysis and submit a corrective action plan. If the Joint Commission finds the response inadequate, the organization gets an additional 15 business days to revise and resubmit. Failure to submit within an additional 45 days beyond the original deadline can affect the organization’s accreditation status.

RCA2: Closing the Action Gap

Traditional RCA has faced valid criticism for producing detailed analyses that sit in binders and never translate into real change. In response, the National Patient Safety Foundation introduced a framework called RCA² (root cause analysis and action) in 2015. The added emphasis on “action” is the point. RCA² provides strategies for conducting more efficient investigations and, more importantly, tools for evaluating whether the corrective actions that come out of those investigations are actually implemented and sustained. The renaming was deliberate: an analysis that doesn’t lead to meaningful system change hasn’t fulfilled its purpose.

Legal Protections for RCA Findings

One reason healthcare workers are willing to speak candidly during an RCA is that the findings are generally protected from legal discovery. Under peer review privilege laws, the proceedings, records, and recommendations of a peer review organization are confidential and cannot be obtained through lawsuits. A witness who participates in an RCA can still testify about things they personally know, but they cannot be compelled to reveal what was discussed during the review itself or what opinions they formed as a result. Documents that exist independently (like a patient’s medical record) don’t become protected just because they were presented during an RCA, but the analysis itself, including the team’s conclusions and recommendations, is shielded. These protections vary by state, but the underlying principle is consistent: if people fear legal consequences for honest reporting, they stop reporting, and the system loses its ability to learn from mistakes.