Fault tree analysis (FTA) is a top-down method for figuring out all the possible causes of a system failure. You start with one specific unwanted event, like a power outage or a safety system not working, and work backward through every combination of equipment failures, human errors, and environmental conditions that could lead to it. The result is a visual diagram shaped like an inverted tree, with the main failure at the top and root causes branching out below.
Originally developed by Bell Telephone Laboratories in 1962 for the U.S. Air Force’s Minuteman missile system, the technique was later adopted and extensively applied by Boeing. It has since become a standard tool in nuclear power, aerospace, chemical processing, healthcare, and any industry where understanding failure pathways is critical to safety.
How a Fault Tree Works
Every fault tree starts with a single event at the top, called the “top event.” This is the failure you want to prevent or understand. It might be something like “reactor cooling system fails” or “surgical procedure performed on wrong site.” From there, you break the problem down level by level, asking: what conditions or failures could directly cause this event?
Each level introduces more specific faults, connected by logic gates that define the relationships between them. You keep branching downward until you reach root causes that can’t be broken down any further. The finished diagram gives you a complete map of every failure pathway leading to the top event, from the most obvious single-point failures to obscure combinations of small problems that wouldn’t be dangerous on their own.
Logic Gates: AND vs. OR
The two most important building blocks of a fault tree are AND gates and OR gates. They control how failures combine as you move up the tree.
An OR gate means the output event happens if any one of the inputs occurs. If a pump can fail because of a motor burnout, a jammed valve, or a power loss, those three causes sit below an OR gate. Any single one is enough to cause the pump failure above it. OR gates typically represent the more dangerous spots in a system because there are multiple independent paths to failure.
An AND gate means the output event happens only if all of the inputs occur together. For example, a backup generator system might fail only if the primary generator fails and the secondary generator also fails simultaneously. Both conditions must be true. AND gates represent built-in redundancy, where the system can tolerate one failure as long as everything else holds.
This distinction has a practical consequence for safety. When you see an OR gate in your tree, you know that eliminating any single input below it won’t prevent the output, because the other inputs can still trigger it independently. When you see an AND gate, strengthening any one of the inputs makes the overall failure less likely, because every input has to fail for the problem to occur.
Standard Symbols
Fault trees use a small set of standardized shapes so that anyone trained in the method can read a diagram regardless of the industry it came from.
- Rectangle (intermediate event): A failure that results from combinations of lower-level events. These sit in the middle layers of the tree and always have a logic gate beneath them showing what causes them.
- Circle (basic event): A root-cause failure at the lowest level of the tree. These are the elementary faults that can’t be broken down further, like a specific component wearing out or an operator skipping a step.
- Diamond (undeveloped event): A fault that could theoretically be broken down further but isn’t, either because there’s not enough information available or because further detail isn’t needed for the analysis.
Qualitative vs. Quantitative Analysis
A fault tree can be used in two distinct ways, and many teams use both.
Qualitative analysis focuses on identifying all the possible failure pathways without assigning numbers. The key output here is a list of “minimal cut sets,” which are the smallest combinations of basic events that can cause the top event. A minimal cut set with just one event is a single point of failure, meaning one component or error alone can bring the whole system down. A minimal cut set with three events means those three things all have to go wrong simultaneously. Just mapping these out often reveals vulnerabilities that weren’t obvious before.
Quantitative analysis goes a step further by assigning failure probabilities to each basic event and calculating the overall probability of the top event occurring. The math follows directly from the logic gates. For an AND gate, you multiply the probabilities of the inputs together. For an OR gate, you add them (with corrections for overlap in more precise calculations). The probability of the top event is approximately the sum of all the individual cut set probabilities. This lets you put a number on how likely a catastrophic failure actually is, and it shows you exactly which components contribute the most risk so you can prioritize improvements where they matter most.
Where FTA Is Used
Nuclear power has been one of the heaviest users of fault tree analysis since the 1970s. The Nuclear Regulatory Commission’s Fault Tree Handbook remains one of the most referenced guides on the method. In this context, FTA feeds into broader probabilistic risk assessments that evaluate everything from coolant system reliability to containment integrity.
Aerospace relies on FTA for both aircraft design and mission-critical systems. When a new aircraft component is proposed, engineers build fault trees to verify that no realistic combination of failures can lead to a catastrophic outcome, and that redundancy is sufficient.
Healthcare has increasingly adopted the technique for patient safety. One notable application involved analyzing wrong-site surgery, where researchers built a fault tree with 35 faults (25 of them basic events) covering the entire process from operating room scheduling through intraoperative confirmation. The analysis identified five key intermediate failure points: OR scheduling, patient confirmation on the day of surgery, site marking, the time-out process, and intraoperative imaging. By mapping these out, individual hospitals or surgical specialties can adapt the tree to their own settings and target the weakest links in their verification process.
Chemical plants, automotive manufacturing, and software systems all use variations of FTA as well, often as part of regulatory compliance or internal safety management programs.
Strengths of Fault Tree Analysis
FTA’s biggest advantage is that it forces systematic thinking about failure. Rather than relying on intuition or experience to guess what might go wrong, the method walks you through every logical pathway. Complex systems with thousands of components become manageable because the tree structure organizes them hierarchically.
The visual format also makes it a powerful communication tool. A fault tree diagram can be understood by engineers, managers, and regulators without specialized training in probability theory. This makes it especially useful in multidisciplinary teams where people from different backgrounds need to agree on where the risks are and what to do about them. The same diagram that an engineer uses for probability calculations can help a project manager justify design changes or budget allocation for safety improvements.
FTA also offers flexibility. You can use it purely as a qualitative thinking exercise to identify failure modes, or you can layer on quantitative data for rigorous probability estimates. It works for trade-off studies, where you compare the safety impact of different design options, and for root cause analysis after an incident has already occurred.
Limitations to Keep in Mind
Fault tree analysis can be costly and time-consuming, especially for large systems. A complex industrial process might generate a tree with hundreds or thousands of basic events, and constructing it accurately requires deep knowledge of the system being analyzed.
The method also struggles with partial failures. A component is modeled as either working or failed, with no middle ground. In reality, a sensor might be degraded but still providing some useful data, or a valve might be partially stuck. Capturing these gray areas requires workarounds that add complexity.
Human error is another challenge. While human actions can be included as basic events, modeling the full range of human behavior (fatigue, distraction, misunderstanding of procedures) is harder than modeling mechanical component failures, where historical reliability data is more readily available.
Finally, the results of a large fault tree can be difficult to verify independently. Errors in the tree’s logic, like missing a failure mode or incorrectly assigning a gate type, propagate through the entire analysis. Peer review and software tools help, but checking a complex tree by hand is a significant undertaking. Carnegie Mellon’s Software Engineering Institute offers an open-source modeling framework for building fault trees digitally, and various commercial tools exist for large-scale industrial applications.

