Evaluating Adverse Event Causality: Scales and Tools

When evaluating the causality of an adverse event, the core question is whether a drug or device actually caused the harm, or whether something else explains it. Clinicians and safety professionals answer this by systematically weighing timing, alternative explanations, what happened when the drug was stopped, and whether the reaction is already known to occur with that product. Several structured tools exist to standardize this process, each approaching the question from a slightly different angle.

The Key Factors in Any Causality Assessment

Regardless of which formal tool is used, every causality evaluation comes back to the same handful of questions. Did the event happen after the drug was given, and was the timing consistent with how that drug works in the body? Could the patient’s underlying disease, another medication, or something else entirely explain what happened? Did the event improve when the drug was stopped? And is this type of reaction already documented for this drug?

Two concepts sit at the heart of this process: dechallenge and rechallenge. Dechallenge refers to what happens when the suspected drug is withdrawn. A “positive dechallenge” means the adverse event resolved or improved after stopping the drug, which supports a causal link. A “negative dechallenge” means the event continued on its own course regardless. Rechallenge, reintroducing the drug to see if the reaction returns, is considered essential to confirm a causal relationship, though it’s rarely done deliberately because of the ethical problem of knowingly re-exposing someone to potential harm.

The WHO-UMC Scale: Four Levels of Certainty

The World Health Organization’s Uppsala Monitoring Centre (WHO-UMC) system classifies causality into four main categories. An event is considered “certain” when it has a plausible time relationship to the drug, cannot be explained by the patient’s disease or other medications, resolves in a way that makes pharmacological sense when the drug is withdrawn, and (if needed) recurs on rechallenge. “Probable/likely” requires a reasonable time relationship and an event unlikely to be caused by other factors, but does not require rechallenge. “Possible” applies when the timing fits but the event could also be explained by the disease or other drugs, and information about withdrawal may be unclear. “Unlikely” is reserved for events where the timing alone makes a drug connection improbable, and other explanations are more convincing.

This scale is widely used in global pharmacovigilance because it’s straightforward and doesn’t require scoring arithmetic. It relies on clinical judgment applied within a defined framework.

The Naranjo Scale: A Scored Questionnaire

The Naranjo Adverse Drug Reaction Probability Scale takes a more structured approach by asking 10 specific yes/no questions and assigning point values to each answer. Points range from -1 to +2 per question, and the total score falls somewhere between -4 and +13. A score of 9 or higher means the reaction is considered definite. Scores of 5 to 8 indicate probable, 1 to 4 indicate possible, and 0 or below means the association is doubtful.

The questions themselves walk through the same core factors: whether there are previous conclusive reports of this reaction, whether the event appeared after the drug was given, whether it improved when the drug was withdrawn, and whether it reappeared on rechallenge. The advantage of this approach is consistency. Two different evaluators reviewing the same case should, in theory, arrive at similar scores. In practice, some subjectivity still creeps in, particularly on questions where the answer is “do not know.”

RUCAM: Built for Liver Injury

Some types of adverse events are common enough and complex enough to warrant their own specialized tool. Drug-induced liver injury is one, and the Roussel Uclaf Causality Assessment Method (RUCAM) was designed specifically for it. RUCAM scores seven categories covering eight separate factors: time to onset, the course of the injury after stopping the drug, patient risk factors (two separate scores), whether other drugs could be responsible, whether non-drug causes of liver injury have been ruled out, previous information about the drug’s liver toxicity, and the response to rechallenge.

Each factor has its own point range, some as wide as -3 to +3. This granularity matters because liver injury can be caused by dozens of things: viral hepatitis, alcohol use, autoimmune conditions, herbal supplements, and other medications the patient may be taking simultaneously. RUCAM forces the evaluator to work through each alternative explanation systematically rather than relying on a general impression.

Expert Judgment vs. Algorithms vs. Probabilistic Models

A French pharmacovigilance study compared three fundamentally different approaches to causality assessment: consensual expert judgment (a panel of specialists reaching agreement), an algorithmic method (a structured decision tree used in French pharmacovigilance since 1985), and a probabilistic model based on a logistic function. The results revealed striking trade-offs.

The algorithmic approach had high specificity (0.92) but poor sensitivity (0.42), meaning it was good at correctly identifying events that were not drug-caused but missed many that were. The probabilistic method showed the opposite pattern: high sensitivity (0.96) but low specificity (0.42), catching nearly all true reactions but also flagging many false positives. Expert judgment, meanwhile, failed to discriminate in 10 of the cases evaluated. Agreement between all three methods was poor overall, aligning only for events that expert consensus already considered drug-induced.

The practical takeaway is that no single method is universally best. Algorithmic tools like Naranjo are conservative and structured, making them useful for standardized reporting. Probabilistic models may be better suited for automated screening of large databases where missing a real signal is the bigger risk. Expert judgment remains important for complex individual cases but is harder to scale and reproduce.

Confounding Factors That Complicate the Picture

The biggest challenge in causality assessment is separating the drug’s effects from everything else happening in the patient’s body. Hospitalized patients are particularly difficult to evaluate because they often have multiple diseases, take several medications simultaneously, and undergo procedures that could independently cause the same symptoms being investigated. Research on trigger-based detection systems has found that these confounding variables, particularly the clinical conditions of inpatients, consistently impair detection tools by underestimating the true causal association between drugs and adverse events.

Common confounders include the natural progression of the disease being treated, interactions between multiple drugs, pre-existing organ damage that makes a patient more vulnerable, and lifestyle factors like alcohol use. This is why structured tools explicitly ask evaluators to consider and rule out alternative explanations before assigning a causality category.

Population-Level Causality: The Bradford Hill Criteria

Individual case assessment is only one piece of the puzzle. When looking at whether a drug causes harm across an entire population, epidemiologists turn to the nine Bradford Hill criteria: strength of association, consistency across different studies, specificity of the effect, temporality (exposure must precede the outcome), biological gradient (higher doses causing more harm), plausibility based on known biology, coherence with existing knowledge, experimental evidence, and analogy to similar known relationships.

These criteria were originally developed for occupational and environmental health but are now applied in pharmacovigilance and pharmacoepidemiology. They don’t function as a checklist where every box must be ticked. Instead, they provide a framework for weighing evidence. Temporality is the only criterion considered absolutely necessary: if the drug wasn’t given before the event occurred, causality is off the table.

Regulatory Reporting Thresholds

Causality assessment doesn’t happen in a vacuum. Regulatory frameworks define when adverse events must be reported, and the threshold is deliberately low. In the United States, a serious adverse event is one that is life-threatening, results in permanent impairment of a body function or permanent damage to a body structure, or requires medical or surgical intervention to prevent such permanent harm. Device user facilities must report deaths and serious injuries within 10 working days. Manufacturers face an even tighter window of 5 working days when the event necessitates remedial action to prevent unreasonable risk to public health.

Critically, the reporting standard is “reasonably suggests” a causal connection, not proof. A manufacturer must report an event that reasonably suggests their device may have caused or contributed to a death or serious injury. This means reports are filed at the “possible” level of causality or even lower, and deeper assessment happens afterward. The system is designed to capture signals early, accepting that many reported events will ultimately turn out to be unrelated to the product.