P(B|A) represents the conditional probability of event B occurring given that event A has already occurred. The vertical bar “|” is read as “given,” so the full expression is spoken aloud as “the probability of B given A.” It answers a specific question: if you already know A happened, how likely is B?
The Formula Behind the Notation
P(B|A) is calculated by dividing the probability of both events happening together by the probability of event A alone:
P(B|A) = P(A and B) / P(A)
This formula only works when P(A) is greater than zero, which makes intuitive sense. You can’t ask “given that A happened” if A is impossible. The numerator, P(A and B), captures the overlap between the two events. The denominator, P(A), scales everything relative to A. You’re essentially zooming in on the world where A is true and asking how much of that world also contains B.
Why It Works: The Shrinking Sample Space
The key idea behind conditional probability is that learning new information shrinks the set of possibilities you’re considering. Normally, you’d calculate a probability against the entire sample space, meaning every possible outcome. But once you know event A occurred, the sample space shrinks to just the outcomes inside A. You’re no longer asking “how likely is B out of everything?” You’re asking “how likely is B out of only the outcomes where A happened?”
A Venn diagram makes this concrete. Imagine two overlapping circles, one for A and one for B. Without any conditions, each circle’s size relative to the whole rectangle represents its probability. But P(B|A) ignores everything outside circle A. Your new “whole world” is circle A, and you’re measuring what fraction of it overlaps with circle B.
A Medical Example
Conditional probability shows up constantly in medical testing. Consider sensitivity, a measure of how well a diagnostic test catches a disease. Sensitivity is a conditional probability: the probability of getting a positive test result given that the patient actually has the disease.
For exercise electrocardiograms used to detect coronary artery disease, studies found that about 70% of patients who had confirmed disease also had a positive test result. That means P(positive test | disease) = 0.70. Notice the direction matters here. This does not tell you the probability that a patient with a positive test actually has the disease. That’s a different conditional probability, with the events flipped.
The same test had a false-positive rate of about 15%, meaning 15% of patients without coronary artery disease still got a positive result. That’s P(positive test | no disease) = 0.15. These two conditional probabilities together define how useful the test is, but neither one alone tells you what a positive result means for any individual patient.
The Most Common Mistake: Flipping the Condition
P(B|A) and P(A|B) are not the same thing. This confusion is so widespread it has a name in legal contexts: the prosecutor’s fallacy. It involves confusing the probability that someone is guilty given the evidence with the probability of the evidence given that someone is guilty. Those sound similar but can produce wildly different numbers.
Here’s why. Suppose a DNA test has a one-in-a-million false match rate. P(match | innocent) = 0.000001. It’s tempting to flip that and conclude P(innocent | match) = 0.000001, meaning the suspect is almost certainly guilty. But that ignores how many people were tested, how common the crime is, and other base-rate information. The probability of the evidence given guilt is not the probability of guilt given the evidence. Whenever you see P(B|A), pay close attention to which event is before the bar and which is after it. Swapping them changes the question entirely.
What Happens When Events Are Independent
If knowing that A occurred doesn’t change the likelihood of B at all, the two events are independent. Mathematically, this means P(B|A) simply equals P(B). Whether or not A happens is irrelevant to B.
For example, if you flip a coin and roll a die, knowing the coin landed heads tells you nothing about whether the die shows a six. P(six | heads) = P(six) = 1/6. Independence is actually defined this way in probability: two events are independent precisely when conditioning on one doesn’t change the probability of the other.
How P(B|A) Connects to Bayes’ Theorem
Bayes’ theorem is the formal tool for reversing a conditional probability, going from P(B|A) to P(A|B) or vice versa. The formula is:
P(A|B) = P(B|A) × P(A) / P(B)
Each piece has a role. P(A) is called the prior probability, your belief about A before considering B. P(B|A) is called the likelihood, how probable the evidence B is if A were true. P(A|B) is the posterior probability, your updated belief about A after learning B occurred. P(B) in the denominator acts as a scaling factor to keep everything between 0 and 1.
Returning to the medical example: if the pre-test probability of coronary artery disease is 0.30, and the exercise electrocardiogram has a sensitivity of 0.70 with a false-positive rate of 0.15, Bayes’ theorem combines all of this to show the probability of disease given a positive test is about 0.66. The test moved the probability from 30% to 66%, a meaningful update but far from certainty. Without Bayes’ theorem, you’d be stuck with P(positive | disease) and no way to answer the question patients actually care about: P(disease | positive).
Reading the Notation at a Glance
- P(B|A): the probability of B happening, assuming A has already happened
- The vertical bar “|”: means “given” or “conditional on”
- Left side of the bar (B): the event whose probability you’re calculating
- Right side of the bar (A): the event you’re treating as known or already occurred
Whenever you encounter this notation, start by identifying what’s on each side of the bar. The event on the right is your given information, your new reality. The event on the left is what you want to know. That framing will keep you from accidentally flipping the two, which is the single most common source of errors in applied probability.

