What Is a Bernoulli Distribution in Statistics?

A Bernoulli distribution is the simplest probability distribution in statistics. It models any situation with exactly two outcomes: success or failure, yes or no, 1 or 0. Flip a coin, check whether a patient responds to treatment, or see if a website visitor clicks a button. Each of these is a single Bernoulli trial, and the distribution describes the probability of each result.

How It Works

The entire distribution revolves around one number: p, the probability of success. If you’re flipping a fair coin, p = 0.5. If you’re modeling whether a customer makes a purchase and historically 8% do, p = 0.08. The probability of failure is simply 1 minus p, often written as q.

A Bernoulli random variable can only take two values. It equals 1 (success) with probability p, and 0 (failure) with probability 1 − p. That’s the whole distribution. There’s no middle ground, no range of outcomes. It’s binary by definition.

The Three Requirements

For something to qualify as a Bernoulli trial, it needs to meet three conditions:

Two outcomes only. Every trial produces either a success or a failure. Nothing else.
Constant probability. The probability of success, p, stays the same from one trial to the next. If you’re testing whether emails bounce, the bounce rate can’t shift between sends.
Independence. The result of one trial doesn’t influence the next. Drawing a card from a deck and replacing it before the next draw is independent. Drawing without replacing is not.

These conditions matter because violating any one of them means the Bernoulli model no longer fits your data. If outcomes affect each other or if the probability drifts over time, you need a different distribution.

Mean and Variance

Because the distribution has only one parameter, its statistical properties are straightforward. The mean (expected value) of a Bernoulli random variable is simply p. If there’s a 30% chance of success, the average outcome across many trials converges to 0.3.

The variance, which measures how spread out the results are, equals p(1 − p). This peaks at 0.25 when p = 0.5, meaning a fair coin flip has the maximum possible uncertainty for a binary outcome. As p moves toward 0 or 1, the variance shrinks. An event that almost always happens (or almost never does) has very little variability.

Connection to the Binomial Distribution

The Bernoulli distribution is actually a building block for something larger. If you repeat a Bernoulli trial multiple times and count the total number of successes, you get a binomial distribution. Formally, a binomial random variable with n trials and success probability p is the sum of n independent Bernoulli random variables.

Say you flip a coin 20 times. Each individual flip follows a Bernoulli distribution. The total number of heads across all 20 flips follows a binomial distribution. A Bernoulli distribution is just the special case of a binomial where n = 1: a single trial.

This relationship is useful because it lets you build up complex probability models from the simplest possible unit. Once you understand one yes/no trial, you can scale it to model hundreds or thousands.

Where It Shows Up in Practice

Bernoulli distributions appear whenever you’re tracking a binary outcome across a population or over time. In clinical trials, researchers model whether each patient responds to a treatment as a Bernoulli outcome, with the response probability as the key parameter they’re trying to estimate. Phase II oncology trials, for example, often use single-arm designs where each patient’s result is coded as treatment success or failure.

In machine learning, the Bernoulli distribution underpins classification algorithms like the Bernoulli Naive Bayes classifier, which is commonly used for text classification and spam detection. It works by treating the presence or absence of each word in a document as a separate Bernoulli variable. Does this email contain the word “invoice”? Yes or no. That binary feature, repeated across many words, feeds the classifier’s predictions.

Quality control in manufacturing uses Bernoulli trials to model defective versus non-defective items on a production line. A/B testing on websites treats each visitor’s behavior (clicked or didn’t click, converted or didn’t convert) as a Bernoulli trial. Insurance companies model whether a policyholder files a claim in a given year the same way.

A Brief Origin

The distribution is named after Jakob Bernoulli (1654–1705), a Swiss mathematician who developed foundational ideas about probability between 1684 and 1690. His major work, Ars Conjectandi, introduced what we now call the Law of Large Numbers, which explains why the observed frequency of an event converges to its true probability over many trials. The book was published in 1713, eight years after his death.

Why It Matters

The Bernoulli distribution is worth understanding not because it’s complicated, but because it’s foundational. Nearly every binary question you can ask in data analysis starts here. Will this part fail? Will this voter turn out? Will this ad get clicked? Each of these is a Bernoulli trial, and recognizing that gives you a precise mathematical framework to describe uncertainty, estimate probabilities, and make predictions. More advanced distributions like the binomial, geometric, and negative binomial all grow directly from this one simple yes-or-no model.