A discrete probability distribution is a mathematical model that lists every possible outcome of a random event and assigns a probability to each one. Think of rolling a single die: there are exactly six outcomes (1 through 6), each with a probability of 1/6. That complete list of outcomes and their probabilities is a discrete probability distribution. It only works when the outcomes are countable, separate values, like the number of heads in ten coin flips or the number of emails you receive in an hour.
Two Rules Every Distribution Must Follow
For a set of probabilities to qualify as a valid discrete probability distribution, it must satisfy two conditions. First, every individual probability has to be between 0 and 1. No outcome can have a negative chance of occurring, and no outcome can be more than certain. Second, when you add up the probabilities of all possible outcomes, the total must equal exactly 1. This makes intuitive sense: something has to happen, and the distribution needs to account for every possibility.
If you’re looking at a table of outcomes and probabilities and the numbers don’t sum to 1, or any value is negative, you’re not looking at a legitimate probability distribution.
What Makes It “Discrete”
The word “discrete” refers to the type of values the random variable can take. A discrete random variable lands on specific, countable values. The number of customers who walk into a store today could be 0, 1, 2, 3, and so on. You’ll never get 2.7 customers. Compare that to a continuous random variable like temperature, which can be 72.0°F, 72.001°F, or any value along an unbroken range. Continuous distributions require different math (integration instead of summation) and produce smooth curves instead of individual bars.
“Countable” doesn’t necessarily mean finite. A discrete variable can technically take on infinitely many values, as long as those values can be listed in sequence. The number of coin flips until you get your first heads could be 1, 2, 3, stretching out to infinity. That’s still discrete because you could list every possibility, even though the list never ends.
The Probability Mass Function
The probability mass function, or PMF, is the formula or table that tells you the probability of each specific outcome. If you ask “what’s the probability that X equals exactly 3?” the PMF gives you the answer. For any value that isn’t a possible outcome, the PMF returns zero. Rolling a standard die, the PMF of getting a 7 is simply 0.
There’s a related tool called the cumulative distribution function (CDF), which answers a slightly different question: “what’s the probability that X is less than or equal to some value?” It works by adding up all the PMF values at or below that point. For a die roll, the CDF at 4 would be the probability of rolling a 1, 2, 3, or 4, which is 4/6 or about 0.667.
When you graph a CDF for a discrete variable, it looks like a staircase. The function stays flat between possible values and then jumps up at each outcome. The size of each jump matches the probability of that outcome in the PMF.
Visualizing Discrete Distributions
The standard way to display a discrete distribution is with a probability histogram. The possible outcomes sit along the horizontal axis, and each outcome gets a bar whose height equals its probability. Unlike histograms built from collected data, a probability histogram is theoretical: the bar heights come from the distribution’s rules, not from observations. Each bar has a width of 1, so the area of a bar equals its probability, and the total area of all bars sums to 1.
Mean and Variance
Two numbers summarize the core behavior of any discrete distribution. The expected value (or mean) tells you the long-run average outcome if you repeated the random process forever. You calculate it by multiplying each possible value by its probability and adding up the results. For a fair die, that’s (1 × 1/6) + (2 × 1/6) + … + (6 × 1/6) = 3.5. You’ll never actually roll a 3.5, but over thousands of rolls, your average will hover there.
Variance measures how spread out the outcomes are around that mean. For each possible value, you take the difference between that value and the mean, square it, multiply by the probability of that value, and sum everything up. A small variance means outcomes cluster tightly around the mean. A large variance means they’re scattered. The square root of the variance gives you the standard deviation, which is often easier to interpret because it’s in the same units as the original variable.
Common Discrete Distributions
Binomial Distribution
The binomial distribution models situations where you repeat the same yes-or-no trial a fixed number of times, each trial is independent, and the probability of success stays constant. Flipping a coin 100 times and counting heads is the classic example. The distribution tells you the probability of getting exactly 0 heads, exactly 1 head, exactly 2, and so on up to 100. It’s defined by two parameters: the number of trials and the probability of success on each trial.
In practice, binomial distributions show up whenever you’re counting successes out of a fixed number of attempts. Quality control inspectors checking a batch of 50 parts for defects, marketers tracking how many of 1,000 recipients open an email, or medical researchers recording how many patients in a trial respond to treatment are all working with binomial setups.
Poisson Distribution
The Poisson distribution counts how many times an event occurs in a fixed interval of time or space, when those events happen independently at a roughly constant average rate. How many calls a help desk receives per hour, how many typos appear on a page, or how many accidents occur at an intersection per month are all Poisson scenarios. It’s defined by a single parameter: the average rate of occurrence.
The Poisson distribution is actually a special case of the binomial. When the number of trials is very large (100 or more) and the probability of success on each trial is very small (so that the average number of successes stays at 10 or below), the binomial distribution closely matches a Poisson distribution. This is why the Poisson works well for rare events observed over many opportunities.
Bernoulli Distribution
The simplest discrete distribution has just two outcomes: success or failure, 1 or 0. A single coin flip, a single quality inspection, a single free throw. The Bernoulli distribution is really just a binomial distribution with one trial, but it serves as the building block for more complex models.
Where Discrete Distributions Are Used
Discrete distributions are practical tools across a wide range of fields. In finance, they’re used in options pricing models and in forecasting the probability of market shocks or recessions, where analysts assign probabilities to distinct scenarios. Insurance companies use them to model the number of claims expected in a given period. In biology, researchers use them to predict how many organisms in a sample will carry a particular gene. Manufacturing teams use them to estimate the number of defective items in a production run and set acceptable quality thresholds.
Any time you’re working with countable outcomes and want to know how likely each one is, you’re working with a discrete probability distribution. The math stays the same whether you’re counting goals in a soccer match or server crashes in a data center. Define the possible outcomes, assign valid probabilities that sum to 1, and you have a complete model for making predictions.

