How to Create a Probability Distribution Step by Step

Creating a probability distribution means listing every possible outcome of an event and assigning each outcome a probability, where every probability is between 0 and 1 and they all sum to exactly 1. Whether you’re working from raw data, building a theoretical model, or using spreadsheet software, the core process follows the same logic. Here’s how to do it.

Two Types of Distributions

Before you start, you need to know which type of distribution fits your situation, because the approach differs for each.

A discrete distribution covers outcomes you can count: the number of defective items in a shipment, the number of heads in 10 coin flips, or the number of customers who walk into a store each hour. There’s a finite (or countably infinite) set of possible values.

A continuous distribution covers outcomes you measure on a scale: height, weight, temperature, time until failure. The variable can land on any value within a range, including fractions and decimals. You can’t list every possible value individually, so instead of assigning probability to each exact number, you work with ranges and use a curve (called a density function) to describe how likely different intervals are.

Building a Discrete Distribution From Data

This is the most hands-on method and the best place to start if you’re learning. You collect data, count how often each outcome occurs, and convert those counts into probabilities.

Step 1: List All Possible Outcomes

Identify every distinct value your variable can take. If you’re looking at the number of goals scored per soccer match, your outcomes might be 0, 1, 2, 3, 4, 5, and so on. Write them in a column.

Step 2: Count Frequencies

Go through your data and tally how many times each outcome occurred. Say you observed 50 matches: 0 goals happened 8 times, 1 goal happened 14 times, 2 goals happened 15 times, 3 goals happened 9 times, 4 goals happened 3 times, and 5 goals happened 1 time.

Step 3: Convert Counts to Probabilities

Divide each frequency by the total number of observations. With 50 matches, the probability of 0 goals is 8/50 = 0.16, the probability of 1 goal is 14/50 = 0.28, and so on. Each value is your experimental probability for that outcome.

Step 4: Validate the Distribution

Check two rules that every valid probability distribution must satisfy. First, no probability can be negative. Second, all probabilities must add up to 1. If your values sum to 0.98 or 1.03 due to rounding, adjust so the total is exactly 1. If the numbers don’t follow these rules, you don’t have a valid distribution.

Once validated, you can display the result as a table (outcome in one column, probability in the other) or as a bar chart where the height of each bar represents the probability of that outcome. This visual form is called a probability mass function.

Using a Theoretical Model Instead

When you don’t have data, or when your situation matches a well-known pattern, you can build a distribution from a formula rather than from observations. Two of the most common models are the binomial and the normal.

The Binomial Distribution

Use this when you’re counting successes in a fixed number of independent trials, each with the same probability of success. Classic examples: the number of patients who respond to a treatment out of 20, the number of defective parts in a batch of 100, or the number of sales calls that convert out of 50 attempts.

You need two pieces of information (called parameters): the number of trials (n) and the probability of success on any single trial (p). With those, the probability of getting exactly x successes is calculated using the binomial formula, which multiplies the probability of x successes by the probability of (n minus x) failures, then accounts for all the different orderings those successes and failures could occur in.

For example, if you flip a fair coin 10 times (n = 10, p = 0.5), you can compute the probability of getting exactly 0 heads, exactly 1 head, exactly 2 heads, all the way up to 10 heads. Plot those probabilities against each outcome and you have your binomial distribution. The shape will be symmetric when p is 0.5, and skewed when p is closer to 0 or 1.

The Normal Distribution

This is the classic bell curve, and it’s the go-to model for continuous data like blood pressure readings, exam scores, or manufacturing measurements. It’s defined by just two parameters: the mean (center of the curve) and the standard deviation (how spread out the data is).

The mean determines where the peak sits on the number line. The standard deviation controls width: a small standard deviation produces a tall, narrow bell, while a large one produces a flat, wide bell. A useful rule of thumb is that about 68% of values fall within one standard deviation of the mean, about 95% fall within two, and about 99.7% fall within three.

To create a normal distribution, you estimate the mean and standard deviation from your data (or specify them based on what you know about the situation), then use the normal density function to generate the curve. In practice, almost nobody computes this by hand. Software does it instantly.

Creating Distributions in Excel

Spreadsheet software makes this straightforward, especially for binomial and normal distributions.

For a binomial distribution, use the BINOM.DIST function. The syntax is BINOM.DIST(number_s, trials, probability_s, cumulative). The first argument is the number of successes you want the probability for, the second is total trials, the third is the probability of success on each trial, and the fourth is TRUE or FALSE. Set it to FALSE to get the probability of that exact number of successes. Set it to TRUE to get the probability of that many successes or fewer (the cumulative probability).

To build a full distribution table, create a column with every possible number of successes (0 through n), then use BINOM.DIST in the next column for each value. Highlight the table and insert a bar chart for a quick visual.

For a normal distribution, use NORM.DIST. It takes four arguments: the x value, the mean, the standard deviation, and TRUE/FALSE for cumulative. Setting the last argument to FALSE gives you the height of the density curve at that point, which is useful for plotting. Setting it to TRUE gives you the probability of being at or below that value, which is what you typically need for practical questions like “what’s the chance a measurement falls below 50?”

Checking Whether Your Model Fits

After choosing a theoretical distribution, you should verify that it actually matches your data. The simplest approach is visual: plot your observed data as a histogram alongside the theoretical curve and see if the shapes align.

For a more rigorous check, statistical goodness-of-fit tests compare your data against a theoretical distribution and tell you whether the difference is small enough to be explained by random chance. The chi-square test works for both discrete and continuous data and is the most widely used. The Kolmogorov-Smirnov test is another option for continuous distributions, though it’s more sensitive to differences near the center of the distribution than at the extremes, and it requires the distribution to be fully specified in advance (you can’t estimate parameters from the same data you’re testing). For checking whether data follows a normal distribution specifically, the Shapiro-Wilk test is a common choice.

A poor fit doesn’t mean you did something wrong. It means your data follows a different pattern than the one you assumed, and you should try another distribution type.

Practical Uses for Probability Distributions

Building a probability distribution isn’t just an academic exercise. In finance, analysts assign probability distributions to risk factors like stock returns or interest rate changes, then use those distributions to estimate expected losses, optimize portfolios, and develop strategies to hedge against worst-case scenarios. In manufacturing, distributions help quantify risks around supplier reliability, production delays, and defect rates, letting companies prioritize where to invest in quality control. In healthcare, distributions model patient outcomes, helping researchers design clinical trials with the right number of participants to detect a treatment effect.

The common thread is that a probability distribution turns vague uncertainty (“this could go badly”) into a specific, quantified picture of what’s likely, what’s possible, and what’s rare. That picture is what lets you make informed decisions rather than guesses.