How to Run a Monte Carlo Simulation: 5 Steps

A Monte Carlo simulation works by running a model thousands of times with randomly generated inputs, then analyzing the spread of results to understand risk and uncertainty. Instead of plugging in a single “best guess” for each variable, you define a range of possibilities for each one, let the computer sample from those ranges repeatedly, and end up with a distribution of outcomes rather than a single number. Here’s how to build and run one from scratch.

The Core Idea Behind Monte Carlo

Every Monte Carlo simulation follows the same basic logic. You model a system by describing each uncertain variable as a probability distribution: a function that defines what values are possible and how likely each one is. Then you repeatedly sample random values from those distributions, calculate the result each time, and tally the outcomes. After enough repetitions, the collection of results tells you not just what’s likely to happen, but how likely each outcome is.

Think of it like rolling dice thousands of times and recording every result. No single roll tells you much, but after 10,000 rolls you have a reliable picture of the odds. The same principle applies whether you’re modeling investment returns, project timelines, engineering tolerances, or drug trial outcomes.

Five Steps to Run a Simulation

1. Define the Problem and Your Goal

Start by identifying what you’re trying to estimate. This might be the probability of a project finishing late, the expected profit from a product launch, or the chance that a portfolio loses more than 10% in a year. Your goal determines what you model and what output you care about.

2. Identify Your Uncertain Variables

List every input that could vary. For a product launch, that might be unit demand, raw material cost, and selling price. For each variable, choose a probability distribution that reflects reality. A variable with equal chance of falling anywhere in a range gets a uniform distribution. One that clusters around an average with some spread gets a normal (bell curve) distribution. Sales data that skews right might use a lognormal distribution. The distribution you pick matters: it encodes your assumptions about what’s plausible.

3. Build the Model

Write the formula or set of formulas that connects your inputs to your output. If you’re modeling profit, that might be: Profit = (Price × Demand) − (Fixed Costs + Variable Cost × Demand). The model can be simple arithmetic or a complex chain of calculations. What matters is that it accepts your uncertain variables as inputs and produces the quantity you care about as output.

4. Run the Simulation

For each iteration, the computer draws a random value for every uncertain variable from its assigned distribution, plugs those values into your model, and records the result. Repeat this thousands of times. Each iteration is one possible version of reality. One thousand iterations is a bare minimum for rough estimates; 10,000 to 100,000 is more common for reliable results. The error in your estimate shrinks in proportion to the square root of the number of iterations, so going from 1,000 to 10,000 runs cuts your error roughly in thirds.

5. Analyze the Output Distribution

After all iterations complete, you have a dataset of outcomes. Plot a histogram to see the shape of the distribution: where results cluster, how wide the spread is, and whether the tails are fat or thin. Calculate summary statistics like the mean, median, standard deviation, and specific percentiles. If you want to know the chance that profit exceeds $50,000, count how many iterations produced a result above that threshold and divide by the total number of iterations.

How to Read Your Results

The histogram of your output is the most intuitive tool. It shows you the full range of possible outcomes and how frequently each one occurred across your iterations. A tight, tall peak means your outcome is fairly predictable. A wide, flat spread means high uncertainty.

For risk analysis, percentiles are especially useful. The 5th percentile tells you the outcome that’s worse than 95% of all scenarios, giving you a “bad case” benchmark. In finance, this concept is formalized as Value at Risk (VaR): the 99% VaR is the loss level that would only be exceeded 1% of the time. You calculate it by sorting all your simulated outcomes from smallest to largest and picking the value at the relevant position. Expected Shortfall goes a step further, telling you the average loss in those worst-case scenarios, which helps you understand how bad things get when they do go wrong.

A cumulative distribution function (CDF) is another useful view. It plots, for any given value on the x-axis, the probability that your outcome falls at or below that value. This lets you quickly answer questions like “What’s the probability my project costs less than $200,000?” by reading the y-axis value at $200,000.

Running a Simulation in Excel

You don’t need programming skills to run a Monte Carlo simulation. Excel’s built-in functions handle it well for straightforward models. The key function is RAND(), which generates a random number between 0 and 1 each time the spreadsheet recalculates. You can transform this into any distribution you need. For a normal distribution, wrap it in NORM.INV(RAND(), mean, standard_deviation). For a uniform distribution between two bounds, use lower + RAND() × (upper − lower).

The trick to running thousands of iterations efficiently is Excel’s Data Table feature. Set up your model so that a single cell contains your output formula. In a column, list trial numbers from 1 to 1,000. Reference your output cell in the corner of the table range. Then select the entire range, go to Data > What-If Analysis > Data Table, and point the column input cell to any blank cell. When you click OK, Excel recalculates your model 1,000 times, each with fresh random values, and fills the table with results. You can then use AVERAGE, STDEV, and PERCENTILE functions on that column to get your summary statistics.

Press F9 to regenerate the entire simulation with new random draws. Each time, your results will shift slightly. If they shift a lot between regenerations, you need more iterations.

Running a Simulation in Python

Python gives you more power and flexibility, especially for large simulations or complex models. The two essential libraries are NumPy for generating random samples and SciPy for working with probability distributions and statistical analysis.

A basic simulation in Python follows this pattern:

  • Define distributions using scipy.stats, which offers dozens of continuous and discrete probability distributions. For example, scipy.stats.norm(loc=100, scale=15) creates a normal distribution with a mean of 100 and standard deviation of 15.
  • Sample from them using the .rvs() method or NumPy’s random module. Generating 10,000 samples is as simple as calling .rvs(size=10000).
  • Calculate your output by applying your model formula to the sampled arrays. NumPy handles element-wise math on arrays, so this runs in a single line of code even for 100,000 iterations.
  • Analyze results using NumPy’s percentile, mean, and std functions. For confidence intervals, scipy.stats.bootstrap resamples your results to give you a range around any statistic.

For quasi-Monte Carlo sampling, which fills the input space more evenly than purely random draws and converges faster, SciPy’s scipy.stats.qmc module provides low-discrepancy sequences like Sobol and Halton. These can give you the same accuracy with fewer iterations.

How Many Iterations You Need

The accuracy of a Monte Carlo estimate improves at a rate proportional to 1/√N, where N is the number of iterations. This means quadrupling your iterations only cuts the error in half. Going from 1,000 to 10,000 iterations meaningfully tightens your estimates, but going from 100,000 to 1,000,000 yields diminishing returns for most practical purposes.

A good rule of thumb: run your simulation, then double the number of iterations and check whether your key statistics (mean, standard deviation, 5th percentile) change meaningfully. If doubling iterations shifts your mean by less than 1%, you have enough. For most business and engineering applications, 10,000 iterations produce stable results. For estimating rare events in the tails of a distribution, like the 99th percentile, you need more, often 50,000 to 100,000, because fewer data points fall in those extreme regions.

Common Pitfalls That Skew Results

The most frequent mistake is choosing the wrong distribution for an input variable. Assuming a normal distribution when real-world data is skewed or has fat tails will understate extreme outcomes. If you have historical data, plot it before choosing a distribution. If you’re working from expert judgment, triangular distributions (defined by a minimum, most likely, and maximum value) are a practical starting point.

Ignoring correlations between variables is another major error. If your model includes both interest rates and bond prices, those are not independent: when one rises, the other falls. Treating correlated variables as independent produces output distributions that are too narrow and underestimate risk. You can handle this by defining a correlation matrix for your inputs and using techniques like the Cholesky decomposition to generate correlated random samples. Autoregressive models, which build in a relationship between each value and the previous one, help when simulating variables like bond yields that exhibit momentum over time.

Finally, the quality of your inputs determines the quality of your outputs. A beautifully coded simulation with 100,000 iterations still produces misleading results if your assumed means, standard deviations, or distributions don’t reflect reality. Spend more time validating your assumptions than optimizing your code.