What Is Uncertainty Quantification and How Does It Work?

Uncertainty quantification (UQ) is the science of measuring how much we don’t know in a prediction or model. Whenever an engineer simulates how a bridge handles wind, a climate scientist projects future temperatures, or an AI model diagnoses a medical image, the output carries some degree of doubt. UQ puts numbers on that doubt, turning vague confidence into something measurable and actionable.

Two Types of Uncertainty

Not all uncertainty is the same. UQ distinguishes between two fundamentally different kinds: aleatoric uncertainty and epistemic uncertainty. Understanding the difference matters because they require different strategies to manage.

Aleatoric uncertainty comes from the data itself. It’s the natural randomness built into a system. Think of rolling dice, measuring wind speed, or recording patient blood pressure readings. No matter how good your instruments or how many measurements you take, some variation is inherent. You can’t eliminate it. You can only characterize it.

Epistemic uncertainty comes from the model or the analyst. It reflects gaps in knowledge: not enough training data, a simplified equation, or an assumption that doesn’t perfectly match reality. Unlike aleatoric uncertainty, epistemic uncertainty can shrink. Collect more data, refine the model, or remove a flawed assumption, and the epistemic portion drops. This is why separating the two types is so valuable. It tells you whether gathering more information will actually help, or whether you’ve hit a floor set by the randomness of the system itself.

A well-designed UQ analysis keeps these two sources independent. Aleatoric uncertainty should look the same regardless of which modeling technique you use, because it belongs to the data. Epistemic uncertainty, by contrast, depends directly on how the model is built and which quantification method you choose.

How Monte Carlo Simulation Works

The most widely used UQ technique is Monte Carlo simulation. The core idea is simple: instead of feeding a model one set of inputs and getting one output, you feed it thousands of slightly different inputs, each drawn randomly from realistic ranges, and watch how the outputs spread out. That spread is your uncertainty.

In practice, the steps look like this. First, define probability distributions for each input variable, such as a bell curve centered on your best estimate of material strength, with a width reflecting how much that value could realistically vary. Then use a random number generator to sample a value for each input. Run the model with those sampled inputs and record the result. Repeat this thousands or tens of thousands of times. The collection of outputs forms a histogram that shows not just the most likely answer, but the full range of plausible answers and how probable each one is.

Monte Carlo methods are popular because they work with virtually any model, no matter how complex. You don’t need to rewrite equations or simplify the physics. The tradeoff is computational cost: if a single model run takes hours, running it 10,000 times can be impractical. That’s where surrogate models come in.

Surrogate Models for Speed

When the original model is too expensive to run thousands of times, researchers build a lightweight stand-in called a surrogate model. Two of the most common approaches are Polynomial Chaos Expansion (PCE) and Gaussian process models (sometimes called kriging).

PCE works by representing the model’s output as a series of polynomial terms that depend on the uncertain inputs. Once those polynomial coefficients are calculated from a relatively small number of full model runs, the surrogate can be evaluated almost instantly. PCE tends to be especially good at capturing the global behavior of a system, meaning the broad trends in how outputs respond to inputs across their full range.

Gaussian process models take a different approach, fitting a flexible statistical surface through the known model outputs and providing a built-in estimate of how confident the surrogate is at any given point. They excel at capturing local features, fine details in how the output changes in specific regions of the input space. Researchers have even combined the two into hybrid methods that leverage global accuracy from PCE and local precision from Gaussian processes.

Sensitivity Analysis and Sobol Indices

Once you’ve quantified the total uncertainty in a model’s output, the next question is: which inputs contribute the most? Sensitivity analysis answers this by ranking how much each input variable drives the output uncertainty.

The most rigorous version is global sensitivity analysis using Sobol indices. Sobol’s method decomposes the total variance of the output into contributions from each input and, if desired, from interactions between pairs or groups of inputs. A “main effect” index tells you how much of the output variance is caused by a single input acting alone. A “total effect” index adds in all the interactions that input participates in. If one input has a Sobol total index of 0.6, it’s responsible for 60% of the output uncertainty, either directly or through its interactions with other inputs.

This is enormously practical. If one input dominates the uncertainty, you know exactly where to invest resources: get better data for that variable, or redesign the system to be less sensitive to it. Inputs with near-zero Sobol indices can often be fixed at their best-guess values without meaningfully affecting the results, simplifying the model.

UQ in Climate Projections

Climate science is one of the fields where UQ has the most visible impact. Global climate models project future temperature and precipitation, but those projections carry substantial uncertainty from multiple sources. Under a high-emissions scenario, the uncertainty in projected global annual mean temperature by 2100 is roughly ±3.8°C (one standard deviation), and for precipitation it’s about ±244 mm.

Where does that uncertainty come from? Early in a projection (say, a few decades out), nearly all of it, around 99%, comes from model uncertainty: different climate models produce different answers. By 2100, model uncertainty’s share drops to about 39%, while scenario uncertainty (which emissions path humanity actually follows) grows to account for 61%. This decomposition directly mirrors the aleatoric/epistemic distinction and has real policy implications. It tells decision-makers that reducing disagreement between climate models matters most for near-term planning, while long-term projections hinge on emissions choices.

Researchers also use bias correction techniques to narrow these ranges. Methods like quantile mapping and spatial disaggregation adjust raw model outputs against observed data, and the best-performing methods significantly reduce the spread in projections.

UQ in Engineering Design

Engineers have traditionally handled uncertainty with safety factors: design a beam to hold twice the expected load, and hope that covers the unknowns. UQ offers a more precise alternative called reliability-based design. Instead of a blanket multiplier, reliability-based methods calculate the actual probability that a structure will fail under realistic ranges of loading, material properties, and construction tolerances.

This approach produces structures that are no heavier than those designed with safety factors, while meeting the same reliability requirements. In many cases, they’re lighter, because the safety factor method is a blunt instrument. It applies the same margin everywhere, over-protecting against well-understood variables while potentially under-protecting against poorly understood ones. Reliability-based design allocates margins where the uncertainty actually lives.

UQ in Machine Learning and AI

Standard deep learning models produce a prediction but no honest measure of how confident that prediction is. This is a serious limitation in safety-critical applications. In medical diagnosis, for example, an AI system that flags a tumor should also report how certain it is. When certainty is low, the patient can be referred to a human physician for further evaluation.

Bayesian neural networks address this by treating the model’s internal parameters not as fixed numbers, but as probability distributions. Instead of learning one set of weights, the network learns a range of plausible weights, and predictions reflect that range. The result is a prediction with a built-in confidence interval. Bayesian approaches have been shown to improve not just uncertainty estimates but also raw prediction accuracy compared to standard networks.

Computing these distributions exactly is mathematically intractable for large networks, so practitioners rely on approximation techniques. Variational inference fits a simpler distribution to approximate the true one. Markov chain Monte Carlo methods sample from the distribution directly, with the Metropolis-Hastings algorithm being the most widely used variant. More recent architectures go further, estimating the full shape of the predictive distribution rather than just a mean and variance, which matters when outcomes aren’t symmetrically distributed.

Software Tools

The Uncertainty Toolbox, developed with support from the U.S. Department of Energy, is the most popular open-source repository on GitHub for uncertainty quantification and calibration. It provides tools for evaluating how well a model’s uncertainty estimates match reality, visualizing confidence intervals, and recalibrating predictions when they’re over- or under-confident. It’s used across fusion energy research, climate modeling, and machine learning development. Beyond this, UQ capabilities are built into broader scientific computing ecosystems, with libraries in Python and MATLAB covering Monte Carlo sampling, polynomial chaos, Gaussian processes, and Sobol analysis as standard components.