What Is a Moment Generating Function? MGF Explained

A moment generating function (MGF) is a formula that encodes all the “moments” of a random variable, like its mean, variance, and beyond, into a single expression. If you know the MGF, you can extract any moment you need by taking derivatives, and you can even identify the entire probability distribution from it. It’s one of the most powerful tools in probability theory, used heavily in statistics, actuarial science, and anywhere you need to work with sums of random variables.

The Basic Idea Behind Moments

Before diving into the function itself, it helps to understand what “moments” are. The first moment of a random variable is its mean (expected value). The second moment relates to how spread out the values are, which connects to variance. Higher moments capture things like skewness (lopsidedness) and kurtosis (how heavy the tails are). Together, the full set of moments paints a complete picture of how a random variable behaves.

Calculating these moments one at a time can be tedious. The moment generating function packages all of them into a single expression, letting you pull out whichever moment you need through differentiation.

The Definition

The MGF of a random variable X is defined as the expected value of e raised to the power tX:

M(t) = E[e^(tX)]

Here, t is just a helper variable. You’re not plugging in a specific number for t right away. Instead, you treat t as a free parameter, compute the expectation, and get a function of t as your result. This function is the MGF.

For the MGF to exist, this expectation has to be finite in some interval around t = 0. Specifically, there must be some small value h where E[e^(tX)] is finite for all t between -h and h. If that condition isn’t met, the MGF doesn’t exist for that distribution. Distributions with very heavy tails (where extreme values are relatively likely) often fail this condition, which is one limitation of MGFs.

How to Extract Moments From It

The key property that makes the MGF useful: the nth moment of X equals the nth derivative of the MGF, evaluated at t = 0.

E[X^n] = M^(n)(0)

So to find the mean, you take the first derivative of the MGF and plug in t = 0. To find E[X²] (which you need for variance), take the second derivative and plug in t = 0. The third derivative at zero gives you E[X³], and so on. This is where the name comes from: the function literally “generates” moments when you differentiate it.

For example, once you have the first and second moments, you can compute the variance using the familiar relationship: Var(X) = E[X²] – (E[X])². Instead of setting up separate integrals or summations for each moment, you do one computation to find the MGF, then differentiate as many times as you need.

The Uniqueness Property

One of the most important results in probability theory is that the MGF uniquely determines a distribution. If two random variables have MGFs that are equal in some neighborhood around t = 0, those two random variables have the same distribution. Period.

This matters because it gives you a strategy for identifying distributions. If you’re working with a complicated random variable and you manage to derive its MGF, you can compare it against known MGFs. If it matches the MGF of, say, a normal distribution, then your variable is normally distributed, regardless of how it was constructed. You don’t need to work out the full probability density function.

Why Sums of Random Variables Get Easier

The property that makes MGFs especially practical is how they handle sums of independent random variables. If X and Y are independent, the MGF of their sum is simply the product of their individual MGFs:

M(X+Y)(t) = M(X)(t) · M(Y)(t)

This extends to any number of independent variables: for X₁, X₂, …, Xₙ, you just multiply all their MGFs together. This is far simpler than the alternative, which involves computing a convolution integral to find the distribution of the sum directly.

This property shows up constantly in practice. If you’re adding up many independent measurements, modeling total insurance claims, or analyzing the sum of random signals, the MGF turns a difficult problem into multiplication. And once you have the MGF of the sum, you can use the uniqueness property to identify what distribution the sum follows.

MGFs for Common Distributions

Each standard probability distribution has its own MGF. Here are three of the most commonly encountered ones:

Normal distribution with mean μ and variance σ²: the MGF is exp(μt + σ²t²/2). The quadratic term in the exponent is a signature of the normal distribution.
Poisson distribution with rate λ: the MGF is exp(λ(e^t – 1)). You can verify this gives a mean of λ by differentiating once and evaluating at t = 0.
Exponential distribution with rate λ: the MGF is λ/(λ – t), defined only for t less than λ. This restriction on t is a concrete example of how MGFs don’t always exist for all values of t.

These formulas are worth recognizing because of the uniqueness property. If you derive an MGF and it matches one of these forms, you’ve identified the distribution.

When the MGF Doesn’t Exist

Not every distribution has a moment generating function. The issue is that e^(tX) can blow up when X takes very large values and t is positive. Distributions with “fat tails,” where extreme outcomes have relatively high probability, often cause E[e^(tX)] to be infinite for any t other than zero. The Cauchy distribution is a classic example.

When this happens, there’s an alternative called the characteristic function, which replaces t with it (where i is the imaginary unit from complex numbers). The characteristic function is defined as E[e^(itX)], and it always exists for every distribution and every value of t. The tradeoff is that it involves complex numbers, which makes computation less intuitive. But it preserves the same uniqueness property and the same multiplication rule for sums of independent variables.

There’s also the probability generating function, which is designed specifically for discrete random variables taking non-negative integer values. It’s defined as E[t^X] and is commonly used in contexts like counting processes. All three functions (MGF, characteristic function, probability generating function) serve similar roles: encoding distributional information in a way that simplifies calculations. The MGF is the most accessible of the three when it exists.

A Simple Walkthrough

Suppose you have two independent Poisson random variables, X₁ with rate 3 and X₂ with rate 5, and you want to know the distribution of their sum. Using MGFs, you multiply:

M(X₁+X₂)(t) = exp(3(e^t – 1)) · exp(5(e^t – 1)) = exp(8(e^t – 1))

That result is the MGF of a Poisson distribution with rate 8. By uniqueness, X₁ + X₂ is Poisson with rate 8. No convolution integrals, no summing over all possible combinations. The entire argument takes three lines. This kind of shortcut is the reason MGFs appear throughout probability and statistics, any time you need to characterize what happens when independent random quantities combine.