What Is a Transition Matrix? Probabilities Explained

A transition matrix is a square grid of numbers that describes how a system moves between different states over time. Each number in the matrix represents the probability (or proportion) of shifting from one state to another. If you’re tracking whether customers stay with a brand or switch to a competitor, whether a patient’s disease progresses or improves, or how weather patterns change day to day, a transition matrix captures all of those possible shifts in one compact table.

How a Transition Matrix Is Structured

A transition matrix is always square, meaning it has the same number of rows and columns. That number is determined by how many “states” your system has. A state is simply a condition or situation at a specific point in time. If you’re modeling weather as sunny, cloudy, or rainy, you have three states, so your transition matrix is 3×3.

Each entry in the matrix sits at the intersection of a row and a column, and it tells you the probability of moving from one particular state to another. For example, the entry in the “sunny” column and “rainy” row represents the chance that a sunny day is followed by a rainy day. Every entry must be between 0 and 1, since each one is a probability. And because the system has to end up in some state, all the entries leaving any given state must add up to exactly 1.

There are two common conventions. In a “right stochastic” matrix, each row represents a starting state and the row entries sum to 1. In a “left stochastic” (or column stochastic) matrix, each column represents a starting state and the column entries sum to 1. Both describe the same idea, just oriented differently. You’ll encounter both in textbooks, so it helps to check which convention is being used before reading the numbers.

The Connection to Markov Chains

Transition matrices are the mathematical backbone of Markov chains. A Markov chain is a process where the next state depends only on the current state, not on the full history of how you got there. This “memoryless” property is what makes the math tractable: you only need to know where the system is right now to predict where it goes next.

The transition matrix P contains what are called one-step transition probabilities. The entry P(i,j) gives the probability of moving from state i to state j in a single time step. What makes this powerful is that you can predict multiple steps into the future through simple matrix multiplication. To find the probability of going from state i to state j in exactly n steps, you multiply the matrix P by itself n times. The resulting matrix, P raised to the nth power, gives you every n-step transition probability at once.

This means that if you know today’s state and have the transition matrix, you can compute the probability distribution over all possible states at any future time point, just by repeated multiplication.

Steady State: Where the System Settles

One of the most useful things about transition matrices is that many systems eventually settle into a stable pattern called a stationary distribution. This is a set of probabilities across states that no longer changes when you apply the transition matrix one more time. Mathematically, the stationary distribution π satisfies the equation π = πP, meaning multiplying by P leaves it unchanged.

Not every transition matrix reaches a steady state, but most well-behaved ones do. Specifically, if every state can eventually be reached from every other state (the chain is “irreducible”) and the system doesn’t cycle through states in a fixed, repeating loop (it’s “aperiodic”), then the matrix will converge. As you raise P to higher and higher powers, each row of the resulting matrix approaches the same stationary distribution, regardless of where the system started.

For practical purposes, this means you can answer questions like: “In the long run, what fraction of days will be rainy?” or “Over time, what share of customers will end up with each brand?” You can find the stationary distribution either by multiplying the matrix by itself many times until the numbers stabilize, or by solving for the special vector that remains unchanged under multiplication by P.

How to Build One From Real Data

If you have observed data, constructing a transition matrix is straightforward. You count how many times each transition actually occurred, then divide by the total number of transitions leaving that state. If you observed 100 sunny days and 30 of them were followed by cloudy days, the estimated probability of sunny-to-cloudy is 30/100, or 0.3.

Formally, the estimated probability of transitioning from state i to state j equals the number of observed i-to-j transitions divided by the total number of transitions out of state i. This approach, rooted in maximum likelihood estimation, gives you the most likely transition matrix given your data. The more observations you have, the more reliable these estimates become.

Applications in Health and Biology

Transition matrices are widely used to model disease progression. Researchers define states like “healthy,” “early-stage disease,” “advanced disease,” and “remission,” then estimate the probabilities of moving between them over fixed time intervals. This approach has been applied to conditions ranging from diabetes to cancer, helping clinicians understand typical progression patterns and evaluate how treatments might shift patients toward better outcomes.

In one example from diabetes research, scientists used a type of model built on transition matrices to track how patients moved through stages defined by combinations of autoantibody markers. The model identified three distinct progression trajectories, each following different state transition patterns, with some patients accumulating markers one by one and others developing them simultaneously. This kind of modeling helps identify which patients are on a faster or slower path toward full disease.

In genetics and bioinformatics, transition matrices show up as substitution matrices that describe how DNA bases or amino acids change over evolutionary time. The well-known PAM and BLOSUM matrices used in sequence alignment tools like BLAST are essentially transition matrices tailored to different evolutionary distances. PAM matrices model closely related sequences, while BLOSUM matrices handle more distantly related ones. When you run a sequence search to find related genes or proteins, these matrices are scoring every possible substitution behind the scenes.

A Simple Worked Example

Suppose you run a subscription service and customers are either “active” or “cancelled” each month. From your data, you find that 90% of active customers stay active the next month and 10% cancel. Of cancelled customers, 30% resubscribe and 70% stay cancelled. Your transition matrix looks like this:

Active → Active: 0.9
Active → Cancelled: 0.1
Cancelled → Active: 0.3
Cancelled → Cancelled: 0.7

Each row sums to 1. To predict the state distribution two months from now, you multiply this matrix by itself. To find the long-run proportion of active versus cancelled customers, you solve for the stationary distribution. In this case, it works out to 75% active and 25% cancelled in the long run, regardless of how many customers you start with in each group. That single matrix, just four numbers, tells you both the short-term dynamics and the long-term equilibrium of your entire system.