What Is a Markov Chain? Definition and How It Works

A Markov chain is a mathematical model that predicts what happens next in a sequence based only on the current state, ignoring everything that came before. If today is sunny, the chain uses only that fact to estimate tomorrow’s weather. It doesn’t care whether yesterday was sunny, rainy, or snowy. This “memoryless” property is the defining feature, and it makes Markov chains surprisingly useful across fields from weather forecasting to text prediction to financial modeling.

The Core Idea: Only Now Matters

Imagine you’re watching a system that can be in one of several states at any given moment. A Markov chain says that the probability of moving to any future state depends entirely on the current state and nothing else. The full history of how the system arrived at its current state is irrelevant. Mathematicians call this the Markov property.

Think of it like a board game where you roll dice to move. Your next position depends on where you are right now and what you roll. It doesn’t matter how you landed on your current square, whether you got there on your first turn or your fiftieth. That’s the memoryless quality in action. A process with this property is called a stochastic process, which just means a system that evolves randomly over time according to probability rules.

Where the Name Comes From

The concept is named after Andrey Markov, a Russian mathematician who published foundational work on dependent sequences of events in the early 1900s. In 1913, Markov demonstrated his idea with a creative experiment: he analyzed 20,000 consecutive letters from Alexander Pushkin’s poem “Eugene Onegin,” classifying each as either a vowel or a consonant. He found that the probability of a vowel appearing was about 43.2% overall, but it shifted dramatically based on the previous letter. A vowel followed another vowel only 12.8% of the time, while a vowel followed a consonant 66.3% of the time. By showing that each letter’s identity depended on the one before it, Markov gave scholars a concrete method for modeling sequences where events are linked rather than independent.

How Transition Probabilities Work

Every Markov chain is defined by its states and its transition probabilities, the chances of moving from one state to another. These probabilities are organized in a grid called a transition matrix.

A simple weather model illustrates this well. Suppose the weather has two states: sunny and rainy. The transition matrix might look like this:

If today is sunny: 40% chance tomorrow is sunny, 60% chance tomorrow is rainy
If today is rainy: 30% chance tomorrow is sunny, 70% chance tomorrow is rainy

Notice that the probabilities in each row add up to 100%. This is a hard rule for any valid transition matrix. Since the system must go somewhere (including possibly staying in the same state), the probabilities of all possible next states must sum to one. To predict two days ahead, you apply the matrix twice. To predict a week ahead, you apply it seven times. Each step uses only the output of the previous step, never reaching further back.

Types of States

Not all states in a Markov chain behave the same way. Some states are “absorbing,” meaning once the chain enters them, it never leaves. Think of a game of Monopoly: bankruptcy is an absorbing state. You can reach it, but you can’t come back from it.

Other states are classified as recurrent or transient. A recurrent state is one the chain will return to with certainty, given enough time. A transient state is one the chain might never revisit. There’s always some positive probability, however small, that the chain wanders away permanently. Understanding which states are recurrent and which are transient tells you a lot about the chain’s long-term behavior.

Reaching a Steady State

One of the most powerful features of many Markov chains is that they eventually settle into a predictable pattern called a stationary distribution. No matter where the chain starts, after enough steps, the probability of being in each state stabilizes to fixed values.

This doesn’t happen automatically. Two conditions must be met. First, the chain must be irreducible, meaning it’s possible to get from any state to any other state eventually. There are no isolated clusters of states that the chain gets trapped in. Second, the chain must be aperiodic, meaning it doesn’t cycle through states in a rigid repeating pattern. When both conditions hold, the chain has a single unique steady state, and it converges to that distribution over time. The convergence is exponential, so the chain approaches its steady state faster and faster as it runs.

In the weather example, this would mean that regardless of whether you start on a sunny day or a rainy day, after running the model long enough, you’d arrive at the same long-run percentages for sunny and rainy days.

Discrete Time vs. Continuous Time

Most introductory examples use discrete-time Markov chains, where transitions happen at regular intervals: step 1, step 2, step 3, and so on. The weather model above checks the state once per day.

Continuous-time Markov chains, by contrast, allow transitions at any moment. Instead of fixed intervals, the time spent in each state follows an exponential distribution, characterized by a rate parameter. The system might stay in one state for 3.7 seconds, then jump to another for 0.2 seconds, then stay there for 12.1 seconds. The key Markov property still holds: how long the system has already been in a state doesn’t affect when it will leave. Continuous-time chains are common in modeling things like the arrival of customers at a service counter, radioactive decay, or the firing patterns of neurons.

Real-World Applications

Text Prediction and Language

When your phone suggests the next word as you type, a version of Markov chain logic is at work. A basic model looks at the last one or two words you typed and predicts the most likely next word based on patterns in a large body of text. A “first-order” model uses only the single previous word. A “second-order” model uses the previous two words, giving more context and better predictions. Markov chains are also used in speech recognition, handwriting recognition, data compression, and spam filtering, all cases where predicting the next element in a sequence from recent context is valuable.

Monte Carlo Sampling

One of the most influential applications is Markov Chain Monte Carlo, or MCMC. This is a technique for exploring complex probability distributions that are too difficult to calculate directly. The idea is to construct a Markov chain whose steady-state distribution matches the distribution you’re trying to study. Then you run the chain for a long time, collect the states it visits, and use those samples as a representative picture of the target distribution. MCMC is especially important in Bayesian statistics, where researchers need to estimate probability distributions over model parameters. It allows scientists to characterize distributions without needing to solve every equation analytically, making it practical to fit complex models to real data.

Finance and Economics

Credit rating agencies model the probability that a company’s credit rating will change from one grade to another over the next year. These transition matrices help estimate default risk. Stock market models sometimes treat market conditions (bull, bear, neutral) as states in a Markov chain, using historical data to estimate the likelihood of shifting between regimes.

Biology and Genetics

Hidden Markov models, an extension where the states aren’t directly observed, are widely used in genomics to identify gene sequences. The chain transitions between hidden states (like “inside a gene” and “outside a gene”) while producing observable outputs (the nucleotide letters A, T, C, G). Algorithms then work backward from the observed sequence to infer the most likely hidden states.

Limitations Worth Knowing

The memoryless property is both the greatest strength and the biggest limitation of Markov chains. Many real systems do depend on their history. A patient’s health outcome often depends not just on their current condition but on how long they’ve been sick, what treatments they’ve already tried, and their baseline before getting ill. In these cases, a simple first-order Markov chain is too crude. Higher-order chains (which look back two, three, or more steps) can capture more history, but they get exponentially more complex. At some point, other modeling tools become a better fit.

Despite this, the simplicity of the Markov assumption is exactly what makes it so widely used. For a huge range of problems, “only now matters” turns out to be a close enough approximation to produce useful, actionable results.