What Is Variance and Covariance in Statistics?

Variance measures how spread out a single set of numbers is from its average. Covariance measures how two sets of numbers move together. Both are foundational concepts in statistics, and understanding them unlocks everything from portfolio investing to machine learning.

Variance: Measuring Spread

Variance answers a simple question: how far do values in a dataset typically fall from the average? To calculate it, you take each data point, subtract the mean, square that difference, and then average all the squared differences together. Squaring serves two purposes: it eliminates negative signs (so deviations above and below the mean don’t cancel out), and it gives extra weight to values that are far from the center.

If you’re working with an entire population, you divide by N (the total number of data points). If you’re working with a sample, you divide by N minus 1 instead. That small adjustment corrects for the fact that a sample tends to underestimate the true spread of the population it came from. Most real-world analysis uses the sample version, since you’re rarely measuring every single member of a group.

A low variance means the data clusters tightly around the mean. A high variance means the values are scattered widely. One quirk worth knowing: variance is always expressed in squared units. If your data is in dollars, the variance is in “dollars squared,” which isn’t intuitive. That’s why people often take the square root of variance to get the standard deviation, which returns the result to the original units.

Covariance: Measuring Joint Movement

Covariance extends the idea of variance to two variables at once. Instead of asking “how does X spread around its mean?” it asks “when X is above its mean, does Y tend to be above its mean too?” The calculation is similar to variance, but instead of squaring each deviation, you multiply the deviation of X by the deviation of Y for each paired observation, then average those products.

For a sample, the formula looks like this: sum up (each X value minus the X mean) times (each Y value minus the Y mean), then divide by N minus 1. For a population, you divide by N.

Consider a quick example. If you have four years of economic growth rates (2.1%, 2.5%, 4.0%, 3.6%) and matching S&P 500 returns (8%, 12%, 14%, 10%), you first find the mean of each: 3.1% for growth, 11% for returns. Then for each year, you multiply the two deviations together and sum them up. The result here is 1.53, a positive number, telling you that higher economic growth years tended to coincide with higher stock returns.

What the Sign Tells You

The sign of a covariance value is its most useful feature. A positive covariance means the two variables tend to move in the same direction: when one rises above its mean, the other usually does too. A negative covariance means they move in opposite directions: when one goes up, the other tends to go down. A covariance near zero suggests no consistent directional relationship between the two variables.

You can picture this on a scatter plot. If most data points land in the upper-right and lower-left quadrants (both variables above their means, or both below), the covariance is positive. If the points cluster in the upper-left and lower-right quadrants (one variable high while the other is low), the covariance is negative. When the points are scattered evenly across all four quadrants, the covariance approaches zero.

Why Raw Covariance Is Hard to Interpret

Unlike variance, where the magnitude directly tells you about spread, the magnitude of covariance is difficult to interpret on its own. A covariance of 500 between height and weight doesn’t mean much unless you know the scales of both variables. The units of covariance are the product of the two variables’ units (centimeters times kilograms, for instance), which makes comparisons across different datasets nearly impossible.

This is why statisticians standardize covariance into the correlation coefficient, which divides the covariance by the product of the two variables’ standard deviations. Correlation always falls between negative 1 and positive 1, giving you both the direction and the strength of the relationship on a universal scale. A correlation of 0.9 is a strong positive relationship regardless of whether you’re measuring temperatures, test scores, or stock prices. Covariance tells you the direction; correlation tells you the direction and strength.

Key Differences at a Glance

Number of variables: Variance describes one variable. Covariance describes the relationship between two.
Units: Variance is in squared units of the original variable. Covariance is in the product of the two variables’ units.
Range: Variance is always zero or positive (squared values can’t be negative). Covariance can be positive, negative, or zero.
What it measures: Variance measures spread. Covariance measures directional co-movement.
Special case: The covariance of a variable with itself equals its variance.

The Variance-Covariance Matrix

When you’re working with more than two variables, variance and covariance get organized into a single table called the variance-covariance matrix. The diagonal of the matrix contains the variance of each variable. Every off-diagonal cell contains the covariance between a pair of variables. For three variables, you’d have a 3-by-3 grid: three variances along the diagonal and six covariance values filling the rest (three unique pairs, each appearing twice since the matrix is symmetric).

This matrix is the backbone of multivariate statistics. It compactly describes how every variable in a dataset relates to every other variable, and it’s the starting input for techniques like principal component analysis, linear regression with multiple predictors, and portfolio optimization.

How Finance Uses Both Concepts

One of the clearest real-world applications comes from Modern Portfolio Theory. Variance represents the risk of a single asset: a stock with high variance has volatile, unpredictable returns. Covariance captures how two assets’ returns move relative to each other.

The total risk of a two-asset portfolio depends on three things: the variance of each asset (weighted by how much of your money is in each one) and the covariance between them. If two assets have a negative covariance, meaning one tends to go up when the other goes down, combining them in a portfolio actually reduces overall risk below what either asset carries alone. This is the mathematical foundation of diversification. Even if each individual asset is highly volatile, a strong negative covariance between them can considerably lower the portfolio’s combined volatility.

This is why investors spread money across different industries and asset classes. They’re not just hoping for the best. They’re deliberately seeking negative covariance between holdings to push their portfolio toward what’s called the efficient frontier, the set of portfolios that delivers the maximum expected return for any given level of risk.