What Does the Correlation Coefficient Mean in Stats?

A correlation coefficient is a number between -1 and +1 that tells you how strongly two things are related. If you measured the height and weight of 100 people, the correlation coefficient would capture, in a single number, how closely those two measurements move together. A value of +1 means they move in perfect lockstep, -1 means one goes up exactly as the other goes down, and 0 means there’s no relationship at all.

What the Numbers Actually Tell You

The correlation coefficient, usually written as “r,” works on a scale from -1 to +1. The sign tells you the direction of the relationship, and the size of the number tells you the strength.

A positive correlation means both variables increase together. Taller people tend to weigh more, so height and weight have a positive correlation. A negative correlation means one variable goes up while the other goes down. The more hours you spend sitting each day, the lower your cardiovascular fitness tends to be.

A value of exactly 0 means the two variables have no linear relationship whatsoever. You’d see this if you plotted shoe size against, say, the number of books someone read last year. The dots on the graph would be scattered randomly with no visible pattern.

How to Judge Weak, Moderate, and Strong

There’s no single universal cutoff for what counts as “strong” or “weak.” Different fields use slightly different guidelines, which can cause confusion. A commonly used framework in psychology (from Dancey and Reidy) breaks it down like this:

0.1 to 0.3: Weak
0.4 to 0.6: Moderate
0.7 to 0.9: Strong

In medicine, the thresholds tend to be stricter. A correlation of 0.5 might only be considered “fair,” and values below 0.2 are labeled “poor.” In political science, the bar is more generous, with 0.4 already considered “strong.” These same ranges apply to negative values. A correlation of -0.8 is just as strong as +0.8; it simply runs in the opposite direction.

The general rule proposed by statistician Douglas Altman is straightforward: below 0.2 is poor, above 0.8 is excellent, and everything in between falls on a sliding scale.

Squaring It Tells You Even More

One of the most useful tricks with a correlation coefficient is squaring it. The result, called r-squared, tells you the percentage of variation in one variable that’s explained by the other. If the correlation between exercise frequency and resting heart rate is -0.7, then r-squared is 0.49, meaning about 49% of the differences in resting heart rate can be accounted for by differences in exercise frequency. The other 51% comes from genetics, diet, stress, and everything else.

This is why a correlation of 0.5, which sounds decent, only explains 25% of the variation. It’s a helpful reality check. Even moderately strong correlations leave a lot unexplained.

Pearson vs. Spearman: Two Common Types

The two most widely used versions are the Pearson and Spearman coefficients. Both produce a number between -1 and +1, but they measure slightly different things.

The Pearson coefficient measures how well two variables follow a straight-line relationship. It works best when both variables are continuous (like blood pressure or temperature) and the data is roughly normally distributed, meaning most values cluster near the middle with fewer at the extremes. The Spearman coefficient is more flexible. Instead of measuring a straight-line relationship, it picks up any consistent trend, even if it curves. It works by ranking the data points and correlating those ranks, which makes it better suited for data that’s skewed or contains ordinal rankings (like pain rated on a 1-to-10 scale).

In practice, Pearson and Spearman often give similar results. But when the data has outliers or doesn’t follow a bell curve, Spearman is typically the safer choice.

Why Outliers Can Distort the Result

A single extreme data point can dramatically change a Pearson correlation. Research using simulated datasets has shown that one outlier can reduce a correlation by 50% or even reverse its direction entirely, turning what looks like a positive relationship into a negative one. In a classic demonstration by the statistician Francis Anscombe, four completely different datasets all produced the same Pearson correlation of 0.81, even though one of them had its result driven almost entirely by a single outlier.

Spearman correlations are less sensitive to outliers but not immune. When extreme data points are present, specialized methods that detect and downweight outliers perform more accurately than either standard approach. If you’re reading a study that reports a correlation, it’s worth checking whether the researchers addressed outliers, especially with small sample sizes where one unusual observation carries more weight.

Sample Size Matters More Than You’d Think

A correlation calculated from 10 people is far less reliable than one calculated from 500. With small samples, the true correlation in the broader population could be much higher or lower than what the study found. For example, a study of just 20 people reporting a correlation of 0.6 might reflect a true population correlation anywhere from 0.45 to 0.75. That’s a wide range spanning from “fair” to “strong” depending on the field.

This is where p-values come in. A p-value attached to a correlation tells you the probability that you’d see a result this strong by pure chance if there were actually no relationship. A small p-value (typically below 0.05) suggests the correlation is statistically significant, meaning it’s unlikely to be a fluke. But here’s the catch: with a very large sample, even tiny, practically meaningless correlations can reach statistical significance. A correlation of 0.08 between two variables might be statistically significant with 10,000 data points, but it explains less than 1% of the variation. Significance doesn’t equal importance.

Correlation Does Not Mean Causation

This is the single most important thing to remember about any correlation coefficient. A strong correlation between two variables does not prove that one causes the other. There are three common reasons a correlation can be misleading.

The first is coincidence. With enough variables, some will correlate by sheer chance. There are famously strong correlations between completely unrelated things, like per-capita cheese consumption and the number of people who die tangled in bedsheets. The numbers line up, but there’s obviously no causal link.

The second is confounding. Two variables might correlate because a third, hidden variable drives both of them. Ice cream sales and drowning deaths are positively correlated, but ice cream doesn’t cause drowning. Hot weather increases both.

The third is reverse causality. You might observe that people who take a certain supplement have better health outcomes and assume the supplement helps. But it could be that healthier people are more likely to take supplements in the first place.

To establish that one thing actually causes another, three conditions need to be met: the cause must come before the effect in time, the two must be statistically related, and no third variable can explain the link. Statistical tools like regression can adjust for known confounders, but they can’t account for factors the researchers didn’t measure. This is why randomized controlled trials, not observational correlations, are the gold standard for proving causation.

Reading Correlations in Everyday Life

You’ll encounter correlation coefficients in health news, financial reports, psychology studies, and product reviews. When you see one, ask yourself three questions. First, how strong is it? Use the rough benchmarks above and remember to square it to see how much variation is actually explained. Second, how large was the sample? A correlation from a handful of data points is a rough estimate at best. Third, could something else explain the relationship? A high correlation is a starting point for investigation, not a conclusion.

If you’re plotting data yourself in a spreadsheet, look at the scatter plot before calculating the coefficient. A curved relationship, clusters, or a few extreme points far from the main group can all produce misleading numbers. The correlation coefficient summarizes the data in one number, which is both its power and its limitation.