Pearson’s correlation is a statistical measure that tells you how strongly two variables are related in a straight-line pattern. It produces a single number, called r, that ranges from -1 to +1. A value of +1 means the two variables move perfectly together, -1 means they move in perfectly opposite directions, and 0 means there’s no linear relationship at all. It’s one of the most widely used tools in statistics, showing up in everything from medical research to economics to psychology.
What the R Value Actually Tells You
The Pearson correlation coefficient, r, captures two things at once: the direction of a relationship and its strength. The sign tells you the direction. A positive r means that as one variable increases, the other tends to increase too. Height and weight are a classic example: taller people generally weigh more. A negative r means the variables move in opposite directions. The more time a person spends exercising, for instance, the lower their body fat percentage tends to be.
The size of the number tells you how tightly the data points cluster around a straight line. An r of 0.95 means the points fall almost exactly on a line, while an r of 0.25 means there’s a loose, scattered relationship with a lot of variation. An r of zero means knowing one variable gives you no useful information about the other. Coffee consumption and IQ, for example, have essentially no correlation.
How to Interpret the Strength of R
There’s no single universal scale for labeling a correlation as “strong” or “weak,” but several widely used frameworks overlap enough to give you a working guide. Jacob Cohen’s benchmarks, commonly used in behavioral sciences, classify an r of 0.10 as a small effect, 0.30 as medium, and 0.50 as large. In psychology, Dancey and Reidy consider anything above 0.7 to be strong, 0.4 to 0.6 moderate, and below 0.3 weak. Medical research tends to use stricter labels, calling 0.7 only “moderate” and reserving “very strong” for values above 0.9.
Which framework matters depends on your field. In tightly controlled physics experiments, an r of 0.7 might be disappointing. In social science, where human behavior is messy and influenced by hundreds of variables, an r of 0.3 can represent a meaningful and useful finding. Context matters more than any fixed label.
How It’s Calculated
The basic idea behind the formula is straightforward, even if the math looks intimidating. For each data point, the formula asks: how far is this x value from the average x, and how far is this y value from the average y? It multiplies those two distances together for every point, adds them up, and then divides by a scaling factor that keeps the result between -1 and +1.
That scaling factor is what makes r “unitless.” Whether you’re measuring height in centimeters or inches, income in dollars or euros, the resulting r value is the same. This is one of the reasons Pearson’s correlation is so versatile. You can directly compare the strength of a height-weight correlation to the strength of an education-income correlation, even though the underlying measurements are completely different.
Four Assumptions You Need to Check
Pearson’s correlation only gives meaningful results when certain conditions are met. Ignoring these assumptions can lead to misleading conclusions.
- Continuous data. Both variables need to be measured on a continuous scale, like temperature, time, weight, or income. You can’t meaningfully compute a Pearson correlation for categories like “yes/no” or “low/medium/high.”
- Linear relationship. Pearson’s r only measures straight-line relationships. If the true pattern is a curve, r can badly underestimate (or miss entirely) a real connection between two variables. A quick scatterplot is the easiest way to check this.
- No major outliers. Even a single extreme data point can dramatically distort the result. One outlier can reduce a correlation by 50% or completely reverse its direction, turning a positive relationship into a negative one on paper.
- Roughly normal distribution. Both variables should follow a bell-curve-like distribution, or at least come close. Heavily skewed data can throw off the calculation.
Why Outliers Are Especially Dangerous
The sensitivity of Pearson’s r to outliers is worth emphasizing because it catches people off guard. A famous illustration called Anscombe’s quartet shows four completely different datasets that all produce the exact same Pearson correlation of r = 0.81. One dataset has a clear linear pattern. Another is a perfect curve with no linear trend at all. A third is linear except for a single outlier that inflates the result. The lesson is that you should always look at your data visually before trusting a correlation number.
Research on this problem has shown that a single outlier can shift the estimated correlation by up to 1.5 units on the scale, meaning it could take a strong positive correlation and make it appear moderately negative. In practice, this means that running a Pearson correlation without first checking a scatterplot is risky.
Correlation Does Not Mean Causation
This is the most important thing to understand about any correlation, Pearson’s included. A strong r value tells you two variables move together, but it says nothing about why. Ice cream sales and drowning deaths are positively correlated, not because ice cream causes drowning, but because both increase during hot summer months. The hidden third variable (temperature) drives both.
Establishing causation requires a different kind of study design, typically a controlled experiment where researchers manipulate one variable and observe the effect on another. Correlation is a starting point for investigation, not a finish line.
When to Use Spearman’s Correlation Instead
If your data doesn’t meet Pearson’s assumptions, Spearman’s rank correlation is often the better choice. While Pearson measures strictly linear relationships and requires roughly normally distributed data, Spearman measures any monotonic relationship, meaning any pattern where one variable consistently increases (or decreases) as the other increases, even if the pattern isn’t a straight line.
Spearman works by converting your actual data values into ranks and then calculating the correlation on those ranks. This makes it naturally resistant to outliers and usable with ordinal data (like survey responses on a 1-to-5 scale). A useful diagnostic: if your Pearson correlation is weak but your Spearman correlation is strong, the relationship between your variables is likely real but nonlinear. If the data isn’t normally distributed and shows a consistent upward or downward trend, Spearman is the more appropriate tool.
Real-World Examples
Pearson’s correlation shows up constantly in published research. In medicine, researchers use it to quantify links between variables like blood pressure and age, or between medication dosage and symptom reduction. In education, it helps measure how strongly study hours predict exam performance. In economics, it can capture the relationship between unemployment rates and consumer spending.
Some everyday relationships are intuitive. Height and weight have a well-known positive correlation. Time spent watching TV and exam scores tend to have a negative correlation. But r also helps identify surprising non-relationships. Coffee consumption and intelligence, for example, show essentially zero correlation, meaning your daily caffeine habit isn’t making you smarter or slower.
The real power of Pearson’s r is its simplicity. A single number summarizes how two variables relate, making it easy to communicate findings and compare the strength of different relationships across studies. Just remember that the number only tells part of the story, and a scatterplot should always come first.

