When to Use Pearson Correlation: Key Conditions

Pearson correlation is the right tool when you want to measure the strength of a straight-line (linear) relationship between two continuous, numeric variables. If your data meets a few key conditions, like being measured on a numeric scale and roughly following a bell curve, Pearson’s r gives you a single number between -1 and +1 that captures both the direction and strength of that relationship. Outside those conditions, a different method is usually a better fit.

What Pearson Correlation Actually Measures

Pearson’s r quantifies how closely two variables move together in a linear pattern. An r of +1 means a perfect positive relationship: as one variable goes up, the other goes up by a proportional amount. An r of -1 means a perfect inverse relationship. An r near 0 means no linear pattern exists between the two variables.

The key word is “linear.” If the real relationship between your variables is curved, Pearson’s r will underestimate or completely miss it. Two variables could be strongly related in a U-shape, for example, and Pearson’s r might return something close to zero. So before running the test, you need to plot your data. If the scatterplot looks like a straight line (or a rough cloud tilting in one direction), Pearson is appropriate. If the pattern curves, it isn’t.

Four Conditions Your Data Needs to Meet

Pearson correlation works reliably when four assumptions hold:

Continuous, numeric data. Both variables need to be measured on an interval or ratio scale, meaning the numbers represent actual quantities with equal spacing. Examples include height in centimeters, temperature in degrees, test scores, income in dollars, or blood pressure readings. If one of your variables is a ranking (like a satisfaction rating from 1 to 5) or a category (like “smoker” vs. “non-smoker”), Pearson is not the right choice.
Linear relationship. The association between your two variables should follow a roughly straight-line pattern when plotted. A quick scatterplot will tell you this in seconds.
No major outliers. Pearson’s r is highly sensitive to extreme values. A single outlier among just 10 data points can dramatically inflate or deflate the correlation, producing a number that misrepresents the actual pattern. Research in robust statistics has shown that one extreme observation can result in a “highly inaccurate summary of the data.” Always check your scatterplot for points that sit far away from the rest.
Approximate normality. Both variables should be roughly bell-shaped in their distributions. Strictly speaking, normality matters most for calculating the p-value and confidence intervals rather than the correlation number itself. But if your data is heavily skewed, significance testing becomes unreliable.

A related condition is homoscedasticity: the spread of data points around the trend line should be roughly even across the full range. If the scatter fans out wider at one end (like a megaphone shape), the correlation coefficient can be misleading.

Real-World Examples Where Pearson Fits

Pearson correlation shows up across nearly every field that works with numeric data. In healthcare, a researcher might examine whether vitamin D levels correlate with parathyroid hormone levels, both measured on continuous scales. One published analysis found a moderately strong inverse correlation of r = -0.76 between the two, meaning that as vitamin D went up, parathyroid hormone went down in a consistent linear pattern.

In obstetrics, a hospital might look at whether family income predicts birth weight. In finance, analysts use Pearson correlations to measure how closely two stock prices move together over time. In psychology, researchers routinely use it to test whether scores on one personality measure relate to scores on another. In education, you might correlate hours of study time with exam scores. All of these involve two continuous measurements and a plausible linear relationship.

How to Interpret the Strength of r

Cohen’s widely used guidelines from 1988 provide a simple framework. An r of 0.10 is considered a small effect, 0.30 is medium, and 0.50 is large. These same thresholds apply to negative values: r = -0.50 is just as strong as r = +0.50, only in the opposite direction.

These benchmarks are useful starting points, but context matters. In some fields, an r of 0.30 is impressive because the variables are hard to measure precisely. In others, anything below 0.70 is considered weak. The descriptive labels researchers attach to correlation values (words like “strong,” “moderate,” “weak”) vary from one paper to the next, which is why explicitly reporting the actual r value is important rather than relying only on a label.

One common mistake is confusing statistical significance with practical strength. A correlation can be statistically significant (meaning it’s unlikely to have occurred by chance) while still being very small. With a large enough sample, even an r of 0.08 can reach significance. The p-value tells you whether the correlation is real; the r value tells you whether it matters.

How Many Data Points You Need

Small samples make Pearson correlation unreliable. With only 10 or 15 observations, a single unusual data point can swing r dramatically in either direction, and the test won’t have enough statistical power to detect a real relationship even if one exists.

Power analysis gives a more precise answer. To detect a moderate effect size (r = 0.30) with 80% power, the standard recommendation is a minimum of about 85 paired observations. For detecting smaller effects, you need even more. A practical rule of thumb from recent research suggests that 149 paired observations is generally adequate for both parametric and non-parametric correlation analyses, providing enough power to detect at least a moderate relationship with acceptable precision.

If your dataset is small, say under 30 pairs, treat any Pearson result cautiously and pay extra attention to outliers, since each data point carries outsized influence.

When to Use Spearman Instead

Spearman’s rank correlation is the most common alternative, and the decision between the two comes down to a few practical questions.

Choose Spearman when your data is ordinal (ranked categories like “mild, moderate, severe”), when the distribution is clearly not bell-shaped, or when the relationship is monotonic but not linear. A monotonic relationship means that as one variable increases, the other consistently increases (or decreases), but not necessarily at a constant rate. A curve that always goes upward but gets steeper over time is monotonic but nonlinear. Spearman handles this perfectly; Pearson does not.

Spearman is also more resistant to outliers because it works with the ranks of the data rather than the raw values. An extreme outlier gets reduced to the highest or lowest rank, limiting its influence. If your scatterplot shows a few extreme points you can’t justify removing, Spearman is the safer bet.

Interestingly, research has shown that Pearson’s coefficient can sometimes detect monotonic trends even in non-normal data, which is why some analysts run both tests and compare. But as a general rule: if the data is continuous, roughly normal, and the scatterplot looks linear, use Pearson. If any of those conditions fail, start with Spearman.

Correlation Does Not Imply Causation

This warning is so frequently repeated that it risks being tuned out, but it’s genuinely important for interpreting any Pearson result. A strong correlation between two variables does not mean one causes the other. The classic examples make this obvious: countries that consume more chocolate per capita have more Nobel Prize winners, and weeks with higher ice cream sales also have more drowning incidents. In both cases, a third variable (national wealth, warm weather) drives both measurements independently.

Pearson correlation tells you that two things move together. Figuring out whether one actually causes the other requires a different study design entirely, typically an experiment where you manipulate one variable and observe changes in the other while controlling for outside influences.

How to Report Pearson Results

If you’re writing up results for a paper, thesis, or report, include three components: the r value, the sample size or degrees of freedom, and the p-value. A standard format looks like this: r(102) = -0.76, p < .001. The number in parentheses is the degrees of freedom (usually sample size minus 2), the r value shows direction and strength, and the p-value indicates statistical significance.

Always report the actual r value rather than just stating that a correlation was “significant.” Two correlations can both be statistically significant while having very different practical meanings. An r of 0.12 and an r of 0.76 tell completely different stories, even if both have p-values below 0.05.