Which Statistical Test Should You Use for Correlation?

The right correlation test depends on two things: what type of data you have and whether your data meets certain assumptions. For most people working with two continuous, normally distributed variables, Pearson’s r is the standard choice. But if your data is ranked, skewed, or categorical, you’ll need a different test. Here’s how to decide.

Start With a Scatterplot

Before running any test, plot your two variables on a scatterplot. This single step prevents the most common mistake in correlation analysis: applying a linear test to a non-linear relationship. You’re looking for the general shape the points make. If they trend upward or downward in a roughly straight line, a linear correlation test is appropriate. If they curve, fan out, or form a U-shape, a standard correlation coefficient will either underestimate the relationship or miss it entirely.

A scatterplot also reveals outliers. Even a single extreme data point can substantially distort Pearson’s correlation, making a weak relationship look strong or a strong one look weak. Spotting these points visually before choosing your test saves you from misleading results.

Pearson’s r: The Default for Continuous Data

Pearson’s product-moment correlation coefficient (r) measures the strength and direction of a straight-line relationship between two continuous variables. It ranges from -1 (perfect negative relationship) to +1 (perfect positive relationship), with 0 meaning no linear association at all.

To use Pearson’s r appropriately, your data needs to check three boxes:

Both variables are continuous and measured on an interval or ratio scale (think: weight, temperature, blood pressure, income).
The relationship is linear. Pearson’s r only captures straight-line associations. A strong curved relationship can return a low r value, which would be misleading.
Both variables are approximately normally distributed. Extreme skew or heavy tails in either variable inflate or dampen the correlation. Pearson’s r is also particularly sensitive to outliers.

You can check normality with a Shapiro-Wilk test. If the p-value is above 0.05, your data is consistent with a normal distribution. If it falls below 0.05, the data is significantly non-normal and you should consider a non-parametric alternative. Histograms and Q-Q plots offer a quick visual check as well.

Spearman’s Rho: For Ranked or Skewed Data

Spearman’s rank correlation (ρ or rs) is the go-to alternative when Pearson’s assumptions don’t hold. Instead of working with raw values, it converts your data to ranks and then calculates the correlation on those ranks. This makes it resistant to outliers and skewed distributions.

Use Spearman’s rho when:

Your data is ordinal. Survey responses on a Likert scale (e.g., 1 = strongly disagree to 5 = strongly agree), pain severity rankings, or education levels are all ordinal. Pearson’s r treats the gaps between values as equal, which doesn’t make sense for ranked categories.
Your continuous data isn’t normally distributed. If your Shapiro-Wilk test is significant or your scatterplot shows heavy skew, Spearman handles it without requiring data transformation.
The relationship is monotonic but not linear. A monotonic relationship means that as one variable increases, the other consistently increases (or consistently decreases), but not necessarily at a constant rate. Spearman’s rho detects these curved-but-consistent trends that Pearson would undervalue.

One important nuance: Pearson’s r can sometimes still detect monotonic trends in non-normal data, so it’s not true that it only works with perfectly normal distributions. But Spearman is the more appropriate and defensible choice when assumptions are violated, and it will outperform Pearson specifically when the underlying pattern is non-linear but monotonic.

Kendall’s Tau: Best for Small Samples

Kendall’s tau-b is another rank-based correlation that works under the same conditions as Spearman’s rho: ordinal or non-normal data with a monotonic relationship. The practical difference is sample size. Kendall’s tau requires fewer observations to produce a reliable estimate. For a moderate effect size, Kendall’s tau-b needs roughly 65 observations to achieve a precise confidence interval, while Spearman’s rho needs about 149 under the same conditions.

Kendall’s tau also handles tied ranks (cases where multiple observations share the same value) more gracefully than Spearman’s rho. If you’re working with a small dataset, say under 30 observations, or your ordinal scale has few categories and many ties, Kendall’s tau-b is the stronger choice. The trade-off is that the tau value itself tends to be lower than the equivalent Spearman value for the same data, so the two aren’t directly comparable.

Point-Biserial: One Binary, One Continuous Variable

If one of your variables has only two categories (male/female, treatment/control, pass/fail) and the other is continuous, the point-biserial correlation is the correct test. It’s mathematically equivalent to Pearson’s r applied to this specific situation, but it’s designed for the case where one variable is genuinely dichotomous rather than measured on a scale.

A common example: testing whether income differs by gender. The binary variable is coded as 0 and 1, and the continuous variable is measured normally. The result is interpreted the same way as Pearson’s r, with values closer to -1 or +1 indicating a stronger association.

Phi and Cramér’s V: For Categorical Variables

When both variables are categorical (nominal), none of the tests above apply. You’re no longer measuring correlation in the traditional sense but rather the strength of association between categories. The typical approach is to run a chi-square test first to determine whether a significant association exists, then measure its strength.

For a simple 2×2 table (two variables, each with two categories), the phi coefficient (φ) quantifies the strength of association. For larger tables, where at least one variable has three or more categories, Cramér’s V is the appropriate measure. Both range from 0 (no association) to 1 (perfect association), though neither captures direction the way Pearson or Spearman can, since nominal categories have no inherent order.

How to Interpret the Strength of a Correlation

Getting a correlation coefficient is only half the job. You also need to know whether the value is meaningfully large or trivially small. The most widely used benchmarks come from Jacob Cohen’s guidelines:

r = .10: Small effect. A real but weak association that might not be noticeable in practice.
r = .30: Medium effect. A moderate relationship with practical relevance in many fields.
r = .50: Large effect. A strong association that’s usually obvious in the data.

These thresholds apply to Pearson’s r and can serve as rough guides for Spearman’s rho as well. Keep in mind that context matters. In fields like physics or engineering, an r of .50 might be considered weak, while in psychology or medicine, the same value could represent one of the strongest known relationships. Always interpret your coefficient within the norms of your discipline.

What to Do About Outliers

If your scatterplot reveals outliers that you can’t justify removing (they’re real data, not entry errors), you have several options beyond simply switching to Spearman. A Winsorized correlation replaces the most extreme values with less extreme ones at a set percentile, reducing their pull on the result without deleting them. A skipped correlation uses a robust method to identify the central cluster of data and ignores points that fall far outside it.

Simulation studies comparing these approaches have found that all three (Spearman, Winsorized, and skipped correlations) substantially outperform Pearson’s r when outliers are present. Spearman’s rho is the simplest to implement and the most widely understood, making it the practical first choice for most researchers dealing with messy data.

Quick Decision Guide

Two continuous, normally distributed variables with a linear relationship: Pearson’s r
Two continuous variables that are skewed or have outliers: Spearman’s rho
Two ordinal variables (ranked categories): Spearman’s rho or Kendall’s tau-b
Small sample size (under 30) with ordinal or non-normal data: Kendall’s tau-b
One binary variable, one continuous variable: Point-biserial correlation
Two nominal (categorical) variables: Chi-square test, then phi (2×2) or Cramér’s V (larger tables)

Reporting Your Results

If you’re writing up your findings, APA style has specific formatting conventions for correlation statistics. Report the correlation coefficient to two decimal places without a leading zero, since correlations can never exceed 1 (e.g., r = .45, not r = 0.45). Report exact p-values to two or three decimal places (p = .003), except when the value is below .001, in which case you write p < .001. Italicize statistical symbols like r, p, and N.

A typical write-up looks like: “There was a strong positive correlation between hours of sleep and test performance, r = .52, p < .001, N = 120." For Spearman's, replace r with rs. Always report your sample size, since the same correlation coefficient can be statistically significant with 200 participants and meaningless with 10.