What Is a Scree Plot and How to Interpret It

A scree plot is a simple line graph that helps you decide how many components or factors to keep when reducing a large dataset down to its most important parts. It plots eigenvalues (a measure of how much variance each component explains) on the vertical axis against the component number on the horizontal axis. The goal is to spot where the line flattens out, telling you that additional components aren’t adding much useful information.

What a Scree Plot Shows

Imagine you have a dataset with dozens or even hundreds of variables. Techniques like principal component analysis (PCA) and factor analysis compress those variables into a smaller set of components, ranked by how much of the original data’s variation each one captures. The first component always explains the most variability, the second explains the next most, and so on. A scree plot puts all of these in order so you can see the pattern at a glance.

The vertical axis represents eigenvalues. An eigenvalue is essentially a score for each component: the higher it is, the more variance that component accounts for. An eigenvalue of 1 means the component explains as much variance as a single original variable would. The horizontal axis is just the component number (1, 2, 3, and so on). When you connect the dots, you typically see a steep drop at the start that gradually levels off into a flat tail.

The “Elbow” Rule

The key to reading a scree plot is finding the “elbow,” the point where the steep decline shifts to a relatively flat line. Components to the left of that elbow capture meaningful patterns in your data. Components to the right are mostly noise. You retain the ones before the bend and discard the rest.

Psychologist Raymond Cattell introduced this technique in 1966. He named it after “scree,” a geological term for the pile of loose rocks that accumulates at the base of a cliff. In the plot, the steep part of the curve is the cliff face, and the flat tail is the rubble. You want the cliff, not the rubble.

For example, if a scree plot shows large eigenvalues for the first three components and then drops sharply before leveling off at component four, you would retain three components. Those three capture the most significant variability in the data. A common complementary guideline is to keep enough components to account for at least 80% of the total variance.

Why It Matters for Data Analysis

Real datasets are often unwieldy. A study might collect 1,000 measurements on 50 samples, but much of that information overlaps or is redundant. PCA can compress those 1,000 variables into a handful of components that still retain most of the meaningful variation. The scree plot is what tells you where “a handful” actually falls. In one example from chemometrics research, a dataset with 1,000 original variables was reduced to just 20 components while still preserving the majority of the variability.

This kind of reduction makes downstream analysis faster, easier to visualize, and less prone to overfitting (where a model captures noise instead of real patterns). But the decision of how many components to keep is critical. Too many and you’re dragging noise along. Too few and you lose real signal.

How to Make One

Most statistical software generates scree plots automatically as part of PCA or factor analysis. In R, the psych package has a dedicated scree() function that plots eigenvalues for both principal components and factors. In Python, you can run PCA through scikit-learn and then plot the explained variance for each component using matplotlib. Even spreadsheet tools like Excel can produce a scree plot if you calculate the eigenvalues first and create a basic line chart.

Some implementations also overlay a line showing the cumulative proportion of variance explained, so you can see both the individual contribution of each component and the running total. This makes it easier to apply the 80% variance threshold alongside the visual elbow test.

Known Limitations

The biggest criticism of scree plots is subjectivity. The elbow isn’t always obvious. Sometimes the curve declines gradually with no sharp bend. Other times there are two or more plausible elbows, and different analysts looking at the same plot will choose different cutoff points. Research on inter-rater reliability has consistently found that people disagree on where the elbow falls, particularly when no strong factors dominate the data.

This ambiguity means scree plots work best when the data has a few clearly dominant components and a lot of weak ones, creating a dramatic cliff-to-rubble shape. When the eigenvalues decline more evenly, the plot becomes harder to interpret. For this reason, statisticians generally recommend using the scree plot as one piece of evidence rather than the sole decision-maker. Its primary utility, as several methodologists have noted, is narrowing the choice down to two or three reasonable options to investigate further.

Alternatives and Complements

Because of the subjectivity problem, other methods are often used alongside scree plots. Parallel analysis generates random datasets with the same dimensions as your real data and compares their eigenvalues to yours. Any component with an eigenvalue higher than what you’d expect from random noise is worth keeping. This approach removes much of the guesswork and is widely considered more reliable than the scree test alone.

The Kaiser criterion is another common rule: keep any component with an eigenvalue above 1. The logic is that a component should explain at least as much variance as a single original variable to be worth retaining. This rule is simple but can overestimate or underestimate the true number of components depending on the dataset. Using it together with a scree plot and parallel analysis gives you a more complete picture than any single method on its own.