How to Interpret PCA Results: Loadings, Scores & More

Interpreting PCA results comes down to understanding a few key outputs: how much variance each component captures, which original variables drive each component, and what the visual plots are telling you about your data’s structure. Once you know how to read these pieces, PCA stops being a black box and becomes a practical tool for spotting patterns and reducing complexity.

What Each Component Actually Represents

Principal components are new variables created from weighted combinations of your original variables. The first component captures the single direction of greatest variation in your data, the second captures the next greatest direction, and so on. Each component is uncorrelated with every other one, meaning they each describe an independent pattern in the data.

This is worth pausing on because it shapes how you interpret everything else. PCA finds linear combinations of your original variables that maximize variance. If the real structure in your data is nonlinear (think a spiral or a curve), PCA can miss it entirely or spread the pattern across multiple components in confusing ways. As long as relationships between your variables are roughly linear, PCA will give you clean, interpretable results.

Explained Variance: How Much Each Component Matters

The explained variance ratio tells you what percentage of total variability in your dataset each component accounts for. If your first component explains 45% of the variance and the second explains 20%, together they capture 65% of the information in your original data. The cumulative explained variance is usually what you care about most: it tells you how much of the full picture you retain when you keep only a certain number of components.

There’s no universal cutoff, but retaining enough components to explain 70% to 90% of total variance is a common target depending on your field and goals. In some applications, capturing 80% is plenty. In others, especially when you’re feeding components into a predictive model, you might push for 90% or higher. The right threshold depends on how much information loss you can tolerate.

Using the Scree Plot to Choose Components

A scree plot graphs each component’s eigenvalue (a number representing how much variance it captures) against its component number. The first few components typically have large eigenvalues, and the rest taper off into a relatively flat tail. The “elbow” is the point where the curve shifts from steep to flat. Components to the left of the elbow carry meaningful signal. Components to the right mostly capture noise.

Raymond Cattell introduced this approach in 1966, and it remains one of the most widely used methods for deciding how many components to keep. The idea is straightforward: look for the bend in the curve where adding another component stops giving you a meaningful gain. In practice, the elbow isn’t always obvious. When two or three components have similar eigenvalues in the transition zone, you may need to combine the scree plot with other criteria.

One such criterion is the eigenvalue-greater-than-one rule, often called the Kaiser criterion. The logic is that a component should explain at least as much variance as a single original variable would on its own (which, after standardization, equals an eigenvalue of 1). It’s simple and built into many software packages as the default. However, simulation studies have consistently shown it tends to overestimate the number of meaningful components, because the first few sample eigenvalues get inflated by random sampling variation. Use it as a rough sanity check rather than a definitive answer.

Reading the Loadings

Loadings are the weights connecting each original variable to each principal component. They tell you which variables contribute most to a given component and in what direction. A loading of 0.8 on a variable means that variable is a strong contributor. A loading near zero means it plays almost no role in that component. Negative loadings indicate the variable moves in the opposite direction of the component’s overall trend.

Penn State’s applied statistics curriculum uses 0.5 as a practical cutoff: correlations (or loadings) above 0.5 in absolute value are considered important contributors to a component. Variables below that threshold are minor players for that particular component. This isn’t a hard statistical rule, but it gives you a defensible way to focus your interpretation on the variables that actually matter.

When you look at the loadings for a component and see that income, education level, and property value all load heavily, you might label that component “socioeconomic status.” This labeling step is subjective but important. PCA gives you the math. You supply the meaning based on which variables cluster together on each component.

Interpreting Score Plots

A score plot shows where each observation (each row in your data) falls in the space defined by two principal components, typically the first two. Each dot represents one data point, repositioned from the original high-dimensional space into this reduced view. Points that are close together in the score plot are similar across the variables that matter most. Points far apart differ substantially.

Clusters of observations suggest natural groupings in your data. Outliers that sit far from the main cloud may represent unusual cases worth investigating. The axes themselves don’t have intuitive units. The x-axis is PC1 (the direction of greatest variance) and the y-axis is PC2 (the next greatest). What those directions mean depends entirely on the loadings, which is why you need to read the score plot alongside the loading information.

What Biplots Tell You

A biplot overlays the score plot with arrows (vectors) representing the original variables, giving you both observations and variables in a single picture. This is one of the most informative PCA visualizations, but the arrows need careful reading.

In a covariance biplot, the length of each arrow approximates the standard deviation of that variable, and the cosine of the angle between any two arrows approximates their correlation. Two arrows pointing in the same direction (small angle) indicate positively correlated variables. Arrows at 90 degrees are roughly uncorrelated. Arrows pointing in opposite directions (close to 180 degrees) are negatively correlated. Short arrows indicate variables that aren’t well represented in the two-component view you’re looking at, so avoid drawing strong conclusions about them.

In a form biplot, the emphasis shifts to the observations: distances between points approximate actual Euclidean distances in the original data, and vector length reflects how well each variable is represented rather than its standard deviation. Which type of biplot you’re looking at matters for interpretation, so check your software’s documentation.

Why Scaling Your Data Changes Everything

PCA is sensitive to the scale of your variables. Variables with larger variances will dominate the first few components simply because they have bigger numbers, not because they carry more meaningful information. If one variable is measured in thousands (like income in dollars) and another in single digits (like a 1-to-5 satisfaction score), the income variable will swamp the analysis.

Standardizing your variables (subtracting the mean and dividing by the standard deviation) puts everything on equal footing. This is equivalent to running PCA on the correlation matrix instead of the covariance matrix. Most of the time, standardization is the right choice. The exception is when you deliberately want high-variance variables to carry more weight. An ecology study looking at species counts across sites, for instance, might intentionally skip standardization so that more abundant (and more variable) species receive greater emphasis. If you standardized in that case, rare species observed only a handful of times could end up driving a component, which may not be useful.

When PCA runs on the covariance matrix without standardization, higher-variance variables are prioritized in the data reduction. When it runs on the correlation matrix (standardized data), all variables contribute equally regardless of their original measurement scale. Choosing between the two is a decision you should make before running PCA, and it will meaningfully change your results.

Rotation and Its Trade-Offs

Sometimes raw PCA components are hard to interpret because multiple variables load moderately on the same component, making it unclear what the component “means.” Rotation is a technique that redistributes the explained variance more evenly across components to make the loadings cleaner, with each variable loading heavily on one component and weakly on the others.

The most common approach, varimax rotation, keeps the components uncorrelated (orthogonal) while maximizing the contrast in loadings. No total variance is lost in the rotation: the same amount of information is captured, just distributed differently. The trade-off is that you lose the neat ordering where PC1 always explains the most variance, PC2 the next most, and so on. After rotation, the variance is spread more evenly. Rotated components can also look quite different if you change the number of components you’re rotating, so the choice of how many to retain becomes more consequential.

Practical Steps for a Clean Interpretation

  • Check your scaling first. If your variables are on different scales and you haven’t standardized, your results are likely dominated by whichever variable has the largest raw variance.
  • Start with explained variance. Look at the cumulative percentage to decide how many components are worth interpreting. Use the scree plot’s elbow as a visual guide and the eigenvalue-greater-than-one rule as a loose check, not a final answer.
  • Read the loadings for each retained component. Focus on variables with absolute loadings above 0.5. Try to name each component based on the variables that drive it.
  • Use the score plot to explore observations. Look for clusters, gradients, and outliers. Cross-reference what you see with the loadings to understand why certain observations group together.
  • Use the biplot to see variables and observations together. Pay attention to arrow angles for correlation, arrow length for representation quality, and the relationship between arrows and nearby observation clusters.

PCA results are most useful when you move through these outputs in sequence rather than jumping to a single plot. The variance tells you how much to trust, the loadings tell you what each component captures, and the plots show you how your data is structured in this reduced space.