What Is Partial Least Squares and How Does It Work?

Partial least squares (PLS) is a statistical method for building predictive models when your predictor variables are numerous, highly correlated with each other, or even outnumber your observations. Where traditional regression breaks down under these conditions, PLS works by creating a small set of new summary variables (called components or latent variables) that capture the strongest relationships between your predictors and the outcome you’re trying to predict. It’s widely used in chemistry, genomics, marketing research, and any field where datasets are wide and messy.

How PLS Works

The core idea behind PLS is surprisingly intuitive. Imagine you have dozens or hundreds of measurements (your predictors) and you want to predict some outcome. Rather than feeding all those measurements into a regression equation, which would be unstable or impossible with correlated data, PLS compresses them into a handful of new variables. These new variables are linear combinations of the originals, chosen specifically to be maximally relevant to the outcome.

Mathematically, PLS finds pairs of scores from the predictor data and the response data so that the relationship between each pair is as strong as possible. It does this iteratively: the first component captures the single strongest predictive direction in your data, the second captures the next strongest (while being uncorrelated with the first), and so on. The method is rooted in the singular value decomposition of the cross-product matrix between predictors and responses, but the practical effect is straightforward. You end up with a small number of components that summarize what matters most for prediction, discarding noise and redundancy.

PLS vs. Principal Component Analysis

PLS is often compared to a related technique called principal component regression (PCR), which first applies principal component analysis (PCA) to reduce dimensions and then fits a regression model. The difference is critical. PCA is entirely unsupervised: it finds directions in your data where variance is highest, with no regard for the outcome variable. If the directions with the most variance in your predictors happen to be irrelevant to the outcome, PCA will prioritize them anyway, and the resulting regression can perform poorly.

PLS avoids this trap because its transformation is supervised. It uses information about the outcome during the dimension-reduction step, not just afterward. This means PLS can identify and prioritize a low-variance direction in your predictors if that direction turns out to be the most predictive one. In practice, PLS often needs fewer components than PCR to achieve the same or better prediction accuracy, precisely because every component it creates is tuned to the target variable from the start.

Why Multicollinearity Matters

Multicollinearity is the situation where your predictor variables are highly correlated with one another. In ordinary regression, this causes parameter estimates to become wildly unstable: small changes in the data can flip coefficients from positive to negative. PLS sidesteps this problem entirely by working with components rather than the original variables. Simulation studies have shown PLS is more robust to multicollinearity than principal component regression, producing more stable parameter estimates even in heavily correlated datasets.

PLS also handles the “wide data” problem, where you have more predictor columns than you have rows of data. This is common in genomics (thousands of genes measured on a few dozen patients) and spectroscopy (hundreds of wavelengths measured on a small number of samples). Ordinary regression simply cannot be computed in this scenario. PLS can, because it reduces the predictor space to a manageable number of components before fitting the model.

Choosing the Right Number of Components

The most important decision in building a PLS model is how many components to retain. Too few and you underfit, missing real predictive signal. Too many and you overfit, modeling noise that won’t generalize to new data. Like many tunable methods, PLS has a strong tendency toward overfitting when too many components are included.

The standard approach is cross-validation: you repeatedly hold out a portion of your data, build PLS models with different numbers of components on the remainder, and measure prediction error on the held-out portion. The number of components that minimizes cross-validation error is your best choice. In some datasets this might be as few as two components, while in others it could be ten or more. When cross-validation error doesn’t show a clear minimum, choosing the fewest components that achieve near-minimum error is a safe strategy. This keeps the model simple and less prone to capturing noise.

PLS for Classification

PLS isn’t limited to predicting continuous outcomes. A variant called PLS-DA (partial least squares discriminant analysis) adapts the method for classification tasks, where the goal is to assign observations to categories. PLS-DA has been popular in chemometrics for over two decades and is gaining traction in metabolomics and other large-scale biological analyses. These fields generate datasets with hundreds or thousands of features, substantial noise, and missing data, which are exactly the conditions where PLS shines. Researchers have used PLS-DA to differentiate microbial community types based on the abundance of nearly 250 bacterial taxa across hundreds of samples, for example.

PLS in Structural Equation Modeling

A separate but related use of the PLS algorithm appears in structural equation modeling (SEM), a technique for testing theoretical models about how variables influence one another. PLS-SEM takes a variance-based approach, estimating relationships by maximizing explained variance, while the alternative (covariance-based SEM) works by fitting the model to the covariance structure of the data.

The practical difference comes down to flexibility. Covariance-based SEM demands larger sample sizes, normally distributed data, and works best for models built around latent factors. PLS-SEM is more lenient with data requirements and is better suited for composite-based models, where constructs are formed as weighted combinations of their indicators rather than being treated as underlying causes. Social sciences and business research frequently use PLS-SEM when sample sizes are limited or the research is more exploratory than confirmatory.

Common Applications

PLS appears across a wide range of fields, often wherever traditional regression hits a wall:

  • Spectroscopy and chemistry: Predicting chemical composition from spectral data, where each wavelength is a predictor and wavelengths are heavily correlated with their neighbors.
  • Genomics and metabolomics: Identifying which genes or metabolites are associated with disease states, often with thousands of features measured on relatively few subjects.
  • Sensory science and food quality: Relating dozens of chemical measurements to taste or texture ratings from panels of tasters.
  • Marketing and social science: Modeling consumer behavior from large survey instruments with many overlapping questions.

The common thread is data that is high-dimensional, correlated, and potentially noisy. PLS thrives in these environments because it was designed for them.

Software for PLS

PLS is well-supported in both R and Python. In R, the pls package on CRAN provides functions for PLS regression, principal component regression, and related methods. You can specify cross-validation directly within the model call, and built-in plotting tools let you visualize prediction error across different numbers of components. In Python, scikit-learn’s cross-decomposition module includes PLS regression and PLS canonical correlation, with the same fit-transform-predict workflow used across the library. Both ecosystems make it straightforward to standardize predictors, select components via cross-validation, and evaluate model performance on held-out data.