Dimensionality reduction is a set of techniques that compress complex, high-dimensional data into a simpler form with fewer variables while preserving as much meaningful structure as possible. If you have a dataset where each item is described by hundreds or thousands of features (think pixels in an image, genes in a cell, or words in a document), dimensionality reduction finds a way to represent that same data with far fewer numbers. The result is data that’s easier to visualize, faster to process, and often performs better in machine learning models.
Why High-Dimensional Data Is a Problem
Every feature you add to a dataset is another “dimension.” A spreadsheet with 10 columns lives in 10-dimensional space. A dataset of gene expression levels across 20,000 genes lives in 20,000-dimensional space. As dimensions increase, something counterintuitive happens: the data becomes increasingly sparse, and patterns that hold in lower dimensions start to break down. This is known as the curse of dimensionality.
One of the most concrete effects is that distance measurements lose their meaning. In everyday life, distance tells you which things are close together and which are far apart. But as dimensions climb, the gap between the nearest and farthest points in a dataset shrinks to almost nothing. Research analyzing three common distance measurements (including the familiar straight-line distance) found that nearest-neighbor search becomes essentially meaningless at high dimensionality, because all points appear roughly equidistant from each other. When your algorithm can’t tell which data points are similar, classification, regression, and clustering all degrade.
High-dimensional data also tends to carry a lot of redundancy. Many features are correlated or contribute almost no useful information. Analyses using PCA (a common reduction method) consistently show that the meaningful variation in high-dimensional datasets concentrates in just a handful of dimensions, while the rest is noise or repetition.
The Core Idea
At its heart, dimensionality reduction is a mapping problem. You start with data points described by D features, and you want to represent them using only L features, where L is much smaller than D. The goal is to find a transformation that moves your data into this smaller space while keeping the important relationships intact, so that points which were similar in the original space remain similar after reduction. A good reduction also allows you to reconstruct a reasonable approximation of the original data from the compressed version, meaning you haven’t thrown away too much signal.
The tradeoff is always between compression and fidelity. Squeeze the data into too few dimensions and you lose meaningful patterns. Keep too many dimensions and you haven’t gained much. Finding that sweet spot is a practical challenge in every project that uses these methods.
Linear Methods: PCA and LDA
The most widely used dimensionality reduction technique is Principal Component Analysis, or PCA. It works by finding the directions in your data along which there’s the most variation, then projecting the data onto those directions. The first principal component captures the single axis of greatest variance, the second captures the next greatest (perpendicular to the first), and so on. You then keep only the top few components and discard the rest.
PCA is unsupervised, meaning it doesn’t know or care about any labels in your data. It simply looks at the overall spread. This makes it a good general-purpose tool for compression and visualization, but it’s not optimized for tasks where you need to tell groups apart.
Linear Discriminant Analysis (LDA) fills that gap. Instead of maximizing overall variance, LDA looks for the directions that best separate known classes. It maximizes the distance between class averages while minimizing the spread within each class. If you’re trying to distinguish between categories (say, different types of cells or different customer segments), LDA can produce a lower-dimensional space that’s more useful for classification than PCA would be. Research comparing the two on face recognition found that PCA performs better when learning the general appearance of a face matters more, while LDA excels when the goal is distinguishing one face from another.
Both PCA and LDA are “linear” methods, meaning they can only find flat, straight-line relationships in the data. When the meaningful structure in your data lies along curves or complex shapes, you need nonlinear approaches.
Nonlinear Methods: t-SNE and UMAP
For visualization of complex datasets, two nonlinear methods dominate: t-SNE (t-distributed stochastic neighbor embedding) and UMAP (uniform manifold approximation and projection). Both are especially popular in biology and genomics, where datasets can have tens of thousands of dimensions.
t-SNE works by measuring how close each data point is to every other point in the original high-dimensional space, then iteratively arranging points in two or three dimensions so that nearby points stay nearby. It places heavy emphasis on preserving local neighborhoods: points that are close in the original space should cluster together in the reduced version. Medium- and long-range distances are largely ignored, which is why t-SNE plots often show tight, well-separated clusters but don’t reliably reflect how far apart those clusters actually are.
UMAP takes a similar neighbor-based approach but uses a different mathematical foundation rooted in topology. In practice, UMAP produces visualizations broadly consistent with t-SNE but runs significantly faster, which matters when you’re working with millions of data points. UMAP has a reputation for better preserving global structure (the relative positioning of clusters), though recent analysis suggests this advantage comes more from its default settings than from the algorithm itself. Both methods rely on nearest-neighbor graphs to maintain local relationships, and neither fully preserves the actual distances from the original space.
Choosing the Right Number of Dimensions
With PCA, the standard approach is to look at how much of the total variance each component explains, then keep enough components to cross a threshold. A common, if somewhat arbitrary, cutoff is 70% of total variance explained. In practice, many researchers aim higher. Analyses of real datasets frequently find that just two or three principal components capture 90% to 95% of total variation, meaning the vast majority of the data’s structure can be represented with very few dimensions.
For visualization, two or three dimensions are standard simply because humans can’t see more than that. For machine learning pipelines, where the reduced data feeds into a downstream model, you typically experiment with different numbers of components and evaluate how model performance changes.
Scaling Your Data First
One practical step that’s easy to overlook: your features need to be on comparable scales before you apply most reduction methods. If one variable is measured in millimeters and another in kilograms, the algorithm will treat the variable with bigger numbers as more important, purely because of its units. Standard practice is to scale each variable so it has a variance of one, ensuring that every feature contributes equally. This is especially critical when your dataset contains mixed types of measurements, like combining lab values, demographic data, and survey scores.
Real-World Applications
Dimensionality reduction isn’t just an academic exercise. It’s a core step in some of the most data-intensive fields.
In single-cell genomics, researchers measure the activity of thousands of genes in individual cells, producing datasets with tens of thousands of dimensions. It’s standard practice to apply PCA before clustering cells into types, and then use UMAP or t-SNE to visualize those clusters in two dimensions. Recent work in this area has focused on quantifying how confident you can be that clusters represent genuinely distinct biological groups rather than artifacts of the reduction process. One approach generates a “cluster cohesion index” that measures how tightly grouped and well-separated clusters really are, going beyond what a simple scatterplot can show.
In image processing, each pixel is a feature, so even a small 100-by-100 grayscale image lives in 10,000-dimensional space. PCA compresses these images into a manageable number of components for tasks like facial recognition or object detection. In natural language processing, word embeddings reduce the enormous vocabulary space into dense vectors of a few hundred dimensions, capturing meaning and relationships between words. In finance, dimensionality reduction helps identify the handful of underlying factors driving the movement of hundreds of correlated assets.
The common thread across all these cases is the same: you have more features than you can practically work with, and most of the meaningful information can be captured with far fewer. Dimensionality reduction finds that compact representation, letting you visualize what you couldn’t see before and build models that train faster and generalize better.

