What Does SVD Do? Matrix Decomposition Explained

Singular value decomposition (SVD) breaks any matrix of data into three simpler matrices, revealing the most important patterns hidden inside. It’s one of the most widely used tools in applied mathematics, powering everything from image compression to Netflix recommendations. At its core, SVD answers a surprisingly useful question: what are the most meaningful components of this data, ranked by importance?

How SVD Breaks Down a Matrix

Any matrix, regardless of its shape or contents, can be split into three pieces:

U: a matrix whose columns describe patterns across the rows of your original data
Σ (Sigma): a diagonal matrix of “singular values,” which are essentially weights telling you how important each pattern is
V: a matrix whose columns describe patterns across the columns of your original data

The original matrix equals U × Σ × V transposed. That’s the whole decomposition. The singular values in Σ are always arranged from largest to smallest, so the first component captures the dominant pattern in the data, the second captures the next most important, and so on. You can think of it like sorting the ingredients of a recipe by how much they contribute to the final flavor.

A useful way to visualize this: your original data is the weighted sum of a series of simpler layers. Each layer is built from one column of U, one singular value, and one column of V. Because the singular values tell you the relative size of each layer, the first few layers often capture the vast majority of what matters. The rest is fine detail or noise.

Reducing Data Without Losing Meaning

The most common use of SVD is dimensionality reduction. If you have a massive dataset with hundreds of variables, SVD lets you keep only the top components (the ones with the largest singular values) and throw away the rest. You lose a small amount of detail but retain the structure that actually matters.

How much each component contributes is easy to quantify. You square each singular value and divide by the sum of all squared singular values. This gives you the “percent variance explained” by each component. In many real datasets, the first 10 or 20 components out of hundreds explain 90% or more of the total variation. That means you can represent the data with a fraction of the original size and barely lose any information.

This is closely related to principal component analysis (PCA), which many people encounter in statistics courses. PCA and SVD are mathematically intertwined. When you run PCA on a dataset, SVD is typically what’s happening under the hood. The principal components are derived from the same decomposition. The distinction matters mainly to mathematicians; for practical purposes, SVD is the engine that makes PCA work.

Image Compression

A digital image is just a matrix of pixel values, which makes it a natural fit for SVD. When you decompose an image matrix, the first few singular values and their associated components capture the broad shapes, edges, and gradients. The later components hold progressively finer detail.

To compress an image, you keep only the top k components and discard the rest. The image is then stored as three smaller matrices instead of the full original, significantly reducing storage requirements. The goal is to find a k value where the compressed image still looks good to the human eye while using far less space. Performance is typically measured by the compression ratio and the mean square error between the original and compressed versions.

Cleaning Noise From Signals

SVD is widely used to separate real signals from random noise. The idea is straightforward: meaningful patterns in data tend to concentrate in the largest singular values, while noise spreads across the smaller ones. By reconstructing the data using only the top singular values and discarding the rest, you get a cleaner version of the original signal.

This technique shows up in sensor data processing, audio engineering, and vibration analysis. The tricky part is deciding where to draw the line between “signal” and “noise.” Various methods exist for choosing that cutoff, often by looking for a sharp drop in the sequence of singular values. Everything above the drop is kept; everything below is treated as noise and removed.

Recommendation Systems

When Netflix or Amazon suggests something you might like, SVD-based collaborative filtering is one of the techniques that may be at work. The setup looks like this: imagine a giant matrix where each row is a user, each column is a movie, and the cells contain ratings. Most cells are empty because no one has rated every movie. SVD fills in those gaps.

By decomposing the ratings matrix, SVD identifies latent factors, hidden dimensions that describe both users and items. One factor might loosely correspond to “preference for action movies,” another to “tolerance for long films.” These factors aren’t labeled explicitly, but they emerge from the patterns in the data. The system then predicts how you’d rate an unseen movie by combining your factor profile with the movie’s factor profile. Research using the MovieLens dataset found that SVD outperformed both user-based and item-based collaborative filtering, and that tuning the number of retained components (the k value) improved accuracy up to a point.

Understanding Language and Text

In natural language processing, SVD powers a technique called latent semantic analysis (LSA). You start with a matrix where rows represent words and columns represent documents (or paragraphs). Each cell records how often a word appears in a given document. This matrix is typically enormous and sparse.

SVD compresses it into a lower-dimensional “semantic space” where words with similar meanings end up near each other, even if they never appeared in the same document. The word “car” and “automobile” might never co-occur, but because they appear in similar contexts, SVD places them close together. Similarity between any two words or documents is measured by the angle between their vectors in this space.

This approach has been applied to collections of up to half a billion documents with 750,000 unique word types. It’s the foundation that preceded modern word embeddings and large language models, and the core insight, that meaning can be extracted from patterns of co-occurrence, remains influential.

Genomics and Medical Imaging

In bioinformatics, SVD helps researchers make sense of gene expression data. Rather than analyzing thousands of individual genes, SVD can condense expression profiles into “pathway activity levels,” summarizing the behavior of whole biological pathways in a single number. This has been applied to studies of type 2 diabetes and the effects of cigarette smoke on airway tissue, where it simplified comparisons between healthy and diseased states.

In medical imaging, SVD plays a role in reconstructing MRI scans. Dynamic MRI sequences generate large amounts of data, and SVD-based methods can separate the slowly changing background from the dynamic elements (like blood flow or organ movement). Compressed SVD approaches reduce reconstruction time while maintaining image quality, which matters when patients are waiting inside the scanner.

Computational Cost

SVD is powerful but not free. For a matrix with m rows and n columns, the standard algorithm runs with a computational complexity of O(mn²), meaning the time required grows quickly as your data gets larger. For a 1,000 × 1,000 matrix, that’s manageable on a laptop. For millions of rows and columns, it becomes a serious bottleneck. This is why much of the practical work with SVD involves approximations and faster algorithms that compute only the top k singular values rather than the full decomposition. In most applications, you don’t need all the components anyway, just the ones that carry the most weight.