What Does the Field of Proteomics Study?

Proteomics is the large-scale study of proteins: their structures, functions, interactions, and modifications within a cell, tissue, or entire organism. While genomics maps the roughly 20,000 protein-coding genes in human DNA, proteomics tackles the far more complex question of what those genes actually produce and how those products behave in living systems. The human body contains not 20,000 proteins but potentially millions of distinct protein forms, and understanding that complexity is the central challenge of the field.

From 20,000 Genes to Millions of Proteins

The gap between the genome and the proteome is enormous. Humans have approximately 20,300 protein-coding genes, but the number of unique protein forms the body produces is orders of magnitude larger. When genes are read by cells, many are spliced into alternative versions, bumping the count to roughly 70,000 distinct protein sequences. On top of that, proteins are chemically modified after they’re built, a process that generates hundreds of thousands of additional variants. And for specialized immune proteins like antibodies and T-cell receptors, the body shuffles gene segments so aggressively that the number of possible variants reaches into the billions over a lifetime.

A 2016 database analysis estimated roughly 6 million distinct protein forms across human tissues. Even within a single cell type, researchers estimate there may be around 1 million. This is why reading the genome alone doesn’t tell you what’s actually happening in a cell at any given moment. Proteins are the molecules doing the work, and proteomics exists to catalog and understand them.

Why Proteins Are Harder to Study Than DNA

DNA can be copied billions of times using standard lab techniques, making even tiny samples easy to analyze. Proteins cannot be amplified. Whatever amount exists in a sample is all you have to work with, which makes detecting rare proteins a serious technical hurdle. Proteins also vary wildly in abundance: a single cell can contain anywhere from one copy to ten million copies of a given protein, spanning seven orders of magnitude. Most current instruments capture only a portion of that range, often missing low-abundance proteins in what researchers call the “dark proteome.”

There’s also the issue of chemical diversity. DNA is built from four building blocks. Proteins use 20 different amino acids, plus more than 650 known types of chemical modifications that can be added after the protein is assembled. These modifications change a protein’s shape, stability, location, electrical charge, and ability to interact with other molecules. Distinguishing all of these variants unambiguously remains one of the field’s core technical challenges.

How Proteins Get Modified and Why It Matters

After a protein is built, the cell can chemically tag it in ways that alter its behavior. These post-translational modifications are not random decorations. They are essential regulators of nearly every cellular process, from metabolism to immune signaling to cell division. Phosphorylation, for instance, acts as an on/off switch in cell signaling pathways and the cell cycle. Acetylation and methylation help control which genes get read and how cells process energy. Glycosylation (adding sugar molecules) influences how proteins fold and how cells stick to one another. Ubiquitination tags proteins for destruction or redirects them within the cell.

When these modifications go wrong, disease often follows. Excessive phosphorylation of a protein called Tau is a hallmark of Alzheimer’s disease, driving the formation of toxic tangles in the brain. Reduced modification of the huntingtin protein increases its tendency to clump together, contributing to Huntington’s disease. Abnormal acetylation patterns can disrupt insulin sensitivity and metabolism. Proteomics provides the tools to detect these modification patterns across thousands of proteins simultaneously, making it possible to spot disease signatures that genetic testing alone would miss.

The Tools Behind Proteomics

Mass spectrometry is the workhorse technology of modern proteomics. It works by breaking proteins into smaller fragments, measuring the mass and charge of each fragment, and then computationally reconstructing which proteins were present in the original sample. The analytical pipeline involves multiple stages: quality control, data cleaning, normalization, statistical analysis, functional interpretation, and visualization. Each step requires specialized software, and the field relies heavily on bioinformatics tools, including databases that map known protein interactions and pathway analysis platforms that help researchers understand what the detected proteins are doing in biological context.

A newer frontier is single-cell proteomics, which aims to measure proteins in individual cells rather than bulk tissue samples. Current methods can routinely detect and quantify 1,000 to 1,500 proteins per cell, but that number pales in comparison to the total number of unique proteins and protein forms a cell contains. The field is roughly where genomic sequencing was in its early days: promising but still limited. One major obstacle is that proteins in a cell can’t be copied the way DNA can, so every molecule lost during sample preparation is gone for good. Researchers are also exploring sequencing-based and nanopore approaches that could eventually detect the low-copy proteins that mass spectrometry tends to miss, though none of these newer methods have commercially viable products yet.

Mapping the Human Proteome

The Human Proteome Project, coordinated by the Human Proteome Organization (HUPO), is the field’s flagship global effort. Its goal is to build a complete parts list of human proteins by confirming expression and function for every one. The project’s 2025 report, based on a reference proteome of 19,435 proteins, found that 93.6% of the proteome has now been detected. On the functional side, 5,562 proteins have been assigned to the highest confidence category for known function, an increase of 288 proteins in a single year. Twelve biology- and disease-focused initiatives are working to fill the remaining gaps.

Finding Drug Targets and Disease Markers

One of the most direct applications of proteomics is in drug discovery. Chemical proteomics, a specialized branch of the field, identifies which proteins a drug molecule physically binds to inside living cells. This is valuable in two directions: it helps researchers find new drug targets by revealing which proteins are involved in a disease process, and it helps explain side effects by uncovering unintended protein interactions (off-target effects). Because these experiments happen inside intact cells rather than in artificial lab conditions, they provide a more realistic picture of how a drug behaves in the body.

Proteomics has also produced clinical diagnostic tools. The OVA1 test for ovarian cancer, for example, combines measurements of five different proteins (including CA125, prealbumin, and transferrin) to assess whether a pelvic mass is likely to be malignant. Researchers are also evaluating phosphorylation-based protein signatures as predictors of how patients with melanoma and lung cancer will respond to specific targeted therapies.

Proteomics and Precision Medicine

Genomic data alone often misses critical changes that are visible only at the protein level. A gene mutation might predict that a protein will malfunction, but proteomics can show whether that protein is actually present, how abundant it is, and whether it carries modifications that change its activity. This is why a growing number of cancer research programs combine genomics, transcriptomics (measuring gene activity), and proteomics into a unified approach called proteogenomics.

The National Cancer Institute actively supports clinical proteogenomics research, recognizing that integrating these data types provides a more complete picture of tumor biology than any single approach. By layering protein-level data on top of genetic profiles, clinicians can better match patients to treatments and identify resistance mechanisms earlier. While cancer is the most advanced application, the approach is expected to expand into other areas of medicine where protein-level changes drive disease progression.