What Is the Human Protein Atlas and How Does It Work?

The Human Protein Atlas (HPA) is an open-access research resource dedicated to systematically mapping all proteins encoded by the human genome. Initiated in 2003 in Sweden, the project’s goal is to provide a comprehensive understanding of the human proteome. This global effort catalogs approximately 20,000 protein-coding genes and visualizes where their corresponding protein products reside. The resource offers millions of high-resolution images and integrated data to scientists worldwide, advancing both basic human biology and disease research.

The Core Mission and Scope

The mission of the Human Protein Atlas is to create a complete spatial and quantitative map of the entire human proteome. The proteome is the full complement of proteins expressed by an organism, and mapping its composition and location is considered the next frontier following the sequencing of the human genome. The project documents precisely where each protein is found within the human body.

This scope extends to documenting protein expression across all major healthy cells, tissues, and organs. Knowing the biological context of a protein is directly tied to its function; for instance, a liver-specific protein likely plays a role in liver physiology, while a ubiquitous protein may fulfill basic cellular maintenance. By providing this comprehensive baseline, the HPA furnishes a standard against which disease states can be compared. The data is continuously updated to fuel research into human development, metabolism, and disease progression.

Generating the Protein Map

The Human Protein Atlas methodology relies on integrating two powerful molecular techniques: antibody-based imaging and transcriptomics. This dual approach ensures the accuracy and reliability of the data for each protein-coding gene. The primary visualization technique is immunohistochemistry (IHC), which uses specific antibodies to bind to target proteins within tissue samples.

When the antibody binds to its target, a chemical reaction produces a visible stain, typically brown, that microscopically reveals the protein’s presence and exact location. Researchers apply these validated antibodies to tissue microarrays containing samples from a wide range of normal human tissues. This process generates millions of high-resolution images that visually demonstrate the protein’s spatial distribution at a single-cell level.

Complementing this visual data is transcriptomics, primarily RNA sequencing (RNA-seq), which measures the amount of messenger RNA (mRNA) present for each gene. Since mRNA is the template used to produce the protein, its abundance serves as a proxy for potential protein expression. By comparing the visual evidence from antibody staining with the quantitative RNA-seq data, researchers validate the protein’s expression and increase confidence in the data. The HPA also employs rigorous validation protocols, often using two or more independent antibodies targeting different regions of the same protein.

Exploring the Specialized Atlases

The vast amount of data generated by the project is organized into several specialized, interconnected databases, or “Atlases,” focusing on a particular context of the human proteome. This structure allows researchers to search for protein information based on tissue, cell type, or disease state.

Tissue Atlas

The Tissue Atlas details the distribution of proteins across up to 44 different normal human tissue types, such as the kidney, heart, and brain.

Cell Atlas

The Cell Atlas provides a deep dive into the subcellular localization of proteins, showing precisely where they reside inside individual cells. This resource identifies if a protein is located in the nucleus, mitochondria, or another of the approximately 35 cellular structures, which is directly relevant to its function.

Pathology Atlas

The Pathology Atlas focuses on disease, specifically analyzing protein expression in various types of human cancer. It correlates protein levels with patient survival data, providing insights into which proteins may be over- or under-expressed in malignant tissue.

Other specialized resources, such as the Brain Atlas, further refine the data by focusing on the complex protein landscape across different regions of the mammalian brain. These distinct, yet integrated, Atlases provide multiple dimensions of context for every mapped human protein.

Applications in Research and Medicine

The Human Protein Atlas serves as a foundational reference for translational science, driving advancements in both research and clinical medicine. One of its most significant applications is the identification of potential biomarkers for diagnostics. By comparing protein expression profiles in healthy and diseased tissues, scientists can pinpoint proteins uniquely altered in a specific condition, such as cancer, providing targets for early detection tests.

The data is also utilized in the pharmaceutical industry for drug discovery and target validation. The protein target must be validated to ensure it plays a meaningful role in the disease and is accessible to therapeutic intervention. The HPA’s tissue-specific expression data helps researchers assess potential side effects by revealing the target’s presence in other healthy organs, informing drug safety and design.

The resource contributes to a deeper understanding of molecular disease mechanisms. By providing a spatial map, the HPA can reveal if a protein is mislocalized or aberrantly expressed in a disease state, offering clues about how a condition develops. For example, researchers have used the RNA-seq data to determine tissue-specific expression of candidate genes, aiding in the identification of genetic mutations associated with diseases like severe heart failure.