Why Scientists Use 16S rRNA to Identify Bacteria

The 16S rRNA gene is used to identify bacteria because it exists in virtually every bacterium on Earth, changes slowly enough to reveal evolutionary relationships, yet contains enough variation to distinguish one species from another. This roughly 1,500 base-pair gene acts as a molecular fingerprint: parts of it are nearly identical across all bacteria (letting scientists detect it with a single set of tools), while other parts differ enough between species to tell them apart. No other gene offers this combination of universality, stability, and variability so reliably.

What the 16S rRNA Gene Actually Does

Every living cell needs ribosomes to build proteins, and in bacteria, the small subunit of the ribosome contains a strand of RNA encoded by the 16S rRNA gene. Because protein synthesis is essential to life, this gene has been under intense evolutionary pressure to stay functional. Mutations that break it are lethal, so the gene changes very slowly over millions of years. That slow rate of change is precisely what makes it useful for classification: it preserves a record of how bacteria are related to one another across deep evolutionary time. Researchers have described 16S rRNA genes as “living fossils” carrying information about the earliest divergences of cellular life.

The gene is about 1,500 nucleotides long and has a distinctive structure. Nine hypervariable regions, labeled V1 through V9, are scattered throughout the gene. These regions accumulate mutations faster than the rest of the sequence, creating species-specific signatures. Flanking each hypervariable region are stretches of highly conserved sequence that barely differ from one bacterium to another. This alternating pattern of sameness and difference is the structural reason the gene works so well as an identification tool.

How Scientists Read the Fingerprint

To identify a bacterium, researchers extract its DNA and use a technique called PCR to make millions of copies of the 16S rRNA gene. This is where the conserved regions become critical. Because those stretches are nearly identical across bacteria, scientists can design “universal” primers, short DNA sequences that bind to conserved flanking regions and amplify the variable segments in between. Commonly used primers target positions near the start and end of the gene (positions 27 and 1492 in the standard reference), capturing most or all of the nine variable regions in a single reaction. A single primer pair can work on thousands of bacterial species without redesign.

Once amplified, the gene is sequenced and compared against massive reference databases. The SILVA database, one of the largest, contains over 9.4 million 16S sequences. When your unknown bacterium’s sequence closely matches one already in the database, you have an identification. The closer the match, the more confident the classification.

The Discovery That Reshaped Biology

The idea of using ribosomal RNA to classify microorganisms came from Carl Woese, a microbiologist who began this work in the 1970s when most scientists believed a comprehensive family tree of bacteria was impossible. By comparing 16S rRNA sequences across hundreds of organisms, Woese and his colleagues made a startling discovery, first published in late 1977: a group of microbes that looked like bacteria under a microscope were, at the molecular level, as different from bacteria as bacteria are from animals. This group became the Archaea. In 1990, Woese, Otto Kandler, and Mark Wheelis formally proposed three domains of life (Bacteria, Archaea, and Eucarya), replacing the old two-kingdom split of prokaryotes and eukaryotes. The 16S rRNA gene was the evidence that made this reclassification possible.

Where 16S Identification Shines in Medicine

In clinical settings, growing bacteria in culture remains the standard method for diagnosing infections. But culture has blind spots. Some bacteria are fastidious, meaning they grow poorly or not at all on standard lab media. Others may be present in very low numbers. And patients who have already received antibiotics before samples are collected often yield cultures with no growth at all, even when an active infection is present.

In these situations, 16S rRNA sequencing fills the gap. Infectious disease specialists now routinely order 16S testing when conventional cultures come back negative, particularly for invasive or difficult-to-diagnose infections where pinpointing the pathogen matters most. In children, for example, targeted PCR of the 16S gene has proven especially useful for detecting Kingella kingae, a fastidious organism that causes septic arthritis and is frequently missed by standard culture. The technology doesn’t replace culture, but it catches what culture misses.

Short Reads vs. Full-Length Sequencing

Not all 16S sequencing is created equal. For years, the dominant approach used short-read platforms that could only sequence small portions of the gene, typically the V3-V4 region (about 400 nucleotides). This is enough to classify bacteria to the genus level in most cases, but it often can’t distinguish between closely related species. When two species share nearly identical V3-V4 sequences but differ in regions V1, V7, or V9, short reads simply miss the difference.

Newer long-read sequencing technologies from companies like Oxford Nanopore have changed this. These platforms can sequence the full V1-V9 region (all 1,500 nucleotides) in a single pass, providing enough information to identify bacteria at the species level far more consistently. In one colorectal cancer study, nanopore sequencing of the full 16S gene identified specific bacterial biomarkers, including Fusobacterium nucleatum and Parvimonas micra, that short-read sequencing could not resolve. The entry cost for nanopore sequencers is also significantly lower, making full-length 16S sequencing accessible to labs with smaller budgets.

Known Limitations

The 16S rRNA gene is powerful, but it has real weaknesses. The most significant is copy number variation. Bacteria don’t carry just one copy of the 16S gene. Some carry two or three, while others carry a dozen or more. Species in the phyla Firmicutes and Proteobacteria average roughly 6.8 and 5.1 copies per genome, respectively. These copies aren’t always identical within the same organism, and during PCR amplification, whichever variant is present in higher numbers gets preferentially amplified. In microbiome studies that try to estimate how abundant each species is in a sample, this skews the results: bacteria with more gene copies appear more common than they really are.

Resolution is another issue. Some genuinely distinct species share 16S sequences that are nearly or completely identical. The Bacillus cereus group, for instance, includes species with very different clinical significance but almost indistinguishable 16S genes. In these cases, whole-genome sequencing or other gene targets are needed to tell them apart. Horizontal gene transfer can also complicate things. In some bacterial lineages, 16S genes get swapped between related species, creating an apparent lack of diversification at the 16S level that doesn’t reflect the true diversity of those organisms’ genomes.

Why It Remains the Standard

Despite these limitations, no single gene has displaced 16S rRNA for routine bacterial identification. The practical reasons are hard to argue with: universal primers work across nearly all bacteria, the gene is long enough to carry meaningful taxonomic information, reference databases contain millions of sequences for comparison, and decades of published literature use 16S as the common framework. When researchers discover a new bacterial species, the 16S sequence is still typically the first piece of molecular evidence they submit. It isn’t perfect for every question, but for the fundamental task of figuring out what bacterium you’re looking at, it remains the most accessible, most standardized, and most widely supported tool available.