What Is 16S rRNA Sequencing and How Does It Work?

16S rRNA sequencing is a method for identifying bacteria by reading a specific gene that all bacteria share. Instead of trying to grow bacteria in a lab (which fails for a large percentage of species), this technique extracts DNA directly from a sample, amplifies a target gene, and compares the results against known reference databases. It’s the gold standard for profiling bacterial communities in environments ranging from soil to the human gut.

Why the 16S Gene Works as a Bacterial ID

Every bacterium carries at least one copy of the 16S rRNA gene, which encodes a structural component of the ribosome (the cellular machinery that builds proteins). This gene is roughly 1,500 base pairs long and contains a useful quirk: it has nine hypervariable regions, labeled V1 through V9, sandwiched between conserved regions that are nearly identical across all bacteria.

The conserved regions act as universal landing pads. Researchers design primers that bind to these shared sequences, which means a single set of primers can grab the 16S gene from virtually any bacterium in a sample. The hypervariable regions, by contrast, differ enough between bacterial groups to serve as fingerprints. The V1 region, for example, can distinguish pathogenic Streptococcus species and separate Staphylococcus aureus from other staphylococci. The V3-V4 regions are popular for general surveys because they balance broad detection with the ability to tell closely related bacteria apart.

How the Process Works

The workflow moves through three main phases: wet lab preparation, sequencing, and computational analysis.

In the wet lab, DNA is extracted from whatever sample you’re working with, whether that’s a stool sample, a soil core, or a wound swab. Then a two-step PCR (polymerase chain reaction) process amplifies the target hypervariable regions. The first round of PCR copies the 16S gene segment. The second round attaches short molecular tags called adapters and barcodes, which let the sequencer read multiple samples in a single run. The result is a “library” of DNA fragments ready for sequencing.

Most studies today use Illumina platforms, which produce short reads of 200 to 500 base pairs. That’s enough to cover two or three hypervariable regions at a time. Newer long-read technologies from PacBio and Oxford Nanopore can capture the full 1,200 to 1,650 base-pair gene in a single read, giving finer resolution for distinguishing closely related species.

Turning Raw Data Into Bacterial Names

Sequencing generates millions of short DNA reads that need to be cleaned, sorted, and matched to known bacteria. Several software platforms handle this, with QIIME 2, mothur, and DADA2 (available through the Bioconductor platform) being the most widely used. The first steps involve trimming off primer sequences and filtering out low-quality reads.

From there, the software groups similar sequences together. Older approaches cluster sequences that are within 3% of each other into bins called operational taxonomic units, or OTUs. Newer methods like DADA2 and Deblur resolve sequences down to single-nucleotide differences, producing amplicon sequence variants (ASVs). ASVs are more precise because they can distinguish bacteria that differ by even one base pair, rather than lumping them into a 97%-similarity bin.

Once sequences are grouped, the software compares them against reference databases to assign taxonomic names. The major databases include SILVA (version 138, with over 436,000 entries), Greengenes (about 203,000 entries), and the Ribosomal Database Project (RDP). A newer combined database called GSR-DB merges all three for broader coverage. The Genome Taxonomy Database (GTDB) is another option that uses genome-based taxonomy rather than traditional naming conventions.

Where 16S Sequencing Is Used

The Human Microbiome Project used 16S sequencing to characterize over 5,100 bacterial communities from 242 adults, mapping the microbes that live in the airways, skin, mouth, gut, and vaginal tract. The gut microbiome alone, dominated by two major bacterial groups called Bacteroidetes and Firmicutes, has become one of the most studied ecosystems in biology, and 16S sequencing is the primary tool that made that research possible.

In clinical settings, 16S sequencing shines when standard culture methods fail. A large percentage of the bacteria in the mouth and lungs simply won’t grow in a lab dish. In one study, 172 patients harboring bacteria that couldn’t be identified through standard lab methods were evaluated using 16S sequencing, improving identification and treatment decisions. Another study of mechanically ventilated patients found clinically important bacterial groups in airway samples that traditional cultures missed entirely, using long-read sequencing that covered roughly 95% of the 16S gene.

Beyond human health, the technique is used in environmental science to survey microbial communities in oceans, soils, wastewater treatment plants, and food production facilities.

Known Limitations

The biggest limitation is taxonomic resolution. 16S sequencing reliably identifies bacteria to the genus level but often struggles to distinguish between closely related species. Escherichia coli and Shigella, for instance, are so genetically similar in their 16S genes that the method can’t reliably separate them. When species-level identification matters, researchers need to sequence additional genes or move to whole-genome approaches.

Copy number variation also complicates things. Different bacterial species carry different numbers of 16S gene copies in their genomes, ranging from 1 to more than 15. A bacterium with 10 copies will produce more sequences than one with a single copy, even if both are equally abundant in the sample. This can skew estimates of how much of each bacterium is actually present.

Additionally, the choice of which hypervariable regions to sequence introduces bias. No single region captures every bacterial group equally well. The V4, V5, and V6 regions are the most conserved (meaning they vary less between species), which makes them useful for broad surveys but less helpful for telling similar bacteria apart. The V2 and V3 regions show more variation and can be better for distinguishing specific pathogens, but they may miss other groups.

16S Sequencing vs. Shotgun Metagenomics

Shotgun metagenomics takes a fundamentally different approach: instead of targeting one gene, it sequences all the DNA in a sample. This captures bacterial, viral, and fungal genomes simultaneously, along with their functional genes. In a direct comparison, shotgun metagenomics identified bacteria to the species level in 82.4% of clinical samples, compared to just 38.2% for 16S Sanger sequencing. It was also more efficient at resolving species in 39% of samples where 16S could only reach the genus level.

Shotgun metagenomics can also detect antibiotic resistance genes and virulence factors, something 16S sequencing simply cannot do because it only looks at one structural gene. The tradeoff is cost: shotgun metagenomics runs about three times more per sample, and it demands significantly more computational resources to analyze the data. For studies that need to profile bacterial community composition across hundreds or thousands of samples, 16S sequencing remains the more practical and affordable choice.