What Is Third Generation Sequencing?

DNA sequencing determines the exact order of the four nucleotide bases (adenine, guanine, cytosine, and thymine) that make up a DNA molecule. The field began with first-generation sequencing, established by Frederick Sanger, which read short segments of DNA. This was superseded by second-generation technologies, or Next-Generation Sequencing (NGS), which introduced massive parallelization, lowering costs and increasing speed. Third Generation Sequencing (TGS) represents the latest advance, focusing on reading a single DNA molecule in real-time. This approach eliminates the pre-sequencing amplification step required by earlier methods, allowing for the direct acquisition of much longer, contiguous segments of the genetic code.

The Defining Feature: Single-Molecule Real-Time Analysis

The fundamental innovation of third-generation sequencing lies in its ability to analyze a single strand of DNA as it moves through the detection apparatus. Previous sequencing methods relied on generating millions of identical copies of the DNA sample through Polymerase Chain Reaction (PCR) for detection. The single-molecule approach bypasses this amplification step entirely, streamlining the workflow and preventing the sequencing bias that PCR can cause.

This direct, real-time observation generates “long reads,” which are contiguous stretches of sequenced DNA extending for tens or hundreds of thousands of base pairs. Second-generation sequencing produces “short reads,” which are fragmented segments only a few hundred bases long. Long reads simplify the task of reconstructing the whole genome by providing continuous data across large regions.

The real-time capability means data generation is fast and can be analyzed almost instantly. However, the single-molecule nature initially resulted in a higher raw error rate compared to the consensus-based accuracy of short-read technologies. These raw errors are typically random and are corrected by sequencing the same molecule multiple times or using specialized computational algorithms. The ability to span vast, complex genomic regions in one continuous read outweighs the initial lower base-level accuracy.

Technological Implementations: Nanopore and SMRT Sequencing

Third-generation sequencing is dominated by two platforms: Single Molecule Real-Time (SMRT) sequencing from Pacific Biosciences (PacBio) and Nanopore sequencing from Oxford Nanopore Technologies. Both achieve single-molecule analysis but employ fundamentally different physical mechanisms.

Single Molecule Real-Time (SMRT) Sequencing

PacBio’s SMRT sequencing relies on a tiny, illuminated chamber known as a Zero-Mode Waveguide (ZMW). A single DNA polymerase enzyme is immobilized at the base of the ZMW, where it continuously incorporates fluorescently labeled nucleotides onto a template DNA strand. As each nucleotide is added, it emits a unique flash of light detected by a sensor. The sequence is recorded based on the color of the flash and the duration of the pause between additions. SMRT sequencing relies on the natural kinetics of the polymerase enzyme, generally providing high consensus accuracy, especially when using Circular Consensus Sequencing (CCS).

Nanopore Sequencing

Nanopore sequencing utilizes a biological protein or synthetic pore embedded within an electrically resistant membrane. A voltage is applied across this membrane, creating an ionic current that flows through the pore. When a DNA strand is passed through this narrow aperture, the molecule changes the electrical current in a sequence-specific manner. These changes are measured in real-time and decoded by software algorithms to determine the base sequence. This method can read both DNA and RNA directly without an initial conversion step, streamlining transcriptomics studies. Nanopore technology is recognized for achieving the longest reads and offering portability in handheld devices.

Why Long Reads Matter: Solving Complex Genomic Puzzles

The extended length of third-generation reads has transformed the ability of researchers to tackle genomic problems that short-read technologies could not resolve.

Structural Variation (SV)

One significant application is the detection and characterization of Structural Variation (SV), which involves large-scale changes in DNA structure like insertions, deletions, inversions, or translocations. Short-read technologies struggle to map these changes because the variation often exceeds the read length, making the genomic location ambiguous. Long reads can span these large structural changes entirely, providing clear evidence of the variation’s exact size and position within the chromosome.

De Novo Assembly

Long reads are also paramount for generating a high-quality De Novo Assembly, which is the process of piecing together a complete genome sequence for an organism that has never been fully sequenced before. Genomes contain many highly repetitive regions, such as those found in centromeres and telomeres, which are difficult for short reads to navigate, resulting in many gaps or misassemblies. By reading through these long, repetitive sequences in a single pass, long reads act as a bridge, dramatically reducing the number of gaps and producing a complete, contiguous representation of the genome.

Epigenetic and Base Modifications

Certain TGS platforms offer the unique capability of detecting Epigenetic and Base Modifications directly on the DNA molecule. Chemical modifications to the bases, such as cytosine methylation, are important mechanisms for regulating gene expression and are often altered in disease states. Because the sequencing process monitors the physical or kinetic properties of the molecule, it can register the presence of these modifications without the need for specialized chemical conversion steps required by older methods, simplifying the study of these important regulatory layers.