What Is Resequencing and How Does It Work?

Resequencing is a method used in modern genetics to efficiently determine the complete DNA sequence of an organism after a foundational genetic map has already been created. This technique is a powerful tool for examining the entire genome of an individual to understand how their unique DNA sequence differs from a standard representation of the species. By focusing on these differences, scientists gain deep insights into the biological variation that affects traits, disease susceptibility, and evolutionary relationships. This process provides a high-resolution snapshot of an organism’s inherited instruction set and has become a standard approach for understanding human variation and the genetics of many other well-studied organisms.

Resequencing Versus Initial Genome Mapping

The fundamental distinction in genome analysis lies between de novo sequencing and resequencing, defined by the availability of a reference sequence. De novo sequencing is the initial, large-scale effort to map a species’ genome for the first time, such as the initial Human Genome Project. This process involves assembling millions of short DNA fragments into a cohesive sequence without any template for guidance. The resulting sequence, a comprehensive representation of the species’ genome, is known as the reference genome.

Resequencing, in contrast, involves sequencing an individual’s genome and then computationally aligning the resulting short DNA reads against that established reference genome. Because the framework already exists, the process shifts from assembly to mapping, identifying where the new fragments fit onto the known map. This reliance on a template simplifies the computational analysis significantly, making resequencing faster, more cost-effective, and more scalable than de novo efforts. The technique is used when studying genetic variation within an already-mapped species, such as sequencing the genomes of thousands of people.

How the Resequencing Process Works

The resequencing process begins with the physical preparation of the DNA sample, involving extraction and purification of the genetic material. The purified DNA is then randomly fragmented into millions of small pieces, as sequencing instruments read only short segments. This fragmentation is followed by library preparation, where specialized synthetic sequences called adaptors are attached to the ends of the DNA fragments. These adaptors allow the fragments to bind to the sequencing platform and serve as primers for subsequent DNA synthesis reactions.

Sequencing by Synthesis

The prepared library is loaded onto the sequencer, where the reading of the bases occurs in a process called sequencing by synthesis. A DNA polymerase enzyme synthesizes a complementary strand for each fragment, incorporating fluorescently labeled nucleotides one by one. A camera captures the color signal emitted by each incorporated base, reading the sequence of A, T, C, and G for millions of fragments simultaneously. This step generates a massive amount of raw data consisting of millions of short sequence reads.

Bioinformatics Analysis

The final stage of resequencing is the bioinformatics analysis, where the raw short reads are computationally mapped to the reference genome. Specialized algorithms align these reads to the known sequence, determining the precise location of each fragment. The software identifies any positions where the individual’s sequence read does not match the reference sequence, as the goal is to find differences. This alignment step quickly pinpoints genetic variations, translating millions of short reads into a comprehensive map of an individual’s unique genome sequence.

Cataloging Genetic Differences

Small-Scale Variations

Once the individual’s DNA fragments have been aligned to the reference genome, the objective is to catalog the specific types of genetic variations present. The most common type of variation identified is the Single Nucleotide Polymorphism (SNP), which is a change in a single base pair at a specific location. Resequencing also accurately detects small insertions and deletions, referred to as indels, where one or a few base pairs are added or removed from the sequence. These small-scale variants are responsible for a significant portion of individual genetic differences.

Structural Variations

Researchers also use resequencing data to identify larger alterations known as structural variations, which involve changes of fifty base pairs or more. These include large deletions, inversions, or translocations, where segments of the DNA have been rearranged or lost. A particularly important type is the Copy Number Variation (CNV), which occurs when a segment of the genome is present in a different number of copies than the typical two. CNVs can involve thousands of base pairs and may affect multiple genes, leading to changes in gene dosage and function.

The computational analysis compares the coverage and alignment patterns of the reads against the reference to distinguish these variants. For example, a region with a reduced number of mapped reads might indicate a deletion or CNV loss. These identified variations are cataloged and compared across individuals or populations, helping researchers correlate specific genetic markers with biological traits or disease susceptibility.

Impact on Research and Medicine

The ability to efficiently catalog genetic differences through resequencing has profound implications, particularly in personalized medicine and population genetics. In medicine, this technique moves away from a one-size-fits-all approach by revealing an individual’s unique genetic predispositions. By identifying specific genetic variants associated with disease, doctors can better estimate a patient’s risk and implement tailored prevention strategies.

Resequencing is also fundamental to pharmacogenomics, the study of how an individual’s genetic makeup affects their response to drugs. Variations in genes that code for drug-metabolizing enzymes can alter how quickly a medication is processed. This informs physicians on the appropriate drug choice and dosage for maximum efficacy and minimal side effects, translating genetic data into safer and more effective therapeutic interventions.

Beyond individual health, the technology drives large-scale population and evolutionary studies by tracking genetic changes across groups. By resequencing the genomes of many individuals, scientists can measure genetic diversity, trace migration patterns, and identify genes that have evolved under selective pressure. This provides a deep understanding of the evolutionary history of a species and the genetic basis of local adaptation.