How Does Whole Exome Sequencing Work, Step by Step?

Whole exome sequencing (WES) reads the DNA of every protein-coding gene in your genome, roughly 20,000 genes, while skipping the vast majority of DNA that doesn’t code for proteins. The exome makes up only about 1.5% of your total genome, yet it contains the majority of known disease-causing mutations. By focusing on this small but information-rich slice, WES delivers clinically useful genetic data faster and more affordably than sequencing an entire genome.

What the Exome Actually Is

Your genome contains about 3 billion nucleotides, the individual chemical “letters” of DNA. Most of that sequence sits in regions called introns, which don’t directly encode proteins. Scattered between the introns are shorter segments called exons, which contain the instructions your cells use to build proteins. String all those exons together and you get the exome: roughly 45 million nucleotides, or 1.5% of the whole genome.

That 1.5% punches well above its weight. Most genetic variants known to cause inherited diseases fall within exons, which is why sequencing just the exome can be a highly efficient way to find the molecular cause of a condition without the expense and complexity of reading every base pair.

Step 1: Preparing the DNA Library

The process starts with a blood or saliva sample. DNA is extracted and then broken into small fragments, typically around 350 base pairs long. This fragmentation can be done mechanically (using high-frequency sound waves called sonication) or enzymatically (using proteins that cut DNA at controlled intervals). The goal is to create millions of uniformly sized pieces that sequencing machines can handle.

Once fragmented, short synthetic sequences called adapters are attached to both ends of each DNA piece. These adapters serve two purposes: they let the fragments bind to the sequencing platform, and they act as molecular barcodes so that samples from different patients can be pooled together and sorted out later. After adapter attachment, a cleanup step removes fragments that are too small or too large, leaving a uniform “library” of DNA ready for the next stage.

Step 2: Capturing Only the Exome

This is the step that makes exome sequencing different from whole genome sequencing. The fragmented DNA library contains pieces from everywhere in the genome, coding and non-coding alike. To isolate just the exons, the library is mixed with specially designed probes, short synthetic RNA or DNA sequences that are complementary to known exon sequences. These probes are tagged with biotin, a small molecule that binds strongly to magnetic beads.

When the probes are added to the DNA library in solution, they latch onto fragments that match exon sequences through a process called hybridization. Magnetic beads are then used to physically pull out those probe-bound fragments, while everything else is washed away. What remains is a highly enriched collection of DNA fragments representing the protein-coding regions of the genome. Different commercial platforms use slightly different probe designs (some use RNA probes, others DNA), but the underlying capture principle is the same.

Step 3: Sequencing the Captured DNA

The enriched exome fragments are loaded onto a glass slide, called a flow cell, that’s coated with short DNA sequences matching the adapters. Each fragment binds to the surface and is copied repeatedly through a process called bridge amplification, creating tight clusters of about 1,000 identical copies. These clusters are necessary because reading a single molecule of DNA doesn’t produce a strong enough signal. Millions of clusters form across the flow cell, each representing a different fragment of the exome.

The sequencer then reads each cluster one base at a time. During each cycle, fluorescently labeled nucleotides (the four chemical letters of DNA: A, T, C, G) are washed over the flow cell. Each cluster incorporates one nucleotide, which also temporarily halts the reaction. A camera captures which color of fluorescence appears at each cluster position, identifying the base. The blocking group is then removed, and the next cycle begins. This process repeats for 300 or more cycles, generating reads that are typically 150 base pairs long from each end of a fragment. A single run produces millions of these short reads covering the exome many times over.

Step 4: Turning Raw Data Into Results

Sequencing generates enormous files of short reads, essentially millions of puzzle pieces that need to be assembled into a coherent picture. The bioinformatics pipeline that does this follows a standard series of steps.

First, the raw reads are aligned to a human reference genome, a well-characterized template of what “typical” human DNA looks like. Software maps each short read to its correct location, much like placing puzzle pieces on a guide image. Next, duplicate reads are flagged. These are copies that arose during the library preparation process rather than from genuinely different DNA molecules, and including them would skew the results.

With a clean, aligned dataset in hand, variant calling software scans every position in the exome and compares the patient’s sequence to the reference. Any position where the patient’s DNA differs from the reference is flagged as a variant. A typical exome contains around 20,000 to 30,000 variants, the vast majority of which are common and harmless. The real work lies in filtering those thousands of variants down to the one or two that might explain a patient’s condition.

Annotation tools then cross-reference each variant against databases of known disease-causing mutations, population frequency data, and predictions about how a given change might affect protein function. Variants that are common in the general population are deprioritized, while rare variants in genes associated with the patient’s symptoms rise to the top for expert review by a clinical geneticist.

How Long It Takes and What It Costs

Standard clinical turnaround times for WES range from about 11 to 21 weeks, with many labs averaging around 15 to 18 weeks from the time a sample arrives to the delivery of a report. That timeline includes not just the sequencing itself (which takes only a few days) but also library preparation, data analysis, and the careful manual review needed to interpret variants. Some labs have compressed the process to roughly 40 days, and in urgent cases, particularly for critically ill newborns, rapid protocols can deliver whole genome results in as little as 50 hours, though that speed isn’t yet routine for standard exome testing.

Cost estimates for a single WES test range from roughly $555 to $5,169 depending on whether you’re looking at direct laboratory costs or commercial pricing. Most clinical labs fall somewhere in the $1,000 to $2,600 range for the actual cost of performing the test, excluding interpretation fees and insurance adjustments. That’s considerably less than whole genome sequencing, which can run from $1,906 to over $24,000.

What WES Can and Cannot Find

For rare inherited diseases, WES identifies a definitive molecular diagnosis in roughly 37% to 51% of cases, depending on the patient population and how carefully patients are selected for testing. A large meta-analysis of pediatric patients with suspected genetic disorders found a diagnostic rate of about 38% for WES, while smaller, more targeted studies have reported yields above 50%. Those numbers mean WES is one of the most productive single tests in genetics, but it also means it comes back without a clear answer about half the time.

The gaps exist for specific reasons. Because WES only reads protein-coding regions, it misses mutations in the vast stretches of non-coding DNA that regulate when and how genes are turned on. Deep intronic mutations, which sit far from exon boundaries, are invisible to this approach. Structural variations (large deletions, duplications, or rearrangements of DNA segments longer than about 1,000 base pairs) are also generally not detected. If a patient’s condition is caused by any of these types of changes, whole genome sequencing is more likely to find it.

Clinical Uses Beyond Rare Disease

While rare genetic conditions are the most common reason for ordering WES, the technology has become increasingly important in cancer care. Tumors accumulate mutations as they grow, and comparing the exome of a tumor to the exome of a patient’s normal tissue reveals which mutations are unique to the cancer. These somatic mutations can identify driver genes fueling tumor growth, estimate the overall mutational burden (which helps predict whether immunotherapy might be effective), and uncover specific targets for existing drugs.

In lung cancer, for example, exome sequencing of paired primary tumors and brain metastases has identified mutations associated with worse survival outcomes, offering potential biomarkers for prognosis and guiding treatment decisions. The same approach applies across many cancer types, where WES helps match patients to targeted therapies based on the specific genetic profile of their tumor rather than just the organ where the cancer originated.