What Is Genome Sequencing and How Does It Work?

Genome sequencing is the process of reading the complete set of DNA instructions in an organism’s cells. For humans, that means determining the exact order of roughly 3 billion pairs of chemical “letters” distributed across 23 chromosomes. These letters, represented as A, T, G, and C, spell out the instructions your body uses to build proteins, regulate cell behavior, and carry out every biological function that keeps you alive. The technology to read all of those letters has gone from a multi-billion-dollar research project to something that costs a few thousand dollars or less, and it’s reshaping how doctors diagnose diseases, treat cancer, and understand human biology.

How DNA Sequencing Works

At its core, every sequencing method does the same thing: it reads the order of the four chemical bases that make up DNA. The differences between methods come down to how many pieces of DNA they can read at once, how long those pieces are, and how fast results come back.

The original method, developed by Frederick Sanger in the 1970s, reads one fragment of DNA at a time. A copy of the target DNA is built base by base, and each added base carries a fluorescent tag that identifies it. This approach is highly accurate and still used today for confirming specific mutations in a single gene, but it’s far too slow and expensive to read an entire genome on its own.

Modern sequencing uses what’s called next-generation sequencing, or NGS. The core chemistry is similar: DNA is copied base by base, with fluorescent labels identifying each letter as it’s added. The critical difference is scale. Instead of reading one fragment at a time, NGS reads millions of fragments simultaneously in a single run. This massively parallel approach is what makes whole-genome sequencing practical. It can cover hundreds to thousands of genes at once and detect rare genetic variants that older methods would miss, picking up mutations present in as few as 1% of cells in a sample compared to the 15 to 20% threshold of Sanger sequencing.

The Sequencing Process, Step by Step

A typical NGS workflow has four main stages. First, DNA is extracted from a sample, which could be blood, saliva, tumor tissue, or even a single cell. The quality and quantity of that DNA are checked before moving forward.

Next comes library preparation, where the extracted DNA is broken into smaller fragments and tagged with short molecular markers. These markers let the sequencing machine identify and track each fragment. The prepared fragments, collectively called a “library,” are then loaded onto the sequencing instrument.

During sequencing itself, the machine reads each fragment by detecting which base is added at each position. Illumina instruments, the most widely used platform, use a method called sequencing by synthesis, building a complementary copy of each DNA fragment and recording the identity of each base as it’s incorporated. The result is millions of short “reads,” each typically a few hundred bases long.

The final stage is computational. Software aligns all those short reads against a reference copy of the human genome, like assembling a jigsaw puzzle using the picture on the box. Specialized algorithms then scan for places where the patient’s DNA differs from the reference. These differences, called variants, are cross-referenced against databases of known disease-causing mutations to determine which ones are medically meaningful and which are harmless natural variation.

Long-Read Sequencing

A newer approach called nanopore sequencing works on a fundamentally different principle. Instead of copying DNA and reading fluorescent signals, it threads a single strand of DNA through a tiny protein pore embedded in a membrane. As each base passes through, it changes the electrical current flowing through the pore in a characteristic way. A sensor records these current changes and an algorithm translates them into a sequence of bases.

The key advantage is read length. Standard NGS produces short reads of a few hundred bases that must be computationally stitched together. Nanopore sequencing can read extremely long, continuous stretches of DNA, and it can also read RNA directly without first converting it to DNA. This makes it better at detecting large structural rearrangements in chromosomes, repetitive regions, and certain chemical modifications to DNA that short-read methods can’t easily see. Library preparation is also simpler and faster, skipping the amplification steps that can introduce errors.

Whole Genome vs. Whole Exome Sequencing

Not every clinical situation requires reading all 3 billion base pairs. Whole exome sequencing focuses only on exons, the portions of DNA that code for proteins. Exons make up roughly 1% of the genome but contain the majority of known disease-causing mutations, making exome sequencing a faster, cheaper option for many diagnostic situations.

Whole genome sequencing reads everything, including the vast stretches of non-coding DNA between genes. Researchers have increasingly found that variations outside of exons can affect how genes are turned on and off, influencing protein production and contributing to genetic disorders. These variants would be completely invisible to exome sequencing. Whole genome sequencing also detects structural variants, copy-number changes, and repeat expansions that exome sequencing often misses.

Diagnosing Rare Diseases

One of the most impactful clinical uses of genome sequencing is solving diagnostic mysteries. Many people with rare genetic conditions go years without a diagnosis, cycling through specialist after specialist. A study published in the New England Journal of Medicine sequenced the genomes of 822 families and made a molecular diagnosis in about 29% of them. Among those diagnosed, roughly 8% had variants that could only have been found through whole genome sequencing, not through the exome sequencing or targeted genetic tests they had already undergone. These included mutations hidden deep within non-coding regions, structural rearrangements, and repeat expansions.

In urgent clinical settings, such as critically ill newborns suspected of having a genetic condition, rapid genome sequencing can deliver results in as little as 3 to 5 days, though the average turnaround in hospital settings is closer to 18 days depending on the case and the lab.

Genome Sequencing in Cancer Treatment

Cancer is fundamentally a disease of the genome. Tumors grow because of mutations that cause cells to divide uncontrollably. Sequencing a tumor’s DNA reveals which specific mutations are driving its growth, and that information can determine which treatments are most likely to work.

In lung cancer, for example, sequencing can identify mutations in genes that control cell growth signals. Patients whose tumors carry those specific mutations respond well to drugs designed to block that exact signaling pathway, while those same drugs would be ineffective in patients whose tumors are driven by different mutations. The same principle applies across cancer types. In melanoma, identifying a particular mutation in a gene called BRAF leads to a specific combination therapy. In other cancers, sequencing reveals rearrangements in genes that can be targeted with their own class of drugs.

This approach, often called precision oncology, means treatment is matched to the individual biology of each patient’s cancer rather than prescribed based solely on where in the body the tumor appeared. Sequencing also helps oncologists understand why a tumor has stopped responding to treatment, since tumors can acquire new mutations that make them resistant to an initially effective drug.

What Sequencing Costs Today

The cost trajectory of genome sequencing is one of the steepest price drops in the history of technology. In 2003, generating a single human genome sequence cost an estimated $50 million. By mid-2015, that price had fallen to just above $4,000, and by late 2015 it was below $1,500. Commercial prices have generally tracked slightly below these research benchmarks. Today, consumer whole-genome sequencing services are available for a few hundred dollars, though clinical-grade sequencing with full medical interpretation costs more.

Exome sequencing is typically cheaper than whole-genome sequencing, which is one reason many clinical labs still use it as a first-line test. The decision between the two often depends on what condition is suspected and what previous testing has already been done.

Privacy and Legal Protections

Your genome is the most personal data that exists about you. It reveals information not only about your own health risks but about your biological relatives, and unlike a password, you can’t change it if it’s compromised. These realities make privacy a central concern.

In the United States, the Genetic Information Nondiscrimination Act (GINA), passed in 2008, provides two core protections. Health insurers cannot use your genetic information to deny coverage, set premiums, or make underwriting decisions. Employers with 15 or more employees cannot use genetic information in hiring, firing, promotions, or job assignments, and they cannot require genetic testing as a condition of employment. Under GINA, genetic information includes not just your own test results but also your family medical history.

GINA has significant gaps, though. It does not cover life insurance, long-term care insurance, or disability insurance. Some states have passed their own laws to fill these holes, but protections vary widely. The law also doesn’t apply to employers with fewer than 15 employees. If you’re considering genome sequencing, it’s worth understanding what protections exist in your state beyond the federal baseline.