What Is NGS Sequencing and How Does It Work?

Next-generation sequencing (NGS) is a technology that reads millions of DNA fragments simultaneously, making it possible to sequence an entire human genome in about a day for under $1,000. For comparison, the first human genome took over a decade and cost roughly $3 billion using older methods. NGS has transformed genetics research, cancer diagnosis, and rare disease detection by making large-scale DNA analysis fast and affordable.

How NGS Differs From Traditional Sequencing

Before NGS, the standard approach was Sanger sequencing, developed in the late 1970s. Sanger sequencing reads one DNA fragment at a time using a chain-termination method and gel electrophoresis. It’s highly accurate for small targets, but scaling it up to cover a full genome requires enormous time and expense.

NGS solved this scalability problem through a concept called massively parallel sequencing. Instead of reading fragments one by one, NGS anchors millions of DNA molecules to solid surfaces or tiny beads and sequences them all at once. This parallel approach is what makes the speed and cost differences so dramatic. Where Sanger sequencing might take days to cover a small region, NGS can process an entire genome’s worth of data in a single run.

The NGS Workflow, Step by Step

Every NGS experiment follows a general sequence, whether you’re sequencing a tumor sample or screening for inherited conditions.

Library preparation comes first. The DNA sample is broken into short fragments, and small adapter sequences are attached to each end. These adapters let the fragments bind to the sequencing surface and serve as starting points for the reading process. This step is critical because the quality of the library directly affects the quality of the results.

Sequencing and imaging is the core step. The prepared fragments are loaded onto a flow cell or chip, where they’re copied and read using fluorescent signals or electrical detection, depending on the platform. Each time a base (A, T, C, or G) is added to a growing strand, the instrument records it. This happens across millions of fragments at once, generating enormous volumes of raw data.

Data analysis is where the raw signals become meaningful information. The instrument first converts its readings into text files containing short DNA sequences called “reads,” along with quality scores for each base. These reads are then aligned to a reference genome, essentially placing each short fragment back in its correct position along the full genome, like assembling millions of puzzle pieces. After alignment, specialized software scans for differences between the sample and the reference, identifying mutations, insertions, or deletions. A final filtering step separates meaningful genetic variants from background noise and sequencing errors.

Short-Read vs. Long-Read Platforms

Not all NGS platforms work the same way. The two major categories are short-read and long-read sequencing, and each has trade-offs.

Short-read platforms, like those from Illumina, produce reads typically 150 to 300 bases long. They’re extremely accurate, cost-effective, and well suited for most clinical and research applications. The downside is that short reads can struggle with repetitive regions of the genome or complex structural changes, because the fragments are too small to span those areas.

Long-read platforms, from companies like PacBio and Oxford Nanopore, generate reads thousands to tens of thousands of bases long. These are better at resolving structural variants and repetitive sequences. Studies comparing the two approaches have found that long-read sequencing produces better assembly quality overall, with longer continuous stretches of assembled sequence. However, long-read sequencing is more expensive and requires deeper sequencing to recover the same number of reconstructed genomes that short reads can capture at lower cost. Some labs use a hybrid approach, combining both read types, which tends to produce the longest assemblies and the best overall mapping accuracy.

Sequencing Depth and Why It Matters

One concept you’ll encounter frequently with NGS is “depth of coverage,” which refers to how many times, on average, each position in the genome is read. If a region has 100x coverage, that means each base in that region was independently sequenced about 100 times.

Higher depth means higher confidence. When you’re looking for a mutation that’s present in most cells, like an inherited genetic variant, 30x to 50x coverage is often sufficient. But in cancer diagnostics, tumors are genetically diverse, and some mutations may exist in only a small fraction of cells. Detecting a mutation present in just 3% of the DNA in a sample requires dramatically higher coverage. Research in cancer diagnostics recommends a minimum depth of 1,650x, along with at least 30 independent reads showing the mutation, to reliably detect variants at that 3% level with 99.9% confidence. For certain genes, labs may push coverage to 2,000x or higher.

Cancer Diagnosis and Treatment Selection

One of the most impactful clinical uses of NGS is in oncology. Tumors accumulate genetic mutations as they grow, and identifying those mutations can guide treatment decisions. NGS panels can scan dozens to hundreds of cancer-related genes in a single test, revealing which mutations are driving a particular tumor and whether targeted therapies exist for them.

NGS also enables liquid biopsies, where tumor DNA circulating in the bloodstream is sequenced from a simple blood draw rather than a tissue biopsy. This approach can detect and measure the frequency of mutations in genes like TP53, a key tumor suppressor, without an invasive procedure. In thoracic oncology, plasma-based NGS results typically come back within about 9 days from the blood draw, with 95% of cases resulting within two weeks. Only about 18% of cases return within a week, so patients and clinicians should expect a turnaround time closer to 10 to 14 days.

Diagnosing Rare Genetic Diseases

For patients with suspected genetic disorders, NGS has become a frontline diagnostic tool. Two main approaches are used: whole-exome sequencing, which reads only the protein-coding portions of the genome (about 1-2% of total DNA, but where most known disease-causing mutations occur), and whole-genome sequencing, which reads everything.

In clinical settings, exome sequencing leads to a definitive diagnosis in 30% to 50% of rare genetic diseases, depending on the condition and how carefully patients are selected for testing. That may sound modest, but for families who have spent years cycling through inconclusive tests, a clear answer in one out of three cases is significant. The diagnostic rate appears to plateau between 35% and 50% regardless of whether exome or genome sequencing is used, suggesting that the remaining undiagnosed cases involve variants we don’t yet know how to interpret.

Whole-genome sequencing does offer advantages in specific situations. One study of 103 patients with suspected genetic disorders found that 18 diagnoses would not have been possible with exome sequencing alone, because the causative variants were located outside protein-coding regions or involved large structural changes in the DNA that exome sequencing misses.

What the Data Actually Looks Like

If you’re working with NGS data or trying to understand a report, it helps to know the file formats involved. Raw sequencing output starts as FASTQ files, which are plain-text files containing each DNA read along with a quality score for every base. These are often considered the “raw” data of any NGS experiment. For paired-end sequencing, where both ends of each fragment are read, two FASTQ files are generated per sample.

After reads are aligned to a reference genome, the output is stored in SAM (Sequence Alignment Map) or its compressed binary version, BAM. Before variants can be identified, these alignment files go through preprocessing: duplicate reads created during library preparation are flagged, and systematic errors in quality scores are corrected. The final variant calls are stored in VCF (Variant Call Format) files, which list every position where the sample differs from the reference genome, along with supporting evidence for each call.

For clinical reports, patients and physicians typically see a curated summary rather than raw files. But for researchers and bioinformaticians, this FASTQ-to-BAM-to-VCF pipeline is the backbone of every NGS analysis.