What Is RNA Sequencing and How Does It Work?

RNA sequencing, often called RNA-seq, is a technology that reads the complete set of RNA molecules in a sample of cells. Because RNA is the intermediary step between your DNA (the instructions) and proteins (the workers), measuring which RNA molecules are present reveals which genes are actually active at a given moment. This snapshot of gene activity, called the transcriptome, tells researchers far more than a static DNA sequence can. It captures what a cell is doing right now, not just what it’s capable of doing.

Why RNA Matters More Than DNA Alone

Every cell in your body carries essentially the same DNA. Yet a skin cell behaves nothing like a brain cell. The difference comes down to which genes each cell turns on or off, and by how much. RNA molecules are the direct readout of that activity. They’re produced when a gene is “transcribed,” and they carry the instructions that cells use to build proteins. By sequencing all the RNA in a sample, scientists can quantify the expression level of every active gene simultaneously, giving them a detailed picture of cellular behavior, identity, and health.

This is particularly powerful for understanding disease. Cancer cells, for instance, have dramatically altered gene expression compared to healthy tissue. Infections trigger specific immune genes. Developmental disorders may stem from genes that are expressed at the wrong time or in the wrong amount. RNA-seq captures all of this in a single experiment.

How the Process Works

The workflow from tissue sample to usable data involves several stages, each building on the last.

First, RNA is extracted from cells. This is a delicate step because RNA degrades quickly. Researchers use chemical kits to isolate it while removing DNA and other contaminants. Once purified, the RNA is prepared for sequencing through a process called library preparation. This involves either selecting messenger RNA (the type that codes for proteins) using a molecular “hook” that grabs it by its tail, or depleting ribosomal RNA, which makes up the vast majority of RNA in a cell but isn’t usually the focus of study.

Next, the RNA is converted into complementary DNA (cDNA), because most sequencing machines read DNA rather than RNA. The cDNA is broken into short fragments, and small molecular tags called adapters are attached to each end. These adapters let the sequencing machine grab onto each fragment and identify which sample it came from. The fragments are then amplified to create millions of copies, which are fed into the sequencer.

The sequencer reads the fragments one base at a time, producing millions of short “reads” that represent the RNA molecules originally present in the sample. A standard experiment typically costs between $250 and $500 per sample when including both library preparation and sequencing, though prices vary by institution and method.

Turning Raw Data Into Meaning

The sequencing machine produces enormous text files containing millions of short sequences. Making sense of them requires a computational pipeline with several steps. First, software checks the quality of each read, flagging any that are too short or contain too many errors. High-quality reads are then aligned to a reference genome, essentially mapping each fragment back to the gene it came from. After alignment, the software counts how many reads mapped to each gene. More reads means higher expression.

These raw counts are then normalized using statistical methods that account for differences in sequencing depth between samples, so researchers can fairly compare gene expression across conditions. The final output is a table showing, for every gene in the genome, how active it was in each sample. From there, researchers use statistical tests to identify genes that are significantly more or less active between groups, such as tumor versus healthy tissue, or treated versus untreated cells.

Why It Replaced Microarrays

Before RNA-seq, the standard tool for measuring gene expression was the microarray, a glass chip studded with thousands of DNA probes that each bind to a known gene. Microarrays worked, but they had significant blind spots. They could only detect genes that probes had been designed for, so novel or unexpected transcripts went unnoticed. They also struggled at the extremes: genes expressed at very low or very high levels were hard to measure accurately because of background noise and probe saturation.

RNA-seq solved both problems. It has a much higher dynamic range, meaning it can reliably detect genes expressed at very low levels alongside highly active ones. It doesn’t require a reference genome or pre-designed probes, so it can discover entirely new transcripts, including previously unknown RNA variants produced by alternative splicing (where a single gene produces multiple versions of its RNA by mixing and matching segments). These advantages made RNA-seq the dominant technology within a few years of its introduction.

Bulk Versus Single-Cell RNA-Seq

Standard RNA-seq, often called bulk RNA-seq, processes RNA extracted from a tissue sample containing thousands or millions of cells. The result is an average of gene expression across all those cells. This is useful for many purposes, but it masks important differences. A tumor biopsy, for example, contains cancer cells, immune cells, blood vessel cells, and connective tissue cells, all blended into one average signal.

Single-cell RNA sequencing (scRNA-seq) solves this by isolating individual cells before sequencing each one separately. The sequencing library for each cell represents that cell alone, so researchers can see which genes are active in each cell type within the same tissue. With modern high-throughput cell separation technologies, thousands of cells per tumor can be profiled in parallel, capturing the diversity within a tissue at a resolution that bulk methods cannot achieve. This has been transformative for understanding how different cell populations interact in complex tissues, particularly in cancer and immunology research.

Short-Read and Long-Read Platforms

Most RNA-seq today uses short-read technology, which produces fragments typically 100 to 300 base pairs long. These reads are highly accurate, with an average per-base accuracy of about 99.7%. The tradeoff is that short reads can struggle to resolve complex regions of the genome, particularly when a gene produces multiple RNA variants that differ only in which segments are included.

Long-read sequencing platforms generate reads spanning several thousand base pairs, sometimes capturing an entire RNA molecule in a single read. This makes it far easier to identify which version of a gene’s RNA a cell is producing. The downside is a higher error rate, with per-base accuracy around 96.8%, roughly a tenfold increase in errors compared to short reads. Many researchers now combine both approaches: long reads to map the full structure of RNA variants, and short reads to accurately quantify how much of each variant is present.

Applications in Cancer and Disease Research

RNA-seq has become a central tool in cancer research. By comparing the transcriptomes of tumor and healthy tissue, researchers identify genes that are abnormally active or silenced in cancer. These become candidates for diagnostic biomarkers or drug targets. In colorectal cancer, for instance, RNA-seq studies have identified specific circular RNA molecules that serve as biomarkers for liver metastasis. In pancreatic cancer, sequencing revealed changes in repetitive DNA sequences that may serve as tumor markers. Breast cancer researchers have used RNA-seq data mining to identify protein variants that predict overall survival and relapse risk.

Beyond biomarker discovery, RNA-seq helps characterize how tumors evolve, how they develop drug resistance, and how the immune system interacts with cancer cells. This last application has been especially important for immunotherapy, where understanding the immune microenvironment of a tumor helps predict whether a patient will respond to treatment.

Outside of cancer, RNA-seq is used to study rare genetic diseases, infectious disease responses, neurological conditions, and drug mechanisms. Any question that hinges on which genes are active, and by how much, is a question RNA-seq can help answer.

Spatial Transcriptomics: Adding Location

Both bulk and single-cell RNA-seq share a limitation: they require breaking apart tissue to extract RNA, which destroys information about where each cell was physically located. A newer set of methods called spatial transcriptomics preserves that location. These techniques quantify gene expression within intact tissue sections, so researchers can see not just which genes are active in which cell types, but exactly where those cells sit relative to each other.

This matters enormously in tissues with complex architecture. In a tumor, for example, immune cells at the edge may behave very differently from those deep inside. In the brain, neighboring regions have distinct functions driven by different gene expression patterns. Spatial transcriptomics maps these patterns directly onto tissue images, combining the molecular detail of RNA-seq with the anatomical context of traditional microscopy. It’s one of the fastest-growing areas in genomics, and it addresses a gap that researchers have been working around for over a decade.