How to Read and Interpret Electropherograms

An electropherogram is a computer-generated graphical representation of data resulting from the separation of molecules, typically DNA or RNA fragments, using electrophoresis. This chart translates complex biochemical information into a readable format, allowing scientists to visualize the results of molecular separation. The primary purpose of an electropherogram is to provide a detailed, quantitative record of separated molecules, which is fundamental for applications like DNA sequencing and genetic profiling. It transforms the physical separation into a series of colored peaks, enabling interpretation of a sample’s genetic makeup.

Decoding the Peaks and Colors

Interpreting the electropherogram begins with understanding its two axes. The horizontal X-axis represents the migration time, which is directly proportional to the size of the DNA fragments; smaller fragments appear earlier on the left, and larger fragments appear later on the right. The vertical Y-axis measures the signal intensity of the detected molecules, expressed in Relative Fluorescent Units (RFU). The height of a peak on the Y-axis measures the amount of fluorescent signal emitted, corresponding to the quantity of that specific DNA fragment present.

The graph’s distinctive features are the sharp, symmetrical peaks rising from the baseline, each representing a separated molecule. The color of each peak is significant, corresponding to a specific fluorescent dye chemically attached to the DNA fragments. In DNA sequencing, four different colors are used, each assigned to one of the four nucleotide bases: Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). For example, a common scheme assigns green to A, blue to C, red to T, and black (or yellow) to G.

A flat line at the bottom of the graph, the baseline, represents the zero point of fluorescence intensity. This baseline exhibits minor fluctuations, known as background noise. To distinguish meaningful data from this noise, a minimum threshold value, often expressed in RFUs, is established; any peak falling below this threshold is disregarded. The quality of the electropherogram is assessed by examining the sharpness of the peaks and the flatness of the baseline, with sharp, well-separated peaks and low noise indicating a high-quality result.

The Process of Data Generation

The electropherogram is the final output of a high-resolution separation technique known as Capillary Electrophoresis (CE). This process begins by preparing the sample molecules, such as DNA fragments, with fluorescent tags. The prepared sample is then introduced into a capillary, which is filled with a polymer solution that acts as a sieving matrix.

A high-voltage electric field is applied, causing the negatively charged DNA fragments to migrate toward the positive electrode. The sieving matrix within the capillary separates the DNA fragments based on their size. Smaller fragments move through the polymer mesh more quickly than larger fragments.

As the separated, fluorescently labeled fragments exit the capillary, they pass through a detection window where a laser beam excites the attached dyes. The excited dyes emit light at specific wavelengths, and a detector records these emitted signals, distinguishing between the different colors of fluorescence. The intensity and color of the light are converted into digital data points by the instrument’s software. Finally, the software plots this intensity data against the migration time.

Electropherograms in DNA Sequencing

In DNA sequencing, the electropherogram determines the exact order of nucleotide bases in a DNA strand. DNA fragments, differing in length by a single base, are marked with one of four fluorescent dyes at their terminal base. When these fragments are separated by Capillary Electrophoresis, the resulting electropherogram displays a continuous sequence of colored peaks.

Interpreting the sequence involves reading the colors of the peaks from left to right, which corresponds to reading the sequence from the shortest fragment to the longest. Since each color represents a single base—A, T, C, or G—the order of the colors directly translates into the DNA sequence. For example, a sequence of blue, green, red, and black peaks would be interpreted as C-A-T-G.

The quality of the sequencing data is reflected in the clarity of the peaks; an ideal read shows a single, well-defined peak at each position, indicating that the software confidently identified the base. The presence of two distinct peaks of different colors directly on top of each other at the same position suggests a heterozygosity, meaning the individual has two different bases at that specific location in their paired chromosomes. Towards the end of the trace, peaks typically become shorter and less defined, which indicates the limit of the reliable sequence data due to the challenges of separating very long DNA fragments.

Electropherograms in Forensic Fingerprinting

In forensic analysis and paternity testing, electropherograms are used for Short Tandem Repeat (STR) analysis, which creates a unique genetic profile, often called a DNA fingerprint. Unlike sequencing, the peaks in an STR electropherogram do not represent individual bases but instead show the length of specific DNA fragments, known as alleles, at multiple genetic locations. These fragments are created by amplifying regions of DNA that contain short, repetitive sequences of two to six base pairs.

The position of a peak corresponds to the precise length of an STR allele, which is converted into a numerical value representing the number of short repeats. The color of the peak distinguishes between the different STR regions, or loci, analyzed simultaneously. An individual typically shows one or two peaks for each STR locus, depending on whether they are homozygous (one peak) or heterozygous (two peaks) for that allele length.

The height of these peaks indicates the quantity of DNA present for that specific allele. This pattern of colored peaks and size values creates the genetic profile, which is used for comparison, such as matching an unknown sample to a suspect’s profile. Interpretation also involves identifying artifacts like “stutter” peaks—small peaks that appear immediately before a main peak and are a normal by-product of the amplification process that must be disregarded.