How to Read and Understand Gene Mutation Nomenclature

Gene mutation nomenclature is a standardized language used by scientists and clinicians worldwide to precisely report changes in DNA, RNA, and protein sequences. This system ensures that a specific genetic alteration is described unambiguously, preventing misinterpretation across different laboratories and publications. The Human Genome Variation Society (HGVS) developed and maintains the guidelines that form the basis for this standardized communication, providing a concise code that informs the reader exactly where and what the resulting alteration is.

Understanding the Reference Sequence Prefixes

The first step in decoding a mutation description is identifying the reference sequence, which is indicated by a single-letter prefix followed by a period. This prefix defines the specific molecule and numbering system used to locate the change.

The most common prefix is `c.`, which stands for a coding DNA reference sequence, meaning the numbering is relative to the start of the gene’s protein-coding region. The `g.` prefix refers to a genomic reference sequence, using continuous numbering along an entire chromosome or large genomic region.

For changes found within the circular DNA of the cell’s energy-producing organelles, the `m.` prefix denotes a mitochondrial DNA reference sequence. The `p.` prefix indicates a protein reference sequence, signaling that the description focuses on the amino acid sequence rather than the underlying nucleotides.

Interpreting Nucleotide Alterations at the DNA Level

The core of the nomenclature involves detailing the specific type of nucleotide change that occurred, represented by a precise combination of numbers, letters, and symbols following the reference prefix.

The simplest change is a substitution, where one nucleotide is replaced by another, shown using the greater-than sign (`>`). For example, `c.123A>G` indicates that at position 123 of the coding DNA sequence, Adenine (A) was replaced by Guanine (G).

A deletion is noted with `del` and specifies which nucleotide or sequence was removed. A description like `c.76del` means the nucleotide at position 76 has been deleted, while `c.76_78del` specifies the deletion of a three-nucleotide segment.

An insertion uses `ins` to show new material added between two existing points. For instance, `c.76_77insT` indicates that a Thymine (T) nucleotide was inserted between positions 76 and 77.

A more complex event is a deletion-insertion, abbreviated as `delins`. This notation is used when a sequence is deleted and a different sequence replaces it. The code `c.76_78delinsG` means the sequence from positions 76 to 78 was deleted and replaced by a single Guanine (G) nucleotide.

Decoding Amino Acid Changes at the Protein Level

When a mutation is described using the `p.` prefix, the focus shifts to the resulting change in the protein’s amino acid sequence. The HGVS system uses the three-letter abbreviation for amino acids, listing the original amino acid and its position before the new amino acid.

A common change is a missense mutation, which results in the substitution of one amino acid for another, such as `p.Val123Leu`.

A nonsense mutation converts an amino acid codon into a signal that prematurely stops protein production. This is represented by an asterisk (“) or `Ter` for the resulting termination codon, as seen in the code `p.Trp41`.

The most disruptive type of change is often a frameshift mutation, denoted by `fs`. This occurs when an insertion or deletion is not a multiple of three nucleotides, altering the reading frame for all subsequent codons. A frameshift code like `p.Arg123Profs5` indicates the original Arginine (Arg) at position 123 was replaced by Proline (Pro), leading to a stop codon after 5 more altered amino acids.

It is also possible for a DNA change to have no effect on the final protein sequence, known as a silent or synonymous change. In this scenario, the code uses an equal sign inside parentheses, such as `p.(=)`.

The parentheses around a description, as in `p.(Arg123Profs5)`, indicate that the protein-level consequence was theoretically deduced from the DNA sequence change, not directly measured.

Reading Complex Structural Variations and Unknowns

Beyond simple substitutions and deletions, the nomenclature provides terms for larger or more complex rearrangements. A duplication, where a segment of DNA is copied and inserted adjacent to the original sequence, is denoted by `dup`, such as `c.20_23dup`. Similarly, an inversion—where a segment of DNA is flipped 180 degrees—is indicated by `inv`, as in `c.76_83inv`.

Changes that affect gene splicing, which occurs outside the immediate coding region, are described relative to the coding sequence using plus (`+`) and minus (`-`) signs. For a variant in the intron immediately following the coding sequence, a code like `c.123+1G>A` specifies the first nucleotide after position 123. Conversely, a change just before the coding sequence begins is shown with a minus sign, such as `c.124-1G>A`.

In some cases, the exact consequence of a DNA change on the protein is unknown or cannot be determined with certainty. The nomenclature accounts for this lack of clarity by using a question mark in parentheses, for example, `p.(?)`. This symbol signifies that while a change in the DNA was detected, the resulting effect on the protein sequence is currently undetermined.