How to Name Amino Acids: IUPAC, Codes & Conventions

Amino acids carry up to four different types of names: a common (trivial) name like “glycine,” a systematic chemical name like “aminoethanoic acid,” a three-letter abbreviation like Gly, and a one-letter code like G. Which naming system you use depends on the context. Organic chemistry courses focus on systematic names and stereochemistry prefixes. Biochemistry and molecular biology rely on abbreviations. Here’s how each system works and how to apply it.

Trivial Names and Where They Come From

The 20 standard amino acids each have a common name that stuck from the time of their discovery. These names have no consistent logic, which is why they need to be memorized rather than derived. Glycine takes its name from the Greek word for “sweet.” Asparagine was first isolated from asparagus. Tyrosine comes from the Greek word for cheese, because it was found in casein. Cysteine references the bladder (Greek “kystis”) because it was isolated from kidney stones. Some names reference their chemical relatives: glutamic acid was extracted from wheat gluten, and serine from silk protein (Latin “sericum”).

These trivial names are the ones you’ll encounter most often in textbooks and research papers. They’re the foundation for the abbreviation systems described below.

Systematic (IUPAC) Names

The International Union of Pure and Applied Chemistry (IUPAC) provides rules for naming any amino acid based on its chemical structure. The system treats each amino acid as a substituted carboxylic acid. You identify the longest carbon chain containing the carboxyl group, name it using standard organic acid naming, then indicate where the amino group is attached with a number.

The carbon of the carboxyl group is numbered 1. The next carbon (the alpha carbon) is numbered 2. So the simplest amino acid, glycine, becomes “aminoethanoic acid”: a two-carbon acid (ethanoic acid) with an amino group. Alanine, which has one extra carbon in its side chain, becomes “2-aminopropanoic acid.” Aspartic acid, which has a second carboxyl group, becomes “2-aminobutanedioic acid” (the “dioic” suffix indicating two acid groups on a four-carbon chain). Glutamic acid extends this to a five-carbon chain: “2-aminopentanedioic acid.”

IUPAC also allows older acid names as alternatives. So “ethanoic” can be called “acetic,” “propanoic” can be called “propionic,” “butanedioic” can be called “succinic,” and “pentanedioic” can be called “glutaric.” You may see these semi-systematic names in older literature.

To build a systematic name yourself, follow these steps:

Count the longest carbon chain that includes the carboxyl carbon.
Name the parent acid using standard IUPAC rules (propanoic for 3 carbons, butanoic for 4, etc.). If there are two carboxyl groups, use the “dioic” suffix.
Number from the carboxyl carbon as position 1.
Add “amino” as a prefix with the position number (almost always 2 for the standard amino acids, since the amino group sits on the alpha carbon).
Add any other substituents (hydroxyl, thiol, additional amino groups) with their position numbers.

Three-Letter and One-Letter Codes

In biochemistry and molecular biology, writing out full names is impractical, especially when describing protein sequences that can run hundreds of residues long. Two shorthand systems solve this. The three-letter code uses the first three letters of the common name (with some adjustments), and the one-letter code assigns a single capital letter to each amino acid.

The 20 standard amino acids and their codes are:

Ala (A) alanine
Arg (R) arginine
Asn (N) asparagine
Asp (D) aspartic acid
Cys (C) cysteine
Gln (Q) glutamine
Glu (E) glutamic acid
Gly (G) glycine
His (H) histidine
Ile (I) isoleucine
Leu (L) leucine
Lys (K) lysine
Met (M) methionine
Phe (F) phenylalanine
Pro (P) proline
Ser (S) serine
Thr (T) threonine
Trp (W) tryptophan
Tyr (Y) tyrosine
Val (V) valine

Some one-letter codes are intuitive (A for alanine, G for glycine). Others seem arbitrary because the obvious letter was already taken. Phenylalanine gets F (from “Fenylalanine” in some languages), tryptophan gets W (a double-ring, or “double-u” mnemonic some students use), and lysine gets K because L was already assigned to leucine. Three-letter codes are used in structural biology and shorter sequences. One-letter codes dominate in genomics and sequence databases where compact notation matters.

Stereochemistry: D/L and R/S Prefixes

Every amino acid except glycine has a chiral alpha carbon, meaning it exists in two mirror-image forms. These mirror forms are named using two different systems, and understanding when to use each one matters.

The older system uses D- and L- prefixes (from Latin “dexter” and “laevus,” meaning right and left). It works by comparing the arrangement of groups around the alpha carbon to a reference molecule, glyceraldehyde. If you draw a Fischer projection with the carboxyl group at the top and the side chain at the bottom, the amino group will point either left (L-configuration) or right (D-configuration). Nearly all amino acids found in proteins are L-amino acids. The D/L label tells you nothing about which direction the molecule rotates polarized light; that’s a separate property indicated by (+) or (-).

The modern system uses R- and S- prefixes, assigned through the Cahn-Ingold-Prelog priority rules. You rank the four groups attached to the alpha carbon by the atomic number of the atom directly bonded to it. The amino group (nitrogen, atomic number 7) outranks the carboxyl group (oxygen is higher, but the carbon is the bonding atom), and hydrogen always ranks last. After ranking, you orient the molecule so the lowest-priority group (hydrogen) points away from you, then trace a path from highest to lowest priority. If that path curves counterclockwise, the center is S. Clockwise gives R. Most L-amino acids are S-configured, with cysteine being a notable exception (it’s L but R, because the sulfur atom in its side chain changes the priority ranking).

In biochemistry, D/L notation is standard. In organic chemistry courses and when discussing synthetic amino acids, R/S notation is preferred because it’s unambiguous and based on a universal set of rules rather than comparison to a reference molecule.

Classifying by Side Chain Properties

Amino acids are also grouped and described by the chemical character of their side chain (R group). This isn’t a formal naming convention, but it comes up constantly when people discuss amino acid properties, and the category labels function as part of how amino acids are identified and communicated about.

The four standard categories are: nonpolar, polar but uncharged, negatively charged, and positively charged (all at neutral pH). Alanine, valine, leucine, isoleucine, proline, and methionine fall into the nonpolar group. Serine, threonine, asparagine, and glutamine are polar but uncharged. Aspartic acid and glutamic acid carry a negative charge at physiological pH. Lysine, arginine, and histidine carry a positive charge. Phenylalanine, tryptophan, and tyrosine are often pulled out separately as “aromatic” amino acids because they contain ring structures, though phenylalanine and tryptophan are also classified as nonpolar.

The 21st and 22nd Amino Acids

Two additional amino acids are incorporated into proteins during translation, extending the standard set of 20. Selenocysteine (abbreviated Sec or U) is the 21st, and pyrrolysine (abbreviated Pyl or O) is the 22nd. Both are called “proteinogenic” because cells build them directly into proteins using special machinery, not by modifying an amino acid after the protein is made.

Selenocysteine’s systematic name is 2-selenoalanine. It resembles cysteine but has a selenium atom where cysteine has sulfur. It gets inserted at positions coded by what is normally a stop signal (UGA codon) in the messenger RNA. Pyrrolysine is a modified form of lysine with an additional ring structure, and it’s inserted at another stop signal (UAG codon). Their naming follows the same conventions as the standard 20: a trivial name, a three-letter code, and a one-letter code.

Naming in Different Ionic States

Amino acids change form depending on the pH of their environment, and this affects how they’re written in chemical notation. In water at neutral pH, amino acids exist as zwitterions, carrying both a positive charge on the amino group and a negative charge on the carboxyl group simultaneously. The net charge is zero, but the molecule is not truly “neutral” in the way an uncharged molecule would be.

In acidic conditions, the carboxyl group picks up a proton and loses its negative charge, leaving the molecule with a net positive charge. In basic conditions, the amino group loses its proton and its positive charge, leaving a net negative charge. IUPAC systematic names technically refer to the hypothetical uncharged form, with the amino group unprotonated and the carboxyl group undissociated. In practice, when writing structures, you should match the ionic form to the pH you’re describing. At physiological pH (around 7.4), drawing the zwitterion form is most accurate for simple amino acids.