How Does Information Flow From Genes to Proteins?

Information flows from genes to proteins in two major steps: first, a gene’s DNA sequence is copied into a messenger RNA molecule (transcription), and then that RNA is read by a ribosome to build a protein (translation). This one-directional flow, from DNA to RNA to protein, is so fundamental to biology that scientists call it the “central dogma” of molecular biology. Every cell in your body uses this process thousands of times a day to produce the proteins that carry out nearly every function keeping you alive.

Step One: Transcription Copies DNA Into RNA

Your DNA lives inside the nucleus of each cell, tightly wound into a double helix. But proteins are built outside the nucleus, in the cytoplasm. So the cell needs a way to carry genetic instructions from one location to the other. That carrier is messenger RNA, or mRNA.

During transcription, an enzyme called RNA polymerase latches onto a specific stretch of DNA and pries open a small section of the double helix. One of the two exposed DNA strands serves as a template. The RNA polymerase moves along this template strand one unit at a time, matching each DNA base with a complementary RNA base and linking them together into a growing chain. The result is a single-stranded RNA molecule whose sequence is an exact mirror image of the template DNA. Think of it like copying a recipe from a master cookbook onto an index card you can carry into the kitchen.

Not every part of a gene ends up in the final mRNA, though. The initial RNA copy, called pre-mRNA, contains stretches of non-coding sequence (introns) sandwiched between the coding segments (exons). Before the mRNA can leave the nucleus, a molecular machine called the spliceosome cuts out the introns and stitches the exons together. Much of this editing happens while the RNA is still being transcribed. In most human genes, the exons are short and the introns are long, so the splicing machinery has time to work on the front end of the molecule while the back end is still being copied. Once splicing is complete, the mature mRNA is tagged with special protein complexes and shuttled through pores in the nuclear membrane into the cytoplasm.

Step Two: Translation Builds a Protein From RNA

Once mRNA reaches the cytoplasm, the cell reads its sequence and assembles a protein. This process, called translation, takes place on ribosomes, which are large molecular machines made of RNA and protein.

The ribosome reads the mRNA three letters at a time. Each three-letter group is called a codon, and each codon specifies one amino acid, the building blocks of proteins. With four possible RNA bases arranged in groups of three, there are 64 possible codons. Most of these code for one of the 20 standard amino acids (several amino acids have more than one codon). Three codons, UAA, UAG, and UGA, don’t code for any amino acid. They act as stop signals. One codon, AUG, serves double duty: it codes for the amino acid methionine and also acts as the universal start signal that tells the ribosome where to begin.

Translation unfolds in three stages. During initiation, the ribosome assembles around the mRNA at the start codon. During elongation, it moves along the mRNA codon by codon, adding one amino acid after another to a growing chain. During termination, the ribosome hits a stop codon, releases the finished chain, and disassembles from the mRNA.

How tRNA Delivers the Right Amino Acids

The ribosome can read the mRNA, but it can’t grab amino acids on its own. That job belongs to a smaller molecule called transfer RNA, or tRNA. Each tRNA has two critical features: on one end, it carries a specific amino acid; on the other, it has a three-letter sequence called an anticodon that pairs with a complementary codon on the mRNA.

When the ribosome exposes a new codon, the tRNA with the matching anticodon slots into place, delivering its amino acid. The ribosome then forms a chemical bond linking that amino acid to the growing chain, the used tRNA detaches, and the ribosome shifts forward to the next codon. This cycle repeats hundreds or even thousands of times for a single protein. The pairing doesn’t always have to be perfectly complementary. A flexibility called “wobble” at one position of the anticodon allows some tRNAs to recognize more than one codon, which is why 61 sense codons can be handled by fewer than 61 different tRNA molecules.

What Happens After the Protein Is Built

A freshly assembled chain of amino acids isn’t a finished, working protein yet. It needs to fold into a precise three-dimensional shape, and it often gets chemical modifications that fine-tune its behavior. Scientists have cataloged more than 650 types of these modifications so far.

Two of the most common are phosphorylation and glycosylation. In phosphorylation, an enzyme attaches a small phosphate group to certain amino acids in the protein chain. This changes the protein’s shape or electrical charge, effectively flipping it “on” or “off” in a signaling pathway. Phosphorylation is reversible, which makes it an ideal switch. About 86% of phosphorylation events in the cell happen on the amino acid serine, with threonine and tyrosine accounting for most of the rest. Glycosylation, on the other hand, involves attaching sugar molecules to the protein. This typically happens inside specialized compartments in the cell and is especially common in proteins destined for the cell surface or for secretion outside the cell. These sugar coatings help proteins fold correctly, protect them from being broken down, and allow cells to recognize one another.

Some modifications happen immediately after a protein is made, guiding it to fold into the right shape. Others occur later, in response to signals from the cell’s environment, allowing the cell to adapt quickly without having to make entirely new proteins.

How Cells Control Which Genes Become Proteins

Every cell in your body contains the same DNA, yet a muscle cell looks and acts nothing like a nerve cell. The difference comes down to which genes get transcribed and how often. Cells control this primarily at the transcription stage, using a class of molecules called transcription factors.

Transcription factors are proteins that bind to specific stretches of DNA near a gene, either encouraging or blocking RNA polymerase from starting its work. Some act as activators, recruiting the transcription machinery to a gene’s promoter region. Others act as repressors, physically blocking access. On top of that, the cell can adjust how tightly DNA is wound around its packaging proteins. Genes buried in tightly packed DNA are essentially silenced because the transcription machinery can’t reach them. Loosening that packaging exposes the gene and allows transcription to begin.

This layered control system means that even though your genome contains roughly 20,000 protein-coding genes, any given cell only expresses a fraction of them at any given time. The combination of which genes are active defines what type of cell it is and how it responds to the world around it.

The Full Journey at a Glance

The path from gene to protein crosses two cellular compartments and involves dozens of molecular players. DNA in the nucleus is transcribed into pre-mRNA by RNA polymerase. The pre-mRNA is spliced to remove non-coding introns, producing a mature mRNA. That mRNA is exported through nuclear pores into the cytoplasm, where ribosomes read it three letters at a time. Transfer RNA molecules ferry in the correct amino acids, and the ribosome links them into a chain. The chain folds and gets chemically modified into a functional protein. At every stage, from whether a gene is transcribed at all to how the final protein is modified, the cell has regulatory checkpoints that ensure the right proteins are made in the right amounts at the right time.