What Removes Introns? The Spliceosome Explained

Introns are removed from precursor messenger RNA (pre-mRNA) by a large molecular machine called the spliceosome. This complex, built from five small nuclear ribonucleoproteins (snRNPs) and dozens of associated proteins, recognizes the boundaries of each intron, cuts it out, and joins the remaining exons together to form a mature mRNA. The average human gene contains about 7.8 introns, so the spliceosome performs this task thousands of times across the genome every time a gene is expressed.

How the Spliceosome Finds an Intron

The spliceosome doesn’t cut randomly. It reads specific sequence signals embedded in the RNA to identify where each intron begins and ends. The 5′ end of an intron (the “donor site”) almost always starts with the letters GU, while the 3′ end (the “acceptor site”) ends with AG. This is sometimes called the GU-AG rule. Between these boundaries, a stretch of pyrimidine bases and a critical adenosine residue called the branch point help the spliceosome lock onto the correct location.

Recognition begins when the U1 snRNP base-pairs with the 5′ splice site and the U2 snRNP binds the branch point sequence near the 3′ end of the intron. A helper protein called U2 auxiliary factor (U2AF) recognizes the pyrimidine-rich tract and the 3′ splice site, bridging the gap. These early interactions form what’s known as the commitment complex, essentially a molecular bookmark that says “cut here.”

The Two-Step Chemical Reaction

Once assembled, the spliceosome removes the intron through two back-to-back chemical reactions called transesterification. Neither reaction requires energy from ATP directly, even though the spliceosome uses ATP at other stages to rearrange its components.

In the first step, the branch point adenosine deep inside the intron acts as an attacker. Its 2′-OH group strikes the 5′ splice site, severing the connection between the upstream exon and the intron. This creates a lariat structure: the intron loops back on itself, with its 5′ end linked to the branch point adenosine by an unusual 2′-5′ bond. The upstream exon is now free, with an exposed 3′-OH end.

In the second step, that free 3′-OH on the upstream exon attacks the 3′ splice site. This simultaneously releases the lariat-shaped intron and joins the two exons together into a continuous mRNA. The discarded lariat is later broken down and its components recycled.

The Five Core Components

The major spliceosome is built from five snRNPs: U1, U2, U4, U5, and U6. Each contains a small RNA molecule and a set of proteins, and they join the complex in a specific order rather than arriving as a preformed unit.

U1 initiates the process by recognizing the 5′ splice site. U2 then binds the branch point. The U4/U6 and U5 snRNPs arrive together as a tri-snRNP particle. U4 and U6 are base-paired to each other, but during activation U4 is released, freeing U6 to form the catalytic core alongside U2. U5 positions the two exons so they can be joined accurately. The RNA components of U2 and U6 are thought to form the catalytic heart of the spliceosome, making it fundamentally an RNA-based enzyme.

A Second Spliceosome for Rare Introns

A small fraction of introns don’t follow the GU-AG rule and can’t be processed by the major spliceosome. These U12-type introns are handled by a separate minor spliceosome containing its own specialized snRNPs: U11, U12, U4atac, and U6atac. Only U5 is shared between the two machines. The minor spliceosome has just seven unique protein components, all located in a preformed U11/U12 unit that recognizes both the 5′ splice site and branch point cooperatively, without needing the pyrimidine tract or U2AF. Despite its distinct parts, the minor spliceosome follows the same general two-step chemistry.

Proteins That Control Which Introns Are Removed

Not every intron is removed the same way every time. Regulatory proteins influence which splice sites the spliceosome selects, producing different mature mRNAs from the same gene. This process, called alternative splicing, is how roughly 20,000 human genes can encode far more than 20,000 proteins.

Two major families of proteins pull the strings. SR proteins generally bind to enhancer sequences on the RNA and promote splicing by helping the spliceosome recognize weak splice sites. They’re especially important for including alternative exons that might otherwise be skipped. Their effect depends on where they bind: SR proteins attached to an exon typically enhance its inclusion, while the same proteins bound to an intron can suppress splicing at that site. Working in opposition, hnRNP proteins tend to bind silencer sequences and block the spliceosome, repressing splicing at specific locations. The balance between these two families determines which version of the mRNA gets made.

The most common form of alternative splicing in vertebrates is exon skipping, accounting for about 30% of events. Other patterns include the use of alternative 5′ or 3′ splice sites (roughly 25% combined), mutually exclusive exons, and intron retention, where an intron is deliberately left in the mature mRNA.

Self-Splicing Introns

Not all introns need a spliceosome. Group I and Group II introns are ribozymes, RNA molecules that catalyze their own removal. Group I introns, first discovered in the single-celled organism Tetrahymena, splice using a free guanosine molecule and magnesium ions as their only requirements. No external energy source is needed because bond breaking and bond forming are coupled in a single reversible reaction. These introns appear in ribosomal RNA, transfer RNA, and protein-coding genes, particularly in fungal and plant mitochondria, chloroplasts, and some bacteria and bacteriophage.

Group II introns use a mechanism strikingly similar to the spliceosome’s: they form a lariat intermediate through a 2′-5′ bond at an internal adenosine, just like spliceosomal introns. This resemblance has led many biologists to view Group II introns as the evolutionary ancestors of the spliceosome. Group II introns are found in fungal and plant mitochondria and in chloroplasts, mostly within protein-coding genes. In laboratory conditions, they self-splice only slowly and at high temperatures with elevated magnesium concentrations, suggesting that in living cells they normally rely on helper proteins.

Why Splicing Errors Matter

An estimated 50% of disease-causing mutations in humans disrupt the splicing process, most often by altering the conserved sequences at canonical splice sites. When the spliceosome misreads a boundary, it can skip an essential exon, include part of an intron, or shift the reading frame so the resulting protein is truncated or nonfunctional. Diseases linked to splicing defects range from certain cancers to neurodegenerative disorders and immune deficiencies. The sheer frequency of splicing-related mutations reflects how precisely the system must operate across billions of transcripts, and how many sequence elements it depends on to get each cut right.