How Does CRISPR Target the Gene of Interest?

CRISPR finds its target gene using a short piece of synthetic RNA that matches the DNA sequence a scientist wants to edit. This guide RNA pairs with the Cas protein to form a molecular search party that scans billions of base pairs in a genome, looking for one specific stretch of about 20 nucleotides. When the guide RNA finds its complementary sequence, it locks on, the DNA unwinds, and the Cas protein cuts. The whole system hinges on two things: the guide RNA’s ability to match the target through base pairing, and a short DNA tag next to the target that the protein itself recognizes.

The Two Components That Make Targeting Work

The CRISPR system needs just two pieces to find a gene: a Cas protein (most commonly Cas9) and a single guide RNA, or sgRNA. The Cas9 protein is essentially inert without its guide RNA. It only becomes active once it loads the sgRNA, which contains a 20-nucleotide sequence designed to match the gene of interest.

That 20-nucleotide stretch is the address label. Scientists design it to be complementary to one specific location in the genome. Because DNA follows strict pairing rules (A pairs with T, C pairs with G), the guide RNA will only form a stable bond with DNA that matches its sequence. This is how CRISPR achieves its precision: the targeting is encoded in the guide RNA, and swapping in a different guide lets you redirect the whole system to a different gene.

How the Search Begins: PAM Recognition

Cas9 doesn’t start by reading the guide RNA’s match to DNA. It starts by scanning for a much shorter signal in the DNA itself, called a PAM (protospacer adjacent motif). For the most widely used version of Cas9, from the bacterium S. pyogenes, the PAM is just three letters: NGG, where N can be any nucleotide followed by two guanines.

The protein physically bumps along the DNA, checking for this short tag. Two specific amino acids in Cas9’s PAM-interacting domain reach into the DNA’s major groove and grab onto the GG pair. Without a PAM, Cas9 moves on. This is a speed trick: rather than trying to unwind and read every stretch of DNA against the guide RNA, the protein first filters for PAM sites, which narrows the search enormously. PAM recognition is a hard prerequisite for everything that follows.

R-Loop Formation: Testing the Match

Once Cas9 finds a PAM, it pries open the two DNA strands right next to it. This local melting lets the guide RNA begin pairing with the exposed strand. The pairing starts at the PAM-proximal end of the guide, in a region called the seed sequence, which spans roughly the first 10 to 12 nucleotides closest to the PAM.

This seed region is the most critical stretch for targeting accuracy. Structural studies published in Nature show that the seed sequence is pre-organized into a shape that’s ready to grab DNA, making initial pairing fast and energetically favorable. If the first 8 to 9 nucleotides match, a short hybrid of RNA and DNA (called an R-loop) begins to form. As more nucleotides pair correctly, the R-loop expands, pushing the displaced DNA strand aside and pulling the target strand deeper into a channel inside the protein.

The process is directional and stepwise. The R-loop grows from the PAM side outward, and each additional base pair forces the protein’s internal domains to shift and widen the binding channel. If mismatches appear in the seed region, the complex falls apart quickly. Mismatches farther from the PAM are tolerated more easily during binding, but the protein has a second checkpoint: it won’t activate its cutting machinery unless at least 17 base pairs of the full 20-nucleotide guide are correctly matched. This two-layer quality control, one at the binding stage and one at the activation stage, is what keeps the system from cutting the wrong site.

Where Exactly the Cut Happens

When the full R-loop forms and the protein confirms the match, Cas9 makes a double-strand break exactly 3 base pairs upstream of the PAM sequence. Both strands of the DNA are cut, producing a clean, blunt-ended break. The cell then repairs this break using its own DNA repair machinery, and it’s during this repair process that scientists can delete, disrupt, or insert new genetic material.

The predictability of the cut site is one of CRISPR’s biggest advantages. Because the break always lands 3 base pairs from the PAM, researchers know precisely where the edit will occur when they design their guide RNA.

What Happens When Targeting Goes Wrong

Off-target cutting is the main risk of CRISPR editing. The guide RNA occasionally binds to DNA sequences that are similar but not identical to the intended target. Research shows that as few as 8 to 9 base pairs of complementarity in the seed region can produce off-target effects, even when the rest of the guide doesn’t match well.

Mismatches at the PAM-proximal end (the seed) cause the complex to fall off the DNA quickly, so these are effectively rejected. But mismatches at the PAM-distal end are trickier: the guide can still bind stably, and off-target cutting depends on whether the protein’s internal conformational checkpoint catches the imperfect pairing. This is why guide RNA design is so important. Scientists use algorithms that score potential guide sequences based on how unique they are in the genome, prioritizing sequences with few near-matches elsewhere.

To reduce off-target effects, researchers have engineered high-fidelity versions of Cas9, including SpCas9-HF1, eSpCas9, and HypaCas9. These variants are tuned to be more sensitive to mismatches, so they reject imperfect targets more readily. SpCas9-HF1, for example, shows the lowest levels of unintended editing among commonly used variants, with improved specificity at both ends of the guide sequence.

How Scientists Design the Guide RNA

Designing a guide RNA starts with picking a 20-nucleotide target sequence adjacent to a PAM in the gene of interest. In practice, there are usually many possible target sites within any given gene. For the human genome alone, nearly 10 million guide sequences have been identified that target protein-coding regions.

Software tools rank these candidates using several criteria. The most heavily weighted factors are where the cut falls within the gene (earlier in the coding region is better, since it’s more likely to disrupt the protein) and how many of the gene’s transcript variants the guide covers. On-target efficiency and off-target risk are also scored, along with whether the target site overlaps with common genetic variants in the population that could prevent the guide from binding.

The optimal spacer length for the standard Cas9 is 20 nucleotides, though guides between 18 and 21 nucleotides can still work. Shortening the guide slightly can sometimes improve specificity by making the system less tolerant of mismatches, though this comes at the cost of some cutting efficiency.

Beyond Cas9: Other Targeting Systems

Cas9 isn’t the only option. Cas12a (also called Cpf1) targets DNA using a different PAM: TTTV (where V is A, C, or G). This T-rich requirement opens up regions of the genome that Cas9’s G-rich PAM can’t access. Cas12a also simplifies the system because it only needs a single short RNA and doesn’t require the second RNA component that Cas9 originally needed (the tracrRNA, which is now engineered into the single guide RNA for Cas9). Cas12a is particularly useful for editing multiple genes at once, because it can process several guide RNAs from a single array.

For targeting RNA instead of DNA, Cas13 works as an RNA-guided RNA-cutting enzyme. Unlike DNA-targeting systems, Cas13 shows no strong preference for any flanking sequence in human cells, meaning it can target essentially any RNA sequence. This makes it useful for applications like reducing gene activity without permanently altering the genome.

Each of these systems follows the same core logic: a guide RNA carries the address, and the protein provides the machinery to find that address and act on it. The differences lie in what kind of molecule they cut, what PAM or flanking sequence they need, and how they’re best deployed in different experimental contexts.