How Does ChIP-Seq Work? Steps, Controls & Peak Calling

ChIP-seq (chromatin immunoprecipitation followed by sequencing) identifies where specific proteins interact with DNA across an entire genome. It works by chemically locking proteins to DNA, breaking the DNA into small fragments, using an antibody to pull out only the fragments attached to the protein of interest, and then sequencing those fragments to map their locations. The technique is widely used to study transcription factors, histone modifications, and other protein-DNA interactions that control gene activity.

Step 1: Cross-Linking Proteins to DNA

The process begins with living cells or tissue. A chemical agent, typically formaldehyde, is added to create covalent bonds between proteins and the DNA they’re touching at that moment. Think of it as taking a snapshot: wherever a transcription factor, a modified histone, or RNA polymerase sits on the genome, formaldehyde locks it in place. This “freeze frame” preserves the natural binding landscape so it survives the harsh steps that follow.

Step 2: Fragmenting the Chromatin

After cross-linking, cells are broken open and the chromatin (the protein-DNA complex) needs to be chopped into small pieces. This is usually done by sonication, which uses ultrasonic pulses to shear the DNA, or sometimes by enzymatic digestion with a nuclease. The goal is fragments between 150 and 300 base pairs long, roughly the size of one or two nucleosomes. Fragments in this range give high resolution when mapping binding sites and work well on modern sequencing platforms. If fragments are too large, you lose precision about where the protein was actually sitting. Too small, and the library preparation becomes unreliable.

Step 3: Immunoprecipitation

This is the step that gives ChIP its name. An antibody specific to the protein of interest is added to the fragmented chromatin. The antibody binds its target protein, and because that protein is still cross-linked to DNA, the attached DNA fragment comes along for the ride. The antibody-protein-DNA complexes are then pulled out of solution, typically using magnetic beads coated with a protein that grabs antibodies. Everything else, the vast majority of DNA fragments that weren’t bound to your protein, gets washed away.

Antibody quality is critical here. If the antibody isn’t highly specific, it will pull down off-target fragments and contaminate the results. Some labs use an alternative approach: engineering cells to express the protein of interest fused to a small molecular tag, then using an antibody against the tag instead. This can sidestep problems with antibody specificity for the native protein.

Step 4: Reversing Cross-Links and Purifying DNA

Once the enriched fragments are isolated, the formaldehyde cross-links are reversed (usually with heat and protease treatment) so the protein falls off and you’re left with purified DNA. This small pool of DNA fragments represents the genomic regions that were bound by your protein of interest.

Step 5: Sequencing

The purified DNA fragments are prepared into a sequencing library by adding short adapter sequences to their ends, then run on a high-throughput sequencing platform. The sequencer reads millions of short sequences (called “reads”), each one corresponding to a fragment that was pulled down in the immunoprecipitation step. These reads are then aligned back to a reference genome, revealing where each fragment originally came from.

Controls That Make the Data Interpretable

Raw ChIP-seq data is meaningless without a proper control, because sonication doesn’t break DNA evenly across the genome. Some regions shear more easily than others, creating a background pattern of fragment abundance that has nothing to do with protein binding. To correct for this, the standard approach is to sequence “input DNA,” a portion of the fragmented chromatin that skipped the immunoprecipitation step entirely. This input captures the sonication bias so it can be subtracted from the ChIP signal.

An alternative control called a “mock IP” runs the full immunoprecipitation protocol but without a specific antibody, capturing nonspecific interactions along with sonication bias. In theory, this controls for more sources of noise. In practice, mock IPs yield very little DNA and are prone to technical artifacts, which is why the ENCODE consortium and most major projects rely on input DNA controls instead.

Peak Calling: Finding Binding Sites in the Data

After sequencing reads are mapped to the genome, you’ll see certain regions where reads pile up far above the background level. These pileups, called “peaks,” mark the locations where your protein was bound. But distinguishing a real peak from random noise requires statistical testing, and that’s the job of peak-calling software.

The most widely used tool is MACS2. It slides a window across the genome, counts the reads in each window, and asks whether that count is significantly higher than expected given the local background rate. It does this using a Poisson test, which turns out to be more powerful than alternative statistical approaches for scoring candidate peaks. MACS2 consistently performs among the best peak callers on benchmark datasets for transcription factor binding.

Narrow Peaks vs. Broad Domains

Not all ChIP-seq signals look the same, and understanding the difference matters for choosing the right analysis strategy. Transcription factors bind at discrete spots on the genome, producing sharp, narrow peaks that span a few hundred base pairs. Certain histone modifications behave similarly, like H3K4me3, which marks active gene promoters.

Other histone marks spread across large stretches of chromatin. H3K27me3, associated with gene silencing, can blanket entire gene bodies in broad domains of enrichment while also forming narrow peaks at enhancers and promoter regions. H3K36me3, linked to active transcription, similarly forms broad domains over transcribed genes. Analyzing these broad marks with a tool tuned for narrow peaks will miss much of the signal. Some marks like H3K27me3 require specialized algorithms, or tools that can detect both narrow peaks and broad domains simultaneously, to capture the full picture.

How to Tell if a ChIP-seq Experiment Worked

One of the most informative quality metrics is the FRiP score: the fraction of reads in peaks. It measures what proportion of your total sequenced reads fall within called peaks versus scattered across the background. A higher FRiP means better enrichment, meaning the immunoprecipitation successfully concentrated DNA from true binding sites. For a typical transcription factor, a FRiP of 5% or higher indicates good enrichment. For RNA polymerase, which occupies a larger fraction of the genome, values of 30% or higher are expected. A very low FRiP suggests the antibody performed poorly or the experiment had excessive background noise.

CUT&RUN: A Newer Alternative

Traditional ChIP-seq works well but requires substantial amounts of starting material and deep sequencing to overcome background noise. A newer method called CUT&RUN takes a different approach. Instead of sonicating all the chromatin and then pulling down fragments with an antibody, CUT&RUN brings a cutting enzyme directly to the protein of interest. A protein A molecule conjugated to a nuclease is guided to the target by an antibody, and only the DNA immediately surrounding the bound protein gets cleaved and released. Because the cutting is targeted rather than genome-wide, the background is much lower, less sequencing is needed, and resolution is higher. CUT&RUN is gaining traction, particularly for experiments where cell numbers are limited.