Transcription Start Site: What It Is and How It Works

The transcription start site (TSS) is the exact position on a DNA strand where a gene begins to be copied into RNA. It’s labeled as the +1 position, meaning it’s the first nucleotide that RNA polymerase reads and transcribes. Everything upstream (toward the promoter) is given negative numbers, and everything downstream gets positive numbers. This simple coordinate system anchors how scientists describe the location of every regulatory element around a gene.

How the TSS Relates to Gene Structure

A common point of confusion is the difference between the transcription start site and the translation start site. These are not the same thing. The TSS is where RNA begins being made. The translation start site, marked by an ATG codon, is where a ribosome later begins building a protein from that RNA. Between these two points sits the 5′ untranslated region (5′ UTR), a stretch of RNA that gets transcribed but never translated into protein. The 5′ UTR can contain features that regulate how efficiently the RNA is translated, so where transcription begins actually influences how much protein a cell ultimately makes.

Reading from upstream to downstream, a protein-coding gene is organized in this order: promoter region, transcription start site, 5′ UTR, translation start codon (ATG), the protein-coding exons and introns, a stop codon, the 3′ UTR, and finally a transcription termination sequence.

What Happens at the TSS During Transcription

Before RNA polymerase II can start copying a gene, a large assembly of proteins called the pre-initiation complex (PIC) must form at the promoter. The process begins when a multi-protein complex called TFIID recognizes specific DNA sequences near the TSS and places a smaller protein, TBP (TATA-binding protein), onto the promoter DNA. Another factor, TFIIA, helps stabilize this interaction.

From there, additional factors arrive in a specific order. TFIIB binds next and allows RNA polymerase II, paired with yet another factor called TFIIF, to dock onto the still-closed DNA. A large scaffolding complex called Mediator also attaches to RNA polymerase II, helping relay signals from distant regulatory regions. Finally, two more factors, TFIIE and TFIIH, trigger the DNA double helix to physically unwind and separate at the TSS. This “melting” of the DNA strands creates an open complex where RNA polymerase II can begin reading the template strand and synthesizing a new RNA molecule starting at the +1 position.

Promoter Elements That Position the TSS

The TSS doesn’t exist in isolation. Several short DNA sequence motifs in the surrounding promoter region help position RNA polymerase II so it starts at the right spot. The most well-known is the TATA box, typically found about 30 base pairs upstream of the TSS (around position -30). TBP binds directly to this sequence, providing a precise anchor point.

Right at the +1 position itself, a motif called the Initiator (Inr) element can define the start site independently or work together with the TATA box. Flanking the TATA box, BRE elements serve as binding sites for TFIIB. Downstream of the TSS, additional elements like the DPE and MTE boost promoter activity, often in partnership with either the TATA box or the Inr. Not every promoter contains all of these elements. Many genes lack a TATA box entirely, and the combination of motifs present helps determine exactly how transcription initiates.

Sharp vs. Broad Promoters

Not all genes have a single, precisely defined TSS. Researchers have identified two main classes of promoter architecture based on how transcription start sites are distributed. Sharp (or focused) promoters concentrate their transcription initiation within just a few base pairs, producing RNA molecules that all begin at essentially the same position. These promoters typically contain a clear TATA box and well-defined downstream elements that lock RNA polymerase II into a fixed starting point.

Broad (or dispersed) promoters, by contrast, spread their initiation activity across a window of roughly 100 base pairs, generating RNA molecules with many slightly different 5′ ends. These promoters tend to lack a TATA box and instead overlap with CG-rich regions called CpG islands. Without strong anchoring sequences, the protein machinery that positions RNA polymerase II sits less precisely on the DNA, allowing polymerase to select from several nearby start positions. The result is a broad pattern of transcription initiation rather than a single defined point. Most housekeeping genes, the ones active in nearly every cell type, use broad promoters.

Why Genes Have Multiple Start Sites

Many genes have more than one TSS, and different tissues or cell types preferentially use different ones. This matters because choosing an alternative TSS changes the 5′ end of the RNA, which can alter the 5′ UTR or even include or exclude entire protein-coding exons. The result is distinct transcript isoforms, and potentially distinct proteins, from the same gene.

A large-scale study of human tissues found that alternative transcription start and termination sites, rather than alternative splicing, accounted for the majority of tissue-dependent differences in transcript structure. In other words, the choice of where to start transcription is one of the principal ways the body generates the protein diversity needed for different cell types and organs to function. A liver cell and a neuron may both express the same gene but initiate transcription from different positions, producing isoforms tailored to each cell’s needs.

How Scientists Map Transcription Start Sites

Identifying exactly where transcription begins across the genome requires specialized laboratory techniques. Researchers have developed two broad categories of assays for this purpose. TSS-focused assays, such as CAGE, RAMPAGE, STRIPE-seq, and GRO-cap, specifically enrich for the 5′ ends of newly made RNA molecules, pinpointing the exact nucleotides where transcription initiated. A second category, nascent transcript assays like GRO-seq and PRO-seq, captures RNA that is still being made by actively transcribing polymerases, revealing not just where transcription started but where polymerase is paused or moving along the gene.

TSS-focused assays are generally more sensitive at detecting initiation events. In one comparison, TSS assays devoted only about 13% of their sequencing reads to gene body regions, while nascent transcript assays allocated roughly 66% of reads there, since they capture polymerases at all stages of elongation. Among TSS assays, GRO-cap detected the highest proportion of known regulatory elements, covering about 87% of experimentally validated enhancers in a benchmark comparison. Databases like the UCSC Genome Browser host tracks of mapped TSS positions across the human genome, allowing researchers to look up the coordinates and confidence scores for start sites of individual genes.