DNA carries the genetic instructions for the development, functioning, growth, and reproduction of all known organisms. The primary structure is the linear sequence of its nucleotide bases—Adenine (A), Guanine (G), Cytosine (C), and Thymine (T)—which determines the genetic code. The secondary structure refers to the local, three-dimensional folding of this linear sequence, conventionally represented by the B-form right-handed double helix. Specific nucleotide sequences allow the DNA strand to fold into alternative, non-canonical geometries. Understanding these alternative structures is important because they represent local architectural switches that can affect the entire genome’s function.
Beyond the Double Helix
DNA sequences can fold into complex, non-helical shapes that diverge from the standard double-stranded form. The simplest alternative structure is the hairpin, or stem-loop, which forms when a single strand of DNA is self-complementary, allowing it to fold back and base-pair with itself. In double-stranded DNA, this characteristic (known as an inverted repeat) can lead to the formation of a cruciform structure, where both strands fold out from the helix to create two perpendicular hairpins, often stabilized by negative supercoiling.
A more elaborate structure is the triplex, or H-DNA, involving three strands of DNA interacting within a single helix. This structure requires a long, homopurine-homopyrimidine sequence, allowing a third strand to bind into the major groove of the existing duplex. The most studied non-canonical form is the G-quadruplex (G4), a four-stranded structure that forms in sequences rich in guanine bases. G4s are stabilized by stacked planes of four guanines, called G-tetrads, held together by Hoogsteen hydrogen bonds and further stabilized by monovalent cations, particularly potassium ions, in the central channel. Complementary cytosine-rich sequences can also form the i-motif structure, stabilized by protonated cytosine-cytosine base pairs and highly dependent on acidic conditions for its formation.
Biological Roles of Structure
These varied DNA structures serve as functional elements that regulate cellular processes. G-quadruplexes are concentrated in the telomeres, the protective caps at the ends of chromosomes, where their formation helps maintain genome stability and regulate cellular aging. Folding G4 structures in these regions can inhibit the activity of the telomerase enzyme, influencing the cell’s lifespan.
Non-canonical structures frequently occur in the promoter regions of genes, acting as physical switches to modulate gene expression. For example, a G4-forming sequence in the promoter of the c-MYC proto-oncogene can repress transcription by physically blocking the binding site for transcription factors. Triplex and i-motif structures similarly influence transcription and DNA replication by creating structural roadblocks or recognition sites for specialized DNA-binding proteins. Their presence near origins of replication and during DNA repair suggests they are involved in maintaining the integrity and accurate duplication of genetic material.
Computational Prediction Approaches
Predicting which DNA sequences are likely to form these structures relies on theoretical models and bioinformatics algorithms. The most common approach is thermodynamic stability modeling, based on the principle that a nucleic acid sequence will fold into the structure with the lowest free energy. Tools like Mfold or UNAFold use empirically derived “nearest neighbor” parameters, which assign energy values to every possible base pair, loop, and stack interaction within a sequence.
The algorithm sums these energy contributions to calculate the total change in Gibbs free energy ($\Delta G^{\circ}$) for all possible folded structures. The structure with the most negative $\Delta G^{\circ}$ is predicted to be the most stable and likely conformation. For structures like G-quadruplexes, which have a distinct sequence signature, dedicated tools like Quadparser or G4Hunter perform sequence motif searching. These tools scan the genome for the characteristic pattern of four guanine tracts separated by short loops, such as $G_{\ge 3}N_{1-7}G_{\ge 3}N_{1-7}G_{\ge 3}N_{1-7}G_{\ge 3}$. G4Hunter assigns a propensity score based on the sequence’s G-richness and G-skewness, providing a more nuanced prediction than simple pattern matching.
Confirming Predicted Structures
Because computational models are theoretical, predicted structures must be verified using laboratory techniques that provide physical evidence of their formation. Nuclear Magnetic Resonance (NMR) spectroscopy is useful for small oligonucleotides in solution, offering atomic-level detail on the structure and dynamics of the folded molecule. NMR measures the magnetic properties of atomic nuclei, which change depending on the molecule’s three-dimensional arrangement, allowing confirmation of Hoogsteen base pairing in G4s or hemiprotonated pairs in i-motifs.
For high-resolution, static images, scientists use X-ray crystallography, which involves crystallizing the DNA and bombarding it with X-rays. The resulting diffraction pattern is analyzed to reconstruct a precise three-dimensional map of the structure, although crystallization can sometimes impose structural constraints. Circular Dichroism (CD) spectroscopy is a faster method used to confirm the overall folding topology. CD measures the differential absorption of left and right circularly polarized light, producing a unique spectral signature—a “fingerprint”—for each general type of secondary structure, such as the distinct spectrum of a B-form helix versus a G-quadruplex.

