What Conclusions Can Be Made From a DNA Microarray?

A DNA microarray can tell you which genes are turned on or off across thousands of genes simultaneously, and those patterns lead to surprisingly powerful conclusions. Depending on the type of microarray and the experiment, researchers can classify cancers into subtypes, predict how a patient will respond to treatment, identify chromosomal abnormalities, pinpoint toxic mechanisms of chemicals, and map out which biological pathways are active in a disease. Here’s a closer look at each type of conclusion.

Which Genes Are Active or Inactive

The most fundamental conclusion from a microarray is which genes are more active (upregulated) or less active (downregulated) in one sample compared to another. A microarray measures messenger RNA levels for thousands of genes at once, giving a snapshot of gene activity across an entire genome. When researchers compare, say, a tumor sample to healthy tissue, they generate a list of genes whose activity differs between the two.

To decide whether a difference is meaningful, researchers typically require a gene’s activity to change by at least 1.5- to 2-fold, combined with a statistical cutoff (usually a p-value below 0.05). The earliest microarray studies used a simple 2-fold change as the bar for significance, but modern analyses layer in statistical testing to reduce false positives. Some studies use stricter thresholds of 4-fold change, while others go as low as 1.3-fold depending on the biological context. Any gene that clears both the fold-change and statistical thresholds is considered differentially expressed, and that list becomes the starting point for deeper conclusions.

Cancer Classification and Prognosis

One of the most impactful uses of microarrays has been sorting cancers into subtypes that look identical under a microscope but behave very differently. A landmark study of diffuse large B-cell lymphoma (DLBCL) used microarray data from 40 patients to discover two previously unknown subtypes. The distinction mattered enormously: five years after chemotherapy, 76% of patients with one subtype (germinal center B-like) survived, compared to just 16% of those with the other subtype (activated B-like). Without microarray data, these patients would have received the same diagnosis and the same treatment, with no explanation for the dramatically different outcomes.

Similar discoveries followed in other cancers. Microarray analysis of acute myeloid leukemia patients who had normal-looking chromosomes split them into two groups with different survival rates, something no previous method had achieved. A 133-gene predictor was built to separate good-prognosis from poor-prognosis patients within that group. In breast cancer, microarray profiling of 65 tumors using 496 genes revealed four distinct molecular subtypes: estrogen receptor positive, basal-like, Erb-B2 positive, and normal breast tissue-like.

Beyond classification, microarrays can predict whether a cancer is likely to spread. Researchers compared gene activity in metastatic tumors to primary tumors and identified 128 genes that differed between them. Primary tumors that already carried this “metastasis-like” gene signature were associated with worse outcomes. In head and neck cancers, a 102-gene predictor built from microarray data detected lymph node spread with 86% accuracy, outperforming the standard clinical method at 68%.

Biological Pathway Activity

Looking at individual genes only tells part of the story. A cellular process often involves dozens or hundreds of genes working together, and each one might change only modestly. A technique called Gene Set Enrichment Analysis (GSEA) addresses this by asking whether genes belonging to a known biological pathway tend to cluster at the top or bottom of the ranked gene list. This approach can reveal pathway-level changes that single-gene analysis completely misses.

A striking example comes from diabetes research. When researchers analyzed muscle biopsies from diabetic patients versus healthy controls, no individual gene stood out dramatically. But GSEA revealed that genes involved in the cell’s energy-production machinery showed reduced expression as a group, even though the average decrease per gene was only about 20%. That modest, coordinated drop across an entire pathway was biologically meaningful, pointing to a specific metabolic defect in diabetic muscle tissue. Similarly, two independent studies of lung cancer survival showed little overlap when compared gene by gene, but GSEA revealed many shared biological pathways between them.

Chromosomal Deletions and Duplications

A different type of microarray, called a chromosomal microarray (CMA), draws conclusions about the structure of chromosomes rather than gene activity. These arrays detect copy number variants: small deletions or duplications of DNA segments that are too small to see with traditional chromosome analysis under a microscope. Two main platforms exist for this purpose. One is based on comparative genomic hybridization, which identifies gains and losses of DNA at various resolutions. The other uses single nucleotide polymorphism (SNP) arrays, which can detect the same structural changes but also reveal regions where both copies of a chromosome come from the same parent, a finding relevant to certain genetic conditions.

In prenatal and pediatric medicine, chromosomal microarrays are now a recommended first-line test for evaluating developmental delays, intellectual disability, and congenital anomalies. They pick up clinically significant genetic changes that conventional chromosome testing would miss entirely.

Genetic Risk for Complex Diseases

SNP arrays are the workhorse behind genome-wide association studies, which scan hundreds of thousands of genetic variants across large populations to find links between specific DNA changes and disease risk. These arrays are designed to genotype common genetic variants in a cost-effective, high-throughput way. Through these studies, researchers have identified genetic variations associated with conditions ranging from diabetes to heart disease to psychiatric disorders.

The conclusions here are probabilistic rather than deterministic. A single variant might increase your risk of a disease by a small percentage, and it’s typically the combined effect of many variants that produces meaningful risk predictions. These arrays do have notable limitations: they only test predefined genetic positions, they’re best suited for common variants rather than rare mutations, and they can’t capture larger structural changes in DNA. They also explain only a fraction of the heritable risk for most complex diseases, a problem known as “missing heritability.”

Drug Response Prediction

Microarrays contribute to pharmacogenomics, the study of how genetic variation affects drug response. By profiling gene expression in tumors or genotyping patients’ DNA, researchers can draw conclusions about who is likely to benefit from a specific drug and who might experience severe side effects. Microarray-based genotyping platforms have enabled genome-wide searches for genes associated with drug responses that weren’t obvious candidates based on prior knowledge. This is especially useful for predicting severe adverse drug reactions, where the genetic effect tends to be large and binary: a patient either carries the risk variant or doesn’t.

Toxicology and Chemical Exposure

Microarrays allow researchers to conclude how a chemical damages specific organs and through what mechanism. When rats were treated with an experimental drug, the resulting gene expression profile in liver tissue was compared against a database of profiles from known liver toxins. The match revealed the specific molecular receptor through which the new chemical caused its toxic effects, a conclusion that would have taken far longer to reach through traditional methods.

This approach also reveals that chemicals in the same class produce similar gene expression fingerprints. Rats treated with different peroxisome proliferators (a class of compounds) showed overlapping gene profiles, while a different class of chemicals produced a distinct pattern. Researchers have identified gene markers of kidney toxicity, tracked dose-dependent changes in gene expression, and even mapped which organs are most affected by a given chemical. In one study, rats exposed to hexachlorobenzene showed the greatest gene expression changes in the spleen, followed by liver, kidney, and lymph nodes, pinpointing the primary target organs.

What Microarray Data Cannot Tell You

A critical limitation is that microarrays measure messenger RNA, not protein. Protein is what actually carries out most functions in cells, and mRNA levels correlate with protein levels by only about 20 to 40%. Multiple factors drive this gap: proteins are modified after they’re made, different proteins last for different lengths of time in the cell, and some mRNAs are never efficiently translated into protein. A gene that appears highly active on a microarray may produce very little functional protein, and vice versa.

Technical artifacts also complicate interpretation. Some mRNA molecules can accidentally bind to the wrong spot on the array, producing false signals. Probe design on the array may rely on incomplete genome information, leading to misidentification of which gene is actually being measured. And genes that produce multiple mRNA variants, like CD44, may not be accurately distinguished by a single probe set. These limitations mean that microarray findings are typically validated with independent techniques before firm biological conclusions are drawn.