What Is Experimental Design in Biology?

Experimental design in biology is the structured plan a researcher creates before running an experiment, mapping out exactly what will be tested, how subjects will be grouped, what will be measured, and how bias will be minimized. It’s the blueprint that determines whether an experiment’s results are meaningful or useless. Whether you’re designing a high school lab or reading a published study, understanding these principles helps you distinguish solid evidence from sloppy science.

The Five Core Steps

Every well-designed biology experiment follows a consistent sequence. First, you define your variables: what you’ll change, what you’ll measure, and what you’ll hold constant. Second, you write a specific, testable hypothesis that predicts a relationship between those variables. Third, you design experimental treatments that manipulate the thing you’re changing. Fourth, you assign subjects to groups. Fifth, you plan exactly how you’ll measure the outcome.

These steps apply whether you’re studying how salt concentration affects grass seed germination, whether a drug shrinks tumors in mice, or how temperature influences enzyme activity. The scale changes, but the logic doesn’t.

Variables: The Building Blocks

Three types of variables form the backbone of any biological experiment. The independent variable is what you deliberately change. The dependent variable is the outcome you measure. Controlled variables (sometimes called constants) are everything else you keep the same so they don’t muddy your results.

Consider a study asking whether vehicle exhaust increases asthma rates in children. The exhaust concentration is the independent variable. Asthma incidence is the dependent variable. But what about cigarette smoke in the home, or pollution from nearby factories? These are confounding variables: factors linked to both the exposure and the outcome that can distort your conclusions if you don’t account for them. Good experimental design either eliminates confounders or statistically adjusts for them.

A classic plant experiment illustrates this clearly. A USDA protocol tests how salt and alkalinity affect grass growth by planting 30 seeds in cups with varying amounts of salt or baking soda mixed into the soil. The independent variable is the soil additive (and its dose). The dependent variables are the number of plants that germinate and the height of the tallest blades after one and two weeks. Controlled variables include the amount of soil, water, seed count, light exposure, and container type. Every cup gets the same treatment except for the one thing being tested.

Why Controls Matter

Controls are the reference points that give your results meaning. Without them, you can’t tell whether the change you observed was caused by your treatment or by something else entirely.

A negative control receives no treatment (or a placebo) and is expected to show no effect. In the plant experiment above, the cup with plain soil and no added salt or baking soda is the negative control. It establishes what normal grass growth looks like under baseline conditions. If the negative control also shows poor growth, something other than your treatment is the problem.

A positive control receives a treatment already known to produce a specific effect. Its job is to confirm that your experimental setup is actually working. If you’re testing whether a new antibiotic kills bacteria on a petri dish, your positive control might be a well-established antibiotic that you know works. If the positive control fails to kill bacteria, your method has a problem, and your other results can’t be trusted.

Randomization and Blinding

Bias can creep into experiments in subtle ways. A researcher might unconsciously assign healthier-looking animals to the treatment group, or score outcomes more favorably when they know which subjects received the drug. Randomization and blinding are the two main defenses against this.

Randomization means assigning subjects to groups by chance rather than by choice. It prevents selection bias, produces comparable groups, and ensures that any differences between groups at the start of the experiment are due to chance rather than a researcher’s judgment. The simplest method is a coin flip or a random number generator, but more sophisticated approaches exist. Block randomization assigns subjects in small, balanced blocks to keep group sizes equal throughout the study. Stratified randomization first sorts subjects by a key characteristic (age, sex, or disease severity, for example) and then randomizes within each category, ensuring those traits are evenly distributed across groups.

Blinding hides group assignments from the people involved. In a single-blind experiment, subjects don’t know which treatment they’re receiving. In a double-blind experiment, neither the subjects nor the researchers measuring outcomes know. This prevents expectations from influencing either behavior or measurement. Double blinding is the gold standard in clinical trials, though it’s not always possible in every type of biological study.

Replication: Biological vs. Technical

Running an experiment once proves nothing. Replication is what separates a fluke from a finding, and biology distinguishes between two types.

Technical replicates are repeated measurements of the same sample. If you measure the protein concentration of a blood sample three times, those are technical replicates. They capture noise from your equipment and procedures, helping you gauge measurement precision. Biological replicates are measurements from distinct biological samples, like blood drawn from three different mice. These capture the natural variation between living organisms, which is usually the variation you actually care about.

Both matter, but they answer different questions. Technical replicates tell you whether your pipette and spectrophotometer are consistent. Biological replicates tell you whether your finding holds true across different individuals. A study with 50 technical replicates of a single mouse is far less convincing than a study with 10 biological replicates from 10 different mice.

Common Experimental Layouts

Two designs dominate biological research. In a completely randomized design, every subject is independently assigned to a treatment group with no additional structure. This is the simplest approach and works well when subjects are relatively similar to begin with.

A randomized block design adds a layer of organization. Subjects are first sorted into “blocks” based on a characteristic that might influence the outcome (body weight, litter, or growth chamber, for example). Within each block, subjects are then randomly assigned to treatments. This design is more sensitive because it accounts for known sources of variability. When there are only two treatments, this becomes a matched-pairs design, where each block contains two subjects paired by similarity.

In a field ecology study, for instance, you might block by location if soil quality varies across your study site. Each block contains one plot per treatment, and the plots within a block are randomized. This way, soil differences between locations don’t mask the treatment effect you’re trying to detect.

Sample Size and Statistical Power

An experiment can have a perfect design and still fail if it doesn’t include enough subjects. Statistical power is the probability that your experiment will detect a real effect if one exists. The standard target is 80%, meaning you accept a 20% chance of missing a true effect.

Power depends on three things: the size of the effect you’re looking for, how much natural variation exists in your data, and how many subjects you include. Smaller effects require larger sample sizes to detect. A genetics study looking for a difference in allele frequency between 15% and 7% in two groups would need roughly 239 subjects per group (about 578 total) to reach 80% power at standard significance levels.

This is why researchers perform power analyses before starting an experiment. Running a study that’s too small wastes time, resources, and potentially animal lives without producing interpretable results.

The P-Value and What It Actually Means

Most biology experiments use a significance threshold (alpha level) of 0.05, meaning results are considered statistically significant if there’s less than a 5% probability they occurred by chance alone. But this cutoff is a convention, not a law of nature, and it has real limitations.

Rigid reliance on a single threshold increases the risk of both false positives (concluding a treatment works when it doesn’t) and false negatives (concluding it doesn’t work when it does). Lowering the threshold to 0.005, as some researchers have proposed, reduces false positives but increases false negatives. The growing consensus is that significance thresholds should be context-dependent, influenced by sample size, study design, prior evidence, and the size of the effect being measured. A p-value is one piece of information, not a verdict.

Ethical Oversight

Biology experiments involving living subjects face ethical review before they can begin. Research involving human participants must be approved by an institutional review board (IRB) or its equivalent. More than 130 countries have established independent committees for this purpose. The IRB’s primary job is to protect participants’ rights, safety, and welfare, with special attention to vulnerable groups such as children, prisoners, and people with disabilities. The board reviews whether risks are minimized and reasonable relative to anticipated benefits, and it ensures participants give informed consent. An IRB can approve, modify, or reject a study protocol.

Research involving animals goes through a separate committee, typically called an Institutional Animal Care and Use Committee. These bodies evaluate whether the number of animals used is justified, whether pain and distress are minimized, and whether alternatives to animal use were considered. Both types of review shape experimental design directly, because a study that can’t pass ethical review doesn’t happen.