What Does the Marshmallow Test Really Measure?

The marshmallow test was designed to measure a child’s ability to delay gratification, the capacity to resist an immediate reward in favor of a larger one later. But decades of follow-up research have complicated that simple answer. What the test actually captures may have less to do with willpower as a personal trait and more to do with a child’s broader environment, cognitive development, and whether they trust the world to deliver on its promises.

How the Original Test Worked

Psychologist Walter Mischel developed the test in the late 1960s and early 1970s at Stanford University. The setup was simple: a preschool-age child sat alone in a room with a treat, typically a marshmallow, cookie, or pretzel. A researcher told the child they could eat the treat now, or wait until the researcher returned and get two treats instead. Then the researcher left.

The children were asked to wait between 15 and 20 minutes, depending on the study. Mischel’s original work included over 600 preschoolers. Some ate the marshmallow immediately. Others squirmed, covered their eyes, sang to themselves, or looked away, doing whatever they could to hold out. The key measurement was straightforward: how many seconds or minutes could the child wait?

What Mischel Thought It Predicted

The test became famous because of what happened when Mischel followed up with those same children years later. Kids who waited longer for the second marshmallow appeared to have better outcomes as teenagers and adults. They scored higher on the SAT, were reported to handle stress better, and showed stronger social skills. Later research even linked longer wait times to lower body mass index 30 years down the line. The implication seemed clear: the ability to delay gratification at age four predicted success across a lifetime.

This narrative made the marshmallow test one of the most cited experiments in psychology. It suggested that self-control was a stable, measurable trait, something baked into a child early on, with consequences that rippled through decades.

What It Actually Seems to Capture

More recent and more rigorous studies have significantly weakened those original claims. A major 2024 analysis published in Child Development found little evidence that marshmallow test performance reliably predicts adult functioning. The few connections that did show up between wait time and later outcomes were almost entirely explained by confounding factors: the child’s existing cognitive ability, their family’s socioeconomic status, and other early-life advantages.

In other words, the test may not measure some unique capacity for self-control at all. Instead, it may function more like a screener for broader developmental advantages in early childhood. A child from a stable, resource-rich home with strong cognitive skills is more likely to wait for the second marshmallow. That same child is also more likely to do well in school and adulthood, but not necessarily because they waited for the marshmallow.

This distinction matters. Other measures of childhood self-control, particularly those based on behavioral observations across years rather than a single lab task, do predict adult outcomes even after accounting for IQ and family income. The marshmallow test, as a brief snapshot of one moment in a preschooler’s life, doesn’t appear to carry the same predictive weight.

Trust Changes Everything

One of the most striking challenges to the original interpretation came from a 2013 study that added a twist: before the marshmallow task, researchers interacted with children in either a reliable or unreliable way. In the reliable condition, an adult promised the child better art supplies and then delivered. In the unreliable condition, the adult made the same promise but came back empty-handed.

The results were dramatic. Children who had just experienced a reliable adult waited an average of 12 minutes and 2 seconds. Children who had experienced an unreliable adult waited just 3 minutes and 2 seconds. Same test, same marshmallow, but a completely different outcome based on whether the child believed the promise would be kept.

This reframed the test entirely. A child who eats the marshmallow right away isn’t necessarily impulsive or lacking self-control. They may be making a perfectly rational decision based on their life experience. If adults in your world don’t follow through, taking the sure thing now is the smart move.

The Role of Mental Strategies

Mischel himself identified something interesting about how children who waited longer managed to do it. The key was distraction. Children who looked at the marshmallow, thought about how it tasted, or focused on wanting it tended to give in quickly. Children who turned away, sang songs, counted, or imagined the marshmallow as something abstract (a puffy cloud, a cotton ball) lasted far longer.

Mischel described this as the difference between “hot” and “cool” thinking. Hot thinking focuses on the appealing, sensory qualities of the reward. Cool thinking reframes the situation or shifts attention elsewhere. Children who naturally used cool strategies waited longer, but these strategies could also be taught. When researchers told children to pretend the marshmallow was just a picture, wait times increased substantially.

This finding suggests the test partly measures a child’s access to cognitive strategies for managing impulses, not just raw willpower. Whether a four-year-old has learned those strategies depends heavily on their developmental stage and the environment they’ve grown up in.

Why the Test Still Matters

Despite the weakened predictive claims, the marshmallow test remains valuable as a window into how self-regulation works in early childhood. It shows that delaying gratification isn’t a fixed personality trait. It’s a behavior shaped by trust, cognitive development, learned strategies, and context. Two children with identical “willpower” can produce wildly different wait times depending on whether they trust the person making the promise.

The test also illustrates a broader lesson about interpreting psychological research. Early findings from a small, homogeneous sample (Mischel’s original participants attended a preschool on Stanford’s campus) don’t always hold up when tested in larger, more diverse populations. One follow-up study noted that its sample reported an average net worth of $1.8 million, raising obvious questions about how generalizable the results were.

So what does the marshmallow test measure? At its core, it measures how long a young child can resist a treat in a specific situation. What drives that behavior is a tangle of cognitive ability, environmental trust, family background, and mental strategies. It is not, as it was once presented, a reliable window into a child’s future.