How Is Sleep Score Calculated and What Moves It?

Sleep scores are calculated by combining data from multiple sensors on your wearable device, primarily movement tracking, heart rate, and heart rate variability (HRV). These inputs feed into proprietary algorithms that estimate how long you slept, how much time you spent in each sleep stage, and how restful your night was overall. The result is typically a single number out of 100, but what goes into that number varies by brand.

The Raw Data Your Device Collects

Every wearable sleep tracker starts with the same basic toolkit. An accelerometer detects body movement, which helps determine when you fell asleep, when you woke up, and how restless you were in between. An optical heart rate sensor (called a photoplethysmography or PPG sensor) continuously reads your pulse, and from that data, the device derives your heart rate variability, the tiny fluctuations in timing between each heartbeat.

These two data streams, movement and heart signals, are the foundation of sleep staging. Your body behaves differently in each phase of sleep. During deep sleep, your heart rate drops to its lowest point and beats very steadily, while your body is nearly motionless. During REM sleep, your heart rate becomes more variable and irregular, closer to waking patterns, but your body stays still. Light sleep falls somewhere in between, with moderate heart rate and occasional shifts in position. The algorithm looks for these patterns to classify each chunk of the night into a sleep stage.

Some devices collect additional signals. Certain Garmin watches track respiration rate and blood oxygen saturation. These extra data points can flag disruptions like breathing irregularities during sleep, though not all brands incorporate them into the final score.

How the Score Gets Assembled

Once the algorithm has mapped your night into sleep stages and wake periods, the device weighs several components to produce a single number. While the exact formulas are proprietary, the general architecture is similar across brands: duration, depth, continuity, and timing all matter.

Oura breaks this down most transparently. Its sleep score draws from seven contributors: total sleep time, sleep efficiency (the percentage of time in bed actually spent sleeping), restfulness (how often you woke up or moved excessively), time in REM sleep, time in deep sleep, sleep latency (how long it took you to fall asleep), and timing (whether your sleep aligned with your circadian rhythm). Each contributor is scored independently, and they combine into the final number.

Garmin takes a similar approach but adds stress data. Your average stress score during sleep, derived from HRV, factors into the overall calculation. A night where your nervous system stayed calm and recovered well will score higher than one where your body showed signs of physiological stress, even if the total hours were identical. Garmin also adjusts for your age and personal baseline readings, which helps the score reflect what’s normal for you rather than applying a universal standard.

Whoop focuses heavily on the concept of sleep need versus sleep achieved. Its “sleep performance” metric compares how much sleep you actually got against how much your body needed, based on your recent strain, sleep debt, and recovery patterns. Sleep efficiency plays a central role: the general benchmark in sleep research is that spending more than 90% of your time in bed actually asleep counts as efficient sleep, while dropping below 85% signals a problem.

Why HRV Matters So Much

Heart rate variability is arguably the most influential input in modern sleep scoring. HRV reflects how well your autonomic nervous system is functioning. When you’re well-rested and recovered, the intervals between heartbeats vary more, a sign that your body can flexibly respond to demands. When you’re stressed, sleep-deprived, or fighting off illness, those intervals become more rigid.

Research has confirmed the connection between HRV and sleep quality. People with poorer sleep quality and greater daytime dysfunction consistently show lower HRV during stress. Higher HRV during the night signals that your parasympathetic nervous system (the “rest and digest” branch) is dominant, which is exactly what should happen during restorative sleep. Devices use this relationship in both directions: low nighttime HRV pulls your score down, while high, stable HRV pushes it up.

How Accurate Are These Scores

The gold standard for measuring sleep is polysomnography, a clinical test that monitors brain waves directly with electrodes on the scalp. Wearables don’t have access to brain activity, so they’re estimating sleep stages from indirect signals. A 2023 validation study tested 11 consumer sleep trackers against polysomnography and found meaningful differences in accuracy across both devices and sleep stages.

Light sleep was the easiest stage for wearables to detect. Top performers like the Google Pixel Watch, Galaxy Watch 5, and Fitbit Sense 2 achieved agreement scores (measured on a 0-to-1 scale) between 0.71 and 0.73. Deep sleep was the hardest to get right. The best wearable in the study, the Pixel Watch, scored only 0.59 for deep sleep detection, while the Apple Watch 8 dropped to 0.31. REM sleep fell in the middle, with the Fitbit Sense 2 leading at 0.66. The Oura Ring scored 0.60 for light sleep, 0.43 for deep, and 0.60 for REM.

What this means practically is that your device is reasonably good at telling you whether you slept well or poorly on a given night, and it’s useful for tracking trends over weeks and months. But the specific minutes it assigns to deep versus REM sleep can be off by a meaningful margin. If your tracker says you got 45 minutes of deep sleep, the true number could be noticeably higher or lower.

Age, Gender, and Personal Baselines

Sleep architecture changes significantly with age. A 25-year-old typically gets more deep sleep than a 60-year-old, and what counts as a “good” night shifts accordingly. Some devices account for this by adjusting their scoring benchmarks based on your age, so that a 55-year-old isn’t penalized for getting less deep sleep than a college student.

However, the underlying algorithms themselves can carry biases. Research examining two widely used sleep-scoring algorithms found that age introduced significant, nonlinear errors in how sleep stages were classified. The magnitude and variability of those errors changed systematically across different age groups. Gender also introduced bias: one algorithm performed measurably worse for male subjects, and both showed gender-related differences in how accurately they scored specific sleep markers.

This is why personal baselines matter. Devices that learn your patterns over time, comparing tonight’s data against your own 30-day or 90-day averages, tend to give more useful scores than those applying a fixed universal formula. If your tracker asks for your age and sex during setup, it’s using that information to calibrate what “normal” looks like for you.

What Actually Moves Your Score

Understanding the inputs helps you interpret what your score is telling you. A low score usually traces back to one or more of these factors:

Not enough total sleep. This is the most straightforward driver. If you need 7.5 hours and got 5.5, no amount of deep sleep will rescue your score.
Frequent wake-ups. Every time you wake up during the night, your sleep efficiency drops. Tossing, turning, and getting out of bed all count against restfulness metrics.
Low deep or REM sleep. These stages are scored separately because they serve different functions. Alcohol, for example, tends to increase deep sleep early in the night but suppress REM sleep later, which shows up as an imbalanced score.
High stress or low HRV. If your body didn’t shift into recovery mode overnight, devices that track HRV will reflect that. Late meals, alcohol, and illness are common culprits.
Poor timing. Going to bed much later or earlier than your usual pattern can lower your score even if the total hours are the same, because your sleep is misaligned with your circadian rhythm.

The relative weight of each factor differs by brand. Oura treats all seven contributors as part of a holistic picture. Garmin leans heavily on HRV-derived stress recovery. Whoop centers its scoring on whether you met your personalized sleep need. No single number captures everything about sleep quality, but tracking these components over time gives you a practical sense of what helps and what hurts your rest.