Sleep studies are highly accurate for detecting breathing disorders like sleep apnea, but their reliability depends on what’s being measured, where the test is done, and even which technician scores the results. For sleep apnea, in-lab polysomnography (PSG) is considered the gold standard, with accuracy rates above 95% for moderate to severe cases. For other conditions like insomnia, sleep studies perform surprisingly poorly and may be less useful than a simple sleep diary.
How Accurate Lab Sleep Studies Are
A full in-lab sleep study monitors brain waves, eye movements, muscle activity, heart rhythm, breathing effort, airflow, and blood oxygen levels. This combination makes it extremely reliable for identifying sleep apnea and its severity. The core measurement, the apnea-hypopnea index (AHI), captures how many times per hour your breathing partially or fully stops during sleep. For moderate and severe sleep apnea, lab studies consistently achieve near-perfect diagnostic accuracy.
The main limitation is human interpretation. Every sleep study produces hours of raw data that a trained technician must manually score, classifying each 30-second segment into a sleep stage and flagging breathing events. A meta-analysis of scoring agreement found that two technicians reviewing the same recording agree on sleep stages about 82.6% of the time, based on data from thousands of scorers in a standardized reliability program. Individual studies show agreement ranging from 61% to 92%, depending on the complexity of the patient’s sleep patterns. People with sleep-disordered breathing tend to produce harder-to-score data, with agreement dropping to around 71% in some studies, while healthy sleepers’ recordings are more straightforward, with agreement closer to 88%.
This means your sleep stage breakdown (how much deep sleep, light sleep, and REM sleep you got) could shift modestly depending on who reads your study. The good news: clinically important findings like significant sleep apnea are harder to miss, since breathing events are more objectively defined than the subtle brain wave shifts that distinguish one sleep stage from another.
Night-to-Night Variability
A single-night sleep study is a snapshot, and your sleep can vary considerably from one night to the next. Research confirms there is substantial night-to-night variability in AHI for some individuals. Someone who scores a 12 (moderate apnea) on one night might score a 7 (mild) or 18 (moderate-to-severe) on another night, depending on factors like sleep position, alcohol consumption, nasal congestion, and how deeply they sleep.
This variability matters most for people near the diagnostic cutoff points (5, 15, and 30 events per hour), where a single night’s result could place them in a different severity category. For people with clearly severe apnea, one night is usually enough to confirm the diagnosis. At the group level, most measurements don’t differ significantly between a first and second night, but individual results can swing widely.
The First-Night Effect
Sleeping in an unfamiliar lab, wired to sensors, with someone monitoring you from another room is not a normal night. The “first-night effect” typically shows up as lower sleep efficiency, more time spent awake, longer time to fall asleep, and a shift toward lighter sleep stages. Some studies find reduced REM sleep on the first night, though this isn’t always statistically significant. The proportion of the lightest sleep stage (N1) tends to increase.
Sleep physicians are aware of this and interpret results with that context in mind. For sleep apnea diagnosis, the first-night effect is less of a concern because breathing events occur regardless of how deeply you sleep. It’s more relevant for studies evaluating sleep architecture, where a disrupted first night could make your deep sleep and REM sleep look worse than they actually are at home.
Home Sleep Tests vs. Lab Studies
Home sleep apnea tests (HSATs) use a simplified setup, typically monitoring airflow, breathing effort, and blood oxygen, but often skipping brain wave measurement. Without brain wave data, these devices can’t distinguish sleep from quiet wakefulness, which means they estimate rather than measure your actual sleep time. This can dilute the AHI calculation and undercount the severity of apnea, particularly in mild cases.
When brain wave monitoring is added to home devices, accuracy improves significantly. One study comparing home tests with EEG to full lab polysomnography found the home setup correctly identified sleep apnea status in 93% of participants. For moderate to severe thresholds, the area under the curve (a measure of diagnostic accuracy where 1.0 is perfect) reached 0.95 to 1.0, comparable to lab testing. At milder thresholds, accuracy dropped somewhat, to around 0.79.
Technical quality is another factor with home tests. In a study of self-applied home sleep recordings, about 12% of studies didn’t meet minimum quality criteria for diagnosis, and 17% were judged insufficient quality by expert review. Signals from brain wave and eye movement sensors were most prone to problems, with roughly 12% of studies recording less than three hours of usable data from those channels. In a lab, a technician can fix a loose sensor in real time. At home, a bad connection might not be discovered until morning.
Where Sleep Studies Fall Short: Insomnia
If you’re being evaluated for insomnia rather than sleep apnea, a sleep study is a much blunter tool. Research published in the Journal of Clinical Sleep Medicine found that standard polysomnographic measurements failed to accurately distinguish people with diagnosed insomnia from normal sleepers. None of the individual PSG metrics, including time to fall asleep, total sleep time, and wake time after falling asleep, reliably separated the two groups.
The problem runs deeper than measurement error. Many people who meet clinical diagnostic criteria for insomnia actually show normal-looking sleep on a PSG recording. The selection criteria commonly used in insomnia research trials tended to identify 50% or fewer of true insomnia sufferers, and normal sleepers met those same criteria at roughly the same rate. In other words, the test couldn’t tell who had insomnia and who didn’t.
Sleep diaries, where you record your own bedtime, wake time, and sleep quality over one to two weeks, were consistently more accurate than PSG at discriminating insomnia from normal sleep. A cutoff of more than 30 minutes to fall asleep or more than 30 minutes of middle-of-the-night wakefulness, based on diary entries, showed good sensitivity and specificity. This is one of the reasons sleep specialists diagnose insomnia primarily through clinical interview and sleep logs rather than ordering a PSG.
Consumer Wearables vs. Medical Sleep Studies
Smartwatches and sleep rings now offer nightly sleep stage tracking, but their accuracy is far below medical-grade equipment. A validation study tested 11 consumer devices against polysomnography using nearly 4,000 hours of sleep data. The best-performing device for sleep stage classification achieved a macro F1 score (a combined measure of precision and recall) of 0.69, while the worst scored just 0.26. For context, a perfect score is 1.0, and random guessing across four sleep stages would produce roughly 0.25.
Among popular wrist-worn devices, the Fitbit Sense 2 scored 0.58, the Samsung Galaxy Watch 5 scored 0.58, the Google Pixel Watch scored 0.57, the Oura Ring 3 scored 0.52, and the Apple Watch 8 scored 0.49. These devices are reasonably good at detecting when you’re asleep versus awake, but they struggle to correctly identify specific sleep stages, particularly distinguishing deep sleep from light sleep. Consumer trackers are useful for spotting broad trends in your sleep patterns over weeks and months, but a single night’s stage breakdown should be taken loosely.
What Affects Your Results
Several practical factors can shift your sleep study results beyond the inherent accuracy of the equipment. Alcohol and sedatives suppress REM sleep and relax airway muscles, potentially making apnea worse on the study night than on a typical night. Sleeping on your back increases apnea severity for many people, and you may end up in that position more often in a lab bed. Caffeine consumed too late in the day can delay sleep onset and reduce deep sleep, making the study look worse than your normal baseline.
Medications also matter. Antidepressants, beta-blockers, and certain other drugs can alter sleep architecture in ways that show up on a PSG but don’t reflect an underlying sleep disorder. If you’re taking any medications, your sleep physician should factor those into the interpretation. The most accurate study is one where you follow your normal routine as closely as possible in the days leading up to it, including your usual bedtime, caffeine habits, and medications.

