Mixed reality works by using sensors to build a real-time 3D map of your physical environment, then layering digital content into that map so virtual objects appear to coexist with real ones. Unlike virtual reality, which blocks out the real world entirely, mixed reality keeps your surroundings visible and places holograms, interfaces, or 3D models directly into the space around you. The result is digital content that sits on your actual desk, hides behind your actual couch, and stays put when you walk around it.
How the Headset Sees Your Room
The foundation of mixed reality is depth sensing. Headsets use a combination of cameras and specialized sensors to measure the distance between you and every surface in the room. The two most common technologies are LiDAR and time-of-flight (ToF) sensors. Both work on the same basic principle: they emit pulses of light (typically infrared, invisible to you) and measure how long each pulse takes to bounce back. That return time translates directly into distance.
LiDAR fires laser pulses and generates detailed 3D point clouds, essentially a dense scatter plot of thousands of measured points that together form a comprehensive spatial map. ToF sensors produce depth maps that are less granular but fast enough for real-time interaction. Many headsets combine both approaches with standard RGB cameras to capture color and texture information alongside depth. Together, these sensors scan your environment dozens of times per second, identifying walls, floors, furniture, and other surfaces so the system knows exactly where to place digital content.
Tracking Your Hands and Eyes
Once the headset understands the room, it needs to understand you. Mixed reality devices track your head position, hand movements, and in many cases your eye gaze to let you interact with virtual objects naturally.
Hand tracking relies on computer vision algorithms that analyze camera feeds frame by frame. The system identifies your fingers by measuring the spatial relationship between each fingertip and your wrist. To detect a fist, for example, the algorithm checks whether all fingertips have moved within a threshold distance of the wrist center. To distinguish an intentional gesture from a casual hand movement, the system compares consecutive hand positions along three axes (left/right, up/down, and forward/backward) and filters out shifts smaller than about 5 millimeters, which are likely just tracking noise or minor fidgeting.
This processing happens every frame. The system calculates Euclidean distances between joint positions, determines what gesture you’re making, and translates that into a command: pinch to select, push to press a virtual button, grab to move a hologram. Eye tracking works similarly, using infrared cameras pointed at your eyes to determine exactly where you’re looking, which lets the interface respond to your gaze before you even move your hands.
Placing Digital Objects in Physical Space
The core challenge of mixed reality is making virtual content feel physically present. This requires solving several problems simultaneously.
First, the headset has to render objects at the correct size, angle, and distance relative to your position. As you move your head, the system recalculates the perspective of every virtual object in real time. If a holographic coffee mug is sitting on your table, it needs to look smaller as you walk away and larger as you lean in, just like a real mug would.
Second, the system has to handle occlusion, the visual effect where closer objects block your view of farther ones. Without occlusion handling, virtual objects always appear to float in front of everything, which immediately breaks the illusion. Mixed reality headsets solve this by comparing the depth of virtual objects against the depth map of the real scene. If a real chair is closer to you than a virtual object behind it, the system redraws the pixels of the chair over the virtual content, creating a seamless border where the real object naturally blocks the digital one. This simple-sounding process is what makes a hologram genuinely appear to be inside your room rather than pasted on top of a camera feed.
Why Latency Matters So Much
The time between moving your head and seeing the virtual world update is called motion-to-photon latency, and it’s the single most important performance metric in mixed reality. If the delay is too long, digital objects appear to swim or lag behind your movements, and your brain quickly registers the mismatch between what your eyes see and what your inner ear feels. The result is cybersickness: nausea, dizziness, and disorientation.
Research consistently shows that higher latency produces more severe symptoms. Studies using head-mounted displays found that latency as low as 26 milliseconds can disturb standing balance, while delays above 50 milliseconds produce measurable increases in cybersickness scores that scale with the added lag. At delays of 150 milliseconds or more, symptoms become pronounced for most users. Notably, cybersickness from latency requires head movement to trigger. If you hold perfectly still, even significant delay won’t make you feel sick, because there’s no mismatch for your brain to detect.
Modern mixed reality headsets target latency under 20 milliseconds. Achieving this requires tight coordination between the sensors scanning the room, the processor calculating new object positions, and the display rendering the updated frame. Dedicated chips for spatial computing, predictive algorithms that estimate where your head will be a few milliseconds from now, and high-refresh-rate displays (90 Hz or above) all contribute to keeping that number low enough to feel comfortable.
How Content Stays in Place
One of the more impressive tricks in mixed reality is spatial persistence: the ability for a virtual object to stay exactly where you put it, even after you close an app or take off the headset. This works through spatial anchors, which are essentially digital pins tied to specific coordinates in the headset’s map of your environment.
When you place a hologram on your kitchen counter, the system saves a spatial anchor that records the precise location using the surrounding sensor data as reference points. That anchor gets stored on the device. When you put the headset back on days later, the sensors rescan your kitchen, recognize the same surfaces and geometry, and reload the anchor at its original position. The sensor data behind each anchor is kept permanently so the system can relocate it in future sessions.
There’s a practical tradeoff: every persisted anchor consumes storage and processing resources, which reduces the headset’s capacity to track new anchors. If you anchor dozens of objects around your home, the system has to maintain all that reference data, which can eventually limit performance. Spatial anchors can also be shared between devices, which is how two people wearing headsets in the same room can see and interact with the same virtual objects in the same locations.
Software That Ties It All Together
The hardware creates the raw spatial data. Software interprets that data, renders the visuals, and manages input. For years, a major obstacle was fragmentation: every headset manufacturer used its own proprietary software interface, forcing developers to essentially rebuild their apps for each device.
The industry’s answer is OpenXR, a royalty-free open standard maintained by the Khronos Group. OpenXR provides a single set of programming interfaces that work across a wide range of mixed reality and virtual reality devices. Before OpenXR, developers had to write separate code paths for every headset they wanted to support. Now they can build once and port to new platforms with minimal rework, or in some cases no modifications at all. The standard covers head-mounted displays, hand controllers, body and eye trackers, haptic feedback devices, and more.
This standardization has practical consequences for users too. It means the app ecosystem grows faster because developers can reach more customers without multiplying their workload. It also means that when a new headset launches with OpenXR support, existing apps are more likely to work on day one rather than requiring months of adaptation.
The Full Loop in Real Time
Every moment you spend in a mixed reality headset, a cycle repeats dozens of times per second. Sensors scan the room and build a fresh depth map. Cameras track your hands, head, and eyes. The processor compares this new data against its existing spatial map, updates the position of every virtual object relative to your viewpoint, calculates which real objects should occlude which virtual ones, and renders a stereo image (one slightly different view for each eye to create depth perception). That image reaches the display within roughly 10 to 20 milliseconds of your last head movement.
This entire loop, from photons bouncing off your walls to photons hitting your retinas, is what makes a holographic sticky note look like it’s actually stuck to your refrigerator. The technology isn’t simulating a separate world. It’s reading the real one fast enough, and accurately enough, to weave digital content into it before your brain notices the seams.

