Facial recognition works by converting a photograph or video frame of a face into a mathematical code, then comparing that code against stored codes to find a match. The process happens in a series of steps: detecting a face in an image, mapping its key features, translating those features into a numerical representation, and checking that representation against a database. Modern systems can perform this entire sequence in under a second.
The Four Core Steps
Every facial recognition system, whether it unlocks your phone or scans a crowd at an airport, follows the same basic pipeline.
Step 1: Face detection. The system first identifies that a face exists within an image or video frame. This is the same technology that draws a rectangle around faces in your phone’s camera app. It separates the face from the background, other objects, and other people in the scene.
Step 2: Landmark mapping. Once a face is isolated, the system locates specific facial landmarks: the centers of the eyes, the tip of the nose, the corners of the mouth, the edges of the jawline. These anchor points create a geometric skeleton of the face. Some systems map dozens of landmarks, others map hundreds. This step also corrects for angle and tilt, so a face turned slightly to the side can still be compared to a straight-on photo.
Step 3: Feature extraction. This is where the real intelligence lives. A neural network analyzes the aligned face and encodes its identity-specific features into a string of numbers called an “embedding,” a vector in high-dimensional space. Think of it as a unique numerical fingerprint for your face. This embedding captures the structural relationships between your features (the distance between your eyes relative to the width of your nose, the depth of your eye sockets, the shape of your cheekbones) while filtering out things like lighting, expression, and whether you’re wearing glasses. Two photos of the same person taken years apart should produce similar embeddings. Two photos of different people should not.
Step 4: Matching. The system compares the new embedding against stored embeddings using a distance calculation. If two embeddings are close enough in mathematical space, the system declares a match. If the gap exceeds a set threshold, it declares no match. The strictness of that threshold determines the tradeoff between security and convenience: a tight threshold means fewer false matches but more frequent failures to recognize the real person.
2D Photos vs. 3D Depth Sensing
Most facial recognition systems work with standard 2D photographs or video frames. Security cameras, social media tagging, and law enforcement databases all rely on flat images. These systems are effective but can be fooled by high-quality printed photos or video playback in some cases.
Newer hardware-based systems add a third dimension. Apple’s Face ID, for example, uses a “TrueDepth” camera that projects and analyzes thousands of invisible infrared dots onto your face, creating a detailed depth map. This map captures the three-dimensional contour of your features, not just their arrangement on a flat plane. Because it relies on depth information, it can’t be tricked by a printed photograph or a 2D digital image. The infrared approach also works in complete darkness, since it doesn’t depend on visible light.
How the System Measures a “Match”
The mathematical core of facial recognition comes down to distance. Each face embedding is a point in a space with 128 or more dimensions (far beyond anything you can visualize, but the math works the same as measuring the distance between two points on a map). When two embeddings sit close together in that space, the faces share the same identity. When they’re far apart, they belong to different people.
System operators choose a distance threshold that balances two types of errors. A “false match” means the system incorrectly says two different people are the same person. A “false non-match” means it fails to recognize someone it should. In high-security settings like border control, the threshold is set very tight, accepting more false non-matches (which just means rescanning or manual review) to minimize false matches. On a personal phone, the threshold can be slightly more forgiving because convenience matters more and the risk of a false match is lower with a database of one.
The National Institute of Standards and Technology (NIST) regularly benchmarks facial recognition algorithms. Top-performing systems now achieve false match rates as low as one in a million while still correctly identifying the right person the vast majority of the time. Performance varies significantly depending on image quality: controlled, well-lit photos (like passport images) produce far better results than grainy surveillance footage or faces captured at steep angles.
Why Accuracy Varies by Demographic
Facial recognition does not perform equally well on all faces. A landmark 2019 NIST study evaluated 189 algorithms from 99 developers and found consistent demographic disparities. For one-to-one matching (comparing a live photo to a stored one), Asian and African American faces produced false positive rates 10 to 100 times higher than those for Caucasian faces, depending on the algorithm. American Indian demographics had the highest false positive rates overall.
The source of the bias matters. Algorithms developed in the United States showed large accuracy gaps between Asian and Caucasian faces, but algorithms developed in Asia did not show the same disparity. This strongly suggests that training data is a primary driver: systems learn to recognize the types of faces they see most during training. When training datasets overrepresent one demographic, the system becomes less reliable for everyone else.
For one-to-many matching, the kind used when scanning a face against a large database of suspects, African American women had the highest false positive rates. This type of error is especially consequential because a false positive in a law enforcement database can lead to wrongful investigation or detention. The uneven accuracy across demographics remains one of the most significant criticisms of deploying facial recognition at scale.
Where It’s Used Today
Facial recognition has become embedded in daily life in ways many people don’t notice. Phone unlocking is the most personal application. Airport security increasingly uses it for boarding verification, matching your face to your passport photo without needing to hand over documents. Social media platforms use it to suggest photo tags. Retailers use it to identify known shoplifters. Banks use it for identity verification when you open an account remotely.
Law enforcement use is the most controversial application. Police departments use one-to-many matching to compare surveillance footage against databases of mugshots or driver’s license photos. This type of search introduces the highest error risk because the system is comparing one face against millions, and even a tiny false match rate produces real false accusations at that scale.
The Regulatory Landscape
Governments are beginning to draw legal boundaries around facial recognition. The European Union’s AI Act, one of the most comprehensive AI regulations in the world, includes a prohibition on real-time remote biometric identification in publicly accessible spaces for law enforcement. In practice, however, the restriction is narrower than it sounds: it includes broad exemptions for serious crime, missing persons searches, and terrorism prevention, meaning law enforcement can still use the technology in many scenarios.
In the United States, regulation is patchwork. Several cities, including San Francisco and Boston, have banned government use of facial recognition. Illinois requires companies to get consent before collecting biometric data, including face scans. But there is no federal law governing facial recognition, so rules vary dramatically depending on where you live and who is operating the system.
What Limits the Technology
Despite rapid improvement, facial recognition still struggles in specific conditions. Low lighting, extreme angles (faces turned 45 degrees or more from the camera), masks or heavy occlusion, aging over long time periods, and low-resolution images all degrade accuracy. NIST benchmarks show that matching a recent visa photo to another recent visa photo performs far better than matching a border surveillance image to a photo taken 12 or more years earlier.
Speed is another practical constraint. Matching one face against one stored template is nearly instantaneous. But searching one face against a database of millions requires enormous computational power. Modern systems use GPU-based processing to achieve near real-time results, compressing what would take days on a standard processor into minutes or seconds. The larger the database, the more hardware is needed to keep searches fast enough for real-world use.

