What Is an NPU? Neural Processing Units Explained

An NPU, or neural processing unit, is a specialized computer chip designed to run artificial intelligence tasks quickly and efficiently. Unlike a general-purpose processor, an NPU is built from the ground up to handle the specific math that powers AI features like live captions, photo enhancement, voice recognition, and on-device chatbots. You’ll find NPUs in the latest smartphones, laptops, and tablets, where they work alongside the main processor and graphics chip to keep AI features fast without draining your battery.

How an NPU Works

AI models, at their core, are massive collections of numbers that get multiplied and added together in specific patterns. Training a model to recognize a face or translate a sentence comes down to billions of these multiply-and-add operations performed on grids of numbers called matrices. An NPU contains dedicated hardware modules built specifically for matrix multiplication, convolution (the math behind image recognition), and activation functions that mimic how neurons fire in the brain.

What makes this different from running the same math on a regular processor is the architecture. NPUs use a design called SIMD, which stands for single instruction, multiple data. Instead of crunching one number at a time, the chip applies the same operation to a large batch of numbers simultaneously. It also keeps data close by using high-speed on-chip memory, so the processor spends less time waiting around for numbers to arrive from slower storage. The result is a chip that can tear through the repetitive, parallel math of AI workloads at a fraction of the energy cost of doing the same work on a CPU or GPU.

NPU vs. CPU vs. GPU

A CPU is a generalist. It handles your operating system, runs applications, and manages resources. It processes instructions mostly in sequence and has relatively few cores, which makes it excellent at complex, varied tasks but slow at the kind of massively parallel math AI requires.

A GPU has thousands of small cores designed to break demanding tasks into tiny operations and run them all at once. This parallel approach made GPUs the original workhorses of AI, and they still dominate in data centers for training large models. The tradeoff is power consumption. GPUs are hungry chips that generate significant heat, which is why gaming laptops need beefy cooling systems and burn through battery life.

An NPU sits in a different niche. It delivers parallelism comparable to a GPU, sometimes better for short, repetitive calculations, but consumes dramatically less energy. Server benchmarks show NPUs consuming 35 to 70% less power than GPUs while matching or exceeding their speed on inference tasks. In one comparison, an NPU setup used just 94 watts at peak load while a GPU doing the same work consumed over 314 watts. That efficiency gap widens with larger models: when running big object-detection networks, a GPU surged to 106 to 118 watts while the NPU stayed below 55. For a laptop or phone running on battery, that difference translates directly into hours of extra use.

The three chips aren’t competitors so much as teammates. The CPU handles general computing, the GPU tackles graphics-heavy and large-scale parallel work, and the NPU takes over the AI-specific tasks. Offloading those tasks to the NPU frees up the CPU and GPU for everything else.

Where You’ll Find NPUs Today

Apple was one of the first to put an NPU in a consumer device. The original Neural Engine, built into the A11 chip in the iPhone X in 2017, delivered 0.6 teraflops of processing power and handled Face ID and Memoji. By 2021, the 16-core Neural Engine in the A15 chip hit 15.8 teraflops, a 26-fold increase in just four years. Apple has since expanded the Neural Engine beyond the iPhone to iPads (starting with the A12 chip) and Macs (starting with the M1 chip).

On the Windows and Android side, Qualcomm’s Snapdragon X Elite chips include the Hexagon NPU, rated at up to 45 trillion operations per second (TOPS). Intel and AMD have added NPUs to their latest laptop processors as well. The push toward on-device AI has turned NPU specs into a headline feature for chip announcements, much like camera megapixels became a selling point for phones a decade ago.

What TOPS Means

TOPS, or trillions of operations per second, is the standard metric for NPU performance. It counts how many basic math operations (additions and multiplications) the chip can execute in one second. The industry standard measures TOPS using INT8 precision, a lower-precision number format that’s sufficient for most AI inference tasks and allows the chip to do more work per clock cycle than higher-precision formats.

TOPS matters most as a threshold for software compatibility. Microsoft’s Copilot+ PC certification, for example, requires an NPU capable of at least 40 TOPS. Devices that meet this bar can run features like real-time translation, AI image generation, and advanced search tools locally on the machine. If your laptop’s NPU falls below that mark, those features either won’t be available or will fall back to the CPU, running slower and using more battery.

Everyday Tasks That Use the NPU

You’re probably already using an NPU without realizing it. On a modern smartphone, the NPU powers computational photography: separating you from the background in portrait mode, sharpening night photos, and identifying objects in the frame so the camera knows how to expose the shot. When you unlock your phone with your face, the NPU is running the recognition model.

On laptops, NPUs are increasingly handling video call features like background blur, eye contact correction, and noise cancellation. Live captions and real-time translation in apps like Microsoft Teams run on the NPU. Some devices can now run small language models entirely on-chip, letting you use an AI assistant without sending any data to the cloud. Even creative tools are shifting AI workloads to the NPU: image generation benchmarks show NPUs producing images 14% faster than GPUs while using 70% less power.

Privacy and Local Processing

One of the most practical benefits of an NPU is that it keeps your data on your device. Without an NPU, many AI features would need to send your photos, voice recordings, or documents to a remote server for processing. That introduces latency (the round trip to the cloud and back), uses more energy from the cellular or Wi-Fi radio, and means your personal data passes through someone else’s infrastructure.

With a capable NPU, the AI model runs locally. Your voice never leaves the phone during speech recognition. Your documents stay on your laptop during an AI-powered search. This on-device approach follows privacy-by-design principles, giving you control over your data rather than handing it to a remote server. The tradeoff is capability: local NPUs can run smaller, optimized models well, but the largest and most powerful AI models still require cloud-based GPUs with far more memory and compute power.