What Is an ALU? Arithmetic Logic Unit Explained

An ALU, or arithmetic logic unit, is the part of a computer processor that does the actual math and decision-making. Every time your computer adds two numbers, compares values, or performs a logical operation, an ALU handles it. It’s one of the core building blocks inside a CPU, and it also appears in graphics processors and other specialized chips.

What an ALU Actually Does

An ALU performs two broad categories of work: arithmetic and logic. On the arithmetic side, it handles addition, subtraction, multiplication, and division using whole numbers (integers) in binary format. On the logic side, it performs bitwise operations like AND, OR, NOT, and XOR, which compare individual bits of data and produce results based on simple true/false rules. These operations sound basic, but every complex task your computer performs, from rendering a webpage to running a video game, ultimately breaks down into millions of these tiny calculations per second.

A standard ALU works only with integers. When a processor needs to handle decimal numbers (floating-point numbers), it passes that work to a separate component called a floating-point unit, or FPU. The FPU follows a specific international standard for how decimal numbers are represented and rounded, which is why your computer can do precise scientific calculations without rounding errors creeping in at every step.

How It Fits Inside a Processor

The ALU sits inside the CPU’s datapath alongside two other major components: the control unit and the registers. Registers are tiny, ultra-fast storage slots that hold the numbers the ALU is about to work with and the results it produces. The control unit acts as a traffic director, reading each instruction from a program and telling the ALU which operation to perform.

This design traces back to 1945, when mathematician John von Neumann published a paper describing a computer architecture where the processor is physically separated from memory and internally divided into a control unit and an arithmetic/logic unit. That basic blueprint, known as the von Neumann architecture, still defines how virtually all modern processors are organized.

Here’s how data flows through this system during a typical calculation. The control unit fetches an instruction from memory, decodes it, and loads the two input values (called operands) from the register file into the ALU. The ALU computes the result based on control signals that specify the operation, then writes the result back to a destination register. This fetch-decode-execute cycle repeats billions of times per second in a modern chip.

Inside the ALU: Gates and Adders

At the hardware level, an ALU is built from logic gates, the simplest electronic circuits that take one or two binary inputs and produce a single output. Four gate types do most of the work: AND, OR, NOT, and XOR. By wiring these gates together in specific patterns, engineers create more complex components. A full adder, for example, uses just five logic gates to add two single-bit numbers while accounting for a carry digit from a previous addition. Chain 32 full adders together and you get a unit that can add two 32-bit numbers.

A typical 32-bit ALU contains parallel units for different operations: a bitwise AND unit, a bitwise OR unit, and an add/subtract unit. All three receive the same inputs and compute their results simultaneously. A component called a multiplexor then selects which result to output based on control signals. Two control lines can encode four possible operations:

00: AND
01: OR
10: Addition
11: Subtraction

This parallel-compute-then-select approach is faster than computing operations one at a time, because the ALU doesn’t waste clock cycles figuring out which operation to do before starting the math.

Status Flags

After every operation, the ALU updates a set of status flags that other parts of the processor can check. These flags are single bits stored in a special register, and they capture important properties of the result:

Zero flag: Set to 1 if the result is exactly zero. This is how a processor checks whether two values are equal (subtract one from the other and see if the result is zero).
Carry flag: Set to 1 when an arithmetic operation produces a carry or borrow beyond the highest bit, indicating the result exceeded the register’s capacity for unsigned numbers.
Sign flag: Reflects whether the result is positive or negative by copying the leftmost bit of the result.
Overflow flag: Set when a result is too large (or too small, for negative numbers) to fit in the register, even accounting for the sign bit.

These flags are the foundation of every “if/then” decision a program makes. When your code checks whether a number is greater than another, the processor subtracts one from the other in the ALU and reads the resulting flags to determine the answer.

Multiple ALUs in Modern Chips

Early processors contained a single ALU, but modern CPUs pack several into each core. This design, called superscalar architecture, allows a processor to execute more than one instruction per clock cycle. A two-way superscalar processor, for instance, contains two ALUs and can fetch, decode, and execute two instructions simultaneously. Its register file has six ports (four for reading input values, two for writing results) so both ALUs can access data at the same time. Performance in these designs is measured in instructions per cycle, or IPC, with a two-way design achieving an IPC of 2 when both pipelines are fully utilized.

Graphics processors take this concept to an extreme. Where a CPU might have a handful of powerful, complex ALUs per core, a GPU trades away memory cache to pack in hundreds or thousands of simpler ALUs. Each individual ALU is less capable than a CPU’s, but the sheer number of them working in parallel makes GPUs ideal for tasks like 3D rendering or machine learning, where the same basic math needs to be applied to millions of data points at once.

ALU vs. FPU

One common point of confusion is the difference between an ALU and a floating-point unit. A standard ALU works exclusively with integers in binary, handling whole numbers in both unsigned and two’s complement (signed) formats. An FPU is a separate circuit designed specifically for decimal arithmetic, performing addition, subtraction, multiplication, and division on floating-point numbers according to the IEEE 754 standard. Early personal computers didn’t include an FPU at all, and users who needed fast decimal math had to buy a separate coprocessor chip. Today, every general-purpose CPU includes both an ALU and an FPU on the same die, but they remain functionally distinct units handling different number formats.