What Is Demand Paging in Operating Systems?

Demand paging is a memory management strategy where the operating system loads a page of data into physical memory only when a program actually tries to use it, not before. Instead of pulling an entire program into RAM at launch, the OS starts the process with zero pages loaded and fetches each one from disk the first time it’s referenced. This “lazy loading” approach is how virtually all modern operating systems handle virtual memory.

Why Demand Paging Exists

The core goal is to make physical memory appear larger than it actually is. A computer might have 8 or 16 GB of RAM but run programs whose combined memory needs far exceed that. Demand paging solves this by keeping only the actively used portions of each program in RAM while the rest stays on disk in a designated area called swap space (also known as the backing store or paging file).

This matters for two practical reasons. First, programs rarely use all of their code and data at once. A word processor loads hundreds of features, but at any given moment you’re only using a few. Loading everything upfront wastes RAM on pages that may never be touched. Second, by keeping memory footprints small, the system can run more programs simultaneously. Demand paging skips loading pages that are never accessed, freeing that memory for other processes and reducing the initial disk overhead when a program starts.

How a Page Fault Works

Every page in a program’s virtual address space has a status marker in the page table: a “valid” bit. When this bit is set to 1, the page is already in physical memory and the hardware translates the virtual address to a physical one normally. When the bit is 0, the page exists only on disk. If the CPU tries to read or execute an instruction from a page marked invalid, the memory management unit (the hardware chip that handles address translation) generates a special interrupt called a page fault trap.

Once the trap fires, the operating system takes over and follows a sequence that looks roughly like this:

Save the current state. The OS saves the running program’s context (register values, instruction pointer) so it can resume later.
Locate the page on disk. The page table entry contains information about where the missing page lives in swap space.
Read the page into a free frame. The OS issues a disk read to copy the page from swap space into an available slot in physical RAM.
Update the page table. Once the read completes, the OS marks the page’s valid bit as 1 and records which physical frame now holds it.
Restart the interrupted instruction. The program resumes exactly where it left off, and this time the memory access succeeds.

Between the disk read and the restart, the CPU isn’t sitting idle. The OS typically switches to another process from the ready queue, so other work gets done while the slow disk operation finishes.

Hardware Requirements

Demand paging can’t work with software alone. It depends on several hardware features working together.

The memory management unit must be able to detect invalid page accesses and generate a trap. Without that, the OS would never know a page was missing. The CPU also needs to save the exact virtual address that caused the fault (on x86 processors, this goes into a dedicated register called CR2) so the OS knows which page to fetch.

Perhaps the trickiest requirement is the ability to restart an instruction after the fault is resolved. A page fault can occur partway through an instruction that has already modified some data. The hardware needs to either undo those partial changes or save enough information about them to resume the instruction midstream. Without this restart capability, it would be impossible to safely continue a process after a page fault.

What Happens When Memory Is Full

Eventually, every frame in physical RAM will be occupied. When a page fault occurs and there’s no free frame available, the OS must choose an existing page to evict, writing it back to disk if it’s been modified. This decision is handled by a page replacement algorithm, and the choice of algorithm directly affects how often page faults happen.

The simplest approach is First In, First Out (FIFO): the page that has been in memory the longest gets evicted. It’s easy to implement but often performs poorly because the oldest page might still be heavily used. Least Recently Used (LRU) is smarter. It evicts the page that hasn’t been accessed for the longest time, based on the idea that pages used recently will likely be used again soon. There’s also the Optimal algorithm, which evicts the page that won’t be needed for the longest time in the future. This produces the fewest possible page faults but requires knowing the future access pattern, so it’s used mainly as a theoretical benchmark to measure how well other algorithms perform.

Performance and Effective Access Time

The speed of demand paging comes down to how often page faults occur. A normal memory access takes around 100 nanoseconds. A page fault, by contrast, requires a disk read that can take millions of nanoseconds. Even a small increase in the page fault rate can drag down overall performance significantly.

Operating systems use a translation lookaside buffer (TLB), a small, fast hardware cache that stores recent virtual-to-physical address mappings, to speed things up. When the CPU generates a virtual address, the TLB is checked first (a lookup takes roughly 5 nanoseconds). If the mapping is found there, the total access time is about 105 nanoseconds: the TLB lookup plus the memory access. If the mapping isn’t in the TLB, the system has to consult the full page table in memory first, adding another 100 nanoseconds.

The hit ratio of the TLB matters enormously. With a hit ratio of 80%, the effective access time works out to about 125 nanoseconds. Push that ratio to 98%, which is typical with a reasonably sized TLB, and the effective access time drops to about 107 nanoseconds, only slightly above the raw memory speed. This is why TLB design is so critical to demand paging performance.

Thrashing: When Demand Paging Breaks Down

Demand paging works well when each process’s actively used pages (its “working set”) fit comfortably in the available physical frames. Problems start when too many processes compete for too little memory. The OS notices CPU utilization dropping (because processes are waiting on disk reads), so it may try to load even more processes, which makes the memory pressure worse. Each process starts faulting constantly, and the system spends more time swapping pages between RAM and disk than actually running programs.

This cycle of low CPU utilization leading to more processes leading to more page faults is called thrashing. During thrashing, the computer can slow to a crawl even though the CPU itself isn’t doing much useful work. The fundamental cause is that the combined active memory needs of all running processes exceed the available physical frames. Operating systems combat this by monitoring the page fault rate per process. If a process faults too frequently, the OS can allocate it more frames or, if none are available, suspend other processes to free up memory.