What Is Multiprogramming and How Does It Work?

Multiprogramming is a technique where an operating system loads multiple programs into memory at the same time and switches the CPU between them so the processor is rarely sitting idle. Instead of waiting for one program to finish before starting another, the system keeps several programs ready to go and hands the CPU to a different one whenever the current program pauses, typically to read from a disk or wait for network data. This single idea transformed computers from machines that wasted most of their time waiting into systems that could keep busy almost continuously.

How Multiprogramming Works

At its core, multiprogramming solves a simple problem: CPUs are fast, but input/output operations (reading a file, printing a document, fetching data from a network) are slow by comparison. In early computing, when a program needed to read data from a disk, the CPU would sit completely idle until that data arrived. Those idle gaps could eat up a huge portion of processing time.

With multiprogramming, the operating system’s kernel steps in during those idle moments. Whenever a running program initiates an I/O operation, the kernel selects a different program from those already loaded in memory and lets it use the CPU instead. When that program also pauses for I/O, the kernel picks yet another one. The result is a rotation that keeps the processor doing useful work almost all the time.

A concrete example makes the difference clear. Imagine three jobs that collectively need 15 seconds of actual CPU time but also require various waits for I/O. Run one at a time without multiprogramming, and those 15 seconds of real work stretch across 25 seconds of wall-clock time, giving you only 60% CPU utilization. Let the operating system interleave those same three jobs, filling each program’s I/O waits with another program’s CPU work, and utilization jumps to roughly 94%. Every program also finishes sooner than it would have running alone, because none of them is forced to wait behind another program’s idle time.

Where Multiprogramming Came From

The concept became commercially significant with IBM’s System/360, announced on April 7, 1964. That system shipped with a disk-based operating system that explicitly supported multiprogramming, making it one of the first widely available platforms to offer the feature. Before that, most computers ran in batch mode: one job loaded, one job executed, one job finished, then the next job started. The System/360 announcement reshaped the entire computer industry, and multiprogramming was a key part of why.

Keeping Multiple Programs in Memory

For multiprogramming to work, multiple programs need to coexist in RAM at the same time. The operating system has to carve up available memory so each program gets its own space without interfering with the others. Two broad strategies evolved to handle this.

The simpler approach is fixed partitioning: when the system boots, memory is divided into blocks of predetermined sizes. A small program gets assigned to a small partition, a large one to a large partition. This is easy to manage but inflexible. If no program currently needs a large partition, that memory goes to waste. If a program is slightly too big for one partition, it gets bumped to the next size up, wasting the leftover space inside that partition.

The more common approach is variable partitioning, where the operating system allocates exactly as much memory as each program needs (rounded up to some minimum unit). When a new program arrives, the system finds a free block, splits off the right amount, and returns the remainder to the pool of available memory. This is more efficient but introduces a problem called fragmentation: over time, as programs load and exit, memory becomes riddled with small gaps that are individually too tiny to be useful, even though their combined size would be plenty.

How the CPU Decides What Runs Next

When the CPU becomes available, the operating system needs a rule for choosing which waiting program gets it. This is called CPU scheduling, and several strategies exist, each with trade-offs.

  • First-come, first-served (FCFS): Programs run in the order they arrived, like a line at a bank. Simple, but if the first program in line needs a long stretch of CPU time, every program behind it waits. This “convoy effect” can make the system feel sluggish even when most jobs are short.
  • Shortest job first (SJF): The operating system estimates which program needs the least CPU time and runs that one next. This minimizes average wait times but requires the system to predict how long each job will take, which isn’t always possible.
  • Round robin (RR): Each program gets a small, fixed time slice (called a quantum). When the slice expires, the program goes to the back of the line and the next one gets a turn. This gives the appearance of all programs running simultaneously. If the time slice is very small, every program progresses at roughly the same rate. If it’s very large, round robin behaves just like first-come, first-served.

Most modern operating systems use variations or combinations of these strategies, often with priority levels that let time-sensitive programs jump ahead of background tasks.

Multiprogramming vs. Multiprocessing

These terms sound similar but describe different things. Multiprogramming is about keeping one CPU busy by switching between programs. Multiprocessing means a computer has multiple CPUs (or multiple cores on a single chip) that can literally run different programs at the same instant. Multiprogramming was designed for single-processor machines, though the same switching logic still operates inside multiprocessor systems to manage more programs than there are available cores.

You’ll also see the term “multitasking,” which is essentially the modern name for the same idea. Early literature called it multiprogramming; today’s operating systems documentation usually calls it multitasking. The underlying principle is identical: share processor time among multiple programs to reduce wasted cycles.

Trade-Offs and Overhead

Multiprogramming isn’t free. Every time the operating system switches from one program to another, it has to save everything about the current program’s state (what it was computing, where it was in memory, the contents of CPU registers) and load the saved state of the next program. This “context switch” takes a small but real amount of time. With too many programs loaded, the system can spend more time switching than doing actual work.

Memory fragmentation is the other significant cost. As programs of different sizes load and exit, the gaps they leave behind get progressively harder to reuse. Some allocation strategies try to reduce this by always picking the smallest free block that fits (best fit), but searching for that perfect block slows down as the number of free blocks grows. Other strategies, like the buddy system, round every allocation up to the nearest power of two for simpler bookkeeping, but this can waste around 25% of allocated memory on padding. There’s no perfect solution, just different balances between speed, simplicity, and wasted space.

Despite these costs, the gains far outweigh the overhead for virtually any workload where programs spend time waiting on I/O. Going from 60% CPU utilization to over 90% means the same hardware handles significantly more work in the same amount of time, which is why multiprogramming became a foundational concept in every operating system built since the 1960s.