Memory allocation is the process by which a computer assigns blocks of physical or virtual memory to programs and their data. Every variable you create, every object you instantiate, and every function you call needs space in memory, and the operating system and your programming language work together to find, reserve, and eventually reclaim that space. How this happens depends on whether memory is managed automatically or manually, and whether data lives on the stack or the heap.
Stack vs. Heap: Two Regions, Two Rules
Programs use two main regions of memory, and they behave very differently.
The stack is where local variables and function call data live. When you call a function, the system pushes a new “frame” onto the stack containing that function’s variables. When the function returns, that frame is popped off, and all its local variables become invalid. This allocation and deallocation happens automatically with no effort from the programmer. Because it follows a strict last-in, first-out order, stack operations are extremely fast.
The heap is a larger, more flexible pool of memory. Unlike the stack, heap memory is allocated explicitly by programmers (or by a language’s runtime), and it won’t be deallocated until it’s explicitly freed or collected by a garbage collector. This flexibility comes at a cost: the system has to search for a suitable block of free space each time you request memory, and over time the heap can become fragmented with scattered gaps. The heap is where you store data that needs to outlive the function that created it, or data whose size you don’t know at compile time.
How Dynamic Allocation Works
Dynamic memory allocation means requesting memory at runtime rather than having it fixed at compile time. In C, this is done through a set of core functions that give you direct control over heap memory.
- malloc allocates a single block of contiguous memory. The memory it returns is uninitialized, meaning it contains whatever leftover data was previously in those bytes. If allocation fails, it returns a null pointer.
- calloc works similarly but initializes every byte to zero, which is useful when you need a clean slate.
- realloc resizes a previously allocated block. It lets you grow or shrink an existing allocation without manually copying data to a new location. If it fails, the original block remains untouched.
- free releases memory back to the system. Forgetting to call it causes memory leaks, where your program holds onto memory it no longer uses, gradually consuming more and more resources.
Higher-level languages like Python, Java, and JavaScript hide these details from you. You create objects and the language handles the rest, but underneath, the same fundamental operations are happening.
Automatic Memory Management and Garbage Collection
Languages that manage memory for you rely on garbage collection, where the runtime automatically identifies objects that are no longer being used and reclaims their space. There are several approaches to this.
Reference counting attaches a counter to each object in memory. Every time a new variable points to that object, the counter goes up. When a variable stops pointing to it, the counter goes down. Once it hits zero, the object is garbage and can be collected. This is simple but struggles with circular references, where two objects point to each other and neither counter ever reaches zero.
Mark-and-sweep takes a different approach. It starts from all known “root” pointers (global variables and anything on the stack), traces every object reachable from those roots, and marks them as alive. Then it sweeps through the entire heap and collects anything that wasn’t marked. This handles circular references naturally, since unreachable cycles won’t be marked regardless of their reference counts. Java uses a variant of this strategy.
Copying collectors divide memory into two halves. Objects are allocated in one half, and when it fills up, the collector copies all live objects to the other half, compacting them in the process. The old half is then wiped clean. This eliminates fragmentation but requires twice the memory space.
How the OS Finds Free Blocks
When a program requests memory, the allocator needs to find a free block that’s large enough. The strategy it uses affects both speed and memory efficiency.
First-fit scans through available blocks and picks the first one that’s big enough. It’s fast because the search stops as soon as a match is found. The downside is that it can consume large free blocks for small requests, forcing the heap to grow more often than necessary.
Best-fit searches every free block and picks the smallest one that satisfies the request. This preserves large blocks for when they’re needed, but it’s slower because it must check every option. It also tends to create many tiny leftover gaps. In one illustrative example, best-fit left holes of 12 and 28 bytes that were smaller than any request the program ever made, rendering that space permanently unusable.
Neither strategy is universally better. The right choice depends on the application’s allocation patterns.
Fragmentation: The Silent Problem
Over time, repeated allocation and deallocation can leave memory in a messy state. This is fragmentation, and it comes in two forms.
Internal fragmentation happens when the system gives you slightly more memory than you asked for. If memory is divided into fixed-size blocks and you request a block slightly smaller than the fixed size, the leftover space inside that block is wasted. Using variable-sized partitions or choosing the best-fit block can reduce this problem.
External fragmentation happens when there’s enough total free memory to satisfy a request, but it’s scattered in non-contiguous chunks. Imagine needing 100 bytes when there are three free blocks of 50, 30, and 40 bytes. There’s 120 bytes free, but none of them is large enough on its own. Solutions include compaction (shifting allocated blocks to consolidate free space), paging (breaking memory into small fixed-size pages that don’t need to be contiguous), and segmentation.
Paging is particularly elegant. Instead of requiring large contiguous blocks, the operating system maps virtual addresses to physical pages scattered anywhere in memory. This lets the OS avoid external fragmentation almost entirely, since any free page can be used for any purpose.
How Operating System Kernels Handle Allocation
At the kernel level, memory management uses specialized systems tuned for performance. The Linux kernel, for instance, uses two layered approaches.
The buddy system divides memory into blocks that are powers of two. When you free a block, the system checks if its “buddy” (the adjacent block of the same size) is also free. If so, they merge into a larger block. This makes coalescing fast but causes internal fragmentation, since a request for 33 bytes gets rounded up to a 64-byte block.
To counteract this, Linux layers a slab allocator on top. The slab allocator maintains caches of commonly used object sizes, ranging from 32 bytes to 131,072 bytes. When a new slab is created, objects are pre-initialized so that allocation is fast. When an object is freed, it’s left in its initialized state so the next allocation of the same type can skip setup work entirely. This eliminates much of the internal fragmentation the buddy system would otherwise create and speeds up repeated allocations of same-sized objects.
Performance Costs of Allocation
Memory allocation is not free. Every call to allocate or free memory takes time, and in some situations that time dominates everything else your program does. Research from the National Science Foundation found that in latency-critical services, memory allocation latency can consume up to 97.5% of overall query processing time.
Several things make allocation expensive. When your program requests memory and the system is running low, the OS may trigger swapping, moving data from RAM to disk to make room. This takes tens of milliseconds to seconds and can cause thrashing, where the system spends more time swapping than doing useful work. Even without swapping, a burst of allocation requests can get blocked while the memory manager expands its available pool. One request might return instantly while the next one stalls, waiting for the system to set up new virtual-to-physical address mappings.
Cache locality matters too. Stack-allocated data tends to be packed tightly together in memory, which is friendly to your CPU’s cache. Heap allocations can be scattered, causing more cache misses as the processor fetches data from slower levels of memory. For performance-sensitive code, minimizing heap allocations, reusing objects, and allocating memory in bulk rather than one piece at a time can make a meaningful difference.

