What Is HPC Storage and How Does It Work?

HPC storage is a specialized class of data storage designed to keep up with high-performance computing workloads, where thousands of processors need to read and write massive amounts of data simultaneously. Unlike standard enterprise storage, which prioritizes reliability and cost efficiency for everyday business operations, HPC storage is built around speed, parallelism, and the ability to handle enormous data volumes without becoming a bottleneck for computation.

The core challenge is straightforward: when thousands of compute nodes are crunching numbers at the same time, they all need data fed to them without delay. If the storage can’t keep pace, expensive processors sit idle waiting for files. HPC storage solves this by splitting data across many storage devices and letting compute nodes access all those pieces at once.

How Parallel File Systems Work

The foundation of most HPC storage is a parallel file system. In a regular file system, one server holds your data, and every request goes through that single point. A parallel file system takes a different approach: it splits each file into strips and scatters those strips across dozens or hundreds of storage targets. When a compute node needs the file, it pulls all the strips simultaneously from different storage servers, dramatically increasing read and write speeds.

Three parallel file systems dominate HPC environments. Lustre is an open-source system widely used in national labs and research centers. It separates file metadata (names, permissions, directory structure) from the actual file contents, storing them on different servers so both can be accessed at the same time. BeeGFS, another open-source option, was designed for easy installation and management, and includes a built-in mirroring feature called “buddy mirroring” that duplicates data across two targets for resilience, with automatic self-healing when a failed server comes back online. IBM Spectrum Scale (formerly GPFS) is a commercial parallel file system that handles the same striping and concurrent access but is tightly integrated with IBM’s broader computing ecosystem.

All three systems share the same core idea: break data into chunks, spread those chunks across many storage nodes, and let compute nodes access them in parallel. The tunable parameters, like how large each strip is or how many storage targets a file gets spread across, let administrators match the storage behavior to specific workload patterns.

Why Metadata Is Often the Bottleneck

Every time a program lists files in a directory, checks file permissions, or searches for a specific file, it’s hitting the metadata server rather than reading actual file contents. Metadata lives on fast storage (typically NVMe drives), but the servers handling these requests can become overwhelmed quickly. A common performance killer is reading or writing huge numbers of very small files, or traversing a directory containing thousands of entries. Each of those tiny operations requires a metadata lookup, and even the fastest metadata servers can choke under that volume.

This is why HPC best practices often emphasize writing fewer, larger files rather than many small ones. It’s also why parallel file systems use dedicated metadata servers separate from data servers, so metadata traffic doesn’t compete with actual data transfers for the same resources.

The Storage Hierarchy: Scratch, Project, and Archive

HPC systems don’t treat all data equally. Instead, they organize storage into tiers based on how actively the data is being used.

Scratch storage is the fastest and most expensive tier, built on high-performance parallel file systems. It’s reserved for data tied to jobs that are currently running or about to run: input files, outputs, and checkpoint files that let a simulation resume if it’s interrupted. Because of its cost, scratch storage is aggressively managed. At many facilities, files older than 90 days are automatically deleted without notice.
Project storage is a medium-term tier for data you’ll need in the future but aren’t actively computing against right now. It’s not designed for high-speed access and is typically not even mounted on compute nodes, since a few heavy users doing intensive reads could overwhelm the system. Think of it as a staging area between active work and long-term preservation.
Archival storage is for long-term preservation, especially data you need to retain for grant compliance or publication requirements. A proper archive includes automatic backups and guarantees data integrity for a defined number of years. Many HPC clusters don’t include this tier directly, requiring users to move data to a separate archival system.

Burst Buffers: Absorbing I/O Spikes

Large-scale simulations don’t produce data at a steady rate. They compute intensively for a while, then dump a massive burst of output, then go back to computing. Without special handling, the simulation would freeze during that dump, waiting for the parallel file system to absorb all the data before it could resume calculations.

Burst buffers solve this problem. They’re high-speed, relatively small storage devices (usually solid-state) positioned between compute nodes and the main parallel file system. They act like a write-behind cache: the simulation pushes its data into the burst buffer, which signals the application that the write is complete. The simulation immediately resumes computing while the burst buffer quietly drains its contents to the parallel file system in the background. Research from Argonne National Laboratory describes them as enabling applications to “overlap computations that follow I/O bursts while bleeding the data from the burst buffer to external storage.” Without them, applications would block until every write request completed, wasting compute time.

High-Speed Networking That Connects It All

Even the fastest storage devices are useless if the network connecting them to compute nodes is slow. HPC storage systems rely on specialized networking technologies that bypass normal software overhead to move data as directly as possible.

The key technology is RDMA (Remote Direct Memory Access), which lets one computer write data directly into another computer’s memory without involving either machine’s processor. This delivers high throughput, low latency, and minimal CPU overhead. InfiniBand is the most common RDMA-capable network in HPC, with current systems running at 100 gigabits per second or higher. A biomedical HPC facility at Mount Sinai, for instance, uses InfiniBand HDR100 (100 Gb/s) to connect compute nodes to storage.

NVMe-over-Fabrics extends the speed of local solid-state drives across the network. It tunnels storage commands through an RDMA fabric so that accessing a remote drive feels nearly as fast as accessing a local one. Recent hardware can even offload all routine storage traffic to the network card itself, meaning zero CPU utilization during normal data transfers.

Real-World Scale: How Much Data HPC Generates

The data volumes in HPC environments are staggering, and they grow fast. A biomedical research computing system at Mount Sinai School of Medicine illustrates the trajectory. In 2013, the system stored 0.7 petabytes across 54 million files. By 2019, that had grown to 8.1 petabytes across 2.3 billion files. Online storage requirements grew by more than a petabyte per year, and archival storage jumped from 1 petabyte to 18 petabytes in the same period. The raw storage capacity of the underlying hardware expanded from 1.5 petabytes to 29 petabytes.

Genomics is a major driver. A single run of an Illumina gene sequencer produces roughly half a terabyte of raw data. One institution with 10 such sequencers running twice weekly generates 10 to 12 terabytes of raw data per week from sequencing alone. The secondary analysis step, where those sequences are aligned to a reference genome, nearly doubles the total storage requirement. And the ability to quickly traverse millions of files and recall them for processing is critical, making parallel metadata servers essential rather than optional.

Top-end HPC storage systems today deliver throughput measured in tens or hundreds of gigabytes per second. Production systems from various vendors offer configurations ranging from 6.4 GB/s with hundreds of thousands of IOPS up to 150 GB/s, with maximum capacities reaching into the petabytes for a single file system and exabytes when fully scaled out.

HPC Storage in the Cloud

Cloud providers now offer managed HPC storage services that bring parallel file system performance to cloud-based compute clusters. AWS provides FSx for Lustre, a fully managed Lustre file system that integrates with S3 for longer-term data storage. Azure offers Azure Managed Lustre for similar workloads. Google Cloud provides Filestore and Parallelstore options for high-throughput needs. These services let research teams spin up HPC-grade storage on demand without purchasing and maintaining physical hardware, though sustained large-scale workloads can still be more cost-effective on dedicated infrastructure.