Why Do We Compress Data: How It Actually Works

We compress data to make files smaller so they take up less storage space and move faster across networks. That single idea drives nearly every digital experience you rely on: streaming a movie, loading a web page, fitting thousands of songs on your phone, or sending an email attachment that would otherwise be too large. Without compression, the internet as you know it wouldn’t function, and your devices would run out of storage almost immediately.

Saving Storage Space

The most intuitive reason for compression is that smaller files mean you can store more of them. The savings vary dramatically depending on the type of data. Lossless compression of things like video and audio typically achieves around a 2:1 ratio, cutting file sizes roughly in half while keeping every bit of the original intact. But when you’re willing to lose some data that humans can’t easily perceive, the ratios get far more dramatic.

An uncompressed song in CD format streams at about 1.4 megabits per second. Compress that same song into AAC format (the standard for most music players) at 128 kilobits per second, and you get a compression ratio of nearly 11:1. That’s a 91% reduction in size. It’s the reason a phone with 64 GB of storage can hold thousands of songs instead of a few hundred.

Video is even more striking. Getting 1080i high-definition video into a standard broadcast stream requires at least a 50:1 compression ratio. Without that level of compression, a single hour of HD video would consume hundreds of gigabytes, and 4K content would be completely impractical to store or distribute.

Making the Internet Usable

Compression doesn’t just save space on a hard drive. It directly determines how fast data travels from a server to your screen. Every file that crosses the internet consumes bandwidth, and smaller files arrive faster. This matters for everything from video calls to loading a news article on your phone.

Consider a 1080p security camera streaming at 30 frames per second. Using the older H.264 video compression standard, that camera needs 6 to 8 megabits per second of bandwidth. Switch to the newer H.265 standard, which compresses about 50% more efficiently, and the same video quality only requires 3 to 4 Mbps. That difference is enormous when you’re running multiple cameras on a home network or when millions of people are streaming from the same service simultaneously.

For 4K streaming, Netflix recommends 15 to 25 Mbps of bandwidth per stream. Without modern compression codecs, that same 4K video would require several hundred Mbps, which would overwhelm most home internet connections. Compression is what makes 4K streaming possible on a typical broadband plan.

Web pages benefit from the same principle. Text-based files like HTML, CSS, and JavaScript are compressed before being sent to your browser. A newer algorithm called Brotli produces files that are 14% smaller than the older gzip standard for JavaScript, 21% smaller for HTML, and 17% smaller for CSS. In practice, a JavaScript file of 225 KB can be compressed down to about 53 KB. Multiply that savings across every file on every page load, and compression shaves seconds off your browsing experience.

Lossy vs. Lossless Compression

Not all compression works the same way, and the distinction matters for understanding when and why different approaches are used.

Lossless compression shrinks a file without discarding any data. When you decompress it, you get back exactly what you started with, bit for bit. This is essential for text documents, software, spreadsheets, and any situation where even a tiny change in the data would cause problems. ZIP files, PNG images, FLAC audio, and RAW photos all use lossless compression. The tradeoff is that the size reductions are modest, typically around 2:1 for complex data like images or audio.

Lossy compression achieves much more dramatic size reductions by permanently removing information that’s considered less important or undetectable to human senses. A JPEG image discards visual detail your eyes are unlikely to notice. An MP3 or AAC file drops audio frequencies that fall outside normal human hearing or are masked by louder sounds. Video formats like MPEG and HEVC do the same with motion and color information. The original data can never be perfectly reconstructed, but the perceptual quality often remains high enough that most people can’t tell the difference.

Choosing between the two comes down to what you’re doing with the file. If you’re putting photos on a website where fast load times matter more than pixel-perfect quality, lossy compression (like JPEG) makes sense. If you’re building an online portfolio where image quality is the whole point, lossless formats (like PNG) are worth the larger file sizes. For everyday purposes like sending a batch of event photos to friends, lossy compression lets you share more files more quickly. For archival purposes where you never want to lose quality, lossless is the only option.

How Compression Actually Works

At its core, compression exploits patterns and redundancy in data. If a file contains the same sequence of characters repeated hundreds of times, a compression algorithm can replace each occurrence with a short reference to the first one. The result is a much smaller file that contains the same information.

One of the most widely used compression methods is the DEFLATE algorithm, which powers gzip (used across the web) and the ZIP file format. DEFLATE combines two techniques. First, it scans the data for repeated sequences and replaces duplicates with pointers that say “this is the same as the string that appeared 500 bytes ago, for a length of 12 bytes.” It can look back up to 32 kilobytes to find matches. Second, it uses a technique called Huffman coding, which assigns shorter codes to the most frequently appearing symbols and longer codes to rare ones. If the letter “e” appears thousands of times in a text file but “z” appears twice, “e” gets a very compact code. The combination of these two strategies is remarkably effective for general-purpose compression.

For lossy formats, the process is different. Audio and video codecs typically transform the data into a mathematical representation, then selectively discard the parts that contribute least to perceived quality. A video codec might notice that a large area of a frame is nearly the same shade of blue sky and store it as a single color value instead of encoding each pixel independently. It might also notice that two consecutive frames are almost identical and only store the differences between them.

The Theoretical Limit

There’s a hard floor on how much you can compress any given data, and it’s defined by a concept called entropy. In 1948, Claude Shannon published a paper showing that every source of information has a measurable amount of randomness, or unpredictability. The more predictable the data, the more it can be compressed. The more random, the less compressible it is.

Shannon defined entropy as a mathematical formula based on the probability of each symbol appearing in the data. If you have a file where 99% of the characters are the letter “a” and 1% are “b,” the entropy is very low, and you can compress that file down to almost nothing. If every possible character appears with equal frequency and in no predictable order, the entropy is at its maximum, and no algorithm can make the file meaningfully smaller.

This means compression isn’t magic. It can’t create something from nothing. It works because real-world data, from English text to photographs to genome sequences, is full of patterns and redundancy. A perfectly random file can’t be compressed at all. And no matter how clever the algorithm, it can never beat the entropy limit that Shannon identified. Every compression tool you use is essentially trying to get as close to that theoretical floor as possible.

Why It Keeps Getting Better

Compression algorithms continue to improve because engineers find smarter ways to identify and exploit redundancy. The jump from H.264 to H.265 video compression cut bandwidth requirements roughly in half for the same visual quality. Newer codecs like AV1 push even further. On the web, Brotli outperforms gzip by double-digit percentages across all text-based file types.

These improvements have real consequences. Better video compression means 4K and eventually 8K streaming become viable on existing internet infrastructure. Better web compression means pages load faster on slow mobile connections. Better image compression means cloud storage lasts longer before you need to upgrade your plan. The fundamental reason we compress data hasn’t changed since Shannon’s era: there’s always more data than there is space or bandwidth to handle it. Compression closes that gap.