What Is Lossless Compression and How Does It Work?

Lossless compression is a way of shrinking a file so that every single bit of the original data can be perfectly reconstructed when you decompress it. Nothing is thrown away, nothing is approximated. The file you get back is identical to the file you started with. This separates it from lossy compression (used in MP3s and JPEGs), which permanently discards some data to achieve smaller file sizes.

How Lossless Compression Works

At its core, lossless compression finds patterns and redundancy in data, then represents those patterns more efficiently. Imagine a text document that contains the phrase ” the ” thousands of times. Each time that phrase appears, the file stores five characters (space, t, h, e, space). A lossless compressor can assign that phrase a single short code and replace every occurrence with that code. When you decompress, the process reverses: each code gets swapped back for the original phrase, and you get the exact same text.

This is essentially how dictionary-based compression works. The algorithm builds a dictionary of phrases it encounters, assigns each one an index number, and then outputs that index whenever the phrase appears again. Instead of storing the full sequence of characters for ” the ” over and over, the compressor stores a single reference number. Across an entire document or dataset, those savings add up fast.

Other approaches look at individual symbols rather than phrases. Huffman coding, for instance, assigns shorter codes to symbols that appear frequently and longer codes to rare ones. In English text, the letter “e” would get a very short code, while “z” would get a longer one. The result is a file that uses fewer total bits to represent the same information.

The Mathematical Limit

There’s a hard floor on how small lossless compression can make a file. Claude Shannon, the founder of information theory, proved that the minimum average number of bits needed to represent data from a source equals the source’s entropy. Entropy, in this context, measures how unpredictable or random the data is. If you have a file full of the same repeated character, the entropy is essentially zero, and compression can shrink it dramatically. If the file is perfectly random with no repeating patterns, the entropy is at its maximum, and no lossless algorithm can compress it at all.

This means lossless compression doesn’t work equally well on all data. Highly structured, repetitive data compresses beautifully. Random or already-compressed data barely shrinks, and in some cases the “compressed” version is actually slightly larger because of the overhead the algorithm adds.

Common Lossless Formats

You encounter lossless compression constantly, even if you don’t realize it. ZIP files use dictionary-based compression to shrink documents, spreadsheets, and other files for storage or email. PNG images use lossless compression, which is why they’re preferred for screenshots, diagrams, and graphics with sharp edges and text, where every pixel matters.

Audio Formats

For music and audio, FLAC and Apple Lossless (ALAC) are the two dominant lossless formats. Both reduce file sizes to roughly 40% to 60% of the uncompressed original, depending on the type of music. Apple describes its format as using “about half the storage space” of uncompressed audio. Dense, complex music with lots of instruments compresses less efficiently than sparse acoustic recordings, because there’s less redundancy for the algorithm to exploit.

ALAC supports up to 8 channels of audio at bit depths of 16, 20, 24, and 32 bits, with sample rates up to 384 kHz. Apple Music streams lossless audio at up to 24-bit, 192 kHz. FLAC offers similar quality but tends to be more efficient on the processing side. ALAC requires roughly four times as much CPU power to decode as FLAC, which can affect battery life on phones and portable devices. Both formats, though, deliver bit-perfect audio that is indistinguishable from the original recording.

General-Purpose Formats

Beyond audio and images, formats like GZIP and ZSTD handle everything from web page delivery to database backups. When your browser loads a webpage, the server often sends the HTML, CSS, and JavaScript in compressed form, and your browser decompresses it on the fly. You never notice this happening because modern processors handle it in milliseconds.

Lossless vs. Lossy Compression

The tradeoff between lossless and lossy compression comes down to fidelity versus size. A lossless audio file at roughly 50% of the original is much larger than an MP3, which can shrink music to about 10% of the original size. The MP3 achieves this by permanently removing audio information that psychoacoustic models predict you won’t notice, like quiet sounds masked by louder ones. You can never recover that discarded data.

For casual listening, the difference between a 320 kbps MP3 and a FLAC file is difficult for most people to hear. But for audio professionals editing recordings, or for archival purposes where you want to preserve the master, lossless is the only option that guarantees nothing is lost. The same logic applies to images: JPEG is fine for sharing photos on social media, but if you’re editing in Photoshop and saving repeatedly, each save degrades the image further. PNG or TIFF preserves the original quality through unlimited edits.

Where Lossless Compression Is Required

Some fields don’t get to choose between lossless and lossy. Medical imaging is one of the clearest examples. The DICOM standard, which governs how medical images like CT scans and MRIs are stored and transmitted, includes specific lossless compression formats (such as JPEG-LS Lossless) precisely because diagnostic accuracy depends on preserving every detail. A radiologist examining a scan for a tiny tumor can’t afford to have compression artifacts blurring fine details. The question of when lossy compression might be acceptable in clinical contexts is explicitly left outside the scope of the standard, which signals how cautious the medical field is about data loss.

Software distribution is another area where lossless compression is non-negotiable. If even a single bit changes in a compiled program, it may crash or behave unpredictably. Every installer, update package, and app download uses lossless compression to ensure the software arrives intact. Financial records, legal documents, scientific datasets, and genomic sequences all demand the same guarantee.

Performance and Processing Cost

Lossless compression isn’t free. Both compressing and decompressing data require CPU time and memory. On a modern desktop or laptop, this cost is negligible for most tasks. But on embedded systems and low-power devices, decompression can become a real bottleneck. Research into hardware accelerators for embedded processors has achieved throughput of about 20.7 megabytes per second at modest clock speeds, which helps offload the decompression work and frees the main processor for other tasks.

Compression algorithms also present a speed-versus-ratio tradeoff. Faster algorithms produce slightly larger files; slower algorithms squeeze out every possible byte. Tools like ZSTD let you dial this balance with compression level settings. Level 1 is fast but less compact. Level 19 is slow but achieves near-maximum compression. For archival storage, you compress once and decompress rarely, so maximum compression makes sense. For real-time applications like streaming or web delivery, speed matters more than squeezing out the last few percent.

Why Some Files Barely Compress

If you’ve ever tried to ZIP a folder of JPEG photos or MP3 files and noticed the ZIP was barely smaller, this is why. Those files are already compressed using lossy algorithms that have stripped out redundancy. There’s very little left for a lossless algorithm to find. Compressing an already-compressed file is like wringing out a towel that’s already dry.

Encrypted files resist compression for a different reason. Good encryption makes data appear random, and random data has maximum entropy. Since lossless compression relies on finding patterns, and encryption deliberately eliminates patterns, the two work against each other. If you need to both compress and encrypt data, always compress first, then encrypt.