What Is the Latency of a System and How to Reduce It?

Latency is the time it takes from the moment a request is made to the moment a response arrives. In a computer system, that covers everything: the nanoseconds your processor spends fetching data from memory, the milliseconds a data packet spends crossing a network, and the time a server takes to process your request and send something back. It’s typically measured as a round-trip delay, capturing the full journey from source to destination and back again.

What Latency Actually Measures

Think of latency as a stopwatch that starts when you click a link and stops when the page begins to appear. That single number, usually expressed in milliseconds, hides a chain of smaller delays that stack on top of each other. Every layer of a system, from the physical hardware inside your computer to the cables stretching across continents, contributes its own slice of delay.

This is different from throughput and bandwidth, two terms that often get confused with latency. Bandwidth is the width of the pipe, how much data it could carry at once. Throughput is how much data actually flows through that pipe in practice. Latency is how long it takes a single drop of water to travel from one end to the other. You can have enormous bandwidth and still suffer high latency if the data has to travel a long physical distance or pass through congested equipment.

Where Delay Comes From Inside Hardware

Before a request ever hits the network, your own computer introduces latency. When a processor needs a piece of data, it checks a series of memory layers, each one faster but smaller than the next. Fetching data from the fastest on-chip memory (called L1 cache) takes roughly 1 to 3 processor clock cycles. The next level of cache takes 8 to 13 cycles. If the data isn’t cached at all and must come from your computer’s main memory, the wait jumps to 100 to 400 cycles. At a modern processor speed, those cycles happen in nanoseconds, but they add up across millions of operations.

Storage introduces even more delay. Reading from a hard drive is orders of magnitude slower than reading from memory. Solid-state drives have narrowed that gap dramatically, but disk latency, the time between receiving a request and returning data from storage, remains one of the larger contributors to overall system response time on servers handling database queries or file requests.

The Stages of Network Latency

Once a request leaves your machine and enters a network, it passes through several distinct stages, each adding its own delay:

  • Queuing: The request waits in line for an available network connection. On a busy network, packets can sit in router buffers before being forwarded.
  • DNS resolution: Your computer looks up the domain name to find the server’s actual address. This lookup can add anywhere from a few milliseconds to over 100 ms if the result isn’t cached locally.
  • Connection setup: Your computer and the server perform a handshake to establish a reliable connection. If the connection is encrypted (which most are today), a second handshake negotiates security, adding another round trip.
  • Propagation: The data physically travels through cables or airwaves. Light in a fiber optic cable crosses the Atlantic in roughly 30 ms one way, and nothing can make it go faster. This is a hard physical limit.
  • Processing: The server receives the request, runs whatever logic is needed, queries databases, assembles a response. This is where application code and server hardware directly affect latency.
  • Transfer: The response data travels back to you. Larger responses take longer to transmit, especially on slower connections.

Each of these stages can be optimized independently, but none can be eliminated entirely. Total network latency is the sum of all of them.

How Humans Perceive Latency

People don’t notice delay in a smooth, linear way. Research on user perception shows that latency stays essentially invisible within a certain range, then satisfaction drops sharply once a threshold is crossed. For direct interactions like tapping a button or scrolling a page, latency below about 100 to 300 ms feels responsive and maintains your sense of control over the system. Push beyond that range, and the interface starts to feel sluggish.

The acceptable range depends heavily on the task. Dragging an object across a screen or zooming into an image feels smooth at 300 ms or less, but users tolerate much higher delays for tasks they expect to take time. Transferring a file between devices, for instance, doesn’t generate widespread dissatisfaction until latency exceeds about 3,500 ms. For something like turning a page, though, anything over 500 to 700 ms feels broken to most people. The pattern is consistent: the more a task feels like a direct physical manipulation, the less delay users will accept.

Latency vs. Jitter

Latency measures the total travel time. Jitter measures how much that travel time varies from one packet to the next. Both cause problems, but they cause different kinds of problems.

High latency on a video stream means it takes a while to start playing, but once it’s going, playback can be perfectly smooth. High jitter means some packets arrive on time and others arrive late, causing stuttering audio, pixelated video, buffering pauses, and choppy animations in games. For voice and video calls, experts generally recommend keeping jitter under 30 ms. Gaming can tolerate jitter up to about 30 to 50 ms before it becomes noticeable. High latency can contribute to jitter, but the two don’t always travel together. A network can have consistently high latency with very low jitter, which is actually easier to work around than the reverse.

Real-World Latency Targets

What counts as “good” latency depends entirely on the application. A typical website load that completes in under 200 ms feels fast. Online gaming becomes frustrating above roughly 100 ms. Video conferencing degrades noticeably when latency causes audio and video to fall out of sync or participants start talking over each other.

At the extreme end, high-frequency trading systems operate on a completely different scale. In that world, the difference between 10 ms and 1 ms of latency determines who wins a trade. Firms synchronize their servers using atomic clocks and the Precision Time Protocol to achieve sub-microsecond accuracy. Every nanosecond of advantage translates directly into money.

On the infrastructure side, 5G networks were designed with a latency target of 1 ms round trip for their most demanding tier, called Ultra-Reliable Low-Latency Communication. That target aims to support applications like remote surgery and autonomous vehicles, where even small delays have serious consequences. Early discussions around 6G push even further, targeting 0.2 ms round trip.

How Latency Gets Reduced

The most effective way to cut latency is to shorten the physical distance data has to travel. This is the core idea behind edge computing, which moves processing out of centralized data centers and closer to the user. Instead of sending data to a server hundreds of miles away, edge nodes handle it locally or regionally. Content delivery networks work on the same principle, caching copies of websites and media on servers distributed around the world so your request reaches the nearest one rather than the origin server.

Caching in general is one of the most powerful latency-reduction tools at every level of a system. The reason your processor checks L1 cache before main memory is the same reason a web browser checks its local cache before making a network request: avoiding a slow trip when the data is already close by.

At the network level, protocol optimizations reduce the number of round trips needed to establish connections. Connection reuse, where your browser keeps a connection to a server open for multiple requests rather than setting up a new one each time, eliminates repeated handshake delays. Compression reduces the amount of data that needs to travel, cutting transfer time. And simply choosing faster hardware, solid-state drives over spinning disks, faster network interfaces, more powerful processors, lowers the processing and disk latency components directly.

How to Measure Latency

The simplest measurement tool is ping, available on every major operating system. It sends a small packet to a destination and reports the round-trip time in milliseconds. Traceroute goes a step further, showing the latency at each hop between you and the destination, which helps identify where delays are occurring.

For web applications, browser developer tools break down every request into its component stages: queuing, DNS lookup, connection, TLS handshake, server wait time, and download. This granular view lets you see exactly which stage is contributing the most delay. On Linux systems, specialized tools like the hardware latency detector built into the kernel can identify delays caused by the underlying hardware or firmware, independent of the operating system itself. It works by spinning a thread on a CPU core and measuring any unexpected pauses, catching issues like system management interrupts that silently steal processing time.

For applications where latency is critical, measuring averages alone can be misleading. A system might average 50 ms but spike to 500 ms for one out of every hundred requests. Percentile measurements (the 95th or 99th percentile response time) give a much more honest picture of what your slowest users actually experience.