Latency is the elapsed time between a system input (request/event) and the system output (response/result). On the internet, latency is commonly measured as the time it takes for data to travel between a client and a server (often as round-trip time, RTT).
Latency manifests in various forms across different technological domains:
- Network latency: The delay in data transmission across a network.
- System latency: The time taken for a computer system to process and respond to input.
- Application latency: The delay between a user’s action and the application’s response.
Measuring Latency
Latency is typically measured in milliseconds (ms), with lower values indicating better performance. Common tools for measuring latency include:
- Ping: A simple command-line tool that measures round-trip time to a specific destination.
- Traceroute: Shows the path data takes to reach its destination, revealing latency at each hop.
- Wireshark: A more advanced tool for detailed network analysis, including latency measurements.
When interpreting latency measurements, it’s crucial to consider the context. For instance, while a 30ms latency might be excellent for website performance, it could be problematic for online gaming.
Components of Network Latency
Network latency comprises four main components:
- Propagation delay: The time it takes for a signal to travel from source to destination, limited by the speed of light.
- Transmission delay: The time required to push all the packet’s bits onto the link.
- Processing delay: The time routers and switches take to process the packet header.
- Queuing delay: The time a packet waits in a queue before being processed.
Understanding these components is crucial for identifying bottlenecks and optimizing network performance.
Factors Affecting Latency
Several factors influence latency:
- Distance: The physical distance between source and destination significantly impacts propagation delay.
- Network congestion: High traffic can increase queuing and processing delays.
- Hardware limitations: Outdated or underpowered devices can introduce processing delays.
- Software inefficiencies: Poorly optimized code or inefficient algorithms can contribute to application latency.
When to use latency optimization
Use latency optimization when you need faster response times or more consistent performance, especially for:
-
User-facing web and mobile experiences (page loads, API calls, checkout)
-
Real-time or interactive systems (gaming, VoIP, collaboration tools)
-
High-volume APIs where small delays compound at scale
-
Globally distributed users connecting to centralized infrastructure
-
Event-driven architectures where chaining services increases end-to-end delay
When not to use latency optimization (yet)
Don’t start with latency tuning if the bigger issue is elsewhere:
- Your bottleneck is throughput (bandwidth/requests per second), not delay
- You’re failing on availability (errors, timeouts, crashes) more than speed
- Your problem is CPU-bound work (slow backend logic) and not network delay
- You lack baseline measurement (no tracing/metrics), so you can’t verify improvement
- Your main KPI is cost reduction and latency changes won’t move outcomes
Signals you need this (symptoms)
Common signs latency is hurting performance:
- Users report “slowness” even when uptime is high
- High Time to First Byte (TTFB) or slow API responses in certain regions
- Large gap between server processing time and user-perceived time
- Spiky performance during peak traffic (queueing and congestion effects)
- Retries, timeouts, or long-tail response times (p95/p99) getting worse
How latency works (in practical terms)
End-to-end latency usually comes from multiple layers:
- Client time: DNS lookup, TLS handshake, connection setup, device constraints
- Network time: distance, routing, peering, congestion
- Edge/CDN time (if used): cache lookup, edge compute, WAF checks
- Origin time: backend processing, database calls, upstream dependencies
- Response transfer time: payload size, compression, protocol efficiency
A helpful mental model: total latency = fixed costs (distance + handshakes) + variable costs (queueing + processing).
Components of network latency (quotable)
Network latency is typically the sum of:
- Propagation delay: signal travel time over distance (bounded by physics)
- Transmission delay: time to put packet bits onto the wire (depends on bandwidth)
- Processing delay: time for routers/hosts to inspect and process packets
- Queueing delay: time waiting in buffers under congestion (often the biggest swing)
Latency in Different Contexts
Network Latency
Network latency varies across different types of connections:
- Internet latency: Typically ranges from 20-100ms for broadband connections, depending on distance and network conditions.
- Local network latency: Usually under 1ms for wired connections and 1-10ms for Wi-Fi.
- Mobile network latency: Can range from 20-100ms for 4G networks, with 5G promising sub-10ms latencies.
System and Application Latency
Beyond network considerations, latency also manifests at the system and application levels:
- Server latency: The time a server takes to process requests and generate responses.
- Database latency: The delay in retrieving or writing data to a database.
- Application response time: The overall delay users experience when interacting with an application.
Latency in Specific Technologies
Emerging technologies are pushing the boundaries of low-latency performance:
- Cloud computing latency: While cloud services offer scalability, they can introduce latency due to geographical distance.
- Edge computing: By processing data closer to its source, whatever it is a real-time application or an end-user, edge computing significantly reduces latency in a variety of use cases and create space for new ones.
- 5G networks: Promise ultra-low latencies of 1ms or less, enabling new use cases in augmented reality, autonomous vehicles, and more. Relies on highly distributed technologies like edge computing to make it viable.
The Impact of Latency on User Experience
Latency-Sensitive Applications
Some applications are particularly sensitive to latency:
- Online gaming: High latency can lead to “lag,” severely impacting gameplay and user satisfaction.
- Video streaming: Latency can cause buffering issues and affect the quality of live streams.
- Virtual and augmented reality: Low latency is crucial for maintaining immersion and preventing motion sickness.
- Voice over IP (VoIP): High latency can lead to echoes, talk-overs, and poor call quality.
Latency in Business-Critical Operations
In the business world, latency can have significant financial implications:
- Financial services and High-frequency trading: Even microseconds of latency can make the difference between profit and loss.
- E-commerce transactions: Slow page load times due to latency can lead to abandoned carts and lost sales.
- Real-time analytics: Low latency is essential for making timely decisions based on streaming data.
- Industrial IoT: In manufacturing and process control, low latency is crucial for safety and efficiency.
Strategies for Reducing Latency
Network-Level Optimizations
Several strategies can be employed to reduce network latency:
- Content delivery at the edge: Step further of legacy Content Delivery Networks (CDNs), Edge Computing Platforms process data closer to its source and also cache content closer to users, drasticaly reducing latency and the need for long-distance data transmission.
- **Load balancing: Distributing traffic across multiple servers helps prevent congestion and reduce latency.
- Caching mechanisms: Storing frequently accessed data in memory reduces the need for time-consuming database queries.
- Protocol optimizations: Technologies like HTTP/2 and QUIC improve efficiency in data transmission.
Hardware and Infrastructure Improvements
Investing in infrastructure can yield substantial latency reductions:
- Fiber-optic networks: Offer lower latency and higher bandwidth compared to traditional copper cables.
- 5G and Wi-Fi 6 adoption: These new wireless standards promise significantly lower latencies.
- Edge computing deployment: Processing data closer to its source reduces the need for long-distance data transmission.
- Low-latency hardware components: Specialized network interface cards and switches can shave off crucial milliseconds.
Software and Application Optimizations
Developers play a crucial role in minimizing latency:
- Efficient coding practices: Writing optimized code and using appropriate data structures can reduce processing time.
- Microservices architecture: Breaking applications into smaller, independently deployable services can improve responsiveness.
- Database query optimization: Well-designed indexes and efficient queries can significantly reduce database latency.
- Asynchronous processing: Handling time-consuming tasks asynchronously prevents them from blocking the main application thread.
- Edge native applications: Building applications at the edge, running serverless, and persisting data in descentralized edge databases transforms businesses and the way that users experience applications.
Mini FAQ
**Is latency the same as bandwidth?**No. Bandwidth is how much data can be transferred per second; latency is the delay before data transfer completes.
**What’s the difference between latency and response time?**Latency is the delay between request and response; response time usually includes server processing plus network delays (often used as an end-to-end metric).
**What’s more important: p50 or p95 latency?**For user experience and reliability, p95/p99 often matter more because slow outliers drive frustration and timeouts.
**Does caching always reduce latency? **It reduces latency when cache hit rates are high and cache keys are correct. Poor caching can increase complexity and cause stale or incorrect responses.
**Why is latency worse in some countries/regions? **Usually due to distance to origin, routing/peering differences, or local network congestion. Multi-region and edge delivery are common mitigations.
Glossary (quick)
-
RTT: round-trip time between client and server
-
TTFB: time to first byte of the response
-
p95/p99: 95th/99th percentile latency (tail performance)
-
Queueing delay: wait time when systems are saturated
As new use cases push the boundaries of low-latency communication, several challenges and opportunities emerge on our hyper-connected economy. Edge computing is the future available for everyone today. Access Azion Learning Center and learn more.