What is Load Balancing?

Load balancing distributes incoming network traffic across multiple servers to ensure no single server bears too much demand. Load balancers act as traffic managers, optimizing resource utilization, maximizing throughput, and ensuring high availability.

Last updated: 2026-03-25

How Load Balancing Works

Load balancers sit between clients and backend servers, intercepting requests and routing them to available servers based on algorithms and health checks.

Basic Workflow:

Client request: User sends HTTP request to website
Load balancer intercepts: Request arrives at load balancer
Health check verification: Balancer confirms server availability
Algorithm selection: Load balancer chooses optimal server based on configured algorithm
Request routing: Traffic forwarded to selected server
Response: Server processes request and returns response through load balancer
Health monitoring: Load balancer continues monitoring server health

Load Balancer Types:

Layer 4 (Transport Layer):

Routes based on IP address and port
Faster processing (doesn’t inspect content)
Example: TCP load balancing for databases

Layer 7 (Application Layer):

Routes based on HTTP headers, URLs, cookies
Content-aware routing
Example: HTTP load balancing for web applications

When to Use Load Balancing

Use load balancing when you:

Run applications requiring high availability (99.9%+ uptime)
Experience traffic spikes that overwhelm single servers
Need horizontal scaling across multiple servers
Require zero-downtime deployments
Serve global audiences from multiple data centers
Need SSL termination at the edge
Want automatic failover when servers fail

Do not use load balancing when:

Running simple applications with predictable, low traffic
Single server handles all traffic with headroom
Cost constraints outweigh availability requirements
Application architecture doesn’t support distributed state
Testing or development environments with single instances

Signals You Need Load Balancing

Server CPU consistently exceeds 80% utilization
Single points of failure causing downtime
Slow response times during traffic peaks
Application requires 99.9%+ availability SLA
Users distributed globally experiencing latency
Need to perform maintenance without downtime
Traffic spikes during promotions or events
Database connections exhausting on single server

Load Balancing Algorithms

Round Robin

How it works: Requests distributed sequentially to each server in rotation.

Best for: Servers with similar specifications and capabilities.

Advantages:

Simple implementation
Fair distribution
No configuration complexity

Limitations:

Doesn’t account for server capacity differences
Doesn’t consider current load
Less effective with heterogeneous servers

Weighted Round Robin

How it works: Assigns weights to servers based on capacity. Higher-weight servers receive more requests.

Best for: Heterogeneous server environments with varying capacities.

Example:

Server A (weight 5): Handles 50% of traffic
Server B (weight 3): Handles 30% of traffic
Server C (weight 2): Handles 20% of traffic

Advantages:

Accounts for server capacity differences
Granular control over distribution
No need for real-time monitoring

Limitations:

Static weights require manual adjustment
Doesn’t respond to real-time server load
Requires capacity planning

Least Connections

How it works: Routes requests to server with fewest active connections.

Best for: Long-lived connections (WebSocket, persistent HTTP), varying request durations.

Advantages:

Responds to real-time load
Better for varying request times
Prevents server overload

Limitations:

Requires connection tracking overhead
Less effective for short-lived requests
Complex implementation

IP Hash

How it works: Uses client IP address to determine which server receives request. Same IP always routes to same server (session persistence).

Best for: Applications requiring session affinity without external session storage.

Advantages:

Maintains session persistence
Stateless load balancer
No session storage needed

Limitations:

Uneven distribution if traffic from few IPs
Server failure breaks sessions for affected IPs
Not ideal for distributed session architectures

Least Response Time

How it works: Routes to server with fastest response times and fewest active connections.

Best for: Performance-critical applications requiring optimal user experience.

Advantages:

Optimizes for performance
Responds to server degradation
Improves user experience

Limitations:

Requires active monitoring
Higher overhead
Complex configuration

Health Checks

Active Health Checks:

Load balancer periodically sends requests to servers
Configurable intervals (typically 5-30 seconds)
HTTP health check: GET /health endpoint, expect 200 OK
TCP health check: Attempt connection on specified port

Passive Health Checks:

Monitor real traffic responses
Detect failures based on actual requests
Immediate response to failures
No additional traffic generated

Health Check Configuration:

Interval: Time between checks (5-30 seconds typical)
Timeout: Max time to wait for response (2-10 seconds)
Unhealthy threshold: Consecutive failures before marking unhealthy (2-5 attempts)
Healthy threshold: Consecutive successes before marking healthy (2-3 attempts)

Session Persistence (Sticky Sessions)

What it is: Ensures client requests route to same backend server throughout session.

Use cases:

Shopping carts stored locally on server
User sessions not distributed across servers
WebSocket connections requiring same server
Legacy applications without distributed sessions

Implementation Methods:

Cookie-based persistence:

Load balancer inserts cookie identifying server
Subsequent requests include cookie
Server selection based on cookie value

IP-based persistence:

Client IP determines server (IP Hash algorithm)
Stateless implementation
Works without cookies

Session ID persistence:

Extract session ID from URL parameter or header
Route based on session ID
Requires session ID in every request

Best Practice: Use external session storage (Redis, database) instead of relying on sticky sessions for modern applications.

Metrics and Measurement

Load Balancer Performance:

Throughput: Requests per second handled
- Enterprise load balancers: 100K-1M+ RPS
- Cloud load balancers: Scale automatically
Latency: Additional delay introduced by load balancer
- Target: <1ms for Layer 4, <5ms for Layer 7
Connection capacity: Concurrent connections supported
- Typical: 100K-1M concurrent connections

Server Performance:

Server utilization: CPU, memory, network per server
- Target: 60-80% utilization with headroom
Request distribution: Variance across servers
- Target: <10% variance in ideal distribution
Response time: Average and p95, p99 latencies
- Target: Consistent across all servers

Availability:

Uptime: Percentage of time service available
- 99.9% (three nines): 8.76 hours downtime/year
- 99.99% (four nines): 52.6 minutes downtime/year
- 99.999% (five nines): 5.26 minutes downtime/year
Failover time: Duration to reroute traffic from failed server
- Target: <30 seconds with active health checks

Business Impact:

Cost per request: Total infrastructure cost / requests handled
Revenue at risk: Potential loss during outages
Capacity headroom: Percentage of unused capacity
- Target: 20-40% headroom for traffic spikes

According to Gartner, the average cost of IT downtime is $5,600 per minute. Load balancers with automatic failover can reduce downtime by 90%+.

Common Mistakes and Fixes

Mistake: Using Round Robin with heterogeneous servers Fix: Use Weighted Round Robin or Least Connections. Assign weights based on server capacity.

Mistake: Health check interval too short Fix: Set intervals 5-10 seconds minimum. Too short causes unnecessary overhead and false positives.

Mistake: Not configuring fallback servers Fix: Always have backup servers. Configure failover to backup data center or cloud region.

Mistake: Relying on session persistence instead of distributed sessions Fix: Use external session storage (Redis, Memcached). Design stateless applications. Sticky sessions complicate scaling.

Mistake: Single load balancer becomes bottleneck Fix: Use multiple load balancers with anycast IP or DNS round robin. Consider cloud load balancers that scale automatically.

Mistake: Ignoring SSL termination overhead Fix: Terminate SSL at load balancer to reduce backend server CPU. Use hardware SSL acceleration for high throughput.

Mistake: Not testing failover scenarios Fix: Regularly test server failure scenarios. Simulate load balancer failure. Practice disaster recovery procedures.

Mistake: Load balancer and application health checks mismatch Fix: Ensure load balancer health checks match application health endpoints. Health check should verify application is truly functional, not just responding.

Frequently Asked Questions

What is the difference between Layer 4 and Layer 7 load balancing? Layer 4 (transport layer) routes based on IP and port without inspecting content. Layer 7 (application layer) routes based on HTTP content like URLs, headers, and cookies. Layer 4 is faster; Layer 7 enables content-based routing.

How many servers do I need for load balancing? Minimum 2 servers for redundancy. Practical minimum depends on traffic volume and desired capacity headroom. Most production deployments use 3+ servers to handle failures while maintaining capacity.

Does load balancing add latency? Yes, minimal latency. Layer 4 load balancers add <1ms. Layer 7 load balancers add 1-5ms. This overhead is negligible compared to benefits of high availability and performance.

What happens if the load balancer fails? Single load balancer is a single point of failure. Use redundant load balancers in active-passive or active-active configuration. Cloud load balancers typically have built-in redundancy.

Can I use multiple load balancing algorithms simultaneously? Yes. Many load balancers support different algorithms per listener or virtual server. For example, use Least Connections for API endpoints and Round Robin for static content.

How does load balancing affect SSL/TLS? Load balancers can terminate SSL, reducing backend server CPU load. Alternatively, pass-through SSL sends encrypted traffic directly to backend servers. Choose based on security requirements and performance needs.

What is the difference between load balancing and clustering? Load balancing distributes traffic across independent servers. Clustering groups servers to work as a single system with shared state. Load balancing is simpler; clustering provides tighter integration but more complexity.

How do I choose the right load balancing algorithm? Use Round Robin for homogeneous servers. Use Weighted Round Robin for heterogeneous capacity. Use Least Connections for varying request durations. Use IP Hash when session persistence required. Test algorithms under realistic traffic patterns.

What is DNS load balancing and how is it different? DNS load balancing distributes traffic at DNS resolution level, returning different IPs for same hostname. It’s coarse-grained, doesn’t account for real-time server load, and caching affects distribution. Use DNS load balancing for geographic distribution, application load balancers for fine-grained control.

How does load balancing work with microservices? Each microservice can have its own load balancer. Service mesh (Istio, Linkerd) provides load balancing between services. API gateways load balance external traffic to microservices.

How This Applies in Practice

Load balancing transforms application architecture from single-server to distributed systems:

High Availability Architecture:

Multiple servers across availability zones
Automatic failover within seconds
Zero-downtime deployments through rolling updates
Graceful degradation during partial outages

Performance Optimization:

Horizontal scaling as traffic grows
Geographic distribution reduces latency
SSL termination optimizes backend resources
Caching at load balancer reduces backend load

Operational Benefits:

Blue-green deployments for zero downtime
Canary releases for testing new versions
A/B testing through content-based routing
Circuit breaker patterns prevent cascade failures

Cost Efficiency:

Scale horizontally with commodity servers
Pay-per-use cloud load balancers
Reduce over-provisioning through dynamic scaling
Minimize downtime costs

Load Balancing on Azion

Azion provides comprehensive load balancing through your Application:

Configure Load Balancer in your Application
Add origin servers with health checks
Select algorithm: Round Robin, Weighted Round Robin, Least Connections
Configure health checks: HTTP endpoints, intervals, thresholds
Enable session persistence via cookies or IP hash
Set up geographic load balancing across edge locations
Monitor performance through Real-Time Metrics

Azion’s distributed network provides load balancing at 100+ global locations with automatic failover and intelligent routing.

Learn more about Application Acceleration and Application.

Sources:

Gartner. “The Cost of Downtime.” https://www.gartner.com/newsroom/press-releases
RFC 7230. “Hypertext Transfer Protocol (HTTP/1.1).” https://tools.ietf.org/html/rfc7230

Join our community

What is Load Balancing?

What is Load Balancing?

How Load Balancing Works

When to Use Load Balancing

Signals You Need Load Balancing

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash

Least Response Time

Health Checks

Session Persistence (Sticky Sessions)

Metrics and Measurement

Common Mistakes and Fixes

Frequently Asked Questions

How This Applies in Practice

Load Balancing on Azion

Subscribe to our Newsletter

Join our community

What is Load Balancing?

What is Load Balancing?

How Load Balancing Works

When to Use Load Balancing

Signals You Need Load Balancing

Load Balancing Algorithms

Round Robin

Weighted Round Robin

Least Connections

IP Hash

Least Response Time

Health Checks

Session Persistence (Sticky Sessions)

Metrics and Measurement

Common Mistakes and Fixes

Frequently Asked Questions

How This Applies in Practice

Load Balancing on Azion

Related Resources

Subscribe to our Newsletter