What is Request Coalescing?

Request Coalescing protects your origin server by grouping identical concurrent requests into a single upstream call — preventing Thundering Herd attacks and resource exhaustion.

Request Coalescing — also known as request collapsing or request deduplication — is a technique that groups identical concurrent requests into a single upstream call.

When thousands of users access the same resource simultaneously, Request Coalescing ensures only one request reaches your origin server. All other identical requests wait for that single response, which is then shared among all waiting clients.

This technique serves a critical security function: origin protection against resource exhaustion attacks.

The security problem: Thundering Herd

Thundering Herd — also called cache stampede or dogpile effect — is a security vulnerability that can be exploited to exhaust origin server resources.

How Thundering Herd works

A heavily accessed resource expires from cache
Multiple simultaneous requests detect the cache miss
All requests independently attempt to regenerate the data
The origin server receives an explosion of identical requests
Backend resources (CPU, memory, connections) become saturated
Legitimate requests fail or timeout

Attack vector

Malicious actors can exploit this vulnerability by:

Coordinated cache invalidation — triggering mass cache misses
Timing attacks — sending requests when cache is about to expire
Resource exhaustion — using Thundering Herd as a DDoS amplification technique

The result is predictable: backend saturation, increased latency, timeouts, and service degradation — exactly what attackers aim to achieve.

How Request Coalescing protects your origin

Request Coalescing acts as a security control mechanism in your distributed infrastructure.

The protection flow

First request becomes the leader The initial request identifies a cache miss and proceeds to the origin.
Subsequent requests are grouped Instead of triggering new calls to the backend, identical requests wait for the leader’s response.
Single response feeds all requests When the origin responds, the result is reused for all waiting users, and cache is repopulated.

Security benefits

1. Attack surface reduction By collapsing requests, you reduce the attack surface from thousands of potential origin calls to a single call.

2. Resource exhaustion prevention The backend stops receiving waves of identical calls, preventing CPU, memory, and connection saturation.

3. DDoS mitigation Request Coalescing acts as a first-line defense against certain DDoS patterns that rely on request amplification.

4. Cascading failure prevention When the backend collapses from redundant load, the problem usually spreads to other services. Coalescing helps prevent this cascade.

Request Coalescing vs. other security controls

Control	Protection Type	Layer
WAF	Application-layer attacks	Layer 7
DDoS Protection	Volumetric attacks	Layer 3/4/7
Rate Limiting	Request frequency	Layer 7
Request Coalescing	Request amplification	Layer 7

Request Coalescing complements — not replaces — other security controls. It specifically addresses the request amplification vector that other controls may miss.

When to use Request Coalescing for security

Use Request Coalescing when:

Your origin has limited resources and can’t handle burst traffic
You’re protecting high-concurrency endpoints
Cache expiration could trigger mass origin requests
You need to defend against resource exhaustion attacks
Your infrastructure serves heavily requested read-heavy data

Common protected endpoints

Product catalog and pricing APIs
Shipping calculation services
Promotion and discount validation
Stock availability checks
Heavily accessed static content

When NOT to use Request Coalescing

Avoid applying Request Coalescing when:

The operation is highly personalized per user
Data must be processed individually per request
The response cannot be safely shared between users
The endpoint represents a non-idempotent transactional action

Request Coalescing is designed for read operations and idempotent requests. It should not be applied to write operations or transactional endpoints without careful consideration.

Request Coalescing in a security architecture

Request Coalescing works best as part of a defense-in-depth strategy:

Layer 1: Distributed Infrastructure Protection

WAF rules filter malicious requests
Rate limiting controls request frequency
Request Coalescing prevents amplification

Layer 2: Cache Layer

Cached content reduces origin load
Cache invalidation is coordinated
TTL policies balance freshness and protection

Layer 3: Origin Shield

Request Coalescing protects during cache misses
Origin has additional rate limiting
Health checks detect origin stress

Real example: protecting against flash sale attacks

During flash sales, attackers may attempt to overwhelm your origin by:

Sending thousands of simultaneous requests for the same product
Timing requests to hit when cache expires
Exploiting the Thundering Herd effect to exhaust resources

Without Request Coalescing:

10,000 simultaneous requests → 10,000 origin calls
Origin server overwhelmed
Legitimate users experience timeouts
Potential revenue loss

With Request Coalescing:

10,000 simultaneous requests → 1 origin call
9,999 requests wait for the single response
Origin remains stable
All users receive the response

Best practices for security implementation

1. Identify high-risk endpoints

Map which endpoints are most vulnerable to Thundering Herd attacks:

Heavily accessed read endpoints
Endpoints with cacheable responses
Endpoints that trigger expensive backend operations

2. Configure appropriate timeouts

Set reasonable wait timeouts for coalesced requests to prevent:

Users waiting indefinitely
Attackers exploiting long timeouts

3. Monitor for attack patterns

Track metrics that indicate Thundering Herd attacks:

Sudden spikes in cache miss ratios
Unusual patterns of identical requests
Origin server resource utilization

4. Combine with rate limiting

Use Request Coalescing alongside rate limiting:

Rate limiting controls overall request volume
Coalescing controls request amplification

5. Plan for graceful degradation

Define behavior when:

The leader request fails
Timeout is reached
Origin is unavailable

FAQ

What is Request Coalescing?

It’s a technique that groups identical concurrent requests into a single upstream call, protecting the origin from resource exhaustion.

How does Request Coalescing improve security?

It prevents Thundering Herd attacks and cache stampede, which can be exploited to exhaust origin server resources.

Is Request Coalescing a DDoS protection?

It’s one component of DDoS protection, specifically addressing request amplification attacks. It should be combined with other security controls.

What’s the difference between Request Coalescing and rate limiting?

Rate limiting controls the total number of requests. Request Coalescing controls how many of those requests reach the origin by deduplicating identical concurrent requests.

When should I use Request Coalescing?

Use it for read-heavy endpoints that could be targeted for resource exhaustion, especially during high-traffic events.

Can Request Coalescing replace a WAF?

No. Request Coalescing is a complementary control. A WAF filters malicious requests; Request Coalescing prevents request amplification.

Conclusion

Request Coalescing is a powerful security control that protects your origin infrastructure from Thundering Herd attacks and resource exhaustion. By grouping identical concurrent requests, it prevents attackers from amplifying their impact through cache stampede techniques.

As part of a defense-in-depth strategy alongside WAF, DDoS protection, and rate limiting, Request Coalescing ensures your origin servers remain available even under coordinated attack conditions.

Next steps

Learn how Azion’s security solutions can protect your infrastructure with Request Coalescing and other origin protection techniques.

Talk to a security specialist

Join our community

What is Request Coalescing? | Origin Protection

Learn what Request Coalescing is, how it protects your origin server from Thundering Herd attacks and cache stampede, and when to implement it for infrastructure security.

What is Request Coalescing?

The security problem: Thundering Herd

How Thundering Herd works

Attack vector

How Request Coalescing protects your origin

The protection flow

Security benefits

Request Coalescing vs. other security controls

When to use Request Coalescing for security

Use Request Coalescing when:

Common protected endpoints

When NOT to use Request Coalescing

Request Coalescing in a security architecture

Real example: protecting against flash sale attacks

Best practices for security implementation

1. Identify high-risk endpoints

2. Configure appropriate timeouts

3. Monitor for attack patterns

4. Combine with rate limiting

5. Plan for graceful degradation

FAQ

What is Request Coalescing?

How does Request Coalescing improve security?

Is Request Coalescing a DDoS protection?

What’s the difference between Request Coalescing and rate limiting?

When should I use Request Coalescing?

Can Request Coalescing replace a WAF?

Conclusion

Next steps

Subscribe to our Newsletter