What is Request Coalescing? | Origin Protection Against Thundering Herd

Learn what Request Coalescing is, how it protects your origin server from Thundering Herd attacks and cache stampede, and when to implement it for infrastructure security.

Request Coalescing protects your origin server by grouping identical concurrent requests into a single upstream call — preventing Thundering Herd attacks and resource exhaustion.


What is Request Coalescing?

Request Coalescing — also known as request collapsing or request deduplication — is a technique that groups identical concurrent requests into a single upstream call.

When thousands of users access the same resource simultaneously, Request Coalescing ensures only one request reaches your origin server. All other identical requests wait for that single response, which is then shared among all waiting clients.

This technique serves a critical security function: origin protection against resource exhaustion attacks.


The security problem: Thundering Herd

Thundering Herd — also called cache stampede or dogpile effect — is a security vulnerability that can be exploited to exhaust origin server resources.

How Thundering Herd works

  1. A heavily accessed resource expires from cache
  2. Multiple simultaneous requests detect the cache miss
  3. All requests independently attempt to regenerate the data
  4. The origin server receives an explosion of identical requests
  5. Backend resources (CPU, memory, connections) become saturated
  6. Legitimate requests fail or timeout

Attack vector

Malicious actors can exploit this vulnerability by:

  • Coordinated cache invalidation — triggering mass cache misses
  • Timing attacks — sending requests when cache is about to expire
  • Resource exhaustion — using Thundering Herd as a DDoS amplification technique

The result is predictable: backend saturation, increased latency, timeouts, and service degradation — exactly what attackers aim to achieve.


How Request Coalescing protects your origin

Request Coalescing acts as a security control mechanism in your distributed infrastructure.

The protection flow

  1. First request becomes the leader The initial request identifies a cache miss and proceeds to the origin.

  2. Subsequent requests are grouped Instead of triggering new calls to the backend, identical requests wait for the leader’s response.

  3. Single response feeds all requests When the origin responds, the result is reused for all waiting users, and cache is repopulated.

Security benefits

1. Attack surface reduction By collapsing requests, you reduce the attack surface from thousands of potential origin calls to a single call.

2. Resource exhaustion prevention The backend stops receiving waves of identical calls, preventing CPU, memory, and connection saturation.

3. DDoS mitigation Request Coalescing acts as a first-line defense against certain DDoS patterns that rely on request amplification.

4. Cascading failure prevention When the backend collapses from redundant load, the problem usually spreads to other services. Coalescing helps prevent this cascade.


Request Coalescing vs. other security controls

ControlProtection TypeLayer
WAFApplication-layer attacksLayer 7
DDoS ProtectionVolumetric attacksLayer 3/4/7
Rate LimitingRequest frequencyLayer 7
Request CoalescingRequest amplificationLayer 7

Request Coalescing complements — not replaces — other security controls. It specifically addresses the request amplification vector that other controls may miss.


When to use Request Coalescing for security

Use Request Coalescing when:

  • Your origin has limited resources and can’t handle burst traffic
  • You’re protecting high-concurrency endpoints
  • Cache expiration could trigger mass origin requests
  • You need to defend against resource exhaustion attacks
  • Your infrastructure serves heavily requested read-heavy data

Common protected endpoints

  • Product catalog and pricing APIs
  • Shipping calculation services
  • Promotion and discount validation
  • Stock availability checks
  • Heavily accessed static content

When NOT to use Request Coalescing

Avoid applying Request Coalescing when:

  • The operation is highly personalized per user
  • Data must be processed individually per request
  • The response cannot be safely shared between users
  • The endpoint represents a non-idempotent transactional action

Request Coalescing is designed for read operations and idempotent requests. It should not be applied to write operations or transactional endpoints without careful consideration.


Request Coalescing in a security architecture

Request Coalescing works best as part of a defense-in-depth strategy:

Layer 1: Distributed Infrastructure Protection

  • WAF rules filter malicious requests
  • Rate limiting controls request frequency
  • Request Coalescing prevents amplification

Layer 2: Cache Layer

  • Cached content reduces origin load
  • Cache invalidation is coordinated
  • TTL policies balance freshness and protection

Layer 3: Origin Shield

  • Request Coalescing protects during cache misses
  • Origin has additional rate limiting
  • Health checks detect origin stress

Real example: protecting against flash sale attacks

During flash sales, attackers may attempt to overwhelm your origin by:

  1. Sending thousands of simultaneous requests for the same product
  2. Timing requests to hit when cache expires
  3. Exploiting the Thundering Herd effect to exhaust resources

Without Request Coalescing:

  • 10,000 simultaneous requests → 10,000 origin calls
  • Origin server overwhelmed
  • Legitimate users experience timeouts
  • Potential revenue loss

With Request Coalescing:

  • 10,000 simultaneous requests → 1 origin call
  • 9,999 requests wait for the single response
  • Origin remains stable
  • All users receive the response

Best practices for security implementation

1. Identify high-risk endpoints

Map which endpoints are most vulnerable to Thundering Herd attacks:

  • Heavily accessed read endpoints
  • Endpoints with cacheable responses
  • Endpoints that trigger expensive backend operations

2. Configure appropriate timeouts

Set reasonable wait timeouts for coalesced requests to prevent:

  • Users waiting indefinitely
  • Attackers exploiting long timeouts

3. Monitor for attack patterns

Track metrics that indicate Thundering Herd attacks:

  • Sudden spikes in cache miss ratios
  • Unusual patterns of identical requests
  • Origin server resource utilization

4. Combine with rate limiting

Use Request Coalescing alongside rate limiting:

  • Rate limiting controls overall request volume
  • Coalescing controls request amplification

5. Plan for graceful degradation

Define behavior when:

  • The leader request fails
  • Timeout is reached
  • Origin is unavailable

FAQ

What is Request Coalescing?

It’s a technique that groups identical concurrent requests into a single upstream call, protecting the origin from resource exhaustion.

How does Request Coalescing improve security?

It prevents Thundering Herd attacks and cache stampede, which can be exploited to exhaust origin server resources.

Is Request Coalescing a DDoS protection?

It’s one component of DDoS protection, specifically addressing request amplification attacks. It should be combined with other security controls.

What’s the difference between Request Coalescing and rate limiting?

Rate limiting controls the total number of requests. Request Coalescing controls how many of those requests reach the origin by deduplicating identical concurrent requests.

When should I use Request Coalescing?

Use it for read-heavy endpoints that could be targeted for resource exhaustion, especially during high-traffic events.

Can Request Coalescing replace a WAF?

No. Request Coalescing is a complementary control. A WAF filters malicious requests; Request Coalescing prevents request amplification.


Conclusion

Request Coalescing is a powerful security control that protects your origin infrastructure from Thundering Herd attacks and resource exhaustion. By grouping identical concurrent requests, it prevents attackers from amplifying their impact through cache stampede techniques.

As part of a defense-in-depth strategy alongside WAF, DDoS protection, and rate limiting, Request Coalescing ensures your origin servers remain available even under coordinated attack conditions.


Next steps

Learn how Azion’s security solutions can protect your infrastructure with Request Coalescing and other origin protection techniques.

Talk to a security specialist

stay up to date

Subscribe to our Newsletter

Get the latest product updates, event highlights, and tech industry insights delivered to your inbox.