Request Coalescing | How to Eliminate Thundering Herd in Checkout

When thousands of users access the same resource at the same time, the risk isn’t just increased traffic. The most dangerous problem is load synchronization.

If a popular piece of data expires from cache during a flash sale, all requests may try to regenerate it simultaneously. This behavior creates the effect known as Thundering Herd, also called cache stampede or dogpile effect. The result is predictable: backend saturation, increased latency, timeouts, and checkout degradation.

In checkout performance, every millisecond of latency and every redundant request represents conversion risk. Request Coalescing solves exactly this scenario. Instead of allowing dozens, hundreds, or thousands of identical requests to hit the origin simultaneously, the technique groups these calls and reuses a single response for all waiting users.

What is the Thundering Herd problem?

Thundering Herd happens when a heavily accessed resource expires from cache exactly when traffic rises.

Imagine a shipping calculation, a promotion, or a frequently consulted price. When this data leaves cache, the first request queries the backend to regenerate it. The problem is that while this response is still being processed, other requests arrive, find the cache empty, and do exactly the same thing.

This generates an explosion of redundant calls to the database or origin. Instead of one query, you now have hundreds or thousands. The backend becomes overloaded and checkout starts responding more slowly, or simply fails.

This scenario is especially critical in e-commerce because it normally happens at moments of highest purchase intent: seasonal campaigns, launches, promotional actions, and Black Friday.

Why does this happen in e-commerce?

In transactional flows, several pieces of data are consulted simultaneously by many users:

product price
shipping calculation
promotion eligibility
stock availability
cart summary
auxiliary checkout validations

These data have a pattern in common: they are heavily read, but don’t always change with each request.

When cache expires without coordination between requests, each user tries to rebuild the same data at the same time. This is exactly where Request Coalescing becomes important.

How Request Coalescing works in practice

Request Coalescing acts as a coordination mechanism in distributed infrastructure.

In practice, the flow is:

The first request arrives and becomes the leader request.
It identifies that the data isn’t in cache and proceeds to the origin.
Subsequent identical requests are grouped.
Instead of triggering new calls to the backend, they wait for the leader request’s return.
A single response feeds all requests.
When the origin responds, the result is reused for all users who were waiting, and cache is repopulated.

This model drastically reduces pressure on the backend and prevents artificial load multiplication.

Benefits for checkout performance

Request Coalescing brings direct gains to checkout stability.

1. Origin protection

The backend stops receiving waves of identical calls and only processes what’s necessary. This reduces CPU, memory, and connection saturation.

2. Cost reduction

Fewer redundant calls mean lower infrastructure resource usage, lower egress, and less operational waste.

3. P99 stability

The biggest benefit doesn’t appear just in average latency, but in the distribution tail. By avoiding bursts of simultaneous calls, Request Coalescing reduces latency spikes and improves the experience of the most affected users.

4. Fewer cascading failures

When the backend collapses from redundant load, the problem usually spreads to other services. Coalescing helps prevent this cascade.

Programmable resilience: beyond static configuration

Request Coalescing becomes even more powerful when part of a programmable resilience strategy.

This means you can:

apply coalescing only on high-concurrency endpoints;
combine the technique with stale-while-revalidate policies;
dynamically adjust rules during campaigns;
protect checkout flows without redesigning the application.

For example, it makes sense to apply the technique on routes like:

shipping calculation
price queries
promotion validation
heavily accessed catalog responses

Critical transactional operations, like final payment authorization, follow different treatment and shouldn’t be grouped the same way.

Request Coalescing and Azion’s architecture

Azion allows applying this type of protection within distributed infrastructure, closer to the user and with fine control over traffic behavior. This makes it possible to absorb demand before it overloads the backend.

In practice, the architecture helps e-commerce teams combine:

acceleration
protection
control
observability

This is especially useful when the goal is to maintain stable conversion during traffic spikes without relying on reactive scaling at the origin.

Real example: Magalu

Magalu is an example of a company using Azion in high-scale, performance-demanding scenarios. In environments like this, protecting the origin and avoiding bottlenecks at critical moments is fundamental to sustaining the shopping experience.

You can check the complete case here:
https://www.azion.com/en/success-case/magalu/

This type of application reinforces how distribution, protection, and request control strategies help maintain transactional flow stability at scale.

When to use Request Coalescing

Use Request Coalescing when:

many users access the same resource at the same time;
data has high concurrency and low momentary variation;
expired cache can generate multiple identical queries;
you want to protect the origin from coordinated spikes;
checkout depends on heavily requested endpoints.

When not to use

Avoid applying Request Coalescing indiscriminately when:

the operation is highly personalized per user;
data needs to be processed individually per request;
the response cannot be safely reused;
the endpoint represents an unrepeatable transactional action.

The technique is excellent for deduplicating concurrent reads, but doesn’t replace business logic nor should be used in write flows without criteria.

FAQ

What is cache stampede?

It’s the scenario where many requests try to regenerate the same expired data from cache simultaneously, overloading the backend.

Does Request Coalescing replace cache?

No. It complements cache. Coalescing’s function is to prevent multiple identical requests from hitting the origin simultaneously.

What’s the difference between Request Coalescing and cache?

Cache delivers already stored content. Request Coalescing coordinates concurrent requests when cache isn’t yet available or has just expired.

When should I use Request Coalescing in checkout?

Mainly on heavily accessed read endpoints, like shipping, price, promotion, and catalog.

Does this help with flash sales?

Yes. Flash sales concentrate traffic on a few heavily consulted resources, which increases Thundering Herd risk.

Conclusion

Request Coalescing is a technique simple in concept, but very powerful in practice. It prevents the backend from being attacked by redundant requests at the worst possible moment: when traffic is high and conversion is at stake.

In checkout performance, the logic is clear: it’s not enough to respond fast on average. You need to prevent load spikes from taking down the most important business flows.

If you want to protect checkout against cache stampede and maintain stability at scale, Request Coalescing should be part of your resilience strategy.

Next steps

Stop losing sales because of a slow checkout.
See how Azion protects your most important revenue moments:
https://www.azion.com/en/contact/

Also read:
Checkout Performance: Definitive Optimization Guide for E-commerce at Scale

Join our community