HTTP status codes are three-digit response codes that servers return to communicate request outcomes. In distributed systems, interpreting them correctly requires understanding the full request path: client, CDN, load balancer, application server, and upstream dependencies.
How HTTP Status Codes Behave in Distributed Systems
A single request traverses multiple components before reaching its origin. Each component contributes to or modifies the status code returned to the client.
Client ──► CDN ──► Load Balancer ──► Application ──► Database │ │ │ │ │ │ Intercepts Routes to Processes May timeout │ 4xx/5xx healthy node request or fail │ from origin and returns │ 200/4xx/5xxIn this architecture, the status code the client sees may come from any layer. A 502 Bad Gateway from the CDN means the origin returned an invalid response. A 504 Gateway Timeout means the origin did not respond within the configured timeout window.
Troubleshooting Workflow
┌─────────────────────────┐ │ Identify status code │ └──────────┬──────────────┘ ▼ ┌─────────────────────────┐ │ Class: 4xx or 5xx? │ └──────────┬──────────────┘ ▼ ┌─────────────────────────┐ ┌────┤ Who generated it? │ │ │ - CDN edge logs │ │ │ - Origin access logs │ │ └──────────┬──────────────┘ │ ▼ │ ┌─────────────────────────┐ │ │ Reproduce the request │ │ │ Isolate variables: │ │ │ - Headers │ │ │ - Method │ │ │ - Body │ │ │ - Auth │ │ └──────────┬──────────────┘ │ ▼ │ ┌─────────────────────────┐ │ │ Check dependencies: │ │ │ - Upstream API │ │ │ - Database │ │ │ - Cache │ │ └──────────┬──────────────┘ │ ▼ │ ┌─────────────────────────┐ └────┤ Apply fix & verify │ └─────────────────────────┘Status Code Patterns by Layer
| Layer | Common Codes | Meaning |
|---|---|---|
| CDN | 502, 504, 403 | Origin unreachable, origin timeout, WAF block |
| Load Balancer | 503, 502, 504 | No healthy upstream, bad upstream response, timeout |
| Application | 200, 201, 400, 401, 403, 404, 409, 422, 429, 500 | Request processing result |
| Authentication | 401, 403 | Missing credentials, insufficient permissions |
| Database | 500, 503 (propagated) | Query failure, connection pool exhaustion |
| Upstream API | 502 (propagated as 502 or mapped) | Dependency failure |
Diagnostics for Every Status Code
4xx patterns:
- 400: Inspect request body format, required headers, parameter types
- 401: Check Authorization header, token expiry, token format
- 403: Verify permissions for the specific resource and method
- 404: Confirm URL path matches route definition exactly (trailing slashes, case)
- 405: Verify HTTP method against allowed methods for the endpoint
- 409: Check for concurrent modifications, idempotency keys
- 422: Validate against business rules, not just schema constraints
- 429: Check Retry-After header, evaluate rate limit configuration
5xx patterns:
- 500: Check server logs for unhandled exceptions. Verify no recent deployment introduced issues.
- 502: Check upstream server logs. Ensure upstream returns valid HTTP responses.
- 503: Check for deployment in progress, traffic spikes, resource exhaustion
- 504: Check upstream response time. Adjust timeout configuration if upstream is slow but reliable.
Metrics and Measurement
- Time to first byte (TTFB): Time until response headers arrive (target: <500ms p95 for dynamic content)
- Error budget burn rate: Percentage of allowed 5xx errors consumed over time (target: <10% of error budget per day)
- Status code distribution: Breakdown of all response codes as percentage of total requests
Industry benchmarks:
- 5xx rates above 0.1% require investigation (Google SRE guidelines)
- Average TTFB for cached edge responses: 20-50ms (CDN provider benchmarks, 2025)
- Average TTFB for uncached origin responses: 200-800ms (HTTP Archive, 2025)
Common Mistakes and Fixes
Mistake: Treating all 5xx errors as application bugs Fix: Classify 5xx errors by layer (CDN, load balancer, application, upstream). A 504 is typically an infrastructure issue, not an app bug.
Mistake: Not logging sufficient context with status codes Fix: Log request ID, user ID, endpoint, method, response time, and upstream status code alongside every response.
Mistake: Mixing 4xx and 5xx error budgets Fix: 4xx errors indicate client misbehavior and should not consume server error budgets. Track 4xx and 5xx independently.
Mistake: Ignoring 429 responses in client code Fix: Implement exponential backoff and respect Retry-After headers in all clients.
Mistake: Treating 502 and 503 identically Fix: 502 means bad upstream response. 503 means the service itself is unavailable. Each requires a different diagnostic path.
Troubleshooting Use Cases
Traffic Spike Response
When a marketing campaign drives unexpected traffic, 503 and 429 responses increase. Check auto-scaling configuration, rate limit thresholds, and CDN cache hit ratio. Increase cache TTL for static assets as a short-term mitigation.
Deployment Rollback
A new release introduces 500 errors. Compare error rates before and after deployment using request logs. If 5xx rate increases by 2x or more, roll back and investigate the diff.
Third-Party API Dependency
An upstream partner API returns 503. Your service may return 502 or 503 to clients. Implement circuit breakers and fallback responses to prevent cascading failures.
Mobile App Compatibility
An iOS update sends a new header that your server does not expect, causing 400 errors for a subset of users. Log request headers and body for every 4xx response to detect API contract drift.
Frequently Asked Questions
How do I distinguish between a CDN-generated 502 and an origin-generated 502? Check CDN access logs and origin access logs simultaneously. If the origin logs show the request with a non-5xx response, the CDN generated the 502. If the origin logs show a 5xx or are missing (timeout), the origin caused it.
What does a 503 during deployment mean? It means the load balancer marked the instance as unhealthy while it was starting. Ensure health checks return 200 only after the application is fully initialized, not immediately after the process starts.
How do I trace a status code across multiple services? Use a correlation ID propagated through headers. Each service logs the status code it returns along with the correlation ID. Aggregate logs by correlation ID to see the full chain.
Should I retry a 429 response? Yes, but only after the delay specified in the Retry-After header. Implement jitter and exponential backoff to avoid retry storms.
Why does my load balancer return 503 when the application is healthy? Check the health check configuration. The load balancer may use a different port, path, or protocol than the application. Ensure the health check endpoint returns 200 within the timeout.
What is the difference between a 502 and a 504 in a CDN context? 502 means the CDN received a malformed response from the origin. 504 means the origin did not respond within the CDN’s configured timeout. Both indicate origin problems but require different timeout or response validation fixes.
How do I debug intermittent 5xx errors? Run the same request multiple times and compare responses. If 5xx occurs randomly, check for resource exhaustion (connection pools, threads, database connections) or garbage collection pauses using application profiling tools.
What status code should I return when a dependency is down? Return 502 Bad Gateway. Do not mask it as a 500 or 503. 502 correctly signals that the error originated from a dependency, not the service itself.
How do I configure monitoring alerts for status codes? Alert on 5xx rate exceeding 0.1% over 5 minutes. Alert on 4xx rate exceeding 10% of total requests. Alert on sudden drops in 2xx rate. Set separate alerts for each critical endpoint.
What is the relationship between status codes and SLIs? Status codes are a direct input to availability SLIs. Count 5xx and appropriate 4xx (timeouts) as failure. Divide successful responses by total requests to compute availability. Use this to track SLO compliance.
How This Applies in Practice
In production systems, a single status code rarely tells the full story. The same 502 can mean an origin timeout, an invalid upstream response, or a CDN misconfiguration. Tracing a status code through the system matters more than recognizing it.
Teams that handle incident response effectively maintain runbooks for each status code pattern. These runbooks specify which logs to check first, which metrics to compare, and which actions to take based on the pattern. This reduces mean time to resolution from hours to minutes.
How to Implement on Azion
Azion provides tools to trace status codes through the entire request path:
- Edge Logs: Export real-time request logs via Azion Data Streaming to see status codes at every layer
- Application Analytics: Use Azion Metrics to filter by status code class and identify anomaly patterns
- Error Response Configuration: Customize 4xx and 5xx responses returned by Azion’s edge, including retry hints
- Alert Rules: Configure threshold-based alerts for 5xx rates with per-application granularity
Learn more in the Azion Documentation.
Sources:
- IETF. “HTTP Semantics.” RFC 9110. 2022.
- Google SRE. “Service Level Objectives.” 2023.
- HTTP Archive. “Web Almanac.” 2025.
- AWS. “Troubleshooting 5xx Errors.” 2025.