A major Brazilian retailer processes more than 730 TB of data in 6 months through its distributed infrastructure. Without structured metrics, understanding where the performance bottleneck lies among millions of requests would be impossible. Metrics are the foundation that transforms raw data into quantitative answers: “what is the P95 latency?”, “how many errors per minute?”, “what is the current throughput?”.
Prometheus is one of the most widely adopted tools for metrics and a graduated CNCF project. Organizations with mature observability practices typically start with metrics as the first step, as they enable real-time anomaly detection and faster incident response. For more details, see the official Prometheus documentation.
What are Metrics?
Metrics are numerical values observed and collected over time, usually organized as time series. They represent the state, behavior, or performance of systems and applications. Each metric consists of:
- Name: Unique identifier (e.g.,
http_requests_total) - Labels/Dimensions: Metadata for filtering (e.g.,
{method="GET", status="200"}) - Numerical value: The observed or collected value
- Timestamp: Moment of collection
Prometheus format (exposition):
http_requests_total{method="GET", status="200"} 12345 1622745600000http_requests_total{method="POST", status="500"} 67 1622745600000Difference: Metrics vs Logs vs Traces
| Dimension | Metrics | Logs | Traces |
|---|---|---|---|
| Observation unit | Numerical time series | Individual event | Request/span |
| Detail level | Most summarized | Most detailed per event | Flow detail between services |
| Common question | ”How much?" | "What happened?" | "Where did latency occur?” |
| Primary use | Monitoring, alerts, trends | Debugging, auditing | Distributed latency analysis |
Beyond knowing what to measure, it’s important to understand how to model that measurement. The Prometheus instrumentation model defines four main metric types, each with specific usage characteristics.
The 4 Metric Types in the Prometheus Model
Counter, Gauge, Histogram, and Summary are metric types in the Prometheus instrumentation model, widely used to represent different measurement patterns. Other tools may have different classifications, but these concepts are applicable across various contexts.
| Type | Behavior | Examples | Primary Use |
|---|---|---|---|
| Counter | Value that only increases. Resets to zero on process restart. | Total requests, total errors, bytes transmitted | Calculate rates with rate(), measure throughput |
| Gauge | Value that can go up or down. Represents current state. | CPU %, memory in use, temperature, active connections | Monitor instantaneous state, trends, capacity |
| Histogram | Distributes values into cumulative buckets. Allows estimating quantiles at query time. | Latency (P50, P95, P99), request size | Estimated quantiles, cross-instance aggregation |
| Summary | Calculates quantiles on the client at observation time. | Client-calculated latency, response time | Pre-calculated quantiles, when aggregation is not needed |
Counter
A value that only increases, resetting to zero when the process is restarted.
Characteristics:
- Monotonically increasing
- Resets to zero on process restart
- Used to calculate rates (e.g., requests per second via
rate())
Examples:
http_requests_total→ Total HTTP requestserrors_total→ Total errorsbytes_transmitted_total→ Bytes transmitted
Typical usage:
# Request rate per second over the last 5 minutesrate(http_requests_total[5m])
# Error rate per minuterate(errors_total[1m]) * 60Gauge
A value that can go up or down, representing the current state.
Characteristics:
- Instantaneous snapshot
- Can freely go up or down
rate()does not make sense for gauges (they are not monotonically increasing)- Used to show trends and current state
Examples:
cpu_usage_percent→ Current CPU usagememory_bytes→ Memory in useactive_connections→ Active connectionstemperature_celsius→ Temperature
Typical usage:
# Current value (instantaneous)cpu_usage_percent
# Average over the last 5 minutesavg_over_time(cpu_usage_percent[5m])
# Maximum over the last 1 hourmax_over_time(memory_bytes[1h])Histogram
Distributes observed values into predefined cumulative buckets, allowing quantile estimation from defined buckets at query time.
Characteristics:
- Bucket counters are cumulative (each bucket includes values from smaller buckets)
- Allows estimating quantiles via
histogram_quantile()at query time - Allows aggregating metrics from multiple instances
- More flexible than summary for distributed systems
Exposition (three series generated):
# Bucket countershttp_request_duration_seconds_bucket{le="0.1"} 100http_request_duration_seconds_bucket{le="0.5"} 150http_request_duration_seconds_bucket{le="1.0"} 180http_request_duration_seconds_bucket{le="+Inf"} 200
# Sum of all valueshttp_request_duration_seconds_sum 123.4
# Count of observationshttp_request_duration_seconds_count 200Typical usage:
# P95 latency over the last 5 minutes (aggregating multiple instances)histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))
# P99 latencyhistogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket[5m])))Summary
Calculates quantiles on the client (instrumentation agent) at observation time.
Characteristics:
- Quantiles calculated on the client side
- Summary quantiles are not correctly aggregatable across instances (unlike histograms)
- Quantiles depend on client configuration and may vary
- Less flexible, but avoids bucket storage cost
Exposition:
http_request_duration_seconds{quantile="0.5"} 0.12http_request_duration_seconds{quantile="0.9"} 0.35http_request_duration_seconds{quantile="0.99"} 0.89http_request_duration_seconds_sum 123.4http_request_duration_seconds_count 200Typical usage:
# Direct value (already calculated)http_request_duration_seconds{quantile="0.99"}Histogram vs Summary: When to use each?
| Criteria | Histogram | Summary |
|---|---|---|
| Aggregation | ✅ Yes (multiple instances) | ❌ No |
| Quantiles | ⚠️ Estimated via buckets | ✅ Calculated on client |
| Flexibility | ✅ Flexible for query-time quantile estimation | ❌ Predefined on client |
| Client cost | Low | High (calculation) |
| Storage cost | Medium (multiple buckets) | Low |
| Recommendation | Use by default | Specific cases |
After understanding metric types, it’s worth knowing a set of signals that has become a reference for service monitoring.
Golden Signals: The Essential Metrics
Golden signals are the four fundamental metrics described in the Google SRE Book. They provide an essential view of service health and are an important starting point for observing distributed systems.
Note: Golden signals are a useful starting point. Mature teams also track business metrics (conversion, revenue) and application-specific metrics (critical journeys, funnels).
| Golden Signal | Key Question | What to Measure |
|---|---|---|
| Latency | How long? | P50, P95, P99 (response time percentiles) |
| Traffic | How much demand? | Requests/second, bytes transmitted, active users, simultaneous connections |
| Errors | Failure rate? | HTTP 4xx, HTTP 5xx, timeouts |
| Saturation | How full? | CPU %, memory %, disk %, connections/limit, request queue |
Percentiles P95 and P99 are quantiles frequently used to measure latency. P95 means 95% of requests had latency equal to or below that value; P99 represents the threshold for 99% of requests.
SLI/SLO Metrics
SLI (Service Level Indicator): Metric measuring an aspect of the service (e.g., P95 latency).
SLO (Service Level Objective): Target for the SLI (e.g., P95 < 200ms in 99.9% of requests).
Example:
| SLI | SLO | Metric |
|---|---|---|
| Availability | 99.9% successful requests | 1 - (sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))) |
| Latency | P95 < 200ms | histogram_quantile(0.95, sum by (le) (rate(latency_bucket[5m]))) |
| Throughput | > 1000 req/s | sum(rate(requests_total[5m])) |
Note: Simplified examples for educational purposes. In practice, the availability SLI should reflect the service-specific success definition.
After understanding metric types and essential signals, the next step is to understand how these metrics are represented as time series.
Prometheus Data Model
Time Series
Format: <metric_name>{<label_name>=<label_value>, ...} <value> <timestamp>
Example:
http_requests_total{method="GET", status="200", endpoint="/api/users"} 12345 1622745600000Components:
| Component | Description | Example |
|---|---|---|
| Metric name | Unique metric name | http_requests_total |
| Labels | Filtering dimensions | {method="GET", status="200"} |
| Value | Numerical value | 12345 |
| Timestamp | Collection time | 1622745600000 (ms) |
Labels increase the analytical power of metrics, allowing filtering and grouping. However, each unique label combination creates a new time series — and this can scale quickly.
Cardinality
Definition: Number of unique time series generated by a metric.
Mathematical formula:
S_total = |C₁| × |C₂| × ... × |Cₙ|
Where:
S_total= Total number of seriesCᵢ= Set of possible values for each label i
Practical example:
Metric http_requests_duration_seconds with labels:
method: 4 values (GET, POST, PUT, DELETE)status: 3 values (2xx, 4xx, 5xx)endpoint: 100 values
S_total = 4 × 3 × 100 = 1,200 series
Cardinality Explosion
Problem: Labels with high cardinality (e.g., user_id, request_id) generate millions of series.
Bad example:
http_requests_total{user_id="12345", request_id="abc-123-def"}# Result: millions of series → degraded performanceMitigations:
- Avoid unique IDs as labels (user_id, request_id, session_id)
- Limit possible values for each label
- Prefer aggregatable models and avoid multiplying high-cardinality labels
- Monitor active series volume and validate modeling before production
With metric theory, types, and data modeling established, it’s worth seeing how these concepts apply in real scenarios.
Real-World Use Cases with Metrics
Marisa: E-commerce Performance Metrics
Marisa is one of the largest fashion retailers in Brazil, with over 11 million app downloads and 70% of digital sales concentrated on mobile.
Challenge:
- Monitor e-commerce performance with millions of requests
- Understand latency bottlenecks in real time
- Correlate infrastructure metrics with user experience
Metrics implemented:
- P95 Latency: Page load time
- Throughput: Requests per second during peaks
- Saturation: CPU/memory usage at origin vs edge
- Error rate: HTTP error rate per endpoint
Verified results:
- 85% of traffic served by distributed infrastructure
- 730 TB transferred without origin impact
- 4.3 TB/day of images processed and optimized
- Improvement in First Contentful Paint, Speed Index, and Time to Interactive
Learning: Well-structured metrics allow correlating infrastructure performance with digital experience at scale, transforming operational data into business decisions.
B2W: Security Metrics at Scale
B2W Digital brings together some of the largest e-commerce platforms in Latin America, with 2 billion visits per year and 17 million active customers.
Challenge:
- Monitor security across millions of daily connections
- Detect attacks in real time
- Measure mitigation effectiveness
Metrics implemented:
- Block rate: Blocked requests per second
- Attack types: DDoS, SQL injection, XSS by category
- Mitigation latency: Time between detection and blocking
- Error rate per rule: Effectiveness of each Firewall rule
Verified results:
- Millions of attacks automatically blocked
- Transformation of events into real-time insights
- Integration of metrics with SIEM via Data Streaming
- Complete environment visibility in dashboards
Learning: Metrics are not just for performance — they are also fundamental for operational security, allowing measurement of defense effectiveness and incident response time.
Collecting metrics is only part of the work. Extracting useful signals depends on knowing how to aggregate data correctly.
Metric Aggregation
Aggregation Types
| Type | Function | Use |
|---|---|---|
| Sum | sum() | Total values |
| Average | avg() | Average across instances |
| Min/Max | min() / max() | Extremes |
| Rate | rate() | Rate per second (counters) |
| Increase | increase() | Increment over period |
| Percentile | histogram_quantile() | Percentiles (histograms) |
Aggregation by Labels
# Sum of requests by method (aggregates all endpoints)sum by (method) (rate(http_requests_total[5m]))
# Average latency by serviceavg by (service) (latency_seconds)
# Maximum CPU by regionmax by (region) (cpu_usage_percent)Temporal Aggregation
# Average of a metric over the last 5 minutesavg_over_time(cpu_usage_percent[5m])
# Maximum over 1 hourmax_over_time(memory_bytes[1h])
# Minimum over 1 daymin_over_time(active_connections[1d])With the concepts of metrics, types, modeling, and aggregation presented, the following frequently asked questions help consolidate learning.
Frequently Asked Questions
What are metrics?
Metrics are numerical values observed and collected over time that represent the state, behavior, or performance of systems. They are organized as time series with names, labels, and timestamps. They answer questions like “what is the current error rate?”, “is latency within SLO?”. They differ from logs (events) and traces (journeys).
What are the 4 metric types?
The four types in the Prometheus model are: Counter (values that only increase, e.g., total requests), Gauge (values that go up and down, e.g., CPU %), Histogram (distribution into cumulative buckets, allows quantile estimation at query time), and Summary (quantiles calculated on the client). Use histogram by default for distributed systems.
What is the difference between counter and gauge?
Counter is a value that only increases, resetting to zero on process restart, used to calculate rates (e.g., rate()). Gauge is a value that can go up or down, representing the current state (e.g., CPU, memory). Use counter for accumulated totals, gauge for instantaneous values.
What are golden signals?
Golden signals are the four essential metrics described by Google SRE: Latency (response time), Traffic (demand), Errors (failure rate), and Saturation (resource usage). They provide an essential view of service health and are the foundation for SLIs/SLOs.
What is cardinality in metrics?
Cardinality is the number of unique time series generated by a metric, calculated as the product of the possible values of each label. “Cardinality explosion” occurs when labels with many values (user_id, request_id) generate millions of series, degrading performance.
Histogram or Summary: when to use each?
Use Histogram by default (aggregates multiple instances and allows quantile estimation at query time from defined buckets). Use Summary only when you need quantiles calculated on the client and don’t need to aggregate them across instances. Histogram is more flexible and suitable for distributed systems.
How to avoid cardinality explosion?
Avoid labels with unique values (user_id, request_id), limit the possible values of each label, prefer aggregatable models, reduce or aggregate high-cardinality dimensions before exposition when possible, and monitor the number of active series.
Conclusion
Metrics are the foundation of observability. They transform system behavior into numerical data that can be queried, alerted on, and correlated. The four types in the Prometheus model — Counter, Gauge, Histogram, and Summary — cover most monitoring scenarios.
Key concepts:
- Metrics = Numerical values collected over time to represent state, behavior, or performance
- 4 types (Prometheus): Counter (only up), Gauge (up/down), Histogram (cumulative buckets), Summary (client-side quantiles)
- Golden signals: Latency, Traffic, Errors, Saturation
- Cardinality: Beware of high-cardinality labels
- Prometheus: Graduated CNCF project, widely adopted
Next steps:
For beginners:
- Understand the 4 metric types
- Implement golden signals in your application
- Use Prometheus for exposition
For operations teams:
- Configure SLOs based on SLIs
- Monitor your metrics cardinality
- Integrate with Real-Time Metrics for dashboards
For mature companies:
- Optimize PromQL queries
- Implement SLO-based alerts
- Use histograms for latency SLIs
Want to visualize metrics in real time with seconds latency? Discover Real-Time Metrics and Data Stream for metric collection and analysis at scale. Get started free.