What are Metrics? Definition, Types, and How to Use in Observability

What are metrics? Understand the 4 main types (counter, gauge, histogram, summary), how to use them with Prometheus, and how to avoid cardinality explosion.

A major Brazilian retailer processes more than 730 TB of data in 6 months through its distributed infrastructure. Without structured metrics, understanding where the performance bottleneck lies among millions of requests would be impossible. Metrics are the foundation that transforms raw data into quantitative answers: “what is the P95 latency?”, “how many errors per minute?”, “what is the current throughput?”.

Prometheus is one of the most widely adopted tools for metrics and a graduated CNCF project. Organizations with mature observability practices typically start with metrics as the first step, as they enable real-time anomaly detection and faster incident response. For more details, see the official Prometheus documentation.

What are Metrics?

Metrics are numerical values observed and collected over time, usually organized as time series. They represent the state, behavior, or performance of systems and applications. Each metric consists of:

  1. Name: Unique identifier (e.g., http_requests_total)
  2. Labels/Dimensions: Metadata for filtering (e.g., {method="GET", status="200"})
  3. Numerical value: The observed or collected value
  4. Timestamp: Moment of collection

Prometheus format (exposition):

http_requests_total{method="GET", status="200"} 12345 1622745600000
http_requests_total{method="POST", status="500"} 67 1622745600000

Difference: Metrics vs Logs vs Traces

DimensionMetricsLogsTraces
Observation unitNumerical time seriesIndividual eventRequest/span
Detail levelMost summarizedMost detailed per eventFlow detail between services
Common question”How much?""What happened?""Where did latency occur?”
Primary useMonitoring, alerts, trendsDebugging, auditingDistributed latency analysis

Beyond knowing what to measure, it’s important to understand how to model that measurement. The Prometheus instrumentation model defines four main metric types, each with specific usage characteristics.

The 4 Metric Types in the Prometheus Model

Counter, Gauge, Histogram, and Summary are metric types in the Prometheus instrumentation model, widely used to represent different measurement patterns. Other tools may have different classifications, but these concepts are applicable across various contexts.

TypeBehaviorExamplesPrimary Use
CounterValue that only increases. Resets to zero on process restart.Total requests, total errors, bytes transmittedCalculate rates with rate(), measure throughput
GaugeValue that can go up or down. Represents current state.CPU %, memory in use, temperature, active connectionsMonitor instantaneous state, trends, capacity
HistogramDistributes values into cumulative buckets. Allows estimating quantiles at query time.Latency (P50, P95, P99), request sizeEstimated quantiles, cross-instance aggregation
SummaryCalculates quantiles on the client at observation time.Client-calculated latency, response timePre-calculated quantiles, when aggregation is not needed

Counter

A value that only increases, resetting to zero when the process is restarted.

Characteristics:

  • Monotonically increasing
  • Resets to zero on process restart
  • Used to calculate rates (e.g., requests per second via rate())

Examples:

  • http_requests_total → Total HTTP requests
  • errors_total → Total errors
  • bytes_transmitted_total → Bytes transmitted

Typical usage:

# Request rate per second over the last 5 minutes
rate(http_requests_total[5m])
# Error rate per minute
rate(errors_total[1m]) * 60

Gauge

A value that can go up or down, representing the current state.

Characteristics:

  • Instantaneous snapshot
  • Can freely go up or down
  • rate() does not make sense for gauges (they are not monotonically increasing)
  • Used to show trends and current state

Examples:

  • cpu_usage_percent → Current CPU usage
  • memory_bytes → Memory in use
  • active_connections → Active connections
  • temperature_celsius → Temperature

Typical usage:

# Current value (instantaneous)
cpu_usage_percent
# Average over the last 5 minutes
avg_over_time(cpu_usage_percent[5m])
# Maximum over the last 1 hour
max_over_time(memory_bytes[1h])

Histogram

Distributes observed values into predefined cumulative buckets, allowing quantile estimation from defined buckets at query time.

Characteristics:

  • Bucket counters are cumulative (each bucket includes values from smaller buckets)
  • Allows estimating quantiles via histogram_quantile() at query time
  • Allows aggregating metrics from multiple instances
  • More flexible than summary for distributed systems

Exposition (three series generated):

# Bucket counters
http_request_duration_seconds_bucket{le="0.1"} 100
http_request_duration_seconds_bucket{le="0.5"} 150
http_request_duration_seconds_bucket{le="1.0"} 180
http_request_duration_seconds_bucket{le="+Inf"} 200
# Sum of all values
http_request_duration_seconds_sum 123.4
# Count of observations
http_request_duration_seconds_count 200

Typical usage:

# P95 latency over the last 5 minutes (aggregating multiple instances)
histogram_quantile(0.95,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)
# P99 latency
histogram_quantile(0.99,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
)

Summary

Calculates quantiles on the client (instrumentation agent) at observation time.

Characteristics:

  • Quantiles calculated on the client side
  • Summary quantiles are not correctly aggregatable across instances (unlike histograms)
  • Quantiles depend on client configuration and may vary
  • Less flexible, but avoids bucket storage cost

Exposition:

http_request_duration_seconds{quantile="0.5"} 0.12
http_request_duration_seconds{quantile="0.9"} 0.35
http_request_duration_seconds{quantile="0.99"} 0.89
http_request_duration_seconds_sum 123.4
http_request_duration_seconds_count 200

Typical usage:

# Direct value (already calculated)
http_request_duration_seconds{quantile="0.99"}

Histogram vs Summary: When to use each?

CriteriaHistogramSummary
Aggregation✅ Yes (multiple instances)❌ No
Quantiles⚠️ Estimated via buckets✅ Calculated on client
Flexibility✅ Flexible for query-time quantile estimation❌ Predefined on client
Client costLowHigh (calculation)
Storage costMedium (multiple buckets)Low
RecommendationUse by defaultSpecific cases

After understanding metric types, it’s worth knowing a set of signals that has become a reference for service monitoring.

Golden Signals: The Essential Metrics

Golden signals are the four fundamental metrics described in the Google SRE Book. They provide an essential view of service health and are an important starting point for observing distributed systems.

Note: Golden signals are a useful starting point. Mature teams also track business metrics (conversion, revenue) and application-specific metrics (critical journeys, funnels).

Golden SignalKey QuestionWhat to Measure
LatencyHow long?P50, P95, P99 (response time percentiles)
TrafficHow much demand?Requests/second, bytes transmitted, active users, simultaneous connections
ErrorsFailure rate?HTTP 4xx, HTTP 5xx, timeouts
SaturationHow full?CPU %, memory %, disk %, connections/limit, request queue

Percentiles P95 and P99 are quantiles frequently used to measure latency. P95 means 95% of requests had latency equal to or below that value; P99 represents the threshold for 99% of requests.

SLI/SLO Metrics

SLI (Service Level Indicator): Metric measuring an aspect of the service (e.g., P95 latency).

SLO (Service Level Objective): Target for the SLI (e.g., P95 < 200ms in 99.9% of requests).

Example:

SLISLOMetric
Availability99.9% successful requests1 - (sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])))
LatencyP95 < 200mshistogram_quantile(0.95, sum by (le) (rate(latency_bucket[5m])))
Throughput> 1000 req/ssum(rate(requests_total[5m]))

Note: Simplified examples for educational purposes. In practice, the availability SLI should reflect the service-specific success definition.

After understanding metric types and essential signals, the next step is to understand how these metrics are represented as time series.

Prometheus Data Model

Time Series

Format: <metric_name>{<label_name>=<label_value>, ...} <value> <timestamp>

Example:

http_requests_total{method="GET", status="200", endpoint="/api/users"} 12345 1622745600000

Components:

ComponentDescriptionExample
Metric nameUnique metric namehttp_requests_total
LabelsFiltering dimensions{method="GET", status="200"}
ValueNumerical value12345
TimestampCollection time1622745600000 (ms)

Labels increase the analytical power of metrics, allowing filtering and grouping. However, each unique label combination creates a new time series — and this can scale quickly.

Cardinality

Definition: Number of unique time series generated by a metric.

Mathematical formula:

S_total = |C₁| × |C₂| × ... × |Cₙ|

Where:

  • S_total = Total number of series
  • Cᵢ = Set of possible values for each label i

Practical example:

Metric http_requests_duration_seconds with labels:

  • method: 4 values (GET, POST, PUT, DELETE)
  • status: 3 values (2xx, 4xx, 5xx)
  • endpoint: 100 values

S_total = 4 × 3 × 100 = 1,200 series

Cardinality Explosion

Problem: Labels with high cardinality (e.g., user_id, request_id) generate millions of series.

Bad example:

http_requests_total{user_id="12345", request_id="abc-123-def"}
# Result: millions of series → degraded performance

Mitigations:

  1. Avoid unique IDs as labels (user_id, request_id, session_id)
  2. Limit possible values for each label
  3. Prefer aggregatable models and avoid multiplying high-cardinality labels
  4. Monitor active series volume and validate modeling before production

With metric theory, types, and data modeling established, it’s worth seeing how these concepts apply in real scenarios.

Real-World Use Cases with Metrics

Marisa: E-commerce Performance Metrics

Marisa is one of the largest fashion retailers in Brazil, with over 11 million app downloads and 70% of digital sales concentrated on mobile.

Challenge:

  • Monitor e-commerce performance with millions of requests
  • Understand latency bottlenecks in real time
  • Correlate infrastructure metrics with user experience

Metrics implemented:

  • P95 Latency: Page load time
  • Throughput: Requests per second during peaks
  • Saturation: CPU/memory usage at origin vs edge
  • Error rate: HTTP error rate per endpoint

Verified results:

Learning: Well-structured metrics allow correlating infrastructure performance with digital experience at scale, transforming operational data into business decisions.

B2W: Security Metrics at Scale

B2W Digital brings together some of the largest e-commerce platforms in Latin America, with 2 billion visits per year and 17 million active customers.

Challenge:

  • Monitor security across millions of daily connections
  • Detect attacks in real time
  • Measure mitigation effectiveness

Metrics implemented:

  • Block rate: Blocked requests per second
  • Attack types: DDoS, SQL injection, XSS by category
  • Mitigation latency: Time between detection and blocking
  • Error rate per rule: Effectiveness of each Firewall rule

Verified results:

Learning: Metrics are not just for performance — they are also fundamental for operational security, allowing measurement of defense effectiveness and incident response time.

Collecting metrics is only part of the work. Extracting useful signals depends on knowing how to aggregate data correctly.

Metric Aggregation

Aggregation Types

TypeFunctionUse
Sumsum()Total values
Averageavg()Average across instances
Min/Maxmin() / max()Extremes
Raterate()Rate per second (counters)
Increaseincrease()Increment over period
Percentilehistogram_quantile()Percentiles (histograms)

Aggregation by Labels

# Sum of requests by method (aggregates all endpoints)
sum by (method) (rate(http_requests_total[5m]))
# Average latency by service
avg by (service) (latency_seconds)
# Maximum CPU by region
max by (region) (cpu_usage_percent)

Temporal Aggregation

# Average of a metric over the last 5 minutes
avg_over_time(cpu_usage_percent[5m])
# Maximum over 1 hour
max_over_time(memory_bytes[1h])
# Minimum over 1 day
min_over_time(active_connections[1d])

With the concepts of metrics, types, modeling, and aggregation presented, the following frequently asked questions help consolidate learning.

Frequently Asked Questions

What are metrics?

Metrics are numerical values observed and collected over time that represent the state, behavior, or performance of systems. They are organized as time series with names, labels, and timestamps. They answer questions like “what is the current error rate?”, “is latency within SLO?”. They differ from logs (events) and traces (journeys).

What are the 4 metric types?

The four types in the Prometheus model are: Counter (values that only increase, e.g., total requests), Gauge (values that go up and down, e.g., CPU %), Histogram (distribution into cumulative buckets, allows quantile estimation at query time), and Summary (quantiles calculated on the client). Use histogram by default for distributed systems.

What is the difference between counter and gauge?

Counter is a value that only increases, resetting to zero on process restart, used to calculate rates (e.g., rate()). Gauge is a value that can go up or down, representing the current state (e.g., CPU, memory). Use counter for accumulated totals, gauge for instantaneous values.

What are golden signals?

Golden signals are the four essential metrics described by Google SRE: Latency (response time), Traffic (demand), Errors (failure rate), and Saturation (resource usage). They provide an essential view of service health and are the foundation for SLIs/SLOs.

What is cardinality in metrics?

Cardinality is the number of unique time series generated by a metric, calculated as the product of the possible values of each label. “Cardinality explosion” occurs when labels with many values (user_id, request_id) generate millions of series, degrading performance.

Histogram or Summary: when to use each?

Use Histogram by default (aggregates multiple instances and allows quantile estimation at query time from defined buckets). Use Summary only when you need quantiles calculated on the client and don’t need to aggregate them across instances. Histogram is more flexible and suitable for distributed systems.

How to avoid cardinality explosion?

Avoid labels with unique values (user_id, request_id), limit the possible values of each label, prefer aggregatable models, reduce or aggregate high-cardinality dimensions before exposition when possible, and monitor the number of active series.

Conclusion

Metrics are the foundation of observability. They transform system behavior into numerical data that can be queried, alerted on, and correlated. The four types in the Prometheus model — Counter, Gauge, Histogram, and Summary — cover most monitoring scenarios.

Key concepts:

  • Metrics = Numerical values collected over time to represent state, behavior, or performance
  • 4 types (Prometheus): Counter (only up), Gauge (up/down), Histogram (cumulative buckets), Summary (client-side quantiles)
  • Golden signals: Latency, Traffic, Errors, Saturation
  • Cardinality: Beware of high-cardinality labels
  • Prometheus: Graduated CNCF project, widely adopted

Next steps:

For beginners:

  1. Understand the 4 metric types
  2. Implement golden signals in your application
  3. Use Prometheus for exposition

For operations teams:

  1. Configure SLOs based on SLIs
  2. Monitor your metrics cardinality
  3. Integrate with Real-Time Metrics for dashboards

For mature companies:

  1. Optimize PromQL queries
  2. Implement SLO-based alerts
  3. Use histograms for latency SLIs

Want to visualize metrics in real time with seconds latency? Discover Real-Time Metrics and Data Stream for metric collection and analysis at scale. Get started free.

stay up to date

Subscribe to our Newsletter

Get the latest product updates, event highlights, and tech industry insights delivered to your inbox.