Building a high-volume metrics pipeline with OpenTelemetry and vmagent

2026-04-07

11 min read

by Eugene Ma

Tags:

engineering

technology

infrastructure

observability

site-reliability-engineer

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

This post details a production migration of a large-scale metrics pipeline from StatsD to OpenTelemetry (OTLP) with Prometheus-based storage and vmagent for streaming aggregation.

•40% of services used a shared metrics library to dual-emit StatsD and OTLP simultaneously, enabling broad migration with low friction
•Switching to OTLP reduced CPU time spent on metrics processing from 10% to under 1% and improved reliability over UDP-based StatsD
•High-cardinality services emitting 10K+ samples/sec required delta temporality to reduce in-process memory pressure and GC overhead
•vmagent was chosen for streaming aggregation due to Prometheus support, horizontal sharding capability, and a small (~10K LOC) codebase
•The final architecture uses two vmagent layers: stateless routers for consistent label-based sharding and stateful aggregators feeding into Grafana Mimir

Building a high-volume metrics pipeline with OpenTelemetry and vmagent

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Using observability data to prevent incidents

May 22, 2026

Making User-Sequence Data More Cost-Efficient, Faster, and Easier to Use

Shipping features to production just got easier with new feature flags in AppLifecycle Manager