Endigest logo
Endigest
All Tech BlogsExplore TagsSend Feedback
Newsletter
Endigest logo
Endigest

© 2026 Endigest. All rights reserved.

  • About
  • Privacy
  • Terms
  • Contact
  • RSS

observability Articles

6 articles

Related Tags

infrastructure(6)
engineering(5)
technology(4)
Uncategorized(2)
site-reliability-engineer(2)
software-engineering(1)
automation(1)
ci-cd(1)
deployment(1)
incident-response(1)
golang(1)
networking(1)
open-source(1)
Airbnb logoAirbnb
11 min read
DevOps•2026-05-05

Monitoring reliably at scale

This post explains how Airbnb eliminated circular dependencies in its observability stack to ensure reliable monitoring at scale.

engineering
infrastructure
technology
observability
site-reliability-engineer
Airbnb logoAirbnb
101 min read
DevOps•2026-04-07

Building a high-volume metrics pipeline with OpenTelemetry and vmagent

This post details a production migration of a large-scale metrics pipeline from StatsD to OpenTelemetry (OTLP) with Prometheus-based storage and vmagent for streaming aggregation.

engineering
technology
infrastructure
observability
site-reliability-engineer
Slack logoSlack
31 min read
DevOps•2026-03-31

From Custom to Open: Scalable Network Probing and HTTP/3 Readiness with Prometheus

Slack addresses HTTP/3 observability challenges through QUIC support in Prometheus Blackbox Exporter.

Uncategorized
golang
infrastructure
networking
observability
open-source
Airbnb logoAirbnb
111 min read
DevOps•2026-03-17

From vendors to vanguard: Airbnb’s hard-won lessons in observability ownership

Airbnb shares hard-won lessons from migrating its observability platform from third-party vendors to a custom in-house solution built on Prometheus across 1,000 services.

engineering
technology
observability
infrastructure
Airbnb logoAirbnb
91 min read
DevOps•2026-03-04

It Wasn’t a Culture Problem: Upleveling Alert Development at Airbnb

Airbnb explains how they rebuilt their Observability as Code (OaC) alert development workflow to eliminate weeks-long validation cycles.

infrastructure
software-engineering
observability
technology
engineering
Slack logoSlack
81 min read
DevOps•2025-10-07

Deploy Safety: Reducing customer impact from change

Slack's Deploy Safety Program reduced customer impact hours by 90% over 18 months by overhauling deployment practices and safety culture.

Uncategorized
automation
ci-cd
deployment
engineering
incident-response
infrastructure
observability