How Workers powers our internal maintenance scheduling pipeline

2025-12-22

14 min read

by Kevin Deems

Tags:

Cloudflare Workers

Reliability

Prometheus

Infrastructure

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

This post explains how Cloudflare built an internal maintenance scheduling system on Cloudflare Workers to safely coordinate data center operations across 330+ cities globally.

•The scheduler enforces maintenance constraints to prevent simultaneous downtime of redundant edge routers or customer-specific Aegis egress IP pools
•Initial approach of loading all data into a single Worker caused out-of-memory errors, requiring a more targeted data-loading strategy
•Cloudflare adopted a graph-based data model inspired by Facebook's TAO paper, using typed object/association interfaces to fetch only relevant regional data
•Response payload sizes dropped 100x by switching from few large requests to many targeted small requests, though this introduced subrequest limit issues
•

A middleware fetch pipeline was built with request deduplication (singleflight pattern), LRU caching, CDN caching via caches.default.match, and backoff retry logic to stay within Workers platform limits

How Workers powers our internal maintenance scheduling pipeline

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Port 8080 is now available in Vercel Sandboxes

Run Docker containers inside Vercel Sandbox

Introducing the next generation of AWS Resilience Hub for generative AI-based SRE resilience journey

Go from resource-level to business-level maintenance in Google Cloud