The Cloudflare Blog  logo The Cloudflare Blog
|DevOps

How Workers powers our internal maintenance scheduling pipeline

2025-12-22
14 min read
0
by Kevin Deems

Endigest AI Core Summary

This post explains how Cloudflare built an internal maintenance scheduling system on Cloudflare Workers to safely coordinate data center operations across 330+ cities globally.

  • The scheduler enforces maintenance constraints to prevent simultaneous downtime of redundant edge routers or customer-specific Aegis egress IP pools
  • Initial approach of loading all data into a single Worker caused out-of-memory errors, requiring a more targeted data-loading strategy
  • Cloudflare adopted a graph-based data model inspired by Facebook's TAO paper, using typed object/association interfaces to fetch only relevant regional data
  • Response payload sizes dropped 100x by switching from few large requests to many targeted small requests, though this introduced subrequest limit issues
  • A middleware fetch pipeline was built with request deduplication (singleflight pattern), LRU caching, CDN caching via caches.default.match, and backoff retry logic to stay within Workers platform limits
Tags:
#Cloudflare Workers
#Reliability
#Prometheus
#Infrastructure