The Cloudflare Blog  logo The Cloudflare Blog
|DevOps

Code Orange: Fail Small — our resilience plan following recent incidents

2025-12-19
10 min read
0
by Dane Knecht

Endigest AI Core Summary

Cloudflare outlines its 'Code Orange: Fail Small' resilience plan following two major network outages in November and December 2025.

  • Both outages were caused by instant global propagation of configuration changes via Quicksilver, bypassing the staged rollout process used for software releases.
  • The plan introduces controlled, health-mediated deployments (HMD) for all configuration changes, mirroring the multi-gate process already used for binary releases.
  • Service interface contracts will be reviewed to ensure failures in one component (e.g., Bot Management) do not cascade across unrelated systems.
  • Default fallback behaviors will be defined at each service boundary so traffic continues to flow even when a module fails.
  • Internal 'break glass' procedures and circular dependencies in tooling access will be redesigned to speed up incident resolution.
Tags:
#Outage
#Post Mortem
#Code Orange