Deploy Safety: Reducing customer impact from change
2025-10-07
11 min read
0
by Sam Bailey
Endigest AI Core Summary
Slack's Deploy Safety Program reduced customer impact hours by 90% over 18 months by overhauling deployment practices and safety culture.
- •73% of customer-facing incidents were triggered by Slack-induced change, primarily code deploys, which motivated the program's North Star goals.
- •Goals targeted automated detection and remediation within 10 minutes, manual remediation within 20 minutes, and limiting problematic deploys to under 10% fleet exposure.
- •The team invested broadly at first, then doubled down on successful patterns—starting with Webapp backend metric monitoring, then expanding to automatic rollbacks across frontend and infra.
- •Automatic rollbacks were the key turning point: manual remediation alone was insufficient, and automation drove dramatic improvement in results.
- •A 3–6 month lag in trailing incident metrics required patience and executive alignment every 4–6 weeks to maintain program confidence.
Tags:
#Uncategorized
#automation
#ci-cd
#deployment
#engineering
#incident-response
#infrastructure
#observability
