Engineering at Slack logoEngineering at Slack
|DevOps

Deploy Safety: Reducing customer impact from change

2025-10-07
11 min read
0
by Sam Bailey

Endigest AI Core Summary

Slack's Deploy Safety Program reduced customer impact hours by 90% over 18 months by overhauling deployment practices and safety culture.

  • 73% of customer-facing incidents were triggered by Slack-induced change, primarily code deploys, which motivated the program's North Star goals.
  • Goals targeted automated detection and remediation within 10 minutes, manual remediation within 20 minutes, and limiting problematic deploys to under 10% fleet exposure.
  • The team invested broadly at first, then doubled down on successful patterns—starting with Webapp backend metric monitoring, then expanding to automatic rollbacks across frontend and infra.
  • Automatic rollbacks were the key turning point: manual remediation alone was insufficient, and automation drove dramatic improvement in results.
  • A 3–6 month lag in trailing incident metrics required patience and executive alignment every 4–6 weeks to maintain program confidence.
Tags:
#Uncategorized
#automation
#ci-cd
#deployment
#engineering
#incident-response
#infrastructure
#observability