A one-line Kubernetes fix that saved 600 hours a year | Endigest
Cloudflare
|DevOpsTags:Kubernetes
Terraform
Platform Engineering
Infrastructure
SRE
Get the latest tech trends every morning
Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.
This post explains how a single Kubernetes configuration change eliminated 30-minute restart delays for Atlantis, a Terraform automation tool.
- •Atlantis ran as a StatefulSet with a PersistentVolume containing millions of files, causing ~100 restarts/month to each take 30 minutes
- •Root cause was Kubernetes default fsGroupChangePolicy: Always, which triggers recursive chgrp on the entire PV every time it mounts
- •The fsGroup: 1 securityContext setting was necessary for non-root process access, but combined with a large volume it created a massive bottleneck
- •The fix was setting fsGroupChangePolicy: OnRootMismatch, which skips recursive permission changes if the root directory already has correct ownership
- •Restart time dropped from 30 minutes to ~30 seconds, recovering ~50 hours/month (600 hours/year) of blocked engineering time
This summary was automatically generated by AI based on the original article and may not be fully accurate.