A one-line Kubernetes fix that saved 600 hours a year

2026-03-26

7 min read

by Braxton Schafer

Tags:

Kubernetes

Terraform

Platform Engineering

Infrastructure

SRE

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

This post explains how a single Kubernetes configuration change eliminated 30-minute restart delays for Atlantis, a Terraform automation tool.

•Atlantis ran as a StatefulSet with a PersistentVolume containing millions of files, causing ~100 restarts/month to each take 30 minutes
•Root cause was Kubernetes default fsGroupChangePolicy: Always, which triggers recursive chgrp on the entire PV every time it mounts
•The fsGroup: 1 securityContext setting was necessary for non-root process access, but combined with a large volume it created a massive bottleneck
•The fix was setting fsGroupChangePolicy: OnRootMismatch, which skips recursive permission changes if the root directory already has correct ownership
•Restart time dropped from 30 minutes to ~30 seconds, recovering ~50 hours/month (600 hours/year) of blocked engineering time

Related Articles