Grab's data platform team implemented Docker image lazy loading using eStargz and SOCI technologies to reduce container startup times for Airflow and Spark Connect.
- •Traditional container startup requires fully downloading and unpacking all image layers before launch, causing slow cold starts for large images
- •eStargz enables lazy loading by individually compressing files and providing a TOC (stargz.index.json) for random access via a FUSE-backed virtual filesystem
- •SOCI maintains separate indexes and matches standard image application startup time (5.0s for Airflow), while eStargz added 25.0s startup overhead
- •Production deployment achieved 30-40% faster P95 startup times for both Airflow and Spark Connect, improving auto-scaling responsiveness
- •Tuning SOCI config (max_concurrent_downloads to 10, chunk size to 16MB) cut fresh-node image download time from 60s to 24s, a 60% improvement