How Google Does It: Fleet-wide, large-scale A/B experimentation

2026-05-18

11 min read

by Nilay Vaish

Tags:

Infrastructure

Systems

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Google performs fleet-wide A/B experiments on infrastructure components to measure and validate optimization improvements at scale.

•Infrastructure experiments target core systems like TCMalloc, compilers, kernel subsystems, and cluster management to achieve performance and efficiency gains.
•Machine-level experimentation is preferred over application-level testing to capture fleet-wide impacts and avoid selection bias.
•Balanced experiment and control groups are maintained by selecting 1% fleet subsets with proportional representation of different machine types.
•Binary hermeticity ensures safe rollouts by separating machine deployment from binary compilation, enabling reliable rollbacks.
•Key metrics include application productivity (work completed), machine-level performance (IPC, cache misses), and reliability (crashes, timeouts).

Related Articles