From Static Rate Limiting to Adaptive Traffic Management in Airbnb’s Key-Value Store
2025-10-09
11 min read
3
by Shravan Gaonkar
Endigest AI Core Summary
Airbnb evolved Mussel, its multi-tenant key-value store, from simple QPS rate limiting to an adaptive traffic management system to maximize goodput during traffic spikes.
- •Resource-aware rate control (RARC) charges each request in 'request units' (RU) based on rows processed, bytes, and latency rather than raw request counts.
- •Load shedding uses a latency ratio (long-term p95 / short-term p95) to detect system stress and automatically apply backpressure to lower-priority traffic classes.
- •A CoDel-inspired thread pool monitors queue wait times and drops requests early when the dispatcher is saturated, freeing resources for high-priority traffic.
- •Hot-key detection identifies skewed access patterns in real time to shield the storage backend from both legitimate surges and DDoS attacks via caching or request coalescing.
- •Criticality tiers ensure high-priority traffic (e.g., customer support, trust and safety) remains responsive even when capacity is exhausted.
Tags:
#engineering
#cloud-storage
#key-value-store
#cloud-services
#infrastructure
