Decoupled by Design: Billion-Scale Vector Search

2026-03-09

1 min read

Tags:

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Databricks redesigned its Vector Search from scratch to support billion-scale workloads by decoupling storage from compute.

•Storage Optimized endpoints store vector indexes in cloud object storage and load them into memory only at query time, reducing serving costs by up to 7x compared to fully in-memory approaches.
•IVF (Inverted File Index) was chosen over HNSW because IVF partitions vectors into independently fetchable clusters, enabling partial loads from object storage without keeping the full graph in memory.
•Distributed K-means clustering is implemented as native PySpark jobs using JAX for hardware-accelerated matrix operations, scaling linearly by adding Spark executors.
•Ingestion runs on ephemeral serverless Spark clusters completely isolated from the query path, preventing write-heavy workloads from degrading query latency.

Related Articles