Kubernetes as AI Infrastructure: Google Cloud, llm-d, and the CNCF

2026-03-24

5 min read

by Sean Horgan

Tags:

GKE

AI & Machine Learning

Open Source

Containers & Kubernetes

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

Google Cloud announces llm-d as an official CNCF Sandbox project, positioning Kubernetes as the foundation for large-scale AI inference infrastructure.

•llm-d is co-founded by Google Cloud, Red Hat, IBM Research, CoreWeave, and NVIDIA with the vision of supporting any model, any accelerator, any cloud
•GKE Inference Gateway uses llm-d's Endpoint Picker (EPP) to route requests based on KV-cache hit rates, inflight requests, and queue depth
•Model-aware routing reduced TTFT latency by 35% for Qwen Coder and improved P95 tail latency by 52% for DeepSeek workloads
•Prefix cache hit rate on Vertex AI doubled from 35% to 70%, lowering re-computation overhead and cost-per-token
•LeaderWorkerSet (LWS) API enables disaggregated prefill and decode phases, managing large fleets of TPUs and GPUs at global scale

Kubernetes as AI Infrastructure: Google Cloud, llm-d, and the CNCF

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Benchmark and optimize LLMs on-device with AI Edge Portal

Agent Sandbox on GKE is now available for everyone, and a first look at Agent Substrate

Introducing Agent Executor, Google’s distributed Agent Runtime

May 20, 2026