
Google Cloud
DevOps•2026-04-01
Run real-time and async inference on the same infrastructure with GKE Inference Gateway
This post introduces GKE Inference Gateway, a unified platform on Google Kubernetes Engine for running both real-time and async AI inference workloads on shared GPU/TPU infrastructure.