Introducing Gemma 4 on Google Cloud: Our most capable open models yet

2026-04-02

7 min read

by Richard Seroter

Tags:

AI & Machine Learning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Google Cloud announces the release of Gemma 4, its most capable open model family built from Gemini 3 research under Apache 2.0 license.

•Supports context windows up to 256K tokens with native vision and audio processing, fluency in 140+ languages
•Available on Vertex AI for self-deployment and fine-tuning using SFT recipes with NVIDIA NeMo Megatron, supporting models from 2B (edge) to 31B (enterprise)
•Cloud Run support with NVIDIA RTX PRO 6000 Blackwell GPUs (96GB vGPU memory) for serverless, scale-to-zero inference
•GKE deployment via vLLM with GKE Agent Sandbox enabling sub-second cold starts and up to 300 sandboxes/sec; GKE Inference Gateway cuts TTFT latency by up to 70%
•Available across Sovereign Cloud offerings including air-gapped and on-premises deployments via Google Distributed Cloud

Related Articles