A developer’s guide to training with Ironwood TPUs

2026-03-23

6 min read

by Lillian Yu

Tags:

AI & Machine Learning

TPUs

Compute

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

This post provides a technical guide for developers on optimizing AI model training using Google's seventh-generation Ironwood TPU within the JAX and MaxText ecosystems.

•Ironwood supports pods of up to 9,216 chips via ICI, OCS, DCN, and massive HBM, with native FP8 support in MXUs that can theoretically double throughput over BF16
•The Qwix library enables FP8 training recipes configurable through MaxText flags without compromising model quality
•Tokamax kernels offer Splash Attention for long-context I/O efficiency, Megablox GMM for MoE ragged tensors, and kernel tuning utilities for tile size optimization
•Fourth-generation SparseCores can offload All-Gather and Reduce-Scatter collectives via XLA flags, freeing TensorCores for primary computation
•

A developer’s guide to training with Ironwood TPUs

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Benchmark and optimize LLMs on-device with AI Edge Portal

Agent Sandbox on GKE is now available for everyone, and a first look at Agent Substrate

Introducing Agent Executor, Google’s distributed Agent Runtime

Governing AI agents at scale with Unity Catalog