MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

2026-04-16

1 min read

Read Original

Get the latest tech trends every morning

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

Endigest AI Core Summary

MaxText introduces Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) capabilities now available on single-host TPU configurations for post-training large language models.

•SFT enables seamless fine-tuning of pre-trained models on labeled datasets with native Hugging Face dataset and checkpoint support.
•Tunix, a JAX-based library, optimizes post-training efficiency and execution performance on TPUs.
•GRPO (Group Relative Policy Optimization) reduces hardware requirements by eliminating the need for a separate value function model and computing relative advantages within response groups.
•GSPO (Group Sequence Policy Optimization) improves training stability through sequence-level importance ratios and clipping, enhancing performance on reasoning benchmarks like GSM8K.
•

MaxText Expands Post-Training Capabilities: Introducing SFT and RL on Single-Host TPUs

Get the latest tech trends every morning

Endigest AI Core Summary

Related Articles

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Agentic AI Is Transforming Defense, But Only Secure IT Infrastructure Will Maximize It

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent