Google Developers Blog logoGoogle Developers Blog
|AI

LiteRT: The Universal Framework for On-Device AI

2026-01-28
1 min read
0

Endigest AI Core Summary

LiteRT is Google's universal on-device AI inference framework evolved from TensorFlow Lite, now with full GPU and NPU acceleration across platforms.

  • Delivers 1.4x faster GPU performance than TFLite via ML Drift engine, supporting OpenCL, OpenGL, Metal, and WebGPU across Android, iOS, macOS, Windows, Linux, and Web
  • Unified NPU deployment workflow abstracts vendor-specific SDKs; MediaTek and Qualcomm integrations reach up to 100x faster than CPU and 10x faster than GPU
  • Async execution and zero-copy buffer interoperability reduce CPU overhead, enabling up to 2x gains for real-time tasks like segmentation and ASR
  • LiteRT-LM orchestration layer powers Gemma Nano in Google products; LiteRT outperforms llama.cpp on Gemma 3 1B with an additional 3x NPU gain over GPU for prefill
  • Supports Gemma family, Qwen, Phi, FastVLM via LiteRT Hugging Face Community and AI Edge Gallery app