LiteRT: The Universal Framework for On-Device AI
2026-01-28
1 min read
0
Endigest AI Core Summary
LiteRT is Google's universal on-device AI inference framework evolved from TensorFlow Lite, now with full GPU and NPU acceleration across platforms.
- •Delivers 1.4x faster GPU performance than TFLite via ML Drift engine, supporting OpenCL, OpenGL, Metal, and WebGPU across Android, iOS, macOS, Windows, Linux, and Web
- •Unified NPU deployment workflow abstracts vendor-specific SDKs; MediaTek and Qualcomm integrations reach up to 100x faster than CPU and 10x faster than GPU
- •Async execution and zero-copy buffer interoperability reduce CPU overhead, enabling up to 2x gains for real-time tasks like segmentation and ASR
- •LiteRT-LM orchestration layer powers Gemma Nano in Google products; LiteRT outperforms llama.cpp on Gemma 3 1B with an additional 3x NPU gain over GPU for prefill
- •Supports Gemma family, Qwen, Phi, FastVLM via LiteRT Hugging Face Community and AI Edge Gallery app
