LiteRT: The Universal Framework for On-Device AI

2026-01-28

1 min read

Receive daily AI-curated summaries of engineering articles from top tech companies worldwide.

LiteRT is Google's universal on-device AI inference framework evolved from TensorFlow Lite, now with full GPU and NPU acceleration across platforms.

•Delivers 1.4x faster GPU performance than TFLite via ML Drift engine, supporting OpenCL, OpenGL, Metal, and WebGPU across Android, iOS, macOS, Windows, Linux, and Web
•Unified NPU deployment workflow abstracts vendor-specific SDKs; MediaTek and Qualcomm integrations reach up to 100x faster than CPU and 10x faster than GPU
•Async execution and zero-copy buffer interoperability reduce CPU overhead, enabling up to 2x gains for real-time tasks like segmentation and ASR
•LiteRT-LM orchestration layer powers Gemma Nano in Google products; LiteRT outperforms llama.cpp on Gemma 3 1B with an additional 3x NPU gain over GPU for prefill
•

Related Articles