LiteRT is Google's universal on-device AI inference framework evolved from TensorFlow Lite, now with full GPU and NPU acceleration across platforms.
- •Delivers 1.4x faster GPU performance than TFLite via ML Drift engine, supporting OpenCL, OpenGL, Metal, and WebGPU across Android, iOS, macOS, Windows, Linux, and Web
- •Unified NPU deployment workflow abstracts vendor-specific SDKs; MediaTek and Qualcomm integrations reach up to 100x faster than CPU and 10x faster than GPU
- •Async execution and zero-copy buffer interoperability reduce CPU overhead, enabling up to 2x gains for real-time tasks like segmentation and ASR
- •LiteRT-LM orchestration layer powers Gemma Nano in Google products; LiteRT outperforms llama.cpp on Gemma 3 1B with an additional 3x NPU gain over GPU for prefill
- •Supports Gemma family, Qwen, Phi, FastVLM via LiteRT Hugging Face Community and AI Edge Gallery app
This summary was automatically generated by AI based on the original article and may not be fully accurate.