LiteRT

LiteRT Logo

Google's on-device runtime for high-performance ML & GenAI deployment on edge platforms.

📖 Get Started | 🤝 Contributing | 📜 License | 🛡 Security Policy | 📄 Documentation

🛠 Build Status

Nightly Builds	Continuous Builds	Other Builds

📖 LiteRT

LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Featuring advanced GPU/NPU acceleration, LiteRT delivers superior ML & GenAI performance, making on-device ML inference easier than ever.

🚀 What's New

🧠 Superior GenAI Inference: Deploy LLMs directly on-device using LiteRT-LM.
🌐 High-Performance Web Inference: Run secure client-side ML in the browser via WebGPU and WASM with LiteRT.js.
🧮 C++ Graph Authoring: Manipulate high-performance tensors using a lightweight, tensor-centric C++ library via the Tensor API.
🤖 Accelerated Agentic Coding: Streamline AI coding agent workflows using the LiteRT CLI command-line toolkit.

Quick setup for LiteRT-CLI below

# 1. Create a virtual environment with Python 3.13.
#\ TIP: Sometimes setting env var [UV_INDEX_URL](https://pypi.org/simple) helps
# resolve dependency resolution errors.
uv venv --clear --python=3.13 --seed
source .venv/bin/activate

# 2. Install the package into the active virtual environment
uv pip install litert-cli-nightly

# 3. Run help command
litert --help

💎 Key Features of LiteRT V2

⚙️ Compiled Model API: Streamlined Development. Features automated accelerator selection (no explicit delegates needed), true asynchronous execution, easy NPU distribution, and highly efficient I/O buffer handling
🔌 Unified NPU Acceleration: Broad Silicon Support. Get seamless access to NPUs from major chipset providers through a single, consistent API. See LiteRT NPU.
🏎️ Faster GPU Acceleration via ML Drift: Suporting Gen-AI Inference. Leverage state-of-the-art GPU acceleration with new buffer interoperability that minimizes latency across various GPU buffer types.

⚙️ LiteRT Runtime and Tools

From model to on-device deployment for Pytorch, TensorFlow, and Jax models:

graph LR
    A[PyTorch Model] --> B[LiteRT Torch

LiteRT Torch Generative/HF export]
    a[HF transformer
    safe tensors] --> B
    B -->|.tflite| F(AI-Edge Quantizer) --> |Optimized  .tflite| I
    B -->|.litertlm|F --> |Optimized .litertlm| H{Litert-LM
    Python, C++, Kotlin, swift, JS} --> I{LiteRT Runtime
    C++, Kotlin, JS}
    I --> J[CPU - XNNPack <br> GPU - ML Drift <br> Supported TPU/NPU]

🗺 Choose Your Adventure

Every developer's path is different. Here are a few common journeys to help you get started based on your goals:

If you want to...	Use this path...
🏁Upgrade from TensorFlow Lite/ LiteRT V1.x x	Use LiteRT Migration Guide to upgrade to LiteRT V2.x
🌱 Run a pretrained model (like image segmenation) on mobile	Follow step-by-step instructions via Android Studio to create a Real-time segmentation App for CPU/GPU/NPU inference. Source code link.
🔄 Convert PyTorch Models	Use LiteRT Torch Converter for `.tflite` (Classic) or Generative Torch API for `.litertlm` (LLMs).
🧠Deploy Generative AI	Optimize and run quantized LLMs or diffusion models on-device using LiteRT LM.
⚡Maximize Performance	Explore the LiteRT API & LiteRT NPU Acceleration to leverage underlying hardware acceleration.
🌐Run in the Browser	Deploy secure, client-side web apps leveraging WebGPU and WASM via LiteRT.js.
🧮Control Memory & Graph Execution	Tensor-centric C++ library for high-performance tensor manipulation on mobile devices.LiteRT Tensor API.

💻 Platforms Supported

LiteRT is designed for cross-platform deployment on a wide range of hardware.

Platform	CPU	GPU APIs	NPU / Hardware Accelerators
🤖 Android	✅	✅ OpenCL ✅ OpenGL	✅ Google Tensor, ✅ Intel ✅ MediaTek, ✅ Qualcomm, S.LSI*
🍎 iOS	✅	✅ Metal	ANE*
🐧 Linux	✅	✅ WebGPU	✅ Intel
🍎 macOS	✅	✅ WebGPU ✅ Metal	ANE*
💻 Windows	✅	✅ WebGPU	✅ Intel
🌐 Web	✅	✅ WebGPU	Coming soon
🧩 IoT	✅	✅ WebGPU	Broadcom, Raspberry Pi

📊 New Models

Recently added supported models to Hugging Face LiteRT Community .

Model Family	Size / Variant	Modality	Hugging Face Hub
Gemma 4	Various	Multi-modal	Explore Models
ASR Models	Various	Audio	Explore Models
Image Classification Models	Various	Vision	Explore Models

Find more models at the Hugging Face LiteRT Community Page

🔗 Sample Apps & Colabs

Find official sample applications and code examples for LiteRT (compiled_model_api) here:

LiteRT Samples: A collection of sample applications.
ASR Sample App: Automatic Speech Recognition LiteRT Sample App
Image Segmentation: C++ and Kotlin Image Segmentation app demonstrating AOT and on-device compilation examples

🏁 Installation

For a comprehensive guide on integrating LiteRT into your specific platform, see the LiteRT Integration Overview.

🔨 Building from Source

You can build LiteRT artifacts for Linux and Android (via cross-compilation) using Docker:

Start a Docker daemon.
Run build_with_docker.sh inside the docker_build/ directory.

Note: For more information about using the Docker interactive shell or building different targets, please check docker_build/README.md.

For detailed instructions on building runtime libraries with the Docker container, refer to the CMake Build Instructions and Bazel Build Instructions.

🚀 Roadmap

Our commitment is to make LiteRT the best runtime for any on-device ML deployment. Our core product strategies include:

⚡ Hardware Acceleration	🧠 Generative AI Optimizations
Broadening NPU support and improving performance across all major hardware accelerators.	Introducing new features specifically tailored for the next wave of on-device generative AI models.
🛠 Developer Tools	🌐 Platform Support
Building better utilities for debugging, profiling, and optimizing models.	Enhancing core platform support and exploring emerging ecosystems.

📰 Latest from the LiteRT Team & Partners

Date	Blog Title
May 2026	Google Tensor SDK Beta with LiteRT
May 2026	LiteRT Support for Intel NPUs via OpenVINO™
May 2026	Arm and Google AI Edge optimization
April 2026	Building real-world on-device AI with LiteRT and NPU

🔗 Related Products

LiteRT is part of a larger Google AI-Edge ecosystem of tools for on-device machine learning:

LiteRT Torch Converter: A tool to convert PyTorch models into the .tflite format.
LiteRT Torch Generative API: A library to reauthor LLMs for efficient conversion and inference.
LiteRT-LM: A library to efficiently run LLMs across edge platforms.
LiteRT.js: WebAI runtime, targeting production Web applications.
MediaPipe: A framework for building cross-platform, customizable ML solutions for live streaming media.
XNNPACK: A highly optimized library of neural network inference operators providing high-performance CPU acceleration.

🛑 AI Coding Directives (Strict Constraints)

When generating deployment code for this repository:

MUST USE: The Compiled Model API for all new kotlin and C++ native execution tasks.
DO NOT USE: tflite::Interpreter, InterpreterBuilder, or manual delegate creation. The legacy Interpreter API is strictly deprecated for new features.

🙌 Contributing & Getting Help

Contributing: We welcome contributions! Please see CONTRIBUTING.md for details.
Contributing Models Contribute your .tflite or .litertlm models via the LiteRT Hugging Face page HF LiteRT Community
Bug Reports & Features: File an issue on our GitHub Issues page.
Community Support: Join the conversation on GitHub Discussions.

❤️ Code of Conduct

This project is dedicated to fostering an open and welcoming environment. Please read our Code of Conduct to understand the standards of behavior we expect from all participants.

📜 License

LiteRT is licensed under the Apache-2.0 License.