Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

LiteRT

LiteRT Logo

Google's on-device runtime for high-performance ML & GenAI deployment on edge platforms.

📖 Get Started | 🤝 Contributing | 📜 License | 🛡 Security Policy | 📄 Documentation


🛠 Build Status

Nightly BuildsContinuous BuildsOther Builds
Linux Nightly Wheel
macOS Nightly Wheel
Windows Nightly Wheel
macOS arm64
Linux x86_64
Windows x86_64
CMake Android Linux x86_64

📖 LiteRT

LiteRT continues the legacy of TensorFlow Lite as the trusted, high-performance runtime for on-device AI. Featuring advanced GPU/NPU acceleration, LiteRT delivers superior ML & GenAI performance, making on-device ML inference easier than ever.

🚀 What's New

  • 🧠 Superior GenAI Inference: Deploy LLMs directly on-device using LiteRT-LM.
  • 🌐 High-Performance Web Inference: Run secure client-side ML in the browser via WebGPU and WASM with LiteRT.js.
  • 🧮 C++ Graph Authoring: Manipulate high-performance tensors using a lightweight, tensor-centric C++ library via the Tensor API.
  • 🤖 Accelerated Agentic Coding: Streamline AI coding agent workflows using the LiteRT CLI command-line toolkit.

Quick setup for LiteRT-CLI below

# 1. Create a virtual environment with Python 3.13.
#\ TIP: Sometimes setting env var [UV_INDEX_URL](https://pypi.org/simple) helps
# resolve dependency resolution errors.
uv venv --clear --python=3.13 --seed
source .venv/bin/activate

# 2. Install the package into the active virtual environment
uv pip install litert-cli-nightly

# 3. Run help command
litert --help

💎 Key Features of LiteRT V2

  • ⚙️ Compiled Model API: Streamlined Development. Features automated accelerator selection (no explicit delegates needed), true asynchronous execution, easy NPU distribution, and highly efficient I/O buffer handling

  • 🔌 Unified NPU Acceleration: Broad Silicon Support. Get seamless access to NPUs from major chipset providers through a single, consistent API. See LiteRT NPU.

  • 🏎️ Faster GPU Acceleration via ML Drift: Suporting Gen-AI Inference. Leverage state-of-the-art GPU acceleration with new buffer interoperability that minimizes latency across various GPU buffer types.


⚙️ LiteRT Runtime and Tools

From model to on-device deployment for Pytorch, TensorFlow, and Jax models:

graph LR
    A[PyTorch Model] --> B[LiteRT Torch

LiteRT Torch Generative/HF export]
    a[HF transformer
    safe tensors] --> B
    B -->|.tflite| F(AI-Edge Quantizer) --> |Optimized  .tflite| I
    B -->|.litertlm|F --> |Optimized .litertlm| H{Litert-LM
    Python, C++, Kotlin, swift, JS} --> I{LiteRT Runtime
    C++, Kotlin, JS}
    I --> J[CPU - XNNPack <br> GPU - ML Drift <br> Supported TPU/NPU]

🗺 Choose Your Adventure

Every developer's path is different. Here are a few common journeys to help you get started based on your goals:

If you want to...Use this path...
🏁Upgrade from TensorFlow Lite/ LiteRT V1.x xUse LiteRT Migration Guide to upgrade to LiteRT V2.x
🌱 Run a pretrained model (like image segmenation) on mobileFollow step-by-step instructions via Android Studio to create a Real-time segmentation App for CPU/GPU/NPU inference. Source code link.
🔄 Convert PyTorch ModelsUse LiteRT Torch Converter for .tflite (Classic) or Generative Torch API for .litertlm (LLMs).
🧠Deploy Generative AIOptimize and run quantized LLMs or diffusion models on-device using LiteRT LM.
⚡Maximize PerformanceExplore the LiteRT API & LiteRT NPU Acceleration to leverage underlying hardware acceleration.
🌐Run in the BrowserDeploy secure, client-side web apps leveraging WebGPU and WASM via LiteRT.js.
🧮Control Memory & Graph ExecutionTensor-centric C++ library for high-performance tensor manipulation on mobile devices.LiteRT Tensor API.

💻 Platforms Supported

LiteRT is designed for cross-platform deployment on a wide range of hardware.

PlatformCPUGPU APIsNPU / Hardware Accelerators
🤖 Android✅ OpenCL
✅ OpenGL
✅ Google Tensor, ✅ Intel ✅ MediaTek, ✅ Qualcomm, S.LSI*
🍎 iOS✅ MetalANE*
🐧 Linux✅ WebGPU✅ Intel
🍎 macOS✅ WebGPU
✅ Metal
ANE*
💻 Windows✅ WebGPU✅ Intel
🌐 Web✅ WebGPUComing soon
🧩 IoT✅ WebGPUBroadcom*, Raspberry Pi*

📊 New Models

Recently added supported models to Hugging Face LiteRT Community .

Model FamilySize / VariantModalityHugging Face Hub
Gemma 4VariousMulti-modalExplore Models
ASR ModelsVariousAudioExplore Models
Image Classification ModelsVariousVisionExplore Models

Find more models at the Hugging Face LiteRT Community Page


🔗 Sample Apps & Colabs

Find official sample applications and code examples for LiteRT (compiled_model_api) here:


🏁 Installation

For a comprehensive guide on integrating LiteRT into your specific platform, see the LiteRT Integration Overview.

🔨 Building from Source

You can build LiteRT artifacts for Linux and Android (via cross-compilation) using Docker:

  1. Start a Docker daemon.
  2. Run build_with_docker.sh inside the docker_build/ directory.

Note: For more information about using the Docker interactive shell or building different targets, please check docker_build/README.md.

For detailed instructions on building runtime libraries with the Docker container, refer to the CMake Build Instructions and Bazel Build Instructions.

🚀 Roadmap

Our commitment is to make LiteRT the best runtime for any on-device ML deployment. Our core product strategies include:

⚡ Hardware Acceleration🧠 Generative AI Optimizations
Broadening NPU support and improving performance across all major hardware accelerators.Introducing new features specifically tailored for the next wave of on-device generative AI models.
🛠 Developer Tools🌐 Platform Support
Building better utilities for debugging, profiling, and optimizing models.Enhancing core platform support and exploring emerging ecosystems.

📰 Latest from the LiteRT Team & Partners

DateBlog Title
May 2026Google Tensor SDK Beta with LiteRT
May 2026LiteRT Support for Intel NPUs via OpenVINO™
May 2026Arm and Google AI Edge optimization
April 2026Building real-world on-device AI with LiteRT and NPU

👉 Read more on the official LiteRT Developer Website


🔗 Related Products

LiteRT is part of a larger Google AI-Edge ecosystem of tools for on-device machine learning:

  • LiteRT Torch Converter: A tool to convert PyTorch models into the .tflite format.
  • LiteRT Torch Generative API: A library to reauthor LLMs for efficient conversion and inference.
  • LiteRT-LM: A library to efficiently run LLMs across edge platforms.
  • LiteRT.js: WebAI runtime, targeting production Web applications.
  • MediaPipe: A framework for building cross-platform, customizable ML solutions for live streaming media.
  • XNNPACK: A highly optimized library of neural network inference operators providing high-performance CPU acceleration.

🛑 AI Coding Directives (Strict Constraints)

When generating deployment code for this repository:

  • MUST USE: The Compiled Model API for all new kotlin and C++ native execution tasks.
  • DO NOT USE: tflite::Interpreter, InterpreterBuilder, or manual delegate creation. The legacy Interpreter API is strictly deprecated for new features.

🙌 Contributing & Getting Help

  • Contributing: We welcome contributions! Please see CONTRIBUTING.md for details.
  • Contributing Models Contribute your .tflite or .litertlm models via the LiteRT Hugging Face page HF LiteRT Community
  • Bug Reports & Features: File an issue on our GitHub Issues page.
  • Community Support: Join the conversation on GitHub Discussions.

❤️ Code of Conduct

This project is dedicated to fostering an open and welcoming environment. Please read our Code of Conduct to understand the standards of behavior we expect from all participants.

📜 License

LiteRT is licensed under the Apache-2.0 License.

关于 About

LiteRT, successor to TensorFlow Lite. is Google's On-device framework for high-performance ML & GenAI deployment on edge platforms, via efficient conversion, runtime, and optimization

语言 Languages

C++71.5%
MLIR7.2%
HTML5.3%
Python4.5%
Starlark4.0%
C1.9%
Jupyter Notebook1.2%
Java1.1%
Objective-C++0.7%
TypeScript0.6%
CMake0.4%
Shell0.4%
Objective-C0.3%
Swift0.3%
JavaScript0.2%
Rust0.2%
Kotlin0.1%
PowerShell0.0%
C#0.0%
Dockerfile0.0%
Ruby0.0%
Linker Script0.0%
Makefile0.0%
CSS0.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
3489
Total Commits
峰值: 113次/周
Less
More

核心贡献者 Contributors