LiteRT-LM

LiteRT-LM is Google's production-ready orchestration layer to run LLMs with LiteRT, engineered for high-performance, cross-platform execution.

🔗 Product Website

🔥 What's New: `v0.13`

Gemma4 12B Support: LiteRT-LM now supports Gemma4 12B (blog).
LiteRT LM CLI update: Support OpenAI API Compatible server (doc)
Swift package for MacOS: Swift package supports both macOS and iOS now.
🚀 Agent skill support: Support creating a standalone LiteRT-LM Android demo app with backend selection and multi-modality support. See README.md for the example prompt.

👉 Try Gemma4-E4B with MTP on Linux, macOS, Windows or Raspberry Pi with the LiteRT-LM CLI:

litert-lm run  \
   --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
   gemma-4-E4B-it.litertlm \
   --backend=gpu \
   --enable-speculative-decoding=true \
   --prompt="What is the capital of France?"

🌟 Key Features

📱 Cross-Platform Support: Android, iOS, Web, Desktop, and IoT (e.g. Raspberry Pi).
🚀 Hardware Acceleration: Peak performance via GPU and NPU accelerators.
👁️ Multi-Modality: Support for vision and audio inputs.
🔧 Tool Use: Function calling support for agentic workflows.
📚 Broad Model Support: Gemma, Llama, Phi-4, Qwen, and more.

🚀 Production-Ready for Google's Products

LiteRT-LM powers on-device GenAI experiences in Chrome, Chromebook Plus, Pixel Watch, and more.

You can also try the Google AI Edge Gallery app to run models immediately on your device.

Install the app today from Google Play	Install the app today from App Store

📰 Blogs & Announcements

Link	Description
Bringing Gemma 4 12B to your Laptop: Unlocking Local, Agentic Workflows with Google AI Edge	Bring agentic, multimodal AI capabilities to everyday laptops, enabling local data processing and visual insight generation.
Blazing-fast on-device GenAI with LiteRT-LM	Unlock Gemma 4's full potential with blazing speed and incredible efficiency using newly added Swift, JavaScript, and Flutter APIs.
Accelerating Gemma 4: faster inference with multi-token prediction drafters	An overview of how Multi-Token Prediction (MTP) drafters are making Gemma 4 models up to 3x faster at inference.
Bring state-of-the-art agentic skills to the edge with Gemma 4	Deploy Gemma 4 in-app and across a broader range of devices with stellar performance and broad reach using LiteRT-LM.
On-device GenAI in Chrome, Chromebook Plus and Pixel Watch	Deploy language models on wearables and browser-based platforms using LiteRT-LM at scale.
On-device Function Calling in Google AI Edge Gallery	Explore how to fine-tune FunctionGemma and enable function calling capabilities powered by LiteRT-LM Tool Use APIs.
Google AI Edge small language models, multimodality, and function calling	Latest insights on RAG, multimodality, and function calling for edge language models.

🏃 Quick Start

🔗 Key Links

👉 Technical Overview including performance benchmarks, model support, and more.
👉 LiteRT-LM CLI Guide including installation, getting started, and advanced usage.

⚡ Quick Try (No Code)

Try LiteRT-LM immediately from your terminal without writing a single line of code using uv:

uv tool install litert-lm

litert-lm run \
  --from-huggingface-repo=google/gemma-3n-E2B-it-litert-lm \
  gemma-3n-E2B-it-int4 \
  --prompt="What is the capital of France?"

📚 Supported Language APIs

Ready to get started? Explore our language-specific guides and setup instructions.

Language	Status	Best For...	Documentation
Python	✅ Stable	Prototyping & Scripting	Python Guide
Kotlin	✅ Stable	Android apps & JVM	Kotlin Guide
Swift	🚀 Early Preview	Native iOS & macOS	Swift Guide
JavaScript (web)	🚀 Early Preview	Browser environments	JavaScript Guide
Flutter	🚀 Community	Cross-platform mobile	Flutter Guide
C++	✅ Stable	High-performance native	C++ Guide

🏗️ Build From Source

This guide shows how you can compile LiteRT-LM from source. If you want to build the program from source, you should checkout the stable tag.

📦 Releases

v0.13: Support Gemma4 12B. Added Agent skill support for Android demo app, OpenAI API compatible server in CLI, and MacOS support in Swift package.
v0.12: Added early preview of Swift and Web JavaScript APIs, and community Flutter support. Updated LiteRT-LM CLI to have full CPU and GPU backend support across Linux, macOS, and Windows.
v0.11: Support Single Position Multi-token Prediction (MTP) for Gemma 4. Expand LiteRT-LM CLI to run natively on Windows with CPU and GPU backends.
v0.10: Deploy Gemma 4 with stellar performance (blog) and introduce LiteRT-LM CLI.
v0.9: Improvements to function calling capabilities, better app performance stability.
v0.8: Desktop GPU support and Multi-Modality.
v0.7: NPU acceleration for Gemma models.

For a full list of releases, see GitHub Releases.

LiteRT-LM

🔥 What's New: `v0.13`

🌟 Key Features

🚀 Production-Ready for Google's Products

📰 Blogs & Announcements

🏃 Quick Start

🔗 Key Links

⚡ Quick Try (No Code)

📚 Supported Language APIs

🏗️ Build From Source

📦 Releases

关于 About

语言 Languages

提交活跃度 Commit Activity

核心贡献者 Contributors

LiteRT-LM

🔥 What's New: v0.13

🌟 Key Features

🚀 Production-Ready for Google's Products

📰 Blogs & Announcements

🏃 Quick Start

🔗 Key Links

⚡ Quick Try (No Code)

📚 Supported Language APIs

🏗️ Build From Source

📦 Releases

关于 About

语言 Languages

提交活跃度 Commit Activity

核心贡献者 Contributors

🔥 What's New: `v0.13`