Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

NexaSDK

NexaSDK lets you build the smartest and fastest on-device AI with minimum energy. It is a highly performant local inference framework that runs the latest multimodal AI models locally on NPU, GPU, and CPU - across Android, Windows, and Linux devices with a few lines of code.

NexaSDK supported latest models weeks or months before anyone else — Qwen3-VL, DeepSeek-OCR, Gemma3n (Vision), and more.

Star this repo to keep up with exciting updates and new releases about latest on-device AI capabilities.

🏆 Recognized Milestones

🚀 Quick Start

PlatformLinks
🖥️ CLIQuick StartDocs
🐍 PythonQuick StartDocs
🤖 AndroidQuick StartDocs
🐳 Linux DockerQuick StartDocs

🖥️ CLI

Download:

WindowsLinux
arm64 (Qualcomm NPU)arm64
x64x64

NPU Access Token (required for NPU models):

Note: Our previous token validation service has been deprecated. For any NPU usage, simply set the access token below — no additional registration or validation is needed.

For Windows:

$env:NEXA_TOKEN="key/eyJhY2NvdW50Ijp7ImlkIjoiNDI1Y2JiNWQtNjk1NC00NDYxLWJiOWMtYzhlZjBiY2JlYzA2In0sInByb2R1Y3QiOnsiaWQiOiJkYjI4ZTNmYy1mMjU4LTQ4ZTctYmNkYi0wZmE4YjRkYTJhNWYifSwicG9saWN5Ijp7ImlkIjoiMmYyOWQyMjctNDVkZS00MzQ3LTg0YTItMjUwNTYwMmEzYzMyIiwiZHVyYXRpb24iOjMxMTA0MDAwMH0sInVzZXIiOnsiaWQiOiI3MGE2YzA4NS1jYjc3LTQ3YmEtOWUxNC1lNjFjYTA2ZThmZjUiLCJlbWFpbCI6ImFsYW40QG5leGE0YWkuY29tIn0sImxpY2Vuc2UiOnsiaWQiOiI4OTlhZGQ2NS1lOTI2LTQ2M2ItODllNi0xMjc0NzM3ZjA1MzYiLCJjcmVhdGVkIjoiMjAyNS0wOS0wNlQwMDo1MzozNi4yMDNaIiwiZXhwaXJ5IjoiMjAzNS0xMi0zMVQyMzo1OTo1OS4wMDBaIn19.BXoUHIEzFMuuZbBT7RvsKO9nTi5950C6kHO64blF7XBnfKvZ6ClA8a55tmszI1ZWdngzpNFTzMM5PV5euuzMCA=="

For Linux / Android adb shell:

export NEXA_TOKEN="key/eyJhY2NvdW50Ijp7ImlkIjoiNDI1Y2JiNWQtNjk1NC00NDYxLWJiOWMtYzhlZjBiY2JlYzA2In0sInByb2R1Y3QiOnsiaWQiOiJkYjI4ZTNmYy1mMjU4LTQ4ZTctYmNkYi0wZmE4YjRkYTJhNWYifSwicG9saWN5Ijp7ImlkIjoiMmYyOWQyMjctNDVkZS00MzQ3LTg0YTItMjUwNTYwMmEzYzMyIiwiZHVyYXRpb24iOjMxMTA0MDAwMH0sInVzZXIiOnsiaWQiOiI3MGE2YzA4NS1jYjc3LTQ3YmEtOWUxNC1lNjFjYTA2ZThmZjUiLCJlbWFpbCI6ImFsYW40QG5leGE4YWkuY29tIn0sImxpY2Vuc2UiOnsiaWQiOiI4OTlhZGQ2NS1lOTI2LTQ2M2ItODllNi0xMjc0NzM3ZjA1MzYiLCJjcmVhdGVkIjoiMjAyNS0wOS0wNlQwMDo1MzozNi4yMDNaIiwiZXhwaXJ5IjoiMjAzNS0xMi0zMVQyMzo1OTo1OS4wMDBaIn19.BXoUHIEzFMuuZbBT7RvsKO9nTi5950C6kHO64blF7XBnfKvZ6ClA8a55tmszI1ZWdngzpNFTzMM5PV5euuzMCA=="

Run your first model:

# Chat with Qwen3 nexa infer ggml-org/Qwen3-1.7B-GGUF # Multimodal: drag images into the CLI nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF # NPU (Windows arm64 with Snapdragon X Elite) nexa infer NexaAI/OmniNeural-4B
  • Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
  • Formats: GGUF, NEXA
  • 📖 CLI Reference Docs

🐍 Python SDK

pip install nexaai
from nexaai import LLM, GenerationConfig, ModelConfig, LlmChatMessage llm = LLM.from_(model="NexaAI/Qwen3-0.6B-GGUF", config=ModelConfig()) conversation = [ LlmChatMessage(role="user", content="Hello, tell me a joke") ] prompt = llm.apply_chat_template(conversation) for token in llm.generate_stream(prompt, GenerationConfig(max_tokens=100)): print(token, end="", flush=True)
  • Models: LLM, Multimodal, ASR, OCR, Rerank, Object Detection, Image Generation, Embedding
  • Formats: GGUF, NEXA
  • 📖 Python SDK Docs

🤖 Android SDK

Add to your app/AndroidManifest.xml

<application android:extractNativeLibs="true">

Add to your build.gradle.kts:

dependencies { implementation("ai.nexa:core:0.0.19") }
// Initialize SDK NexaSdk.getInstance().init(this) // Load and run model VlmWrapper.builder() .vlmCreateInput(VlmCreateInput( model_name = "omni-neural", model_path = "/data/data/your.app/files/models/OmniNeural-4B/files-1-1.nexa", plugin_id = "npu", config = ModelConfig() )) .build() .onSuccess { vlm -> vlm.generateStreamFlow("Hello!", GenerationConfig()).collect { print(it) } }
  • Requirements: Android minSdk 27, Qualcomm Snapdragon 8 Gen 4 Chip
  • Models: LLM, Multimodal, ASR, OCR, Rerank, Embedding
  • NPU Models: Supported Models
  • 📖 Android SDK Docs

🐳 Linux Docker

docker pull nexa4ai/nexasdk:latest export NEXA_TOKEN="your_token_here" docker run --rm -it --privileged \ -e NEXA_TOKEN \ nexa4ai/nexasdk:latest infer NexaAI/Granite-4.0-h-350M-NPU

⚙️ Features & Comparisons

FeaturesNexaSDKOllamallama.cppLM Studio
NPU support✅ NPU-first
Android SDK support✅ NPU/GPU/CPU support⚠️⚠️
Linux support (Docker image)
Day-0 model support⚠️
Full multimodality support✅ Image, Audio, Text, Embedding, Rerank, ASR, TTS⚠️⚠️⚠️
Cross-platform support✅ Desktop, Mobile (Android), Automotive, IoT (Linux)⚠️⚠️⚠️
One line of code to run⚠️
OpenAI-compatible API + Function calling

Legend: ✅ Supported   |   ⚠️ Partial or limited support   |   ❌ No

🙏 Acknowledgements

We would like to thank the following projects:

📄 License

NexaSDK uses a dual licensing model:

CPU/GPU Components

Licensed under Apache License 2.0.

NPU Components

  • Personal Use: Free license key available from Nexa AI Model Hub. Each key activates 1 device for NPU usage.
  • Commercial Use: Contact hello@nexa.ai for licensing.

🤝 Contact & Community Support

Want more model support, backend support, device support or other features? We'd love to hear from you!

Feel free to submit an issue on our GitHub repository with your requests, suggestions, or feedback. Your input helps us prioritize what to build next.

关于 About

Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Supporting OpenAI GPT-OSS, IBM Granite-4, Qwen-3-VL, Gemma-3n, Ministral-3, and more.
gemma3gogpt-ossgranite4llamallama3llmon-device-aiphi3qwen3qwen3vlsdkstable-diffusionvlm

语言 Languages

Kotlin32.3%
Go29.7%
Python24.7%
Jupyter Notebook6.0%
HTML4.2%
Shell1.4%
PowerShell0.5%
Inno Setup0.4%
Java0.3%
Makefile0.2%
Dockerfile0.1%
JavaScript0.0%
CSS0.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
1493
Total Commits
峰值: 127次/周
Less
More

核心贡献者 Contributors