Github

代码库

Agent skills for vLLM
Shell
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python
vLLM Quantization plugin for GGUF
Python
Community maintained hardware plugin for vLLM on Apple Silicon
Python
Cost-efficient and pluggable Infrastructure components for GenAI inference
Go
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python
compressionquantization
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Python
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Go
ai-gatewaybert-classificationfine-tuninggolanghuggingface-candlehuggingface-transformerskubernetesllmllmroutermcpmixture-of-modelsopenclawpii-detectionprompt-engineeringprompt-guardrustsemantic-routervllm
Common recipes to run vLLM
JavaScript
A framework for efficient model inference with omni-modality models
Python
audio-generationdiffusionimage-generationinferencemodel-servingmultimodalpytorchtransformervideo-generationworld-model
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
amdblackwellcudadeepseekdeepseek-v3gptgpt-ossinferencekimillamallmllm-servingmodel-servingmoeopenaipytorchqwenqwen3tputransformer
Community maintained hardware plugin for vLLM on Ascend
C++
ascendinferencellmllm-servingllmopsmlopsmodel-servingtransformervllm