Public
Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md
vLLM GGUF Quantization Plugin
This plugin provides out-of-tree GGUF quantization support for vLLM after in-tree support deprecation (vllm-project/vllm#39583).
Installation
Prerequisites
- CUDA toolkit or ROCm toolkit
We recommend uv for package management. If you don't have it installed:
curl -LsSf https://astral.sh/uv/install.sh | shFrom Source
-
Clone this repository:
git clone https://github.com/vllm-project/vllm-gguf-plugin cd vllm-gguf-plugin -
Install the plugin in development mode:
uv pip install -e . --torch-backend=auto
Or install directly:
uv pip install . --torch-backend=autoDevelopment
uv pip install -e .[dev] --torch-backend=auto
pre-commit install
pre-commit run --all-filesThe same hooks also run in GitHub Actions on every push and pull request.
Usage
vllm serve Qwen/Qwen3-0.6B-GGUF:Q8_0 --tokenizer Qwen/Qwen3-0.6B