vLLM GGUF Quantization Plugin

This plugin provides out-of-tree GGUF quantization support for vLLM after in-tree support deprecation (vllm-project/vllm#39583).

Installation

We recommend uv for package management. If you don't have it installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone this repository:

git clone https://github.com/vllm-project/vllm-gguf-plugin
cd vllm-gguf-plugin

Install the plugin in development mode:

uv pip install -e . --torch-backend=auto

Or install directly:

uv pip install . --torch-backend=auto

uv pip install -e .[dev] --torch-backend=auto
pre-commit install
pre-commit run --all-files

The same hooks also run in GitHub Actions on every push and pull request.

vllm serve Qwen/Qwen3-0.6B-GGUF:Q8_0 --tokenizer Qwen/Qwen3-0.6B