Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

vLLM GGUF Quantization Plugin

This plugin provides out-of-tree GGUF quantization support for vLLM after in-tree support deprecation (vllm-project/vllm#39583).

Installation

Prerequisites

  • CUDA toolkit or ROCm toolkit

We recommend uv for package management. If you don't have it installed:

curl -LsSf https://astral.sh/uv/install.sh | sh

From Source

  1. Clone this repository:

    git clone https://github.com/vllm-project/vllm-gguf-plugin
    cd vllm-gguf-plugin
  2. Install the plugin in development mode:

    uv pip install -e . --torch-backend=auto

Or install directly:

uv pip install . --torch-backend=auto

Development

uv pip install -e .[dev] --torch-backend=auto
pre-commit install
pre-commit run --all-files

The same hooks also run in GitHub Actions on every push and pull request.

Usage

vllm serve Qwen/Qwen3-0.6B-GGUF:Q8_0 --tokenizer Qwen/Qwen3-0.6B

关于 About

vLLM Quantization plugin for GGUF

语言 Languages

Python55.7%
Cuda30.3%
C13.5%
C++0.4%
Shell0.2%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
19
Total Commits
峰值: 11次/周
Less
More

核心贡献者 Contributors