{ "cells": [ { "metadata": {}, "cell_type": "markdown", "source": [ "# vLLM 显存可视化\n", "\n", "Visualize vLLM Memory with PyTorch\n", "\n", "介绍:使用Torch Snapshot对vLLM运行时的内存状态进行采集,并将数据导入Web端进行可视化展示。通过该方法,能够清晰、直观地掌握vLLM内部主要的显存分配与使用情况。\n", "\n", "**本示例要在GPU/NPU机器上完成**\n", "\n", "\n", "相关文章:[vLLM显存管理详解](https://zhuanlan.zhihu.com/p/1916529253169734444)\n", "\n", "Author: kaiyuan\n", "\n", "Email: kyxie@zju.edu.cn" ], "id": "343a0e72dfe0a2fb" }, { "metadata": {}, "cell_type": "markdown", "source": [ "## 1 数据采集\n", "\n", "\n", "pickle数据,打开的web地址:[Link](https://docs.pytorch.org/memory_viz)\n", "\n", "**注意:** 如果vLLM版本较高,主进程无法捕捉数据,需要开启:\n", "\n", "os.environ[\"VLLM_ENABLE_V1_MULTIPROCESSING\"] = \"0\"\n", "\n", "\n", "\n" ], "id": "b325bf01212134ad" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### 1.1 V100机器示例\n", "\n", "测试机器信息:\n", "- Tesla V100-SXM2-32GB\n", "- NVIDIA-SMI 545.23.08\n", "- Driver Version: 545.23.08\n", "- CUDA Version: 12.3\n", "\n", "镜像信息:nvcr.io/nvidia/pytorch:25.04-py3\n", "\n", "注:该镜像未带vLLM版本\n", "\n", "git clone https://github.com/vllm-project/vllm.git\n", "\n", "git checkout v0.9.0\n", "\n", "模型: Qwen2.5-7B-Instruct\n", "\n", "代码:" ], "id": "355c93838caa5052" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# coding=UTF-8\n", "import torch\n", "from vllm import LLM, SamplingParams\n", "\n", "\n", "if __name__ == \"__main__\":\n", " torch.cuda.memory._record_memory_history(max_entries=100000)\n", " model_name = \"/home/kaiyuan/models/Qwen2.5-7B-Instruct\" # 替换为已下载的模型地址\n", " llm = LLM(model=model_name, dtype='float16')\n", " n = 16\n", " # 准备输入提示\n", " prompts = [\n", " \"Hello, I'm kaiyuan\",\n", " \"Do you subscribe InfraTech?\",\n", " ]\n", "\n", " # 设置采样参数\n", " sampling_params = SamplingParams(\n", " temperature=0.8, # 控制生成文本的随机性,值越高越随机\n", " top_p=0.95, # 控制采样范围,值越高生成文本越多样化\n", " max_tokens=50, # 生成的最大 token 数量\n", " n=n\n", " )\n", " outputs = llm.generate(prompts, sampling_params)\n", " torch.cuda.memory._dump_snapshot(\"vllm_snapshot.pickle\")\n", " torch.cuda.memory._record_memory_history(enabled=None)\n" ], "id": "51a3f97b78e3c86d" }, { "metadata": {}, "cell_type": "markdown", "source": [ "### 1.2 NPU 910B 机器示例\n", "\n", "测试机器信息:\n", "- 910B1\n", "- npu-smi 23.0.6\n", "- Version: 23.0.6\n", "\n", "HDK版本:cat /usr/local/Ascend/driver/version.info\n", "\n", "```\n", "Version=23.0.6\n", "ascendhal_version=7.35.19\n", "aicpu_version=1.0\n", "tdt_version=1.0\n", "log_version=1.0\n", "prof_version=2.0\n", "dvppkernels_version=1.1\n", "tsfw_version=1.0\n", "Innerversion=V100R001C15SPC009B220\n", "compatible_version=[V100R001C30],[V100R001C13],[V100R001C15]\n", "compatible_version_fw=[7.0.0,7.1.99]\n", "package_version=23.0.6\n", "```\n", "\n", "镜像信息:swr.cn-southwest-2.myhuaweicloud.com/ei_ascendcloud_devops/vllm-ascend:v0.8.5rc1-openeuler\n", "\n", "\n", "模型: Qwen2.5-7B-Instruct\n", "\n", "代码:" ], "id": "b293a38b759ee024" }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "# coding=UTF-8\n", "import torch_npu\n", "from torch_npu.contrib import transfer_to_npu\n", "from vllm import LLM, SamplingParams\n", "\n", "\n", "if __name__ == \"__main__\":\n", " torch_npu.npu.memory._record_memory_history(max_entries=100000)\n", " model_name = \"/home/kaiyuan/models/Qwen2.5-7B-Instruct\" # 替换为已下载的模型地址\n", " llm = LLM(model=model_name, dtype='float16')\n", " n = 16\n", " # 准备输入提示\n", " prompts = [\n", " \"Hello, I'm kaiyuan\",\n", " \"Do you subscribe InfraTech?\",\n", " ]\n", "\n", " # 设置采样参数\n", " sampling_params = SamplingParams(\n", " temperature=0.8, # 控制生成文本的随机性,值越高越随机\n", " top_p=0.95, # 控制采样范围,值越高生成文本越多样化\n", " max_tokens=50, # 生成的最大 token 数量\n", " n=n\n", " )\n", " outputs = llm.generate(prompts, sampling_params)\n", " torch_npu.npu.memory._dump_snapshot(\"vllm_snapshot.pickle\")\n", " torch_npu.npu.memory._record_memory_history(enabled=None)" ], "id": "28a037d5de19e36a" }, { "metadata": {}, "cell_type": "markdown", "source": [ "\n", "## 2 导入web端可视化\n", "\n", "运行结束将生成的vllm_snapshot.pickle文件导入到[https://docs.pytorch.org/memory_viz](https://docs.pytorch.org/memory_viz)。" ], "id": "e48a7618acd9252e" } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.6" } }, "nbformat": 4, "nbformat_minor": 5 }