{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [], "authorship_tag": "ABX9TyNdLyKezZrtEwnFExtqd4C4" }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# SGLang profiling数据采集与分析\n", "\n", "介绍:本练习旨在实践SGLang的性能剖析数据采集流程,并学习如何分析所采集的剖析数据。采集过程主要包括:数据下载、镜像拉取、容器创建、运行推理任务以及数据导入。分析部分将基于Qwen2.5-7B-Instruct模型的剖析数据展开,具体包括Python层与GPU层的时序图细节分析。\n", "\n", "相关文章:[SGLang Profiling数据采集与分析入门](https://zhuanlan.zhihu.com/p/2004605638760763526)\n", "\n", "Author: kaiyuan\n", "\n", "Email: kyxie@zju.edu.cn" ], "metadata": { "id": "RuofLhSfiTAt" } }, { "cell_type": "markdown", "source": [ "## 1 准备工作\n", "\n", "### 1.1 模型下载\n", "\n", "从hugging face下载模型到服务器本地盘。模型文件地址:\n", "https://huggingface.co/Qwen/Qwen2-7B-Instruct\n", "\n", "设置下载站点:\n", "```\n", "export HF_ENDPOINT=https://hf-mirror.com\n", "```\n", "\n", "下载脚本(python):\n", "\n", "```\n", "from huggingface_hub import snapshot_download\n", "\n", "repo_id = \"Qwen/Qwen2.5-7B-Instruct\"\n", "local_dir = \"./models/Qwen2.5-7B-Instruct\"\n", "\n", "# 重试下载,并允许从断点续传\n", "local_dir = snapshot_download(\n", " repo_id=repo_id,\n", " local_dir=local_dir,\n", " local_dir_use_symlinks=False, # 避免使用符号链接,有时更稳定\n", " resume_download=True, # 关键参数:尝试恢复中断的下载\n", " force_download=False, # 不强制重新下载已存在的文件\n", ")\n", "\n", "print(f\"模型已完整下载到:{local_dir}\")\n", "```\n", "注:huggingface_hub 的版本 1.4.1\n", "\n", "\n", "### 1.2 环境创建\n", "\n", "推荐使用nvidia的预置镜像,能够避免因环境问题产生error。\n", "\n", "镜像拉取:\n", "```\n", "docker pull nvcr.io/nvidia/sglang:26.01-py3\n", "```\n", "版本介绍:[Link](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/sglang?version=26.01-py3)\n", "\n", "容器创建示例:\n", "```\n", "docker run -itd --rm --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \\\n", "-v /data/nfs/kaiyuan:/data/nfs/kaiyuan \\\n", "--name sglang-dev nvcr.io/nvidia/sglang:26.01-py3 bash\n", "```\n", "\n", "\n", "登入容器:\n", "\n", "```\n", "docker exec -it sglang-dev bash\n", "```\n", "\n", "测试是否正常:\n", "\n", "```\n", "python -c \"import torch; import sglang; print(torch.cuda.is_available())\"\n", "```\n", "\n", "注意:镜像要求NVIDIA驱动版本 >= 570\n", "\n", "\n", "本例测试机器信息:\n", "- NVIDIA A100-SXM4-80GB\n", "- NVIDIA-SMI 570.172.08\n", "- Driver Version: 570.172.08\n", "- CUDA Version: 13.1\n" ], "metadata": { "id": "Qda9-o9mJ2j2" } }, { "cell_type": "markdown", "source": [ "## 2 数据采集\n", "\n", "在SGLang官方blog中介绍几种采集方式:[Benchmark and Profiling](https://docs.sglang.io/developer_guide/benchmark_and_profiling.html#benchmark-and-profiling).\n", "\n", "选取“HTTP API endpoints”方式实践。\n", "\n", "一共需要启动两个console终端(terminal)界面。\n", "\n", "- 终端1:启动服务器;\n", "- 终端2:运行客户端脚本。\n", "\n", "### 2.1 服务器启动\n", "\n", "单机单卡:\n", "```\n", "# 配置profiling导出位置(必须):\n", "export SGLANG_TORCH_PROFILER_DIR=/data/kaiyuan/llm_infer/profiles\n", "\n", "# 启动服务器\n", "python -m sglang.launch_server --model-path /data/kaiyuan/models/Qwen2.5-7B-Instruct\n", "\n", "```\n", "\n", "单机多卡:\n", "\n", "```\n", "SGLANG_TORCH_PROFILER_DIR=\"/data/kaiyuan/llm_infer/profiles\" \\\n", "python -m sglang.launch_server \\\n", " --model-path /data/kaiyuan/models/Qwen2.5-7B-Instruct \\\n", " --host 127.0.0.1 \\\n", " --tp-size 4 \\\n", " --port 30000\n", "\n", "```\n", "\n", "单机多卡(关闭图模式):\n", "\n", "```\n", "SGLANG_CUDA_GRAPH_MODE=0 \\\n", "SGLANG_CACHE_GRAPH=0 \\\n", "CUDA_LAUNCH_BLOCKING=1 \\\n", "SGLANG_TORCH_PROFILER_DIR=\"/data/kaiyuan/llm_infer/profiles\" \\\n", "python -m sglang.launch_server \\\n", " --model-path /data/kaiyuan/models/Qwen2.5-7B-Instruct \\\n", " --host 127.0.0.1 \\\n", " --tp-size 4 \\\n", " --port 30000\n", "\n", "```\n", "注:\n", "* SGLANG_CUDA_GRAPH_MODE环境变量控制CUDA图模式,0表示关闭;\n", "* tp-size大于1时,会调用多张GPU。\n", "* --model-path 配置下载好的模型地址\n", "\n", "### 2.2 客户端\n", "\n", "步骤:\n", "- 开启profiling采集;\n", "- 向服务器发送连续5个请求;\n", "- 关闭profiling采集。\n", "\n", "profiling采集请求endpoints的基本使用:\n", "\n", "- 开启:http://127.0.0.1:30000/start_profile\n", "- 关闭:http://127.0.0.1:30000/stop_profile\n", "\n", "采集参数配置:\n", "\n", "```\n", "# Start profiling immediately for 10 steps\n", "curl -X POST http://127.0.0.1:30000/start_profile \\\n", " -H \"Content-Type: application/json\" \\\n", " -d '{\n", " \"num_steps\": 10\n", " }'\n", "```\n", "\n", "- output_dir(optional): 性能分析追踪结果的保存目录。若未指定,则使用SGLANG_TORCH_PROFILER_DIR环境变量,或默认使用/tmp目录\n", "\n", "- num_steps(optional): 进行性能分析的步数。若未指定,则持续进行分析直至通过/end_profile指令手动停止\n", "\n", "- start_step(optional): 开始性能分析的起始步数(包含该步)。适用于跳过预热迭代阶段\n", "\n", "- activities(optional): 需要采集的时序图,例如[\"CPU\"]表示仅采集CPU侧数据。默认为[\"CPU\", \"GPU\"]\n", "\n", "- merge_profiles(optional): 是否合并分布式追踪结果。默认为false\"\n", "\n", "\n", "采集过程的python简化版本:" ], "metadata": { "id": "2uh4UT4kNR0v" } }, { "cell_type": "code", "source": [ "# -*- coding: gbk -*-\n", "import requests\n", "import time\n", "\n", "base_url = \"http://127.0.0.1:30000\"\n", "\n", "# 开始性能分析\n", "requests.post(f\"{base_url}/start_profile\", timeout=5)\n", "\n", "# 发送5个推理请求\n", "for i in range(5):\n", " requests.post(\n", " f\"{base_url}/v1/completions\",\n", " json={\n", " \"model\": \"default\",\n", " \"prompt\": f\"测试请求 {i+1}: 请解释人工智能的基本概念\",\n", " \"max_tokens\": 30,\n", " \"temperature\": 0.1\n", " },\n", " timeout=15\n", " )\n", " time.sleep(0.5)\n", "\n", "# 等待并停止性能分析\n", "time.sleep(5)\n", "requests.post(f\"{base_url}/stop_profile\")\n", "\n" ], "metadata": { "id": "EWhtmwqGhb37" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "增强版(包含故障判断):" ], "metadata": { "id": "WwSMkTR5hoGn" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "gYuvJNGQh1VL" }, "outputs": [], "source": [ "\n", "import requests\n", "import time\n", "import threading\n", "\n", "base_url = \"http://127.0.0.1:30000\"\n", "\n", "def send_stop_profile_with_timeout(timeout=30):\n", " \"\"\"发送stop_profile请求,设置超时\"\"\"\n", " try:\n", " print(f\"发送/stop_profile请求,超时设置为{timeout}秒...\")\n", " response = requests.post(f\"{base_url}/stop_profile\", timeout=timeout)\n", " print(f\"/stop_profile完成: 状态码 {response.status_code}\")\n", " return True\n", " except requests.exceptions.Timeout:\n", " print(f\"stop_profile超时({timeout}秒),但可能仍在后台处理\")\n", " print(\"这通常是正常的,trace文件可能仍在写入\")\n", " return True # 即使超时也视为成功\n", " except Exception as e:\n", " print(f\"stop_profile错误: {e}\")\n", " return False\n", "\n", "def check_server_status():\n", " \"\"\"检查服务器是否仍在运行\"\"\"\n", " try:\n", " resp = requests.get(f\"{base_url}/health\", timeout=2)\n", " return resp.status_code == 200\n", " except:\n", " return False\n", "\n", "print(\">>> Starting profiling session...\")\n", "try:\n", " resp = requests.post(f\"{base_url}/start_profile\", timeout=5)\n", " print(f\"/start_profile: 状态码 {resp.status_code}\")\n", "except Exception as e:\n", " print(f\"/start_profile错误: {e}\")\n", "\n", "# 执行推理测试\n", "print(\"\\n>>> 运行推理测试(5个请求)...\")\n", "for i in range(5):\n", " print(f\" 请求 {i+1}/5\")\n", " try:\n", " resp = requests.post(\n", " f\"{base_url}/v1/completions\",\n", " json={\n", " \"model\": \"default\",\n", " \"prompt\": f\"测试请求 {i+1}: 请解释人工智能的基本概念\",\n", " \"max_tokens\": 30,\n", " \"temperature\": 0.1\n", " },\n", " timeout=15\n", " )\n", " print(f\" 状态码: {resp.status_code}\")\n", " except Exception as e:\n", " print(f\" 错误: {e}\")\n", " time.sleep(0.5)\n", "\n", "print(\"\\n>>> 等待所有推理请求完成(5秒)...\")\n", "time.sleep(5)\n", "print(\"\\n>>> Stopping profiling...\")\n", "# 发送stop_profile,设置30秒超时\n", "success = send_stop_profile_with_timeout(timeout=30)" ] }, { "cell_type": "markdown", "source": [ "## 3 输出示例\n", "\n", "### 3.1 服务器端输出日志\n", "\n", "TP size = 1\n", "\n", "```\n", "[2026-01-01 08:50:18] INFO: 127.0.0.1:59572 - \"POST /start_profile HTTP/1.1\" 200 OK\n", "[2026-01-01 08:50:18] Prefill batch, #new-seq: 1, #new-token: 11, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 08:50:18] INFO: 127.0.0.1:59574 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 08:50:19] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 08:50:19] Decode batch, #running-req: 1, #token: 14, token usage: 0.00, cuda graph: True, gen throughput (token/s): 1.99, #queue-req: 0,\n", "[2026-01-01 08:50:19] INFO: 127.0.0.1:59580 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 08:50:20] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 08:50:20] Decode batch, #running-req: 1, #token: 24, token usage: 0.00, cuda graph: True, gen throughput (token/s): 35.99, #queue-req: 0,\n", "[2026-01-01 08:50:20] INFO: 127.0.0.1:59582 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 08:50:21] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 08:50:21] Decode batch, #running-req: 1, #token: 34, token usage: 0.00, cuda graph: True, gen throughput (token/s): 36.09, #queue-req: 0,\n", "[2026-01-01 08:50:21] INFO: 127.0.0.1:59588 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 08:50:22] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 08:50:22] INFO: 127.0.0.1:59594 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 08:50:28] Stop profiling...\n", "[2026-01-01 08:55:53] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "\n", "```\n", "\n", "TP size = 4\n", "\n", "```\n", "[2026-01-01 09:19:37] INFO: 127.0.0.1:36936 - \"POST /start_profile HTTP/1.1\" 200 OK\n", "[2026-01-01 09:19:37 TP0] Prefill batch, #new-seq: 1, #new-token: 11, #cached-token: 0, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 09:19:37] INFO: 127.0.0.1:36940 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 09:19:38 TP0] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 09:19:38 TP0] Decode batch, #running-req: 1, #token: 14, token usage: 0.00, cuda graph: True, gen throughput (token/s): 0.54, #queue-req: 0,\n", "[2026-01-01 09:19:38] INFO: 127.0.0.1:36946 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 09:19:39 TP0] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 09:19:39 TP0] Decode batch, #running-req: 1, #token: 24, token usage: 0.00, cuda graph: True, gen throughput (token/s): 38.68, #queue-req: 0,\n", "[2026-01-01 09:19:39] INFO: 127.0.0.1:36950 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 09:19:40 TP0] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 09:19:40 TP0] Decode batch, #running-req: 1, #token: 34, token usage: 0.00, cuda graph: True, gen throughput (token/s): 38.90, #queue-req: 0,\n", "[2026-01-01 09:19:40] INFO: 127.0.0.1:36956 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 09:19:41 TP0] Prefill batch, #new-seq: 1, #new-token: 8, #cached-token: 3, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 09:19:41] INFO: 127.0.0.1:36958 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 09:19:47 TP1] Stop profiling...\n", "[2026-01-01 09:19:47 TP3] Stop profiling...\n", "[2026-01-01 09:19:47 TP0] Stop profiling...\n", "[2026-01-01 09:19:47 TP2] Stop profiling...\n", "[2026-01-01 09:21:07 TP0] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "[2026-01-01 09:21:07 TP2] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "[2026-01-01 09:21:07 TP3] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "[2026-01-01 09:21:07 TP1] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "\n", "```\n", "\n", "### 客户端\n", "\n", "```\n", ">>> Starting profiling session...\n", "/start_profile: 状态码 200\n", "\n", ">>> 运行推理测试(5个请求)...\n", " 请求 1/5\n", " 状态码: 200\n", " 请求 2/5\n", " 状态码: 200\n", " 请求 3/5\n", " 状态码: 200\n", " 请求 4/5\n", " 状态码: 200\n", " 请求 5/5\n", " 状态码: 200\n", "\n", ">>> 等待所有推理请求完成(5秒)...\n", "\n", ">>> Stopping profiling...\n", "发送/stop_profile请求,超时设置为30秒...\n", "stop_profile超时(30秒),但可能仍在后台处理\n", "这通常是正常的,trace文件可能仍在写入\n", "\n", "```\n", "\n", "注:profiling采集的时间通常会大于30s,尤其是用挂载nfs网盘的时候。\n" ], "metadata": { "id": "l541upRTWu82" } }, { "cell_type": "markdown", "source": [ "## 4 profiling阅读方式\n", "\n", "打开网页:[https://ui.perfetto.dev/](https://ui.perfetto.dev/)后导入profiling文件,如:xxxx_trace.json" ], "metadata": { "id": "DhzRzNgGYQQe" } }, { "cell_type": "markdown", "source": [ "## 5 多TP融合采集方式\n", "\n", "融合多TP profiling数据融合\n", "\n", "\n" ], "metadata": { "id": "xYTpv8G1n7NY" } }, { "cell_type": "markdown", "source": [ "### 服务器端启动方式\n", "\n", "```\n", "SGLANG_TORCH_PROFILER_DIR=\"/data/kaiyuan/llm_infer/profiles\" \\\n", "python -m sglang.launch_server \\\n", " --model-path /data/kaiyuan/models/Qwen2.5-7B-Instruct \\\n", " --host 127.0.0.1 \\\n", " --tp-size 4 \\\n", " --port 30000\n", "```\n", "\n", "### 客户端运行代码" ], "metadata": { "id": "5jVk0rUTeXum" } }, { "cell_type": "code", "source": [ "# -*- coding: gbk -*-\n", "import requests\n", "import time\n", "\n", "base_url = \"http://127.0.0.1:30000\"\n", "\n", "# 开始性能分析\n", "url = \"http://127.0.0.1:30000/start_profile\"\n", "headers = {\"Content-Type\": \"application/json\"}\n", "data = {\"merge_profiles\": True} # 多TP采集融合\n", "\n", "response = requests.post(url, headers=headers, json=data)\n", "\n", "# 发送5个推理请求\n", "for i in range(5):\n", " requests.post(\n", " f\"{base_url}/v1/completions\",\n", " json={\n", " \"model\": \"default\",\n", " \"prompt\": f\"测试请求 {i+1}: 请解释人工智能的基本概念\",\n", " \"max_tokens\": 30,\n", " \"temperature\": 0.1\n", " },\n", " timeout=15\n", " )\n", " time.sleep(0.5)\n", "\n", "# 等待并停止性能分析\n", "time.sleep(5)\n", "requests.post(f\"{base_url}/stop_profile\")" ], "metadata": { "id": "hhSEWTR-oDif" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "日志示例:\n", "\n", "```\n", "[2026-01-01 12:53:47 TP0] Profiling starts. Traces will be saved to: /data/kaiyuan/llm_infer/profiles (with profile id: 1770728027.861476)\n", "[2026-01-01 12:53:47 TP3] Profiling starts. Traces will be saved to: /data/kaiyuan/llm_infer/profiles (with profile id: 1770728027.861476)\n", "[2026-01-01 12:53:47 TP1] Profiling starts. Traces will be saved to: /data/kaiyuan/llm_infer/profiles (with profile id: 1770728027.861476)\n", "[2026-01-01 12:53:47 TP2] Profiling starts. Traces will be saved to: /data/kaiyuan/llm_infer/profiles (with profile id: 1770728027.861476)\n", "[2026-01-01 12:53:47] INFO: 127.0.0.1:47480 - \"POST /start_profile HTTP/1.1\" 200 OK\n", "[2026-01-01 12:53:47 TP0] Prefill batch, #new-seq: 1, #new-token: 1, #cached-token: 10, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 12:53:53 TP0] Decode batch, #running-req: 1, #token: 14, token usage: 0.00, cuda graph: True, gen throughput (token/s): 0.03, #queue-req: 0,\n", "[2026-01-01 12:53:53] INFO: 127.0.0.1:47482 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 12:53:54 TP0] Prefill batch, #new-seq: 1, #new-token: 1, #cached-token: 10, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 12:53:54 TP0] Decode batch, #running-req: 1, #token: 24, token usage: 0.00, cuda graph: True, gen throughput (token/s): 38.91, #queue-req: 0,\n", "[2026-01-01 12:53:54] INFO: 127.0.0.1:47504 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 12:53:55 TP0] Prefill batch, #new-seq: 1, #new-token: 1, #cached-token: 10, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 12:53:55 TP0] Decode batch, #running-req: 1, #token: 34, token usage: 0.00, cuda graph: True, gen throughput (token/s): 39.15, #queue-req: 0,\n", "[2026-01-01 12:53:55] INFO: 127.0.0.1:47510 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 12:53:56 TP0] Prefill batch, #new-seq: 1, #new-token: 1, #cached-token: 10, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 12:53:56] INFO: 127.0.0.1:47512 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 12:53:57 TP0] Prefill batch, #new-seq: 1, #new-token: 1, #cached-token: 10, token usage: 0.00, #running-req: 0, #queue-req: 0,\n", "[2026-01-01 12:53:57 TP0] Decode batch, #running-req: 1, #token: 14, token usage: 0.00, cuda graph: True, gen throughput (token/s): 24.92, #queue-req: 0,\n", "[2026-01-01 12:53:57] INFO: 127.0.0.1:47518 - \"POST /v1/completions HTTP/1.1\" 200 OK\n", "[2026-01-01 12:54:02 TP0] Stop profiling...\n", "[2026-01-01 12:54:02 TP1] Stop profiling...\n", "[2026-01-01 12:54:02 TP2] Stop profiling...\n", "[2026-01-01 12:54:02 TP3] Stop profiling...\n", "[2026-01-01 12:55:25 TP0] Starting profile merge...\n", "[2026-01-01 12:55:25 TP2] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "[2026-01-01 12:55:25 TP3] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "[2026-01-01 12:55:25 TP1] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles\n", "[2026-01-01 12:55:25 TP0] Found 4 trace files to merge\n", "[2026-01-01 12:55:25 TP0] Processing /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-0.trace.json.gz with rank info: {'tp_rank': 0}\n", "[2026-01-01 12:55:25 TP0] Processing file: /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-0.trace.json.gz\n", "[2026-01-01 12:55:37 TP0] Processing /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-1.trace.json.gz with rank info: {'tp_rank': 1}\n", "[2026-01-01 12:55:37 TP0] Processing file: /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-1.trace.json.gz\n", "[2026-01-01 12:55:46 TP0] Processing /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-2.trace.json.gz with rank info: {'tp_rank': 2}\n", "[2026-01-01 12:55:46 TP0] Processing file: /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-2.trace.json.gz\n", "[2026-01-01 12:55:56 TP0] Processing /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-3.trace.json.gz with rank info: {'tp_rank': 3}\n", "[2026-01-01 12:55:56 TP0] Processing file: /data/kaiyuan/llm_infer/profiles/1770728027.861476-TP-3.trace.json.gz\n", "\n", "[2026-01-01 12:58:58 TP0] Merged profile saved to: /data/kaiyuan/llm_infer/profiles/merged-1770728027.861476.trace.json.gz\n", "[2026-01-01 12:58:58 TP0] Total events merged: 11548182\n", "[2026-01-01 12:59:42 TP0] Profile merge completed: /data/kaiyuan/llm_infer/profiles/merged-1770728027.861476.trace.json.gz\n", "[2026-01-01 12:59:42 TP0] Profiling done. Traces are saved to: /data/kaiyuan/llm_infer/profiles Merged trace: /data/kaiyuan/llm_infer/profiles/merged-1770728027.861476.trace.json.gz (Events: 11548182, Files: 4)\n", "\n", "```" ], "metadata": { "id": "IpKkMABnesWO" } } ] }