{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# LLM训练全链路最佳实践\n", "\n", "随着人工智能技术的飞速发展，大型语言模型（LLMs）已经成为自然语言处理领域的核心驱动力。本文档旨在概述使用modelscope生态进行LLM训练的全链路最佳实践，涵盖数据下载、数据预处理、模型训练、模型评估完整流程。\n", "\n", "主要内容\n", "\n", "教程以知乎评论数据集为例，使用LoRA微调模型，让AI生成的文本没有那么强的“AI味”\n", "\n", "本教程涉及以下框架的安装和使用：\n", "1. modelscope：提供模型、数据集下载能力 \n", "2. data-juicer：提供数据集处理能力\n", "1. ms-swift：提供模型训练、推理能力\n", "1. evalscope：提供模型评测能力\n", "\n", "## 环境准备\n", "\n", "安装modelscope、data-juicer、swift、evalscope" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecutionIndicator": { "show": false }, "execution": { "iopub.execute_input": "2024-12-23T11:49:57.724413Z", "iopub.status.busy": "2024-12-23T11:49:57.723990Z", "iopub.status.idle": "2024-12-23T11:49:59.154300Z", "shell.execute_reply": "2024-12-23T11:49:59.153737Z", "shell.execute_reply.started": "2024-12-23T11:49:57.724383Z" }, "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found existing installation: tensorflow 2.18.0\n", "Uninstalling tensorflow-2.18.0:\n", " Successfully uninstalled tensorflow-2.18.0\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0mNote: you may need to restart the kernel to use updated packages.\n" ] } ], "source": [ "# %pip install modelscope[framework] # 模型库，notebook已预装\n", "%pip install ms-swift[llm] -U # 训练库\n", "%pip install evalscope -U # 评测库\n", "%pip install py-data-juicer[sci] # 数据处理库\n", "%pip install datasets==3.0.1 pydantic==2.0 tf-keras\n", "%pip uninstall tensorflow -y # 不需要，跟环境冲突" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ！！重启notebook环境！！\n", "------" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 数据集准备\n", "\n", "使用modelscope下载数据集，初步处理数据集，提取需要的字段，并处理成data-juicer需要的格式" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-12-23T11:50:09.836788Z", "iopub.status.busy": "2024-12-23T11:50:09.836608Z", "iopub.status.idle": "2024-12-23T11:50:59.750489Z", "shell.execute_reply": "2024-12-23T11:50:59.749933Z", "shell.execute_reply.started": "2024-12-23T11:50:09.836767Z" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/usr/local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Dataset({\n", " features: ['INSTRUCTION', 'RESPONSE', 'SOURCE', 'METADATA'],\n", " num_rows: 1006218\n", "})\n", "{'INSTRUCTION': '怎么说服男朋友买烤箱？',\n", " 'METADATA': '{\"question_id\": 357137111.0, \"answer_id\": 914332816.0, \"url\": '\n", " '\"https://www.zhihu.com/question/357137111/answer/914332816\", '\n", " '\"upvotes\": \"赞同 15\", \"answer_creation_time\": '\n", " '\"2019-11-28T12:01:22.000Z\"}',\n", " 'RESPONSE': 'emmmmm，首先想说的是，我买厨房用品一般是不用「说服」的，只是在厨房堆的满满当当的情况下会象征性的问一下我老公，他就会回答我说：你看看你还有地方放吗。然后我会思考一下，如果是特别想买的，就不会问他了。自己决定就好。 '\n", " '比如，前几天我又买了两个盘子~~~~他还不知道。可以给题主看看我有多少的锅具：自家炒菜用什么锅好？各有什么优缺点？ '\n", " '说回烤箱的问题，买的时候处于热恋期，我告诉他我有一个买烤箱的计划。虽然他基本不吃点心，也不喜欢烘焙，但那个时期的他欣然同意并热情洋溢的给我选烤箱。可能是他有憧憬我会给他做什么好吃的吧。又因为我是一个不怎么吃甜食的湖南人，烤箱在我家烘焙的使用率很低。 '\n", " '但是！！你还是可以告诉他烤箱的作用是可以烤制各种肉类！！！我不相信有不喜欢吃肉的男生！！烤箱真的是可以烤一切的肉类，熟悉之后会觉得非常简单。 '\n", " '我很久以前用烤箱做的最多的就是烤羊排和烤鸡翅，我老公不怎么吃羊肉和鸡翅。这个烤箱因为厨房放不下，被放在了餐厅，也就闲置了下来…… '\n", " '要说的事是，烤箱真的能给你做出很多不一样的美食，尤其是来了客人，在你两个灶台忙不过来的时候，烤箱特别适合准备一个荤素搭配的豪华大菜。在烹饪其他需要爆炒的菜肴的空档去处理一下就可以了。 '\n", " '总结来说理由如下： 1、如果你家是你做饭多，那么为什么有这么多话说，也不是他用，等着吃就好了。 '\n", " '2、工欲善其事，必先利其器。没有好的工具怎么能吃到更好的美食。 3、我要我喜欢，不要你喜欢。我还不能有个爱好吗？',\n", " 'SOURCE': 'Zhihu'}\n" ] } ], "source": [ "from modelscope import MsDataset\n", "from pprint import pprint\n", "\n", "ds = MsDataset.load('OmniData/Zhihu-KOL', cache_dir=\"data\", split='train')\n", "print(ds)\n", "pprint(ds[0])" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-12-23T11:54:05.942611Z", "iopub.status.busy": "2024-12-23T11:54:05.942272Z", "iopub.status.idle": "2024-12-23T11:54:10.671123Z", "shell.execute_reply": "2024-12-23T11:54:10.670576Z", "shell.execute_reply.started": "2024-12-23T11:54:05.942592Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done\n" ] } ], "source": [ "# 处理 metadata\n", "import json\n", "# load json\n", "metadata = list(map(lambda x: json.loads(x), ds['METADATA']))\n", "\n", "# 处理 upvotes \n", "vote_list = []\n", "for item in metadata:\n", " try:\n", " upvotes = item['upvotes'][3:]\n", " if not upvotes:\n", " votes = 0\n", " elif '万' in upvotes:\n", " votes = int(float(upvotes[:-2]) * 10000)\n", " else:\n", " votes = int(upvotes)\n", " except Exception as e:\n", " print(upvotes)\n", " votes = 0\n", " vote_list.append(votes)\n", "print(\"Done\")" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-11-11T07:00:57.127113Z", "iopub.status.busy": "2024-11-11T07:00:57.126851Z", "iopub.status.idle": "2024-11-11T07:01:33.374894Z", "shell.execute_reply": "2024-11-11T07:01:33.374455Z", "shell.execute_reply.started": "2024-11-11T07:00:57.127073Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "1006218\n" ] }, { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "

	query	response	upvotes
0	怎么说服男朋友买烤箱？	emmmmm，首先想说的是，我买厨房用品一般是不用「说服」的，只是在厨房堆的满满当当的情况下...	15
1	航天从业者是如何看待电视剧《你是我的荣耀》的？	难得有个关于航天的剧，职场情节悬不悬浮，航天设定和细节走不走心？带着放大镜看了前18集，...	4432
2	如何看待PayPal正式进入中国？	PayPal不仅是美国支付巨头，也是国际支付巨头，目前已开拓全球200多个市场，美国以外的市...	127
3	中金公司交易员月薪八万五是如何做到的？	1、首先，考虑到这位交易员的工作经验，月薪八万五的表述是不正确的：其实是一年的全部薪酬除以1...	450
4	摇滚乐（金属）给你们带来了什么？	ㄟ( ▔, ▔ )ㄏ哪里带来了什么东西啊，除了找到热爱的东西，也失去了很多。听重型现场像疯子...	5

\n", "

" ], "text/plain": [ " query response \n", "0 怎么说服男朋友买烤箱？ emmmmm，首先想说的是，我买厨房用品一般是不用「说服」的，只是在厨房堆的满满当当的情况下... \\\n", "1 航天从业者是如何看待电视剧《你是我的荣耀》的？难得有个关于航天的剧，职场情节悬不悬浮，航天设定和细节走不走心？带着放大镜看了前18集，... \n", "2 如何看待PayPal正式进入中国？ PayPal不仅是美国支付巨头，也是国际支付巨头，目前已开拓全球200多个市场，美国以外的市... \n", "3 中金公司交易员月薪八万五是如何做到的？ 1、首先，考虑到这位交易员的工作经验，月薪八万五的表述是不正确的：其实是一年的全部薪酬除以1... \n", "4 摇滚乐（金属）给你们带来了什么？ㄟ( ▔, ▔ )ㄏ哪里带来了什么东西啊，除了找到热爱的东西，也失去了很多。听重型现场像疯子... \n", "\n", " upvotes \n", "0 15 \n", "1 4432 \n", "2 127 \n", "3 450 \n", "4 5 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# 写入 jsonl 文件\n", "import pandas as pd\n", "\n", "df = pd.DataFrame.from_dict({\n", " 'query': ds['INSTRUCTION'],\n", " 'response': ds['RESPONSE'],\n", " 'upvotes': vote_list\n", "})\n", "\n", "print(len(df))\n", "\n", "df.to_json(\"data/zhihu.jsonl\", orient=\"records\", lines=True, force_ascii=False)\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用data-juicer进行数据清洗\n", "\n", "\n", "> Data-Juicer 是一个一站式多模态数据处理系统，旨在为大语言模型 (LLM) 提供更高质量、更丰富、更易“消化”的数据。设计简单易用，提供全面的文档、简易入门指南和演示配置，并且可以轻松地添加/删除现有配置中的算子。\n", "\n", "详细介绍：https://github.com/modelscope/data-juicer/blob/main/README_ZH.md\n", "\n", "\n", "### 1. 编写yaml配置文件\n", "\n", "支持的算子：https://github.com/modelscope/data-juicer/blob/main/docs/Operators_ZH.md\n", "\n", "| 类型 | 数量 | 描述 |\n", "|------------------------------------|:--:|---------------|\n", "| [ Formatter ]( #formatter ) | 7 | 发现、加载、规范化原始数据 |\n", "| [ Mapper ]( #mapper ) | 43 | 对数据样本进行编辑和转换 |\n", "| [ Filter ]( #filter ) | 41 | 过滤低质量样本 |\n", "| [ Deduplicator ]( #deduplicator ) | 5 | 识别、删除重复样本 |\n", "| [ Selector ]( #selector ) | 4 | 基于排序选取高质量样本 |\n", "\n", "在[全部算子的配置文件](https://github.com/modelscope/data-juicer/blob/main/configs/config_all.yaml)的基础上进行修改，编写如下配置文件：\n", "\n", "**请手动创建该`zhihu-bot.yaml`，放在当前目录下**\n", "\n", "```yaml\n", "\n", "# global parameters\n", "project_name: 'zhihu-process'\n", "dataset_path: 'data/zhihu.jsonl' # path to your dataset directory or file\n", "np: 16 # number of subprocess to process your dataset\n", "\n", "text_keys: 'response' # the key of text in your dataset file\n", "\n", "export_path: 'data/zhihu_refine.jsonl' # path to save processed dataset\n", "\n", "# process schedule\n", "# a list of several process operators with their arguments\n", "process:\n", " - specified_numeric_field_filter: # filter text with the specified numeric field info out of specific range\n", " field_key: 'upvotes' # the target key corresponding to multi-level field information need to be separated by '.'\n", " min_value: 500 # the min filter value in SpecifiedNumericField op\n", " - text_length_filter: # filter text with the length out of specific range\n", " min_len: 100\n", " max_len: 2000\n", "\n", " - clean_email_mapper: # remove emails from text.\n", " - clean_html_mapper: # remove html formats form text.\n", " - clean_ip_mapper: # remove ip addresses from text.\n", " - clean_links_mapper: # remove web links from text.\n", " - clean_copyright_mapper: # remove copyright comments. # fix unicode errors in text.\n", "\n", " - language_id_score_filter: # filter text in specific language with language scores larger than a specific max value\n", " lang: zh\n", " min_score: 0.9\n", " - alphanumeric_filter: # filter text with alphabet/numeric ratio out of specific range. \n", " tokenization: false\n", " min_ratio: 0.72\n", " - flagged_words_filter: # filter text with the flagged-word ratio larger than a specific max value\n", " lang: zh\n", " tokenization: false\n", " max_ratio: 0.0005 \n", " - perplexity_filter: # filter text with perplexity score out of specific range\n", " lang: zh\n", " max_ppl: 4000\n", " - special_characters_filter: # filter text with special-char ratio out of specific range\n", " max_ratio: 0.4 \n", " - document_simhash_deduplicator: # deduplicate texts with simhash\n", " tokenization: character\n", " window_size: 5 \n", " lowercase: false\n", " ignore_pattern: '\\p{P}'\n", " num_blocks: 10\n", " hamming_distance: 6 # larger hamming distance threshold for short texts\n", " - topk_specified_field_selector: # selector to select top samples based on the sorted specified field\n", " field_key: 'upvotes' # the target keys corresponding to multi-level field information need to be separated by '.'\n", " topk: 50000 # number of selected top sample\n", " reverse: True # determine the sorting rule, if reverse=True, then sort in descending order\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. 根据配置文件进行数据分析 " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!dj-analyze --config zhihu-bot.yaml " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 数据集分析结果\n", "\n", "- 箱型图\n", "- 直方图\n", "- 统计信息\n", "\n", "在`data/analysis`路径下\n", "\n", "| | alnum_ratio | flagged_words_ratio | lang | lang_score | perplexity | special_char_ratio | text_len |\n", "|:-------|--------------:|----------------------:|:----------|--------------:|---------------:|---------------------:|-----------------:|\n", "| count | 1.00622e+06 | 1.00622e+06 | 1006218.0 | 1.00622e+06 | 1.00622e+06 | 1.00622e+06 | 1.00622e+06 |\n", "| mean | 0.871938 | 1.28188e-05 | nan | 0.963631 | 2390 | 0.159879 | 717.802 |\n", "| std | 0.0793817 | 0.00120551 | nan | 0.0976119 | 4733.66 | 0.0878637 | 1666.89 |\n", "| min | 0 | 0 | nan | 0.0593122 | 0 | 0 | 1 |\n", "| 25% | 0.854922 | 0 | nan | 0.976512 | 1500.4 | 0.118577 | 61 |\n", "| 50% | 0.883008 | 0 | nan | 0.989479 | 2017.7 | 0.147059 | 236 |\n", "| 75% | 0.905219 | 0 | nan | 0.994992 | 2695.5 | 0.183099 | 764 |\n", "| max | 1 | 0.6 | nan | 1.00007 | 1.70447e+06 | 1 | 139406 |\n", "| unique | nan | nan | 99.0 | nan | nan | nan | nan |\n", "| top | nan | nan | zh | nan | nan | nan | nan |\n", "| freq | nan | nan | 990697.0 | nan | nan | nan | nan |\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 3. 调整配置文件进行数据处理\n", "\n", "这一步的数据处理包括：筛选、过滤、去重 \n", "\n", "根据分析得到的数据集特征，调整配置文件，再进行数据处理:\n", "\n", "- 数据处理3σ法则：若某个数据点超出均值±3σ的范围，通常被视为异常值\n", "- 先进行筛选，再过滤，能减少数据处理的时间" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "!dj-process --config zhihu-bot.yaml " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 4. 划分训练集和测试集" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-11-11T07:23:59.197224Z", "iopub.status.busy": "2024-11-11T07:23:59.196863Z", "iopub.status.idle": "2024-11-11T07:24:01.884007Z", "shell.execute_reply": "2024-11-11T07:24:01.883385Z", "shell.execute_reply.started": "2024-11-11T07:23:59.197203Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "45000\n", "5000\n" ] } ], "source": [ "import pandas as pd\n", "\n", "data = pd.read_json(\"data/zhihu_refine.jsonl\", lines=True)\n", "\n", "def split_data(data, save=False, suffix=''):\n", " # split data into train and test, 9: 1\n", " train_data = data.sample(frac=0.9, random_state=42)\n", " test_data = data.drop(train_data.index)\n", "\n", " if suffix:\n", " suffix = '_' + suffix\n", " if save:\n", " train_data.to_json(f\"data/zhihu_train{suffix}.jsonl\", orient='records', lines=True, force_ascii=False)\n", " test_data.to_json(f\"data/zhihu_test{suffix}.jsonl\", orient='records', lines=True, force_ascii=False)\n", " return train_data, test_data\n", "\n", "train_data, test_data = split_data(data, save=True)\n", "\n", "print(len(train_data))\n", "print(len(test_data))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 使用ms-swift训练模型\n", "\n", "\n", "> SWIFT支持300+ LLM和50+ MLLM（多模态大模型）的训练(预训练、微调、对齐)、推理、评测和部署。开发者可以直接将我们的框架应用到自己的Research和生产环境中，实现模型训练评测到应用的完整链路。我们除支持了PEFT提供的轻量训练方案外，也提供了一个完整的Adapters库以支持最新的训练技术，如NEFTune、LoRA+、LLaMA-PRO等，这个适配器库可以脱离训练脚本直接使用在自己的自定流程中。\n", "\n", "详细介绍：https://github.com/modelscope/ms-swift/blob/main/README_CN.md\n", "\n", "(可选)安装 flash-attention 加快推理速度：" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [] }, "outputs": [], "source": [ "%pip install flash-attn --no-build-isolation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 编写训练脚本\n", "命令行参数：https://swift.readthedocs.io/zh-cn/latest/Instruction/%E5%91%BD%E4%BB%A4%E8%A1%8C%E5%8F%82%E6%95%B0.html\n", "\n", "需要注意的参数有：\n", "\n", "1. dataset 可以混合一些通用数据集，防止模型灾难性遗忘和通用能力丢失\n", "2. system可以设置一个符合任务特性的system prompt，提升模型能力\n", "3. lora_target_modules可以根据训练任务的难易程度，调整可以训练的参数数量" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecutionIndicator": { "show": true }, "execution": { "iopub.execute_input": "2024-12-23T12:05:18.125022Z", "iopub.status.busy": "2024-12-23T12:05:18.124654Z", "iopub.status.idle": "2024-12-23T12:18:25.096372Z", "shell.execute_reply": "2024-12-23T12:18:25.095810Z", "shell.execute_reply.started": "2024-12-23T12:05:18.124995Z" }, "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "run sh: `/usr/local/bin/python /usr/local/lib/python3.10/site-packages/swift/cli/sft.py --train_type lora --model Qwen/Qwen2.5-0.5B-Instruct --dataset data/zhihu_train.jsonl#3000 AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese#3000 --system 你是一个真实的人类。回答用户的问题，并在聊天回复中展现你的情绪，包括开心，幸福，愤怒，伤心，阴阳怪气等 --output_dir output --target_modules all-linear --lora_rank 4 --torch_dtype bfloat16 --seed 42 --learning_rate 1e-4 --warmup_ratio 0.05 --max_length 1024 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --num_train_epochs 1 --gradient_accumulation_steps 4 --save_total_limit 10 --eval_steps 100 --save_steps 100`\n", "[INFO:swift] Successfully registered `/usr/local/lib/python3.10/site-packages/swift/llm/dataset/data/dataset_info.json`\n", "[INFO:swift] Successfully registered `[]`\n", "[INFO:swift] rank: -1, local_rank: -1, world_size: 1, local_world_size: 1\n", "[INFO:swift.hub.hub] Downloading the model from ModelScope Hub, model_id: Qwen/Qwen2.5-0.5B-Instruct\n", "Downloading Model to directory: /mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2.5-0.5B-Instruct\n", "[WARNING:modelscope] Using branch: master as version is unstable, use with caution\n", "[INFO:modelscope] Target directory already exists, skipping creation.\n", "[INFO:swift] Loading the model using model_dir: /mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2___5-0___5B-Instruct\n", "[INFO:swift] Setting args.lazy_tokenize: False\n", "/usr/local/lib/python3.10/site-packages/transformers/training_args.py:1575: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead\n", " warnings.warn(\n", "[INFO:swift] output_dir: output/v2-20241223-200525\n", "[INFO:swift] args: TrainArguments(\n", "_n_gpu=-1,\n", "acc_steps=1,\n", "acc_strategy=token,\n", "accelerator_config={'dispatch_batches': False},\n", "adafactor=False,\n", "adalora_beta1=0.85,\n", "adalora_beta2=0.85,\n", "adalora_deltaT=1,\n", "adalora_init_r=12,\n", "adalora_orth_reg_weight=0.5,\n", "adalora_target_r=8,\n", "adalora_tfinal=0,\n", "adalora_tinit=0,\n", "adam_beta1=0.9,\n", "adam_beta2=0.999,\n", "adam_epsilon=1e-08,\n", "adapter_act=gelu,\n", "adapter_length=128,\n", "adapters=[],\n", "add_version=True,\n", "attn_impl=None,\n", "auto_find_batch_size=False,\n", "average_tokens_across_devices=False,\n", "batch_eval_metrics=False,\n", "bf16=True,\n", "bf16_full_eval=False,\n", "bnb_4bit_compute_dtype=torch.bfloat16,\n", "bnb_4bit_quant_storage=None,\n", "bnb_4bit_quant_type=nf4,\n", "bnb_4bit_use_double_quant=True,\n", "boft_block_num=0,\n", "boft_block_size=4,\n", "boft_dropout=0.0,\n", "boft_n_butterfly_factor=1,\n", "check_model=True,\n", "ckpt_dir=None,\n", "custom_dataset_info=[],\n", "custom_register_path=[],\n", "data_seed=42,\n", "dataloader_drop_last=False,\n", "dataloader_num_workers=0,\n", "dataloader_persistent_workers=False,\n", "dataloader_pin_memory=True,\n", "dataloader_prefetch_factor=None,\n", "dataset=['data/zhihu_train.jsonl#3000', 'AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese#3000'],\n", "dataset_num_proc=1,\n", "ddp_backend=None,\n", "ddp_broadcast_buffers=None,\n", "ddp_bucket_cap_mb=None,\n", "ddp_find_unused_parameters=None,\n", "ddp_timeout=1800,\n", "debug=None,\n", "deepspeed=None,\n", "device_map=None,\n", "disable_tqdm=None,\n", "dispatch_batches=None,\n", "do_eval=False,\n", "do_predict=False,\n", "do_train=False,\n", "download_mode=reuse_dataset_if_exists,\n", "eval_accumulation_steps=None,\n", "eval_delay=0,\n", "eval_do_concat_batches=True,\n", "eval_on_start=False,\n", "eval_steps=100.0,\n", "eval_strategy=steps,\n", "eval_use_gather_object=False,\n", "evaluation_strategy=steps,\n", "fourier_n_frequency=2000,\n", "fourier_scaling=300.0,\n", "fp16=False,\n", "fp16_backend=auto,\n", "fp16_full_eval=False,\n", "fp16_opt_level=O1,\n", "freeze_aligner=True,\n", "freeze_llm=False,\n", "freeze_parameters=[],\n", "freeze_parameters_ratio=0.0,\n", "freeze_vit=True,\n", "fsdp=,\n", "fsdp_config=None,\n", "fsdp_min_num_params=0,\n", "fsdp_num=1,\n", "fsdp_transformer_layer_cls_to_wrap=None,\n", "full_determinism=False,\n", "galore_cos_threshold=0.4,\n", "galore_gamma_proj=2,\n", "galore_optim_per_parameter=False,\n", "galore_proj_bits=4,\n", "galore_proj_group_size=256,\n", "galore_proj_quant=False,\n", "galore_proj_type=std,\n", "galore_quantization=False,\n", "galore_queue_size=5,\n", "galore_rank=128,\n", "galore_scale=1.0,\n", "galore_target_modules=None,\n", "galore_update_proj_gap=50,\n", "galore_with_embedding=False,\n", "generation_config=None,\n", "generation_max_length=None,\n", "generation_num_beams=None,\n", "gradient_accumulation_steps=4,\n", "gradient_checkpointing=True,\n", "gradient_checkpointing_kwargs=None,\n", "greater_is_better=False,\n", "group_by_length=False,\n", "half_precision_backend=auto,\n", "hqq_axis=None,\n", "hub_always_push=False,\n", "hub_model_id=None,\n", "hub_private_repo=None,\n", "hub_strategy=every_save,\n", "hub_token=,\n", "ignore_args_error=False,\n", "ignore_data_skip=False,\n", "include_for_metrics=[],\n", "include_inputs_for_metrics=False,\n", "include_num_input_tokens_seen=False,\n", "include_tokens_per_second=False,\n", "init_weights=True,\n", "jit_mode_eval=False,\n", "label_names=None,\n", "label_smoothing_factor=0.0,\n", "lazy_tokenize=False,\n", "learning_rate=0.0001,\n", "length_column_name=length,\n", "lisa_activated_layers=0,\n", "lisa_step_interval=20,\n", "llamapro_num_groups=None,\n", "llamapro_num_new_blocks=4,\n", "load_args=True,\n", "load_best_model_at_end=False,\n", "load_data_args=False,\n", "load_dataset_config=None,\n", "load_from_cache_file=False,\n", "local_rank=-1,\n", "local_repo_path=None,\n", "log_level=passive,\n", "log_level_replica=warning,\n", "log_on_each_node=True,\n", "logging_dir=/mnt/workspace/output/v2-20241223-200525/runs,\n", "logging_first_step=True,\n", "logging_nan_inf_filter=True,\n", "logging_steps=5,\n", "logging_strategy=steps,\n", "lora_alpha=32,\n", "lora_bias=none,\n", "lora_dropout=0.05,\n", "lora_dtype=None,\n", "lora_ga_batch_size=2,\n", "lora_ga_direction=ArB2r,\n", "lora_ga_iters=2,\n", "lora_ga_max_length=1024,\n", "lora_ga_scale=stable,\n", "lora_ga_stable_gamma=16,\n", "lora_modules=[],\n", "lora_rank=4,\n", "lorap_lr_ratio=None,\n", "loss_scale=default,\n", "loss_type=None,\n", "lr_scheduler_kwargs=None,\n", "lr_scheduler_type=cosine,\n", "max_grad_norm=1.0,\n", "max_length=1024,\n", "max_new_tokens=64,\n", "max_pixels=None,\n", "max_steps=-1,\n", "metric=None,\n", "metric_for_best_model=loss,\n", "metric_warmup_step=0,\n", "model=Qwen/Qwen2.5-0.5B-Instruct,\n", "model_author=[None, None],\n", "model_kwargs={},\n", "model_layer_cls_name=None,\n", "model_name=[None, None],\n", "model_revision=None,\n", "model_type=qwen2_5,\n", "modules_to_save=[],\n", "mp_parameters=,\n", "neftune_noise_alpha=None,\n", "no_cuda=False,\n", "num_beams=1,\n", "num_labels=None,\n", "num_train_epochs=1.0,\n", "optim=adamw_torch,\n", "optim_args=None,\n", "optim_target_modules=None,\n", "optimizer=None,\n", "output_dir=/mnt/workspace/output/v2-20241223-200525,\n", "overwrite_output_dir=False,\n", "packing=False,\n", "past_index=-1,\n", "per_device_eval_batch_size=4,\n", "per_device_train_batch_size=4,\n", "predict_with_generate=False,\n", "prediction_loss_only=False,\n", "push_to_hub=False,\n", "push_to_hub_model_id=None,\n", "push_to_hub_organization=None,\n", "push_to_hub_token=,\n", "quant_bits=None,\n", "quant_method=None,\n", "ray_scope=last,\n", "reft_args=None,\n", "reft_intervention_type=LoreftIntervention,\n", "reft_layer_key=None,\n", "reft_layers=None,\n", "reft_rank=4,\n", "remove_unused_columns=False,\n", "repetition_penalty=None,\n", "report_to=['tensorboard'],\n", "restore_callback_states_from_checkpoint=False,\n", "resume_from_checkpoint=None,\n", "resume_only_model=False,\n", "rope_scaling=None,\n", "run_name=None,\n", "save_on_each_node=False,\n", "save_only_model=False,\n", "save_safetensors=True,\n", "save_steps=100.0,\n", "save_strategy=steps,\n", "save_total_limit=10,\n", "seed=42,\n", "sequence_parallel_size=1,\n", "skip_memory_metrics=True,\n", "sortish_sampler=False,\n", "split_batches=None,\n", "split_dataset_ratio=0.01,\n", "stop_words=[],\n", "stream=False,\n", "streaming=False,\n", "strict=False,\n", "system=你是一个真实的人类。回答用户的问题，并在聊天回复中展现你的情绪，包括开心，幸福，愤怒，伤心，阴阳怪气等,\n", "target_modules=['all-linear'],\n", "target_regex=None,\n", "temperature=0.0,\n", "template=qwen2_5,\n", "template_backend=swift,\n", "tf32=None,\n", "tools_prompt=react_en,\n", "top_k=None,\n", "top_p=None,\n", "torch_compile=False,\n", "torch_compile_backend=None,\n", "torch_compile_mode=None,\n", "torch_dtype=torch.bfloat16,\n", "torch_empty_cache_steps=None,\n", "torchdynamo=None,\n", "tpu_metrics_debug=False,\n", "tpu_num_cores=None,\n", "train_type=lora,\n", "trainable_parameters=[],\n", "truncation_strategy=delete,\n", "tuner_backend=peft,\n", "use_chat_template=True,\n", "use_cpu=False,\n", "use_dora=False,\n", "use_galore=False,\n", "use_hf=False,\n", "use_ipex=False,\n", "use_legacy_prediction_loop=False,\n", "use_liger=False,\n", "use_liger_kernel=False,\n", "use_mps_device=False,\n", "use_rslora=False,\n", "use_swift_lora=False,\n", "val_dataset=[],\n", "vera_d_initial=0.1,\n", "vera_dropout=0.0,\n", "vera_projection_prng_key=0,\n", "vera_rank=256,\n", "warmup_ratio=0.05,\n", "warmup_steps=0,\n", "weight_decay=0.1,\n", ")\n", "[INFO:swift] The TrainArguments will be saved in: /mnt/workspace/output/v2-20241223-200525/args.json\n", "[INFO:swift.hub.hub] Downloading the model from ModelScope Hub, model_id: Qwen/Qwen2.5-0.5B-Instruct\n", "Downloading Model to directory: /mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2.5-0.5B-Instruct\n", "[WARNING:modelscope] Using branch: master as version is unstable, use with caution\n", "[INFO:modelscope] Target directory already exists, skipping creation.\n", "[INFO:swift] Loading the model using model_dir: /mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2___5-0___5B-Instruct\n", "[INFO:swift] model_kwargs: {'device_map': 'cuda:0'}\n", "[INFO:swift] model.hf_device_map: {'': device(type='cuda', index=0)}\n", "[INFO:swift] model_info: ModelInfo(model_type='qwen2_5', model_dir='/mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2___5-0___5B-Instruct', torch_dtype=torch.bfloat16, max_model_len=32768, quant_method=None, quant_bits=None, config=Qwen2Config {\n", " \"_name_or_path\": \"/mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2___5-0___5B-Instruct\",\n", " \"architectures\": [\n", " \"Qwen2ForCausalLM\"\n", " ],\n", " \"attention_dropout\": 0.0,\n", " \"bos_token_id\": 151643,\n", " \"eos_token_id\": 151645,\n", " \"hidden_act\": \"silu\",\n", " \"hidden_size\": 896,\n", " \"initializer_range\": 0.02,\n", " \"intermediate_size\": 4864,\n", " \"max_position_embeddings\": 32768,\n", " \"max_window_layers\": 21,\n", " \"model_type\": \"qwen2\",\n", " \"num_attention_heads\": 14,\n", " \"num_hidden_layers\": 24,\n", " \"num_key_value_heads\": 2,\n", " \"rms_norm_eps\": 1e-06,\n", " \"rope_scaling\": null,\n", " \"rope_theta\": 1000000.0,\n", " \"sliding_window\": null,\n", " \"tie_word_embeddings\": true,\n", " \"torch_dtype\": \"bfloat16\",\n", " \"transformers_version\": \"4.47.0\",\n", " \"use_cache\": true,\n", " \"use_sliding_window\": false,\n", " \"vocab_size\": 151936\n", "}\n", ")\n", "[INFO:swift] model.generation_config: GenerationConfig {\n", " \"bos_token_id\": 151643,\n", " \"eos_token_id\": [\n", " 151645,\n", " 151643\n", " ],\n", " \"max_new_tokens\": 64,\n", " \"pad_token_id\": 151643,\n", " \"repetition_penalty\": 1.1\n", "}\n", "\n", "[INFO:swift] default_system: 你是一个真实的人类。回答用户的问题，并在聊天回复中展现你的情绪，包括开心，幸福，愤怒，伤心，阴阳怪气等\n", "[INFO:swift] Start time of running main: 2024-12-23 20:05:26.748661\n", "[INFO:swift] Global seed set to 42\n", "Map: 100%|██████████████████████| 45000/45000 [00:00<00:00, 57332.53 examples/s]\n", "[INFO:swift] Downloading the dataset from ModelScope, dataset_id: AI-ModelScope/Magpie-Qwen2-Pro-200K-Chinese\n", "Map: 100%|████████████████████| 200000/200000 [00:18<00:00, 10793.79 examples/s]\n", "[INFO:swift] train_dataset: Dataset({\n", " features: ['messages'],\n", " num_rows: 5940\n", "})\n", "[INFO:swift] val_dataset: Dataset({\n", " features: ['messages'],\n", " num_rows: 60\n", "})\n", "Map: 0%| | 0/5940 [00:00system\n", "你是一个真实的人类。回答用户的问题，并在聊天回复中展现你的情绪，包括开心，幸福，愤怒，伤心，阴阳怪气等<|im_end|>\n", "<|im_start|>user\n", "抗美援朝时期，中国人民志愿军的作战实力有多强？<|im_end|>\n", "<|im_start|>assistant\n", "可以看看美国高级将领对志愿军的评价：美国国防部长马歇尔：中国共军是一个幽灵，连个影子都没有。麦克阿瑟：中国军队常常避开大路，利用山岭，丘陵为接近路，他们总是插入我纵深发起攻击。美第八军军长泰勒：敌人是非常的狡猾，他们很会利用战术，以来减低我们的火力优势，其方法是在黑暗中接近我们的阵地，然后和我们紧缠在一起，是我们无法要求炮兵射击和空中攻击，否则就有同归于尽的危险。范佛里特：以个人而论，中国士兵是一个顽强的敌人，他们在基层三人小组中经常单独作战。但是，他们永远是向前作战的，奋不顾身，有时渗透到我们防线后方，令我们束手无策。李奇微：中国人在夜晚进攻特别神秘莫测，不可思议。中国人是勇士，他们常常不顾伤亡地发起进攻。顺便提一下李奇微对韩国军队的评价：南朝鲜军队缺乏得力的领导，他们在中国军队的打击下损失惨重，往往对中国军队有非常的畏惧心理，几乎把这些人看成了天兵天将。脚踏胶底鞋的中共士兵如果突然出现在南朝鲜军队的阵地上，总是把许多南朝鲜士兵吓得头也不回的飞快逃命，他们没有秩序，丢掉武器，没有领导，完全是在全面败退。再从另外一个角度，看看我们的对手有多强，当时的美军拥有世界上最先进，数量最多的战机。美军的陆军配备的火力为世界之最，一个步兵师便装备了坦克140辆，70毫米火炮330门，拥有大量的汽车和装甲车辆，单兵武器十分先进，可以说是武装到了牙齿，是当时的世界第一重步兵。可志愿军竟然能在初期几乎完全没有制空权的情况下将美军赶回三八线，真的是不愧世界第一轻步兵之名啊！<|im_end|>\n", "[INFO:swift] [LABELS_IDS] [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 73670, 101997, 100625, 104112, 116536, 32664, 101411, 99292, 9370, 103964, 5122, 10236, 122, 236, 28404, 103140, 101662, 99313, 106408, 99079, 5122, 58695, 54899, 99292, 101909, 102517, 99677, 3837, 54926, 18947, 57222, 44729, 104338, 1773, 18137, 118, 99, 99316, 99727, 107318, 5122, 58695, 105591, 104495, 109595, 26288, 45995, 3837, 100152, 57811, 102288, 3837, 105697, 102221, 17714, 104469, 45995, 3837, 99650, 104014, 114731, 35946, 117530, 105342, 102109, 1773, 10236, 122, 236, 107695, 99292, 99292, 45861, 100396, 104240, 5122, 105076, 104771, 9370, 119360, 119440, 3837, 99650, 99165, 36993, 100152, 106814, 3837, 99919, 99536, 99285, 103952, 114763, 100661, 3837, 41146, 39907, 101219, 105913, 15946, 104469, 103952, 108910, 3837, 101889, 33108, 97639, 99378, 103073, 104080, 3837, 105075, 101068, 101882, 104444, 99807, 111724, 33108, 105438, 102109, 3837, 104420, 104435, 40916, 100040, 34204, 99739, 9370, 104264, 1773, 8908, 234, 225, 99820, 69249, 65278, 5122, 23031, 99605, 68536, 67831, 3837, 58695, 106598, 101909, 110981, 9370, 105076, 3837, 107469, 104062, 106569, 101306, 15946, 101942, 105818, 105685, 1773, 100131, 3837, 99650, 102099, 20412, 106315, 105685, 9370, 3837, 99870, 108583, 95256, 3837, 104685, 106694, 26939, 97639, 112270, 33447, 23384, 3837, 99738, 97639, 62963, 44934, 42192, 99560, 1773, 60596, 236, 99309, 48934, 5122, 105165, 18493, 108036, 105695, 100654, 105190, 100707, 99507, 3837, 113325, 1773, 105165, 20412, 107745, 3837, 99650, 104495, 108583, 112344, 29490, 105342, 105695, 1773, 220, 111969, 28072, 100158, 100339, 99309, 48934, 32664, 102083, 105591, 9370, 103964, 5122, 58463, 106449, 105591, 104250, 49828, 47534, 9370, 99728, 3837, 99650, 104377, 105591, 9370, 102262, 16872, 102170, 105003, 29258, 3837, 101207, 107052, 105591, 18830, 106299, 117272, 100438, 3837, 100740, 99360, 107080, 50930, 100215, 35727, 99807, 35727, 44063, 1773, 8908, 226, 248, 100875, 100773, 99413, 102097, 9370, 102771, 106598, 62244, 103961, 105360, 58463, 106449, 105591, 9370, 99854, 102605, 3837, 104014, 99360, 100694, 58463, 106449, 106598, 114446, 64355, 99744, 18397, 9370, 99723, 99234, 100248, 50509, 3837, 99650, 80443, 104951, 3837, 101527, 100373, 104729, 3837, 80443, 99728, 3837, 100372, 101219, 100011, 21125, 55806, 1773, 68739, 235, 45181, 109754, 100884, 3837, 101997, 103952, 102304, 101534, 99193, 3837, 107222, 109558, 103926, 111012, 100455, 3837, 81800, 106654, 112124, 1773, 10236, 122, 236, 99292, 9370, 115089, 102578, 9370, 114763, 17714, 99489, 53930, 31235, 3837, 46944, 64682, 99807, 99235, 99364, 101076, 34187, 108961, 16, 19, 15, 100408, 3837, 22, 15, 106874, 79599, 104444, 18, 18, 15, 64689, 3837, 103926, 101514, 100343, 33108, 117031, 103978, 3837, 23990, 99807, 104729, 101918, 100455, 3837, 106284, 105031, 99495, 107453, 3837, 20412, 101075, 105484, 99363, 29258, 64682, 99807, 1773, 26853, 107, 101411, 99292, 104230, 106792, 105225, 100740, 113544, 43316, 34794, 40981, 104248, 44063, 109558, 99970, 18397, 44991, 99568, 43268, 3837, 105093, 16530, 103521, 116335, 99578, 64682, 99807, 53930, 13072, 103924, 6313, 151645]\n", "[INFO:swift] [LABELS] [-100 * 59]可以看看美国高级将领对志愿军的评价：美国国防部长马歇尔：中国共军是一个幽灵，连个影子都没有。麦克阿瑟：中国军队常常避开大路，利用山岭，丘陵为接近路，他们总是插入我纵深发起攻击。美第八军军长泰勒：敌人是非常的狡猾，他们很会利用战术，以来减低我们的火力优势，其方法是在黑暗中接近我们的阵地，然后和我们紧缠在一起，是我们无法要求炮兵射击和空中攻击，否则就有同归于尽的危险。范佛里特：以个人而论，中国士兵是一个顽强的敌人，他们在基层三人小组中经常单独作战。但是，他们永远是向前作战的，奋不顾身，有时渗透到我们防线后方，令我们束手无策。李奇微：中国人在夜晚进攻特别神秘莫测，不可思议。中国人是勇士，他们常常不顾伤亡地发起进攻。顺便提一下李奇微对韩国军队的评价：南朝鲜军队缺乏得力的领导，他们在中国军队的打击下损失惨重，往往对中国军队有非常的畏惧心理，几乎把这些人看成了天兵天将。脚踏胶底鞋的中共士兵如果突然出现在南朝鲜军队的阵地上，总是把许多南朝鲜士兵吓得头也不回的飞快逃命，他们没有秩序，丢掉武器，没有领导，完全是在全面败退。再从另外一个角度，看看我们的对手有多强，当时的美军拥有世界上最先进，数量最多的战机。美军的陆军配备的火力为世界之最，一个步兵师便装备了坦克140辆，70毫米火炮330门，拥有大量的汽车和装甲车辆，单兵武器十分先进，可以说是武装到了牙齿，是当时的世界第一重步兵。可志愿军竟然能在初期几乎完全没有制空权的情况下将美军赶回三八线，真的是不愧世界第一轻步兵之名啊！<|im_end|>\n", "Map: 100%|█████████████████████████| 5452/5452 [00:02<00:00, 2710.78 examples/s]\n", "[INFO:swift] Dataset Token Length: 467.851064±204.358151, min=86.000000, max=1022.000000, size=5452\n", "Map: 100%|█████████████████████████████| 56/56 [00:00<00:00, 2713.69 examples/s]\n", "[INFO:swift] Dataset Token Length: 431.821429±189.684993, min=141.000000, max=820.000000, size=56\n", "[INFO:swift] lora_config: LoraConfig(peft_type=, auto_mapping=None, base_model_name_or_path='/mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2___5-0___5B-Instruct', revision=None, task_type='CAUSAL_LM', inference_mode=False, r=4, target_modules={'down_proj', 'up_proj', 'q_proj', 'o_proj', 'v_proj', 'gate_proj', 'k_proj'}, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=[], init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_dtype=None, lorap_lr_ratio=None, lorap_emb_lr=1e-06)\n", "[INFO:swift] model: PeftModelForCausalLM(\n", " (base_model): LoraModel(\n", " (model): Qwen2ForCausalLM(\n", " (model): Qwen2Model(\n", " (embed_tokens): Embedding(151936, 896)\n", " (layers): ModuleList(\n", " (0-23): 24 x Qwen2DecoderLayer(\n", " (self_attn): Qwen2SdpaAttention(\n", " (q_proj): lora.Linear(\n", " (base_layer): Linear(in_features=896, out_features=896, bias=True)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=896, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=896, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (k_proj): lora.Linear(\n", " (base_layer): Linear(in_features=896, out_features=128, bias=True)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=896, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=128, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (v_proj): lora.Linear(\n", " (base_layer): Linear(in_features=896, out_features=128, bias=True)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=896, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=128, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (o_proj): lora.Linear(\n", " (base_layer): Linear(in_features=896, out_features=896, bias=False)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=896, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=896, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (rotary_emb): Qwen2RotaryEmbedding()\n", " )\n", " (mlp): Qwen2MLP(\n", " (gate_proj): lora.Linear(\n", " (base_layer): Linear(in_features=896, out_features=4864, bias=False)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=896, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=4864, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (up_proj): lora.Linear(\n", " (base_layer): Linear(in_features=896, out_features=4864, bias=False)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=896, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=4864, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (down_proj): lora.Linear(\n", " (base_layer): Linear(in_features=4864, out_features=896, bias=False)\n", " (lora_dropout): ModuleDict(\n", " (default): Dropout(p=0.05, inplace=False)\n", " )\n", " (lora_A): ModuleDict(\n", " (default): Linear(in_features=4864, out_features=4, bias=False)\n", " )\n", " (lora_B): ModuleDict(\n", " (default): Linear(in_features=4, out_features=896, bias=False)\n", " )\n", " (lora_embedding_A): ParameterDict()\n", " (lora_embedding_B): ParameterDict()\n", " (lora_magnitude_vector): ModuleDict()\n", " )\n", " (act_fn): SiLU()\n", " )\n", " (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)\n", " (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)\n", " )\n", " )\n", " (norm): Qwen2RMSNorm((896,), eps=1e-06)\n", " (rotary_emb): Qwen2RotaryEmbedding()\n", " )\n", " (lm_head): Linear(in_features=896, out_features=151936, bias=False)\n", " )\n", " )\n", ")\n", "[INFO:swift] model_parameter_info: PeftModelForCausalLM: 496.2323M Params (2.1996M Trainable [0.4433%]), 0.0008M Buffers.\n", "[ERROR:modelscope] The request model: Qwen/Qwen2___5-0___5B-Instruct does not exist!\n", "[ERROR:modelscope] The request model: Qwen/Qwen2___5-0___5B-Instruct does not exist!\n", "/usr/local/lib/python3.10/site-packages/swift/trainers/mixin.py:77: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Seq2SeqTrainer.__init__`. Use `processing_class` instead.\n", " super().__init__(\n", "Detected kernel version 4.19.91, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.\n", "[2024-12-23 20:06:07,523] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)\n", "df: /root/.triton/autotune: 没有那个文件或目录\n", "[INFO:swift] The logging file will be saved in: /mnt/workspace/output/v2-20241223-200525/logging.jsonl\n", "Train: 0%| | 0/340 [00:00 EvalScope是魔搭社区官方推出的模型评测与性能基准测试框架，专为多样化的模型评估需求而设计。它支持广泛的模型类型，包括但不限于大语言模型、多模态模型、Embedding 模型、Reranker 模型和 CLIP 模型。\n", "\n", "详细介绍：https://github.com/modelscope/evalscope/blob/main/README_zh.md\n", "\n", "\n", " \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1. 评估模型通用能力\n", "\n", "EvalScope 集成了多个数据集，可以用来评测模型的通用能力，包括数学能力、推理能力等，下面我们使用ARC数据集测试模型的推理能力" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "execution": { "iopub.execute_input": "2024-12-23T12:22:02.879035Z", "iopub.status.busy": "2024-12-23T12:22:02.878675Z", "iopub.status.idle": "2024-12-23T12:23:34.038884Z", "shell.execute_reply": "2024-12-23T12:23:34.038295Z", "shell.execute_reply.started": "2024-12-23T12:22:02.879015Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2024-12-23 20:22:04,539 - datasets - INFO - PyTorch version 2.4.0 available.\n", "2024-12-23 20:22:04,540 - datasets - INFO - Polars version 1.16.0 available.\n", "2024-12-23 20:22:06,854 - evalscope - INFO - Args: Task config is provided with CommandLine type.\n", "2024-12-23 20:22:06,890 - evalscope - INFO - Dump task config to ./outputs/20241223_202206/configs/task_config_16e22f.yaml\n", "2024-12-23 20:22:06,903 - evalscope - INFO - {\n", " \"model\": \"output/v2-20241223-200525/checkpoint-340-merged\",\n", " \"model_id\": \"checkpoint-340-merged\",\n", " \"model_args\": {\n", " \"revision\": \"master\",\n", " \"precision\": \"torch.float16\",\n", " \"device\": \"auto\"\n", " },\n", " \"template_type\": null,\n", " \"chat_template\": null,\n", " \"datasets\": [\n", " \"arc\"\n", " ],\n", " \"dataset_args\": {},\n", " \"dataset_dir\": \"/mnt/workspace/.cache/modelscope/datasets\",\n", " \"dataset_hub\": \"modelscope\",\n", " \"generation_config\": {\n", " \"max_length\": 2048,\n", " \"max_new_tokens\": 512,\n", " \"do_sample\": false,\n", " \"top_k\": 50,\n", " \"top_p\": 1.0,\n", " \"temperature\": 1.0\n", " },\n", " \"eval_type\": \"checkpoint\",\n", " \"eval_backend\": \"Native\",\n", " \"eval_config\": null,\n", " \"stage\": \"all\",\n", " \"limit\": null,\n", " \"mem_cache\": false,\n", " \"use_cache\": null,\n", " \"work_dir\": \"./outputs/20241223_202206\",\n", " \"outputs\": null,\n", " \"debug\": false,\n", " \"dry_run\": false,\n", " \"seed\": 42\n", "}\n", "2024-12-23 20:22:07,209 - evalscope - WARNING - Device: cuda\n", "2024-12-23 20:22:09,622 - accelerate.utils.modeling - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).\n", "2024-12-23 20:22:10,703 - evalscope - INFO - Set 0-shot examples by system for ARC.\n", "2024-12-23 20:22:10,703 - evalscope - INFO - Evaluating on subsets for arc: ['ARC-Challenge']\n", "\n", "2024-12-23 20:22:10,707 - evalscope - INFO - Loading dataset from modelscope hub: >dataset_name: modelscope/ai2_arc\n", "Downloading [5692ab6c356f623d6ca97c54781cf941e21fdee656a692cff82a7890b07c612b]: \n", "2024-12-23 20:22:13,040 - modelscope - INFO - storing https://www.modelscope.cn/api/v1/datasets/modelscope/ai2_arc/repo?Source=SDK&Revision=master&FilePath=ai2_arc.py&View=False in cache at /mnt/workspace/.cache/modelscope/hub/datasets/5692ab6c356f623d6ca97c54781cf941e21fdee656a692cff82a7890b07c612b\n", "2024-12-23 20:22:13,052 - modelscope - INFO - creating metadata file for /mnt/workspace/.cache/modelscope/hub/datasets/5692ab6c356f623d6ca97c54781cf941e21fdee656a692cff82a7890b07c612b\n", "Downloading [a2ea2a26c2081c38a57f38f5834e36e9e497572e00ed541e522ff07a796f5869]: \n", "2024-12-23 20:22:13,795 - modelscope - INFO - storing https://www.modelscope.cn/api/v1/datasets/modelscope/ai2_arc/repo?Source=SDK&Revision=master&FilePath=README.md&View=False in cache at /mnt/workspace/.cache/modelscope/hub/datasets/a2ea2a26c2081c38a57f38f5834e36e9e497572e00ed541e522ff07a796f5869\n", "2024-12-23 20:22:13,809 - modelscope - INFO - creating metadata file for /mnt/workspace/.cache/modelscope/hub/datasets/a2ea2a26c2081c38a57f38f5834e36e9e497572e00ed541e522ff07a796f5869\n", "Downloading data: 100%|██████████████████████| 681M/681M [00:18<00:00, 37.3MB/s]\n", "2024-12-23 20:22:33,080 - modelscope - INFO - storing https://modelscope.oss-cn-beijing.aliyuncs.com/open_data/arc/ARC-V1-Feb2018.zip in cache at /mnt/workspace/.cache/modelscope/datasets/downloads/63804bb34989778ae8da039c38ae30f80f93c8afbe865ae0d83b80afd46f2b6b\n", "2024-12-23 20:22:33,091 - modelscope - INFO - creating metadata file for /mnt/workspace/.cache/modelscope/datasets/downloads/63804bb34989778ae8da039c38ae30f80f93c8afbe865ae0d83b80afd46f2b6b\n", "Generating train split: 1119 examples [00:00, 11927.54 examples/s]\n", "Generating test split: 1172 examples [00:00, 14851.53 examples/s]\n", "Generating validation split: 299 examples [00:00, 9552.11 examples/s]\n", "2024-12-23 20:22:49,122 - evalscope - INFO - Use default settings: > few_shot_num: 0, > few_shot_split: train, > target_eval_split: test\n", "2024-12-23 20:22:49,185 - evalscope - INFO - **** Start evaluating on dataset modelscope/ai2_arc ****\n", "Predicting(ARC-Challenge): 100%|████████████| 1172/1172 [00:34<00:00, 34.46it/s]\n", "2024-12-23 20:23:23,215 - evalscope - INFO - Dump predictions to ./outputs/20241223_202206/predictions/checkpoint-340-merged/ai2_arc_ARC-Challenge.jsonl.\n", "Reviewing(ARC-Challenge): 100%|████████████| 1172/1172 [00:09<00:00, 118.48it/s]\n", "2024-12-23 20:23:33,155 - evalscope - INFO - Dump report: ./outputs/20241223_202206/reports/checkpoint-340-merged/ai2_arc.json \n", "\n", "2024-12-23 20:23:33,166 - evalscope - INFO - Report table: \n", "+-----------------------+------------------------------------------+\n", "| Model | ai2_arc |\n", "+=======================+==========================================+\n", "| checkpoint-340-merged | (ai2_arc/WeightedAverageAccuracy) 0.4983 |\n", "+-----------------------+------------------------------------------+ \n", "\n", "2024-12-23 20:23:33,166 - evalscope - INFO - **** Evaluation finished on modelscope/ai2_arc ****\n", "\n" ] } ], "source": [ "!evalscope eval \\\n", " --model output/v2-20241223-200525/checkpoint-340-merged \\\n", " --datasets arc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "下面看一下原始模型的推理能力" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "execution": { "iopub.execute_input": "2024-12-23T12:24:31.758983Z", "iopub.status.busy": "2024-12-23T12:24:31.758628Z", "iopub.status.idle": "2024-12-23T12:25:28.434419Z", "shell.execute_reply": "2024-12-23T12:25:28.433709Z", "shell.execute_reply.started": "2024-12-23T12:24:31.758963Z" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2024-12-23 20:24:33,485 - datasets - INFO - PyTorch version 2.4.0 available.\n", "2024-12-23 20:24:33,485 - datasets - INFO - Polars version 1.16.0 available.\n", "2024-12-23 20:24:36,022 - evalscope - INFO - Args: Task config is provided with CommandLine type.\n", "2024-12-23 20:24:36,049 - evalscope - INFO - Dump task config to ./outputs/20241223_202436/configs/task_config_a88c00.yaml\n", "2024-12-23 20:24:36,062 - evalscope - INFO - {\n", " \"model\": \"Qwen/Qwen2.5-0.5B-Instruct\",\n", " \"model_id\": \"Qwen2.5-0.5B-Instruct\",\n", " \"model_args\": {\n", " \"revision\": \"master\",\n", " \"precision\": \"torch.float16\",\n", " \"device\": \"auto\"\n", " },\n", " \"template_type\": null,\n", " \"chat_template\": null,\n", " \"datasets\": [\n", " \"arc\"\n", " ],\n", " \"dataset_args\": {},\n", " \"dataset_dir\": \"/mnt/workspace/.cache/modelscope/datasets\",\n", " \"dataset_hub\": \"modelscope\",\n", " \"generation_config\": {\n", " \"max_length\": 2048,\n", " \"max_new_tokens\": 512,\n", " \"do_sample\": false,\n", " \"top_k\": 50,\n", " \"top_p\": 1.0,\n", " \"temperature\": 1.0\n", " },\n", " \"eval_type\": \"checkpoint\",\n", " \"eval_backend\": \"Native\",\n", " \"eval_config\": null,\n", " \"stage\": \"all\",\n", " \"limit\": null,\n", " \"mem_cache\": false,\n", " \"use_cache\": null,\n", " \"work_dir\": \"./outputs/20241223_202436\",\n", " \"outputs\": null,\n", " \"debug\": false,\n", " \"dry_run\": false,\n", " \"seed\": 42\n", "}\n", "2024-12-23 20:24:36,275 - evalscope - WARNING - Device: cuda\n", "Downloading Model to directory: /mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2.5-0.5B-Instruct\n", "2024-12-23 20:24:36,638 - modelscope - WARNING - Using branch: master as version is unstable, use with caution\n", "2024-12-23 20:24:37,002 - modelscope - INFO - Target directory already exists, skipping creation.\n", "Downloading Model to directory: /mnt/workspace/.cache/modelscope/hub/Qwen/Qwen2.5-0.5B-Instruct\n", "2024-12-23 20:24:37,618 - modelscope - WARNING - Using branch: master as version is unstable, use with caution\n", "2024-12-23 20:24:37,953 - modelscope - INFO - Target directory already exists, skipping creation.\n", "2024-12-23 20:24:38,083 - accelerate.utils.modeling - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).\n", "2024-12-23 20:24:38,621 - evalscope - INFO - Set 0-shot examples by system for ARC.\n", "2024-12-23 20:24:38,621 - evalscope - INFO - Evaluating on subsets for arc: ['ARC-Challenge']\n", "\n", "2024-12-23 20:24:38,623 - evalscope - INFO - Loading dataset from modelscope hub: >dataset_name: modelscope/ai2_arc\n", "2024-12-23 20:24:43,950 - evalscope - INFO - Use default settings: > few_shot_num: 0, > few_shot_split: train, > target_eval_split: test\n", "2024-12-23 20:24:44,012 - evalscope - INFO - **** Start evaluating on dataset modelscope/ai2_arc ****\n", "Predicting(ARC-Challenge): 100%|████████████| 1172/1172 [00:32<00:00, 35.53it/s]\n", "2024-12-23 20:25:17,011 - evalscope - INFO - Dump predictions to ./outputs/20241223_202436/predictions/Qwen2.5-0.5B-Instruct/ai2_arc_ARC-Challenge.jsonl.\n", "Reviewing(ARC-Challenge): 100%|████████████| 1172/1172 [00:10<00:00, 111.46it/s]\n", "2024-12-23 20:25:27,562 - evalscope - INFO - Dump report: ./outputs/20241223_202436/reports/Qwen2.5-0.5B-Instruct/ai2_arc.json \n", "\n", "2024-12-23 20:25:27,574 - evalscope - INFO - Report table: \n", "+-----------------------+------------------------------------------+\n", "| Model | ai2_arc |\n", "+=======================+==========================================+\n", "| Qwen2.5-0.5B-Instruct | (ai2_arc/WeightedAverageAccuracy) 0.5137 |\n", "+-----------------------+------------------------------------------+ \n", "\n", "2024-12-23 20:25:27,574 - evalscope - INFO - **** Evaluation finished on modelscope/ai2_arc ****\n", "\n" ] } ], "source": [ "!evalscope eval \\\n", " --model Qwen/Qwen2.5-0.5B-Instruct \\\n", " --datasets arc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "可以看出，通过自定义微调后模型的推理能力有些许下降，也可以接受。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 2. 自定义数据集评估\n", "\n", "使用general qa模版自定义评估数据集\n", "\n", "**评估指标：**\n", "- bleu：比较生成文本和参考文本中的n-gram（n个连续单词的序列）。常见的n有1（unigram）、2（bigram）、3（trigram）等。\n", "- rouge：侧重于召回率（recall）\n", "\n", "**数据格式：**\n", "\n", "需要query和response两个字段，例如：\n", "```json\n", "{\n", " \"query\": \"什么是机器学习？\",\n", " \"response\": \"机器学习（Machine Learning）是计算机科学的一个分支，它研究计算机如何根据已有的例子来学习，从而实现对未知数据的预测和分类。\"\n", "}\n", "``` " ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "ExecutionIndicator": { "show": true }, "execution": { "iopub.execute_input": "2024-12-23T12:36:07.539433Z", "iopub.status.busy": "2024-12-23T12:36:07.539075Z", "iopub.status.idle": "2024-12-23T12:37:28.685250Z", "shell.execute_reply": "2024-12-23T12:37:28.684620Z", "shell.execute_reply.started": "2024-12-23T12:36:07.539408Z" }, "scrolled": true, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2024-12-23 20:36:09,212 - datasets - INFO - PyTorch version 2.4.0 available.\n", "2024-12-23 20:36:09,213 - datasets - INFO - Polars version 1.16.0 available.\n", "2024-12-23 20:36:11,516 - evalscope - INFO - Args: Task config is provided with CommandLine type.\n", "2024-12-23 20:36:11,544 - evalscope - INFO - Dump task config to ./outputs/20241223_203611/configs/task_config_ba7e37.yaml\n", "2024-12-23 20:36:11,558 - evalscope - INFO - {\n", " \"model\": \"output/v2-20241223-200525/checkpoint-340-merged\",\n", " \"model_id\": \"checkpoint-340-merged\",\n", " \"model_args\": {\n", " \"revision\": \"master\",\n", " \"precision\": \"torch.float16\",\n", " \"device\": \"auto\"\n", " },\n", " \"template_type\": null,\n", " \"chat_template\": null,\n", " \"datasets\": [\n", " \"general_qa\"\n", " ],\n", " \"dataset_args\": {\n", " \"general_qa\": {\n", " \"local_path\": \"data\",\n", " \"subset_list\": [\n", " \"zhihu_test.jsonl\"\n", " ]\n", " }\n", " },\n", " \"dataset_dir\": \"/mnt/workspace/.cache/modelscope/datasets\",\n", " \"dataset_hub\": \"modelscope\",\n", " \"generation_config\": {\n", " \"max_length\": 2048,\n", " \"max_new_tokens\": 512,\n", " \"do_sample\": false,\n", " \"top_k\": 50,\n", " \"top_p\": 1.0,\n", " \"temperature\": 1.0\n", " },\n", " \"eval_type\": \"checkpoint\",\n", " \"eval_backend\": \"Native\",\n", " \"eval_config\": null,\n", " \"stage\": \"all\",\n", " \"limit\": 10,\n", " \"mem_cache\": false,\n", " \"use_cache\": null,\n", " \"work_dir\": \"./outputs/20241223_203611\",\n", " \"outputs\": null,\n", " \"debug\": false,\n", " \"dry_run\": false,\n", " \"seed\": 42\n", "}\n", "2024-12-23 20:36:11,770 - evalscope - INFO - /root/nltk_data/tokenizers/punkt_tab.zip already exists, skipping download\n", "2024-12-23 20:36:11,770 - evalscope - WARNING - Device: cuda\n", "2024-12-23 20:36:12,269 - accelerate.utils.modeling - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).\n", "2024-12-23 20:36:12,885 - evalscope - WARNING - Got local model dir: output/v2-20241223-200525/checkpoint-340-merged\n", "2024-12-23 20:36:12,885 - evalscope - INFO - Updating generation config ...\n", "2024-12-23 20:36:12,885 - evalscope - INFO - Evaluating on subsets for general_qa: ['zhihu_test.jsonl']\n", "\n", "2024-12-23 20:36:25,514 - evalscope - INFO - Use default settings: > few_shot_num: None, > few_shot_split: None, > target_eval_split: test\n", "2024-12-23 20:36:28,584 - evalscope - INFO - **** Start evaluating on dataset data ****\n", "Predicting(default): 100%|██████████████████████| 10/10 [00:49<00:00, 4.94s/it]\n", "2024-12-23 20:37:17,997 - evalscope - INFO - Dump predictions to ./outputs/20241223_203611/predictions/checkpoint-340-merged/data_default.jsonl.\n", "Reviewing(default): 0%| | 0/10 [00:00访问令牌获取'\n", "\n", "api = HubApi()\n", "api.login(YOUR_ACCESS_TOKEN)\n", "api.push_model(\n", " model_id=\"YOUR_NAME/zhihu_bot_lora\", \n", " model_dir=\"output/qwen2-7b-instruct/v1-20240819-150005/checkpoint-371\" # 本地模型目录，要求目录中必须包含configuration.json\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 4 }