{ "cells": [ { "cell_type": "markdown", "id": "545d855c-e318-4645-9650-cdff4f48b8e7", "metadata": {}, "source": [ "# 04.从训练到推理" ] }, { "cell_type": "markdown", "id": "4c70a876-3b21-4547-bf26-da972d87764d", "metadata": {}, "source": [ "## 一、预训练阶段,模型到底在做什么?\n", "\n", "在整个大模型生命周期中,**预训练(Pre-training)是最基础、最核心、也是最容易被误解的阶段**。\n", "\n", "在这个阶段,大语言模型并不是在学习事实、规则、世界观或推理方法,而是在完成一个**极其单一、却被重复了数万亿次的任务**,即\n", "\n", "> **给定一段上下文,预测下一个 Token 的概率分布。**\n" ] }, { "cell_type": "markdown", "id": "e786aff2-cbab-4284-9874-615133c1db30", "metadata": {}, "source": [ "### 1. 预训练任务的唯一目标\n", "\n", "所有输入文本——无论是小说、法律条文、论文、代码、网页、聊天记录——在进入模型之前,都会被统一处理为 **Token 序列**:\n", "\n", "```text\n", "Token₁, Token₂, Token₃, … → 预测 Token₄\n", "```\n", "\n", "模型不会被告知:\n", "\n", "* 这是不是事实\n", "* 这是不是规则\n", "* 这是不是一个“正确答案”\n", "* 这是不是在“推理”\n", "\n", "它只是在反复学习一件事:在这种上下文形态下,最可能接下来写什么。" ] }, { "cell_type": "markdown", "id": "397c3f8a-2204-46d2-8a3f-4893001a6ebe", "metadata": {}, "source": [ "### 2. 什么叫「预训练」?\n", "\n", "**预训练的定义可以精确表述为:**\n", "\n", "> 在不区分任务、不区分领域、不引入人类偏好约束的情况下,\n", "> 对大规模通用语料进行无监督或弱监督的下一 Token 预测训练。\n", "\n", "几个关键词非常重要:\n", "\n", "* **不区分任务**:\n", " 问答、对话、代码、说明书在模型眼里没有本质区别\n", "* **不区分领域**:\n", " 数学、法律、文学只是不同的统计分布\n", "* **弱监督**:\n", " 监督信号只来自“真实下一个 Token 是什么”\n", "\n", "### 3. 模型在预训练阶段“真正学到”的是什么?\n", "\n", "当这种预测在:\n", "\n", "* 海量文本数据\n", "* 巨大参数规模\n", "* 深层 Transformer 结构\n", "* 多轮梯度下降\n", "\n", "中被不断重复时,模型**并没有显式学会规则**,但却隐式形成了大量结构性能力。\n", "\n", "它逐渐学会了:\n", "\n", "* **语言的统计结构**\n", "\n", " * 词序\n", " * 语法\n", " * 常见搭配\n", "* **文本模式的展开形态**\n", "\n", " * 提问 → 分析 → 回答\n", " * 定义 → 解释 → 示例\n", "* **问题到解法的高频路径**\n", "\n", " * 数学题的解题模板\n", " * 代码的常见写法\n", " * 论文与报告的组织结构\n", "\n", "这些能力并不是被“教会”的,\n", "而是**在概率空间中自然涌现(emerge)出来的**。" ] }, { "cell_type": "markdown", "id": "fcb1a386-771f-45e4-a956-78e804468e1a", "metadata": {}, "source": [ "### 4. 预训练之后\n", "\n", "为了让预训练后的模型变成更加好用,通常还会有以下两个阶段:\n", "\n", "1. **SFT (Supervised Fine-Tuning) 有监督微调**:\n", "给模型看几万组高质量的 `[指令] -> [回答]` 范本。教它学会:当人类提问时,你应该提供答案,而不是继续接龙。\n", "2. **RLHF (Reinforcement Learning from Human Feedback) 人类反馈强化学习**:\n", "让人类给模型的多个回答打分,告诉它哪些回答更安全、更准确、更有礼貌。这是给模型注入“价值观”的过程。\n" ] }, { "cell_type": "markdown", "id": "63ce4944-8825-4c46-91b9-a53b69f42077", "metadata": {}, "source": [ "## 二、训练过程是怎样的" ] }, { "cell_type": "markdown", "id": "477ae207-8bca-4ce1-9f5d-353f125276e0", "metadata": {}, "source": [ "在上面讲了模型在预训练阶段不断“预测下一个 Token”,并通过这种反复的训练逐渐形成统计规律和模式能力。那么,这个“不断预测、不断优化”的过程,背后到底是怎么发生的呢?这里就涉及到 **Loss、梯度和参数更新** 等概念的地方。\n", "\n", "#### **1. Loss ——模型“犯错的程度”**\n", "\n", "每次模型预测下一个 Token,它都可能猜对,也可能猜错。Loss 就是用来量化这个猜测有多糟糕的指标:\n", "\n", "* Loss 高 → 预测离真实 Token 很远,模型“犯大错”\n", "* Loss 低 → 预测接近真实 Token,模型“比较聪明”\n", "\n", "直觉上,Loss 就像模型的**自我反馈表**:它告诉模型“这一步做得好不好”。\n", "\n", "#### **2. 梯度 ——模型“改进的方向”**\n", "\n", "Loss 告诉模型哪里做错了,但不会直接告诉它该怎么改。这时,梯度就起作用了:\n", "\n", "* 梯度告诉模型:**如果把参数往这个方向微调,Loss 会下降得更快**\n", "* 梯度的大小表示“改进有多急迫”\n", "* 梯度的方向表示“改哪条路能最快变聪明”\n", "\n", "梯度就像模型在海量信息中寻找“下一步微调的指南针”。\n", "\n", "#### **3. 参数更新 ——模型“试错修正”的动作**\n", "\n", "有了梯度信息,模型就可以**调整自己的参数**:\n", "\n", "* 小幅度更新 → 细微改善\n", "* 大幅度更新 → 风险大,但可能学得快\n", "\n", "所以,参数更新就像模型在“沿着 Loss 形成的山谷不断下坡”:\n", "\n", "* Loss 是高度\n", "* 梯度是坡度\n", "* 参数更新就是每一步迈下去,让模型慢慢靠近“预测最准确”的谷底\n", "\n", "\n", "#### **4. 训练的整体过程**\n", "\n", "把三者串起来,你可以把预训练阶段想象成这样一个循环:\n", "\n", "1. **预测** → 模型根据当前参数猜下一个 Token\n", "2. **计算 Loss** → 模型知道自己猜得好不好\n", "3. **计算梯度** → 模型知道沿哪个方向调整参数能改进\n", "4. **参数更新** → 模型沿梯度方向微调自己\n", "5. **重复无数次** → 模型逐渐掌握语言模式、问题到解法的路径\n", "\n", "这个循环发生在**海量文本 + 巨大参数 + 深层 Transformer**中,被重复数十亿次,最终模型形成了 emergent 能力:它能理解语境、模仿逻辑、解决问题,但这些能力本质上是**概率统计的自然涌现**。" ] }, { "cell_type": "markdown", "id": "f2b7927d-dce8-4b5c-bb55-43eb3075b3a6", "metadata": {}, "source": [ "### 实战:训练过程模拟" ] }, { "cell_type": "code", "execution_count": 1, "id": "09584136-89c5-4bfa-a8ea-85e89846dd6f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 040: Loss = 0.1391\n", "Epoch 080: Loss = 0.0884\n", "Epoch 120: Loss = 0.0562\n", "Epoch 160: Loss = 0.0357\n", "Epoch 200: Loss = 0.0227\n", "------------------------------\n", "预测结果: 输入 5.0 → 预测 5.94 (目标值: 6.0)\n" ] } ], "source": [ "import torch\n", "import torch.nn as nn\n", "import torch.optim as optim\n", "\n", "# 1. 数据准备\n", "data = torch.arange(0, 10).float().unsqueeze(1) # shape (10,1)\n", "# 注意:(data + 1) % 10 在 9 之后会回到 0,线性模型很难完美拟合这种“断裂”\n", "# 这里为了演示 Loss 下降,我们假设目标就是 data + 1\n", "target = data + 1 \n", "\n", "# 2. 模型定义\n", "model = nn.Linear(1, 1)\n", "\n", "# 3. Loss 函数和优化器\n", "criterion = nn.MSELoss()\n", "# 修复:将学习率从 0.1 调低到 0.01\n", "optimizer = optim.SGD(model.parameters(), lr=0.01) \n", "\n", "# 4. 训练循环\n", "epochs = 200 # 学习率降低了,增加训练次数\n", "for epoch in range(epochs):\n", " optimizer.zero_grad() \n", "\n", " output = model(data)\n", " loss = criterion(output, target)\n", " loss.backward()\n", " optimizer.step()\n", "\n", " if (epoch+1) % 40 == 0:\n", " print(f\"Epoch {epoch+1:03d}: Loss = {loss.item():.4f}\")\n", "\n", "# 5. 测试预测\n", "test_data = torch.tensor([[5.0]]) \n", "pred = model(test_data).item()\n", "print(\"-\" * 30)\n", "print(f\"预测结果: 输入 5.0 → 预测 {pred:.2f} (目标值: 6.0)\")" ] }, { "cell_type": "markdown", "id": "441832ec-1f98-4bfd-a3f7-44a5c332e88c", "metadata": {}, "source": [ "过程说明如下:\n", "1. `loss = criterion(output, target)` 告诉模型预测有多差,也即模型犯错的程度。\n", "\n", "2. `loss.backward()` 计算 Loss 对模型参数的偏导数,告诉模型“沿哪个方向调整参数才能让 Loss 降低”。\n", "\n", "3. `optimizer.step()` 根据梯度调整权重,就像模型沿 Loss 山谷下坡,离最佳预测更近。\n", "\n", "4. `for epoch in range(epochs)` 模拟模型在海量文本上不断重复预测、计算 Loss、更新参数,最终学会了规律。" ] }, { "cell_type": "markdown", "id": "78703754-50ce-471a-ab06-465c15564bd3", "metadata": {}, "source": [ "## 三、训练阶段 vs 推理阶段" ] }, { "cell_type": "markdown", "id": "84c764f7-02f9-477a-9663-a4a5cdb9c462", "metadata": {}, "source": [ "在训练阶段,模型通过**不断试错、自我修正**来学习规律。每次预测都会计算 Loss(模型“犯错的程度”),梯度指明调整方向,参数更新沿梯度方向微调权重。这个循环反复进行,模型能力逐渐形成。\n", "而推理阶段,模型已经完成训练,参数固定,不再更新。输入流向输出,模型使用已经学到的能力来完成任务或生成文本,不再进行自我修正。\n" ] }, { "cell_type": "markdown", "id": "12c8c4a2-3060-432b-a267-4d8075d651d0", "metadata": {}, "source": [ "对比如下:\n", "| 对比维度 | 训练阶段(Training) | 推理阶段(Inference) |\n", "| --------- | -------------------------------- | ------------------------- |\n", "| **定义** | 模型通过 Loss → 梯度 → 参数更新循环学习规律和能力 | 模型使用训练中学到的能力进行任务或生成文本 |\n", "| **参数状态** | 可更新,梯度指导参数调整 | 固定,不再更新,使用已有参数 |\n", "| **数据流向** | 双向循环:预测 → Loss → 梯度 → 参数更新 → 再预测 | 单向流:输入 → 模型 → 输出,Loss 不回传 |\n", "| **速度与资源** | 慢,资源消耗大,需要显存存储梯度、优化器状态 | 快,资源消耗低,只进行前向计算 |\n", "| **核心目标** | 提升模型能力,学会语言规律、问题解决模式 | 使用模型能力,完成具体任务或生成文本 |\n" ] }, { "cell_type": "markdown", "id": "2c1b738e-5ab4-4725-9aad-20c6aaccc83b", "metadata": {}, "source": [ "我们平时调研ChatGPT、Gemini、Qwen、DeepSeek等模型都是在推理阶段。" ] }, { "cell_type": "code", "execution_count": null, "id": "c3583952-c264-4ea5-b614-853335b3bd59", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.10" } }, "nbformat": 4, "nbformat_minor": 5 }