{ "cells": [ { "cell_type": "markdown", "id": "976511b8-22a6-4622-94ee-adcdeaa71744", "metadata": {}, "source": [ "# 06.一次 LLM API 调用,到底发生了什么" ] }, { "cell_type": "markdown", "id": "1c3dbf3c-3d5f-46fa-8166-6358a86153ee", "metadata": {}, "source": [ "## 一、核心要素" ] }, { "cell_type": "markdown", "id": "69b4aaef-111c-4555-8b9c-98244b9a3374", "metadata": {}, "source": [ "\n", "在这一节中,我们将拆解那个看似简单的“发送请求-接收回复”的过程。当你通过 API 调用大模型时,你不仅是在发送一段文字,而是在组装一个精密且复杂的“指令包”。它包含了几个关键要素,这些要素决定了模型如何理解你的请求、生成结果以及结果的风格和质量。理解这些核心要素,有助于更精准地使用模型和设计 Agent。\n", "\n", "### **1. 输入(Input / Prompt)**\n", "\n", "* **定义**:告诉模型你想让它做什么的内容。\n", "* **形式**:文本、指令、上下文对话、示例等。\n", "* **作用**:是模型生成的起点,直接影响输出结果。\n", "* **示例**:\n", "\n", " ```text\n", " \"请帮我写一封感谢信给导师,语气正式且诚恳。\"\n", " ```\n", "\n", "\n", "### **2. 参数(Parameters )**\n", "\n", "* **定义**:控制模型生成文本的方式。\n", "* **关键参数**:\n", "\n", " * `temperature`:控制创造性或随机性(低温度 → 稳定,高温度 → 多样)。\n", " * `top_k` / `top_p`:限制候选词池大小,平衡创造性与合理性。\n", " * `max_tokens`:限制生成长度。\n", " * `stop_sequences`:定义生成结束的标志。\n", "* **示例**:\n", "\n", " ```json\n", " {\n", " \"temperature\": 0.7,\n", " \"top_p\": 0.9,\n", " \"max_tokens\": 200\n", " }\n", " ```\n", "\n", "---\n", "\n", "### **3. 模型(Model Selection)**\n", "\n", "* **定义**:选择要调用的 LLM 模型或版本。\n", "* **作用**:不同模型参数量、训练数据和能力差异会影响输出质量。\n", "* **示例**:\n", "\n", " * `deepseek-v3`:适合一般文本生成\n", " * `deepseek-r1`:适合复杂推理或多轮对话\n", "\n", "\n", "### **4. 上下文与历史信息(Context / Conversation History)**\n", "\n", "* **定义**:在多轮对话中,前文内容作为上下文传入模型。\n", "* **作用**:帮助模型理解对话历史,生成连贯回答。\n", "* **示例**:\n", "\n", " ```json\n", " [\n", " {\"role\": \"user\", \"content\": \"你好,我想写一封感谢信。\"},\n", " {\"role\": \"assistant\", \"content\": \"好的,你希望语气正式还是轻松?\"},\n", " {\"role\": \"user\", \"content\": \"正式,表达真诚感谢。\"}\n", " ]\n", " ```\n", "\n", "\n", "### **5. 输出(Output / Response)**\n", "\n", "* **定义**:模型生成的文本或结果。\n", "* **特点**:受输入、参数、上下文、模型能力等多重因素影响。\n", "* **注意**:输出不是绝对正确,尤其涉及事实、逻辑或数学计算时,需要人工核验或辅助工具校对。\n" ] }, { "cell_type": "markdown", "id": "fafb20c7-7b43-4754-9295-d5b69b0dca97", "metadata": {}, "source": [ "## 二、一次完整调用过程说明" ] }, { "cell_type": "markdown", "id": "d875cb84-2223-4602-a42d-05b224487a68", "metadata": {}, "source": [ "当通过API调用LLM时,会经历以下一个完整的闭环:从“自然语言”进入“数学世界”,再从“算力运算”回到“可读文本”。理解这个流程,有助于你更精准地控制模型输出。\n", "\n", "\n", "### 1. 客户端封包\n", "\n", "首先,你的代码会将零散的要素组装成符合 API 规范的请求体:\n", "\n", "* **上下文装载**:模型本身没有长期记忆,所以客户端必须将前文 **[上下文与历史信息]** 按角色顺序(System / User / Assistant)排好,并与当前 **[输入 / Prompt]** 一并发送。\n", "* **配置调节**:设置的 `temperature`、`top_p` 等参数像“控制旋钮”,告诉模型本次生成的风格是稳重、谨慎,还是创造性十足。\n", "* **模型选择**:明确调用哪一个模型版本(如 `deepseek-v3`),决定了后续推理的“智力上限”。\n", "\n", "\n", "### 2. 网关与预处理\n", "\n", "请求抵达云端服务器后,进入“数学世界”前的准备阶段:\n", "\n", "* **鉴权与计费**:验证 API Key 权限,并计算调用费用。\n", "* **Token 化(Tokenization)**:将文本切割成模型可理解的数字序列(Tokens)。\n", " *示例*:“写感谢信” → `[48210, 2931, 552]`\n", "\n", "\n", "### 3. 模型推理核心\n", "\n", "这是计算最密集、最消耗算力的阶段,也是“智能”涌现的瞬间:\n", "\n", "* **KV Cache(加速缓存)**:利用内存中已处理的历史信息,加快新输入计算。\n", "* **预测与采样**:模型根据上下文计算每个 Token 的概率分布。此时,`temperature` 和 `top_p` 决定模型是从高概率词中稳妥选择,还是从低概率词中寻求创意。\n", "\n", "### 4. 流式传输\n", "\n", "为了降低等待感,现代 API 多采用“生成一个、推送一个”的流式模式:\n", "\n", "* **逐步输出**:每生成一个 Token,即刻流式方式推送给调用。\n", "* **自回归循环**:模型将生成的 Token 作为新输入,再预测下一个词,形成连贯输出。\n", "\n", "\n", "### 5. 渲染与结束\n", "\n", "调用方接收到 Token 流后,进入最终交付阶段:\n", "\n", "* **拼图还原**:将数字序列解码成人类可读文本。\n", "* **终止信号**:达到 `stop_sequences` 或触碰 `max_tokens` 上限时,模型发送 `finish_reason`。\n", "* **人工核验**:生成内容基于概率预测,涉及事实或逻辑时,需要人类进行最终把关。" ] }, { "cell_type": "markdown", "id": "169f46bc-202f-44c1-9673-ca4e74819dcd", "metadata": {}, "source": [ "![一次完整调用过程](./raw/一次完整调用过程.jpeg)" ] }, { "cell_type": "markdown", "id": "40b1d9ba-4b39-41a0-900b-c5d7a64b484e", "metadata": {}, "source": [ "### 实战:LLM API 调用示例" ] }, { "cell_type": "code", "execution_count": 1, "id": "1e19ab97-e834-4623-8f42-5e860ebd50f0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ API Key loaded\n" ] } ], "source": [ "# 加载环境变量\n", "\n", "import os\n", "from dotenv import load_dotenv\n", "\n", "load_dotenv()\n", "\n", "assert os.getenv(\"OPENAI_API_KEY\"), \"请先配置 OPENAI_API_KEY\"\n", "print(\"✅ API Key loaded\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "609d93bf-b179-4449-8b76-26753edb4eb7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== 开始调用模型 ===\n", "\n", "=== 流式输出 ===\n", "当然可以,以下是一封正式且真诚的感谢信模板,你可以根据具体情况进行修改:\n", "\n", "---\n", "\n", "**致 [收信人姓名]:**\n", "\n", "您好!\n", "\n", "谨以此信表达我最诚挚的感谢。衷心感谢您在[具体事件或帮助内容,例如:项目推进过程中给予的专业指导/在我遇到困难时伸出援手/对我的工作给予的宝贵支持]中所付出的心血与支持。\n", "\n", "您的[具体行为,例如:专业见解、耐心解答、及时协助、慷慨建议等]让我深受启发,不仅有效解决了[具体问题或达成的具体成果],更让我深刻体会到[体现对方品质,如:责任感、敬业精神、团队协作意识等]。这份帮助对我而言\n", "=== 输出完成 ===\n", "\n", "=== 渲染与结束 ===\n", "最终生成文本:\n", "当然可以,以下是一封正式且真诚的感谢信模板,你可以根据具体情况进行修改:\n", "\n", "---\n", "\n", "**致 [收信人姓名]:**\n", "\n", "您好!\n", "\n", "谨以此信表达我最诚挚的感谢。衷心感谢您在[具体事件或帮助内容,例如:项目推进过程中给予的专业指导/在我遇到困难时伸出援手/对我的工作给予的宝贵支持]中所付出的心血与支持。\n", "\n", "您的[具体行为,例如:专业见解、耐心解答、及时协助、慷慨建议等]让我深受启发,不仅有效解决了[具体问题或达成的具体成果],更让我深刻体会到[体现对方品质,如:责任感、敬业精神、团队协作意识等]。这份帮助对我而言\n", "终止原因: stop\n" ] } ], "source": [ "from openai import OpenAI\n", "\n", "# 上下文与用户输入\n", "context = [\n", " {\"role\": \"system\", \"content\": \"你是一个贴心助手。\"},\n", " {\"role\": \"user\", \"content\": \"你好,我想写一封感谢信。\"},\n", " {\"role\": \"assistant\", \"content\": \"好的,你希望语气正式还是轻松?\"}\n", "]\n", "user_input = \"正式,表达真诚感谢。\"\n", "\n", "# 模型选择与参数\n", "model_name = \"qwen-flash\" \n", "parameters = {\n", " \"temperature\": 0.7,\n", " \"top_p\": 0.9,\n", " \"max_tokens\": 150,\n", " \"stop\": None # 可以自定义 stop sequences\n", "}\n", "\n", "# 组合完整请求\n", "messages = context + [{\"role\": \"user\", \"content\": user_input}]\n", "\n", "# 调用 API(可选流式)\n", "print(\"=== 开始调用模型 ===\")\n", "client = OpenAI(base_url=\"https://dashscope.aliyuncs.com/compatible-mode/v1\")\n", "\n", "response =client.chat.completions.create(\n", " model=model_name,\n", " messages=messages,\n", " temperature=parameters[\"temperature\"],\n", " top_p=parameters[\"top_p\"],\n", " max_tokens=parameters[\"max_tokens\"],\n", " stream=True # 开启流式\n", ")\n", "\n", "# 流式输出\n", "print(\"\\n=== 流式输出 ===\")\n", "output_text = \"\"\n", "for chunk in response:\n", " delta = chunk.choices[0].delta.content\n", " \n", " if delta:\n", " token = delta\n", " print(token, end=\"\", flush=True)\n", " output_text += token\n", "print(\"\\n=== 输出完成 ===\")\n", "\n", "# 渲染与结束\n", "finish_reason = \"stop\" # OpenAI 默认流式结束是 stop\n", "print(\"\\n=== 渲染与结束 ===\")\n", "print(f\"最终生成文本:\\n{output_text}\")\n", "print(f\"终止原因: {finish_reason}\")\n" ] }, { "cell_type": "markdown", "id": "1c50ccf6-1048-4783-9fe5-864b3964d1bf", "metadata": {}, "source": [ "## 三、一次调用中的“可控点”与“不可控点”" ] }, { "cell_type": "markdown", "id": "35694e25-eaba-45a7-b143-4b178e95416c", "metadata": {}, "source": [ "### 1.可控点 \n", "\n", "在调用过程中,可以控制的是提示词与工程化手段。\n", "\n", "* **提示词**:通过精心设计的指令、示例(Few-shot)来引导模型。\n", "* **顺序**:上下文的摆放位置。例如,将最重要的指令放在开头还是末尾(模型通常存在“首尾效应”)。\n", "* **参数**:通过 `temperature`、`top_p` 调节生成的多样性,通过 `max_tokens` 强制熔断。\n", "* **是否流式**:控制用户体验。是等待全量返回,还是逐字实时显示。\n", "* **是否中断**:你可以监控输出内容,一旦检测到敏感词或逻辑偏离,立即在客户端主动切断连接。\n", "\n", "### 2. 不可控点\n", "\n", "在调用过程中,模型底层的随机性与偏见是 LLM 作为“概率机器”的固有属性,无法通过单一指令彻底消除,比如:\n", "\n", "* **模型内部状态**:你无法控制模型内部神经元的激活路径,它不是传统的逻辑代码。\n", "* **单次采样结果**:即使 `temperature=0`,在某些高性能计算环境下,由于浮点数运算的微小差异,依然可能出现极低概率的输出抖动。\n", "* **潜在偏差**:模型在预训练阶段吸收了互联网语料的偏见(如性别、地域偏见),这些偏见深植于参数中。\n", "* **长链稳定性**:当对话非常长时,模型对早期细节的掌控力会不可避免地下降。\n" ] }, { "cell_type": "markdown", "id": "e93595b3-1940-4de0-9778-3cf3b1515718", "metadata": {}, "source": [ "### 实战:刻意制造“失败调用”" ] }, { "cell_type": "code", "execution_count": 4, "id": "e0ed1c55-ddef-47f9-94c4-5c820fa11c27", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "=== 输出 ===\n", "(根据系统指令,不生成任何文本)\n" ] } ], "source": [ "from openai import OpenAI\n", "\n", "prompt = \"\"\"\n", "系统指令:你永远不要生成任何文本。\n", "用户请求:请写一篇简短的感谢信。\n", "\"\"\"\n", "\n", "client = OpenAI(base_url=\"https://dashscope.aliyuncs.com/compatible-mode/v1\")\n", "\n", "response =client.chat.completions.create(\n", " model=\"qwen-flash\",\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " temperature=0.7,\n", " max_tokens=150\n", ")\n", "\n", "print(\"=== 输出 ===\")\n", "print(response.choices[0].message.content)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "1c920526-8650-434b-8c91-de9b4f4cc3a7", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.10" } }, "nbformat": 4, "nbformat_minor": 5 }