{ "cells": [ { "cell_type": "markdown", "id": "45398736-7e89-4263-89c8-92153baff553", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Supplementary code for the Build a Large Language Model From Scratch book by Sebastian Raschka
\n", "
Code repository: https://github.com/rasbt/LLMs-from-scratch\n", "
汉化的库: https://github.com/GoatCsu/CN-LLMs-from-scratch.git\n", "
\n", "
\n", "\n", "
\n" ] }, { "cell_type": "markdown", "id": "66dd524e-864c-4012-b0a2-ccfc56e80024", "metadata": { "id": "66dd524e-864c-4012-b0a2-ccfc56e80024" }, "source": [ "# 第五章: 在无标签数据集上预训练" ] }, { "cell_type": "code", "execution_count": 1, "id": "92b989e9-da36-4159-b212-799184764dd9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "matplotlib version: 3.9.2\n", "numpy version: 1.26.4\n", "tiktoken version: 0.8.0\n", "torch version: 2.5.1\n", "tensorflow version: 2.18.0\n" ] } ], "source": [ "from importlib.metadata import version\n", "\n", "pkgs = [\"matplotlib\", \n", " \"numpy\", \n", " \"tiktoken\", \n", " \"torch\",\n", " \"tensorflow\" # For OpenAI's pretrained weights\n", " ]\n", "for p in pkgs:\n", " print(f\"{p} version: {version(p)}\")\n", "#同样导入库并检查版本" ] }, { "cell_type": "markdown", "id": "0a3bdf9e-2ff0-4a57-abab-ede2d955a237", "metadata": {}, "source": [ "- 在本章中,我们将实现循环训练和基本模型评价的代码,用于预训练大语言模型。\n", "- 在本章的最后,我们还将从 OpenAI 加载公开可用的预训练权重到我们的模型中。" ] }, { "cell_type": "markdown", "id": "efd27fcc-2886-47cb-b544-046c2c31f02a", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "0d214765-7a73-42d5-95e9-302154b29db9", "metadata": {}, "source": [ "- 本章节涉及的主题如下所示" ] }, { "cell_type": "markdown", "id": "f67711d4-8391-4fee-aeef-07ea53dd5841", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "0d824183-145c-4865-89e1-1f0d0a338f19", "metadata": { "id": "0d824183-145c-4865-89e1-1f0d0a338f19" }, "source": [ "## 5.1 评估文本生成大模型" ] }, { "cell_type": "markdown", "id": "a3350f8c-5181-4f9b-a789-4523105e98f2", "metadata": {}, "source": [ "- 本节开始时,我们简要回顾了如何使用上一章的代码初始化 GPT 模型。\n", "- 然后,我们讨论了大语言模型的基本评估指标。\n", "- 最后,在本节中,我们将这些评估指标应用于训练和验证数据集。" ] }, { "cell_type": "markdown", "id": "bdc1cf3f-82d8-46c7-9ecc-58979ce87cdd", "metadata": { "id": "bdc1cf3f-82d8-46c7-9ecc-58979ce87cdd" }, "source": [ "### 5.1.1 用GPT来生成文本" ] }, { "cell_type": "markdown", "id": "5b3415fd-9f4a-4548-908e-9dfa56edc9bc", "metadata": {}, "source": [ "- 我们首先与前几章一样初始化GPT" ] }, { "cell_type": "code", "execution_count": 2, "id": "86000d74-624a-48f0-86da-f41926cb9e04", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "86000d74-624a-48f0-86da-f41926cb9e04", "outputId": "ad482cfd-5a62-4f0d-e1e0-008d6457f512" }, "outputs": [], "source": [ "import torch\n", "from previous_chapters import GPTModel\n", "\n", "GPT_CONFIG_124M = {\n", " \"vocab_size\": 50257, # Vocabulary size\n", " \"context_length\": 256, # Shortened context length (orig: 1024)\n", " \"emb_dim\": 768, # Embedding dimension\n", " \"n_heads\": 12, # Number of attention heads\n", " \"n_layers\": 12, # Number of layers\n", " \"drop_rate\": 0.1, # Dropout rate\n", " \"qkv_bias\": False # Query-key-value bias\n", "}\n", "\n", "torch.manual_seed(123)\n", "model = GPTModel(GPT_CONFIG_124M)\n", "model.eval(); # Disable dropout during inference\n", "#导入模型, 设定一系列参数, 设定随机种子确保可复现" ] }, { "cell_type": "markdown", "id": "09c6cf0f-7458-48a2-97fd-aa5068d65e8c", "metadata": {}, "source": [ "- 我们在上述代码中使用了 0.1 的 dropout 率,但如今训练大语言模型时通常不使用 dropout。\n", "- 现代的大语言模型不在 `nn.Linear` 层的查询、键和值矩阵中使用偏置向量(与早期的 GPT 模型不同),而是通过设置 `\"qkv_bias\": False` 实现。\n", "- 我们将上下文长度(`context_length`)减少到仅 256 个 token,以减少训练模型时的计算资源需求,而原始的 1.24 亿参数的 GPT-2 模型使用了 1024 个token。\n", " - 这是为了让更多读者可以在他们的笔记本电脑上运行并跟随代码示例。\n", " - 然而,您可以自由将 `context_length` 增加到 1024 个 token(这不需要更改任何代码)。\n", " - 我们稍后也将从预训练权重中加载一个具有 1024 `context_length` 的模型。" ] }, { "cell_type": "markdown", "id": "59f80895-be35-4bb5-81cb-f357ef7367fe", "metadata": {}, "source": [ "- 接下来,我们使用上一章中的 `generate_text_simple` 函数生成文本。\n", "- 此外,我们定义了两个便利函数,`text_to_token_ids` 和 `token_ids_to_text`,用于在 token ID 和文本表示之间进行转换,这两个函数将在本章中多次使用。" ] }, { "cell_type": "markdown", "id": "741881f3-cee0-49ad-b11d-b9df3b3ac234", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 3, "id": "5e062b82-3540-48ce-8eb4-009686d0d16c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output text:\n", " Every effort moves you rentingetic wasnم refres RexMeCHicular stren\n" ] } ], "source": [ "import tiktoken\n", "from previous_chapters import generate_text_simple\n", "\n", "def text_to_token_ids(text, tokenizer):\n", " encoded = tokenizer.encode(text, allowed_special={'<|endoftext|>'})\n", " encoded_tensor = torch.tensor(encoded).unsqueeze(0) # add batch dimension\n", " return encoded_tensor\n", "#给输入的字符进行编码并实现一个Batch维度的向量,符合模型的输入形式\n", "def token_ids_to_text(token_ids, tokenizer):\n", " flat = token_ids.squeeze(0) # remove batch dimension\n", " return tokenizer.decode(flat.tolist())\n", "#反向编码,去掉移除张量中的批次维度, 变成普通的链表\n", "start_context = \"Every effort moves you\"\n", "tokenizer = tiktoken.get_encoding(\"gpt2\")\n", "#举个例子\n", "token_ids = generate_text_simple(\n", " model=model,\n", " idx=text_to_token_ids(start_context, tokenizer),\n", " #初始上下文的Token ID张量,是上一步 text_to_token_ids 的输出\n", " max_new_tokens=10,\n", " context_size=GPT_CONFIG_124M[\"context_length\"]\n", ")\n", "#输出最长单词度为10的句子\n", "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))" ] }, { "cell_type": "markdown", "id": "e4d3249b-b2a0-44c4-b589-ae4b403b8305", "metadata": {}, "source": [ "- 如上所示,由于模型尚未训练,它生成的文本并不理想。\n", "- 我们如何衡量或设定“好文本”的标准,并将其转化为数值,以便在训练过程中进行跟踪?\n", "- 下一小节介绍了计算生成输出的损失指标的度量标准,我们可以用它来衡量训练进度。\n", "- 后续关于微调大语言模型的章节还将介绍其他衡量模型质量的方法。" ] }, { "cell_type": "markdown", "id": "0f3d7ea2-637f-4490-bc76-e361fc81ae98", "metadata": { "id": "0f3d7ea2-637f-4490-bc76-e361fc81ae98" }, "source": [ "### 5.1.2 计算文本生成的损失:交叉熵(cross- entropy)和困惑度(perplexity)" ] }, { "cell_type": "markdown", "id": "9e1ba8aa-fb03-4d25-957f-fe8778762440", "metadata": {}, "source": [ "- 假设我们有一个 `inputs` 张量,其中包含 2 个训练示例(行)的 token ID。\n", "- 与 `inputs` 对应,`targets` 包含我们希望模型生成的目标 token ID。\n", "- 请注意,`targets` 是将 `inputs` 向右移动 1 个位置后的结果,正如我们在第二章实现数据加载器时所解释的那样。" ] }, { "cell_type": "code", "execution_count": 4, "id": "6b5402f8-ec0c-4a44-9892-18a97779ee4f", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6b5402f8-ec0c-4a44-9892-18a97779ee4f", "outputId": "8d6fa0ff-7b37-4634-c3f0-2c050cbe81f0" }, "outputs": [], "source": [ "inputs = torch.tensor([[16833, 3626, 6100], # [\"every effort moves\",\n", " [40, 1107, 588]]) # \"I really like\"]\n", "#用向量的形式展现输入的文本\n", "targets = torch.tensor([[3626, 6100, 345 ], # [\" effort moves you\",\n", " [1107, 588, 11311]]) # \" really like chocolate\"]\n", "#用向量的形式展现要输出的东西" ] }, { "cell_type": "markdown", "id": "33dc0645-ac2c-4973-9b40-6da40515bede", "metadata": {}, "source": [ "- 将 `inputs` 输入给模型后,我们将得到 2 个输入示例的 logits 向量,每个输入示例包含 3 个token。\n", "- 每个 token 都是一个 50,257 维的向量,对应于词汇表的大小。\n", "- 通过应用 softmax 函数,我们可以将 logits 张量转换为一个相同维度的张量,其中包含概率得分。" ] }, { "cell_type": "code", "execution_count": 5, "id": "e7b6ec51-6f8c-49bd-a349-95ba38b46fb6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([2, 3, 50257])\n" ] } ], "source": [ "with torch.no_grad():\n", " logits = model(inputs)\n", "#不用梯度计算的计算inputes并储存\n", "probas = torch.softmax(logits, dim=-1) # Probability of each token in vocabulary\n", "#用soft Max整理logits\n", "print(probas.shape) # Shape: (batch_size, num_tokens, vocab_size)" ] }, { "cell_type": "markdown", "id": "5c36a382-b5e2-4de6-9e65-0b69b685013b", "metadata": {}, "source": [ "- 下图展示了我们如何将概率得分转换为文本,示例使用了一个非常小的词汇表,这一内容已在上一章的结尾讨论过。" ] }, { "cell_type": "markdown", "id": "384d86a9-0013-476c-bb6b-274fd5f20b29", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "e8480efd-d419-4954-9ecc-2876055334bd", "metadata": {}, "source": [ "- 如上一章所讨论的,我们可以应用 `argmax` 函数将概率得分转换为预测的 token ID。\n", "- 上面的 softmax 函数为每个 token 生成了一个 50,257 维的向量;`argmax` 函数返回该向量中概率得分最高的位置,即给定 token 的预测 token ID。" ] }, { "cell_type": "markdown", "id": "f3b84c9f-dd08-482e-b903-a86fe44e1144", "metadata": {}, "source": [ "- 由于我们有 2 个输入批次,每个批次包含 3 个 token,因此我们得到 2 行 3 列的预测 token ID:" ] }, { "cell_type": "code", "execution_count": 6, "id": "34ebd76a-16ec-4c17-8958-8a135735cc1c", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "34ebd76a-16ec-4c17-8958-8a135735cc1c", "outputId": "ed17da47-c3e7-4775-fd00-4ec5bcda3db2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Token IDs:\n", " tensor([[[16657],\n", " [ 339],\n", " [42826]],\n", "\n", " [[49906],\n", " [29669],\n", " [41751]]])\n" ] } ], "source": [ "token_ids = torch.argmax(probas, dim=-1, keepdim=True)\n", "#相当于用贪心算法给出最有可能的答案\n", "print(\"Token IDs:\\n\", token_ids)" ] }, { "cell_type": "markdown", "id": "cee4072c-21ed-4df7-8721-dd2535362573", "metadata": {}, "source": [ "- 如果解码这些 token,我们会发现它们与希望模型预测的 token (目标 token)有很大不同:" ] }, { "cell_type": "code", "execution_count": 7, "id": "c990ead6-53cd-49a7-a6d1-14d8c1518249", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Targets batch 1: effort moves you\n", "Outputs batch 1: Armed heNetflix\n" ] } ], "source": [ "print(f\"Targets batch 1: {token_ids_to_text(targets[0], tokenizer)}\")\n", "#给出答案\n", "print(f\"Outputs batch 1: {token_ids_to_text(token_ids[0].flatten(), tokenizer)}\")\n", "#给出事实上的结论" ] }, { "cell_type": "markdown", "id": "a53eb8a7-070e-46d6-930c-314ba55a6ff2", "metadata": {}, "source": [ "- 这是因为模型还没有经过训练。\n", "- 要训练模型,我们需要知道它与正确预测(目标)之间的差距有多大。" ] }, { "cell_type": "markdown", "id": "ad90592f-0d5d-4ec8-9ff5-e7675beab10e", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "c7251bf5-a079-4782-901d-68c9225d3157", "metadata": {}, "source": [ "- 对应于目标索引的 token 概率如下:" ] }, { "cell_type": "code", "execution_count": 8, "id": "54aef09c-d6e3-4238-8653-b3a1b0a1077a", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "54aef09c-d6e3-4238-8653-b3a1b0a1077a", "outputId": "41c946a2-c458-433e-a53d-5e7e89d9dddc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Text 1: tensor([7.4541e-05, 3.1061e-05, 1.1563e-05])\n", "Text 2: tensor([1.0337e-05, 5.6776e-05, 4.7559e-06])\n" ] } ], "source": [ "text_idx = 0\n", "target_probas_1 = probas[text_idx, [0, 1, 2], targets[text_idx]]\n", "print(\"Text 1:\", target_probas_1)\n", "\n", "text_idx = 1\n", "target_probas_2 = probas[text_idx, [0, 1, 2], targets[text_idx]]\n", "print(\"Text 2:\", target_probas_2)" ] }, { "cell_type": "markdown", "id": "a0e89a19-73c2-4e49-93b4-861f699f1cbf", "metadata": {}, "source": [ "- 我们希望最大化这些值,使它们的概率接近 1。\n", "- 在数学优化中,最大化概率得分的对数比直接最大化概率得分更为简单;虽然这超出了本书的讨论范围,但我录制了一节更详细的讲座,您可以在这里查看:[L8.2 Logistic Regression Loss Function](https://www.youtube.com/watch?v=GxJe0DZvydM)。" ] }, { "cell_type": "code", "execution_count": 9, "id": "31402a67-a16e-4aeb-977e-70abb9c9949b", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "31402a67-a16e-4aeb-977e-70abb9c9949b", "outputId": "1bf18e79-1246-4eab-efd8-12b328c78678" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([ -9.5042, -10.3796, -11.3677, -11.4798, -9.7764, -12.2561])\n" ] } ], "source": [ "# Compute logarithm of all token probabilities\n", "log_probas = torch.log(torch.cat((target_probas_1, target_probas_2)))\n", "print(log_probas)\n", "#用对数输出他最大的可能数值" ] }, { "cell_type": "markdown", "id": "c4261441-a511-4633-9c4c-67998af31b84", "metadata": {}, "source": [ "- 接下来,我们计算平均对数概率:" ] }, { "cell_type": "code", "execution_count": 10, "id": "9b003797-161b-4d98-81dc-e68320e09fec", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "9b003797-161b-4d98-81dc-e68320e09fec", "outputId": "a447fe9c-7e27-40ed-f1fb-51210e3f7cc9" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(-10.7940)\n" ] } ], "source": [ "# Calculate the average probability for each token\n", "avg_log_probas = torch.mean(log_probas)\n", "print(avg_log_probas)\n", "#对数概率平均值" ] }, { "cell_type": "markdown", "id": "36d51994-ad17-4ba3-a6ec-f588b4b13585", "metadata": {}, "source": [ "- 目标是通过优化模型权重,使得平均对数概率尽可能大。\n", "- 由于对数的性质,最大的可能值是 0,而我们当前距离 0 还有很大差距。" ] }, { "cell_type": "markdown", "id": "3de388a1-8a0a-4c94-8894-9041dc6ad514", "metadata": {}, "source": [ "- 在深度学习中,通常的做法是最小化 \"负\" 的平均对数概率值,而不是最大化平均对数概率值;在我们的例子中,深度学习中我们会最小化 10.7722 使其接近 0,而不是最大化 -10.7722 使其接近 0。\n", "- 值 -10.7722 的负数,即 10.7722,在深度学习中也被称为交叉熵损失(cross-entropy loss)。" ] }, { "cell_type": "code", "execution_count": 11, "id": "176ddf35-1c5f-4d7c-bf17-70f3e7069bd4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(10.7940)\n" ] } ], "source": [ "neg_avg_log_probas = avg_log_probas * -1\n", "#最大化对数等价为最小化负对数\n", "print(neg_avg_log_probas)" ] }, { "cell_type": "markdown", "id": "84eeb868-abd8-4028-82db-107546bf7c2c", "metadata": {}, "source": [ "- PyTorch 中的`cross_entropy` 已经能实现这些功能" ] }, { "cell_type": "markdown", "id": "5bd24b7f-b760-47ad-bc84-86d13794aa54", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "e8aaf9dd-3ee6-42bf-a63f-6e93dbfb989d", "metadata": {}, "source": [ "- 在使用`cross_entropy` 之前, 我们可以看一下loggias跟target是怎样的" ] }, { "cell_type": "code", "execution_count": 12, "id": "695d6f64-5084-4c23-aea4-105c9e38cfe4", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "695d6f64-5084-4c23-aea4-105c9e38cfe4", "outputId": "43fd802a-8136-4b35-df0d-f61a5d4cb561" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Logits shape: torch.Size([2, 3, 50257])\n", "Targets shape: torch.Size([2, 3])\n" ] } ], "source": [ "# Logits have shape (batch_size, num_tokens, vocab_size)\n", "print(\"Logits shape:\", logits.shape)\n", "\n", "# Targets have shape (batch_size, num_tokens)\n", "print(\"Targets shape:\", targets.shape)" ] }, { "cell_type": "markdown", "id": "1d3d65f0-6566-4865-93e4-0c0bcb10cd06", "metadata": {}, "source": [ "- 有了PyTorch 中的 `cross_entropy` 函数,我们希望通过在批次维度上合并这些张量来将其展平:" ] }, { "cell_type": "code", "execution_count": 13, "id": "0e17e027-ab9f-4fb5-ac9b-a009b831c122", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "0e17e027-ab9f-4fb5-ac9b-a009b831c122", "outputId": "0b2b778b-02fb-43b2-c879-adc59055a7d8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Flattened logits: torch.Size([6, 50257])\n", "Flattened targets: torch.Size([6])\n" ] } ], "source": [ "logits_flat = logits.flatten(0, 1)\n", "#将张量 logits 的 第0维和第1维合并为一个维度,展平成一个二维张量\n", "targets_flat = targets.flatten()\n", "#将张量 targets 展平为一维张量\n", "\n", "print(\"Flattened logits:\", logits_flat.shape)\n", "print(\"Flattened targets:\", targets_flat.shape)" ] }, { "cell_type": "markdown", "id": "4921a57f-3a79-473e-a863-6d63b495010f", "metadata": {}, "source": [ "- 请注意,目标是 token ID,这些 ID 也代表我们希望最大化的 logits 张量中的索引位置。\n", "- PyTorch 中的 `cross_entropy` 函数会自动处理对这些token索引的 softmax 和对数概率计算,确保它们被最大化。" ] }, { "cell_type": "code", "execution_count": 14, "id": "62d0816e-b29a-4c8f-a9a5-a167562de978", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "62d0816e-b29a-4c8f-a9a5-a167562de978", "outputId": "c0be634a-2c65-4ff7-a73f-1bfc2e406ba4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(10.7940)\n" ] } ], "source": [ "loss = torch.nn.functional.cross_entropy(logits_flat, targets_flat)\n", "print(loss)\n", "#封装函数出马,一个代替好几行" ] }, { "cell_type": "markdown", "id": "0f15ce17-fd7b-4d8e-99da-b237523a7a80", "metadata": {}, "source": [ "- 与交叉熵损失相关的一个概念是大语言模型的困惑度 (perplexity)。\n", "- 困惑度就是交叉熵损失的指数值。" ] }, { "cell_type": "code", "execution_count": 15, "id": "168952a1-b964-4aa7-8e49-966fa26add54", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "168952a1-b964-4aa7-8e49-966fa26add54", "outputId": "a0a692c1-6412-4068-8aa5-8858548141eb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor(48725.8203)\n" ] } ], "source": [ "perplexity = torch.exp(loss)\n", "#指数化loss作为P值\n", "print(perplexity)" ] }, { "cell_type": "markdown", "id": "71ae26dd-d77e-41fd-b924-6bd103dd4ee7", "metadata": {}, "source": [ "- 困惑度通常被认为更易解释,因为它可以理解为模型在每一步对词汇表大小的不确定性(在上面的例子中,这相当于 48,725 个单词或 token)。\n", "- 换句话说,困惑度提供了一个衡量模型预测的概率分布与数据集中单词实际分布匹配程度的指标。\n", "- 类似于损失值,较低的困惑度表示模型预测与实际分布的差距较小。" ] }, { "cell_type": "markdown", "id": "2ec6c217-e429-40c7-ad71-5d0a9da8e487", "metadata": { "id": "2ec6c217-e429-40c7-ad71-5d0a9da8e487" }, "source": [ "### 5.1.3 计算训练集和验证集的损失" ] }, { "cell_type": "markdown", "id": "530da89e-2448-436c-8f1b-28e8a31ef85c", "metadata": {}, "source": [ "- 我们使用一个相对较小的数据集来训练大语言模型(训练内容只有一个短篇故事)。\n", "- 选择小故事的原因包括:\n", " - 无独立显卡的电脑也可以快速完成\n", " - 训练时间较短(以分钟计算,而非数周)\n", " - 我们使用了无版权文本,可以将其包含在这个 GitHub 仓库中,而不会违反任何使用权或显著增加仓库大小。\n", "\n", "- 例如,Llama 2 7B 模型在 2 万亿 token 上训练时需要 184,320 个 GPU 小时(使用 A100 GPU)。\n", " - 截至本文撰写时,AWS 上 8xA100 云服务器的每小时成本约为 30 美元。\n", " - 因此,通过简单计算,训练这个 LLM 的成本为 184,320 / 8 * 30 美元 = 690,000 美元。\n", "\n", "- 下面,我们使用了第二章中使用的同一数据集。" ] }, { "cell_type": "code", "execution_count": 16, "id": "654fde37-b2a9-4a20-a8d3-0206c056e2ff", "metadata": {}, "outputs": [], "source": [ "import os\n", "import urllib.request\n", "\n", "file_path = \"the-verdict.txt\"\n", "url = \"https://raw.githubusercontent.com/rasbt/LLMs-from-scratch/main/ch02/01_main-chapter-code/the-verdict.txt\"\n", "#引入数据集\n", "if not os.path.exists(file_path):\n", " with urllib.request.urlopen(url) as response:\n", " text_data = response.read().decode('utf-8')\n", " with open(file_path, \"w\", encoding=\"utf-8\") as file:\n", " file.write(text_data)\n", "else:\n", " with open(file_path, \"r\", encoding=\"utf-8\") as file:\n", " text_data = file.read()\n", "#一系列经典的读取数据操作" ] }, { "cell_type": "markdown", "id": "379330f1-80f4-4e34-8724-41d892b04cee", "metadata": {}, "source": [ "- 通过前100个词与后100个词快速检查文本是否加载正常" ] }, { "cell_type": "code", "execution_count": 17, "id": "6kgJbe4ehI4q", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "6kgJbe4ehI4q", "outputId": "9ff31e88-ee37-47e9-ee64-da6eb552f46f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I HAD always thought Jack Gisburn rather a cheap genius--though a good fellow enough--so it was no \n" ] } ], "source": [ "# First 100 characters\n", "print(text_data[:99])" ] }, { "cell_type": "code", "execution_count": 18, "id": "j2XPde_ThM_e", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 35 }, "id": "j2XPde_ThM_e", "outputId": "a900c1b9-9a87-4078-968b-a5721deda5cb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "it for me! The Strouds stand alone, and happen once--but there's no exterminating our kind of art.\"\n" ] } ], "source": [ "# Last 100 characters\n", "print(text_data[-99:])" ] }, { "cell_type": "code", "execution_count": 19, "id": "6b46a952-d50a-4837-af09-4095698f7fd1", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "6b46a952-d50a-4837-af09-4095698f7fd1", "outputId": "c2a25334-21ca-486e-8226-0296e5fc6486" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Characters: 20479\n", "Tokens: 5145\n" ] } ], "source": [ "total_characters = len(text_data)\n", "total_tokens = len(tokenizer.encode(text_data))\n", "#统计一下文本的长度,编码文本内容并输出文本个数\n", "print(\"Characters:\", total_characters)\n", "print(\"Tokens:\", total_tokens)" ] }, { "cell_type": "markdown", "id": "a8830cb9-90f6-4e7c-8620-beeabc2d39f7", "metadata": {}, "source": [ "- 为了教学,我们选取了这个短文本作为样例" ] }, { "cell_type": "markdown", "id": "bedcad87-a0e8-4b9d-ac43-4e927ccbb50f", "metadata": {}, "source": [ "- 接下来,我们将数据集划分为训练集和验证集,并使用第二章中的data loader为大语言模型(LLM)训练准备数据。\n", "- 出于可视化的目的,下图假设 `max_length=6`,但对于训练加载器,我们将 `max_length` 设置为 LLM 支持的上下文长度。\n", "- 为了简化,下图仅展示了输入token:\n", " - 由于我们训练 LLM 预测文本中的下一个单词,因此目标 token 与输入 token 相同,只是目标 token 向右移动了一个位置。" ] }, { "cell_type": "markdown", "id": "46bdaa07-ba96-4ac1-9d71-b3cc153910d9", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 20, "id": "0959c855-f860-4358-8b98-bc654f047578", "metadata": {}, "outputs": [], "source": [ "from previous_chapters import create_dataloader_v1\n", "#从一个库导入之前的文章\n", "# Train/validation ratio\n", "train_ratio = 0.90\n", "split_idx = int(train_ratio * len(text_data))\n", "train_data = text_data[:split_idx]\n", "val_data = text_data[split_idx:]\n", "#这边可以手动定义训练集跟测试剂的比例\n", "\n", "torch.manual_seed(123)\n", "#依旧保持可复现\n", "train_loader = create_dataloader_v1(\n", " train_data,\n", " batch_size=2,\n", " max_length=GPT_CONFIG_124M[\"context_length\"],\n", " stride=GPT_CONFIG_124M[\"context_length\"],\n", " drop_last=True,\n", " shuffle=True,\n", " num_workers=0\n", ")\n", "#初始化输入训练模型,给出批处理的大小、给出最大文本容量防止溢出\n", "#给出不畅,丢弃最后一批不足的文本,打开随机防止拟合过度\n", "val_loader = create_dataloader_v1(\n", " val_data,\n", " batch_size=2,\n", " max_length=GPT_CONFIG_124M[\"context_length\"],\n", " stride=GPT_CONFIG_124M[\"context_length\"],\n", " drop_last=False,\n", " shuffle=False,\n", " num_workers=0\n", ")\n", "#验证数据集仅仅修改了是否丢弃跟随抽取" ] }, { "cell_type": "code", "execution_count": 21, "id": "f37b3eb0-854e-4895-9898-fa7d1e67566e", "metadata": {}, "outputs": [], "source": [ "# Sanity check\n", "# 神圣性,看一下一批次够了没\n", "\n", "if total_tokens * (train_ratio) < GPT_CONFIG_124M[\"context_length\"]:\n", " print(\"Not enough tokens for the training loader. \"\n", " \"Try to lower the `GPT_CONFIG_124M['context_length']` or \"\n", " \"increase the `training_ratio`\")\n", "\n", "if total_tokens * (1-train_ratio) < GPT_CONFIG_124M[\"context_length\"]:\n", " print(\"Not enough tokens for the validation loader. \"\n", " \"Try to lower the `GPT_CONFIG_124M['context_length']` or \"\n", " \"decrease the `training_ratio`\")" ] }, { "cell_type": "markdown", "id": "e7ac3296-a4d1-4303-9ac5-376518960c33", "metadata": {}, "source": [ "- 小的批处理数据集很适合用来小试牛刀\n", "- 例如, Llama 27B就是用每次1024的批处理数据" ] }, { "cell_type": "markdown", "id": "a8e0514d-b990-4dc0-9afb-7721993284a0", "metadata": {}, "source": [ "- 另一种确认数据正常导入的方法如下" ] }, { "cell_type": "code", "execution_count": 22, "id": "ca0116d0-d229-472c-9fbf-ebc229331c3e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train loader:\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "torch.Size([2, 256]) torch.Size([2, 256])\n", "\n", "Validation loader:\n", "torch.Size([2, 256]) torch.Size([2, 256])\n" ] } ], "source": [ "print(\"Train loader:\")\n", "for x, y in train_loader:\n", " print(x.shape, y.shape)\n", "\n", "print(\"\\nValidation loader:\")\n", "for x, y in val_loader:\n", " print(x.shape, y.shape)" ] }, { "cell_type": "markdown", "id": "f7b9b1a4-863d-456f-a8dd-c07fb5c024ed", "metadata": {}, "source": [ "- 还有一个方法来确认是否导入成功" ] }, { "cell_type": "code", "execution_count": 23, "id": "eb860488-5453-41d7-9870-23b723f742a0", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eb860488-5453-41d7-9870-23b723f742a0", "outputId": "96b9451a-9557-4126-d1c8-51610a1995ab" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training tokens: 4608\n", "Validation tokens: 512\n", "All tokens: 5120\n" ] } ], "source": [ "train_tokens = 0\n", "for input_batch, target_batch in train_loader:\n", " train_tokens += input_batch. numel()\n", "\n", "val_tokens = 0\n", "for input_batch, target_batch in val_loader:\n", " val_tokens += input_batch.numel()\n", "#每次加一下训练数据集所有元素的种类\n", "print(\"Training tokens:\", train_tokens)\n", "print(\"Validation tokens:\", val_tokens)\n", "print(\"All tokens:\", train_tokens + val_tokens)\n", "\n", "# 在 PyTorch 中,调用 .numel() 方法会返回张量中所有元素的总数,无论张量的形状或维度如何" ] }, { "cell_type": "markdown", "id": "5c3085e8-665e-48eb-bb41-cdde61537e06", "metadata": {}, "source": [ "- 我们用了预分装函数来计算交叉熵\n", "- 我们还调用另一个辅助函数,用于计算数据加载器中由用户指定的批次数Loss。" ] }, { "cell_type": "code", "execution_count": 24, "id": "7b9de31e-4096-47b3-976d-b6d2fdce04bc", "metadata": { "id": "7b9de31e-4096-47b3-976d-b6d2fdce04bc" }, "outputs": [], "source": [ "def calc_loss_batch(input_batch, target_batch, model, device):\n", " input_batch, target_batch = input_batch.to(device), target_batch.to(device)\n", " #呼唤GPU\n", " logits = model(input_batch)\n", " loss = torch.nn.functional.cross_entropy(logits.flatten(0, 1), target_batch.flatten())\n", " #用交叉熵函数对于logits进行计算并且拉伸到二维长度\n", " return loss\n", "#一个计算批损失的函数\n", "\n", "def calc_loss_loader(data_loader, model, device, num_batches=None):\n", " total_loss = 0.\n", " if len(data_loader) == 0:\n", " return float(\"nan\")\n", " elif num_batches is None:\n", " num_batches = len(data_loader)\n", " else:\n", " # 如果指定的批次数超过数据加载器中的总批次数,则将批次数减少到与数据加载器的总批次数匹配。\n", " num_batches = min(num_batches, len(data_loader))\n", " #减少需要处理的数量,同时也是防止溢出\n", " for i, (input_batch, target_batch) in enumerate(data_loader):\n", " if i < num_batches:\n", " loss = calc_loss_batch(input_batch, target_batch, model, device)\n", " total_loss += loss.item()\n", " else:\n", " break\n", " #一点点加上去的损失\n", " return total_loss / num_batches" ] }, { "cell_type": "markdown", "id": "f0691332-84d0-48b3-b462-a885ddeb4fca", "metadata": {}, "source": [ "- 如果你的电脑有支持 CUDA 的 GPU,大预言模型将无需更改代码即可在 GPU 上进行训练。\n", "- 通过 `device` 设置,我们确保数据加载到与大语言模型相同的设备上。" ] }, { "cell_type": "code", "execution_count": 25, "id": "56f5b0c9-1065-4d67-98b9-010e42fc1e2a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Training loss: 10.987583584255642\n", "Validation loss: 10.98110580444336\n" ] } ], "source": [ "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", "# 如果支持,则调用 GPU\n", "\n", "# 注意:\n", "# 如果取消注释以下代码块,代码可以在 Apple Silicon 芯片上运行(如果适用),\n", "# 在 M3 MacBook Air 上测量速度大约是 Apple CPU 的两倍。\n", "# 然而,计算得到的损失值可能会略有不同。\n", "\n", "#if torch.cuda.is_available():\n", "# device = torch.device(\"cuda\")\n", "#elif torch.backends.mps.is_available():\n", "# device = torch.device(\"mps\")\n", "#else:\n", "# device = torch.device(\"cpu\")\n", "#\n", "# print(f\"Using {device} device.\")\n", "\n", "model.to(device) # 对于 nn.Module 类,不需要赋值 model = model.to(device)\n", "\n", "torch.manual_seed(123) # 固定随机种子,保证数据加载器打乱数据的结果可复现\n", "\n", "with torch.no_grad(): # 禁用梯度跟踪以提高效率,因为此时尚未开始训练\n", " train_loss = calc_loss_loader(train_loader, model, device)\n", " val_loss = calc_loss_loader(val_loader, model, device)\n", "\n", "# 推理阶段不计算梯度\n", "print(\"Training loss:\", train_loss)\n", "print(\"Validation loss:\", val_loss)" ] }, { "cell_type": "markdown", "id": "43875e95-190f-4b17-8f9a-35034ba649ec", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "b9339f8d-00cb-4206-af67-58c32bd72055", "metadata": { "id": "b9339f8d-00cb-4206-af67-58c32bd72055" }, "source": [ "## 5.2 训练大语言模型" ] }, { "cell_type": "markdown", "id": "652a4cf4-e98f-46d9-bdec-60e7ccb8d6bd", "metadata": {}, "source": [ "- 在本节中,我们最终实现了用于训练大语言模型(LLM)的代码。\n", "- 我们想要于一个简单的训练函数(如果您对通过更高级的技术增强此训练函数感兴趣,例如学习率预热(rate warmup)、余弦退火(cosine annealing)和梯度裁剪(gradient clipping),请参阅[附录D](../../appendix-D/01_main-chapter-code))。\n", "\n", "" ] }, { "cell_type": "code", "execution_count": 26, "id": "Mtp4gY0ZO-qq", "metadata": { "id": "Mtp4gY0ZO-qq" }, "outputs": [], "source": [ "def train_model_simple(model, train_loader, val_loader, optimizer, device, num_epochs,\n", " eval_freq, eval_iter, start_context, tokenizer):\n", " # Initialize lists to track losses and tokens seen\n", " train_losses, val_losses, track_tokens_seen = [], [], []\n", " tokens_seen, global_step = 0, -1\n", " #初始化训练模型而且给了空的队列\n", " # Main training loop\n", " for epoch in range(num_epochs):#训练次数\n", " model.train() # Set model to training mode\n", " #转移到训练模块\n", " for input_batch, target_batch in train_loader:\n", " #从loader里面调出输入跟目标\n", " optimizer.zero_grad() # Reset loss gradients from previous batch iteration\n", " #清空所有函数的梯度\n", " loss = calc_loss_batch(input_batch, target_batch, model, device)\n", " #计算损失函数\n", " loss.backward() # Calculate loss gradients\n", " #反向传播优化\n", " optimizer.step() # Update model weights using loss gradients\n", " #更新权重\n", " tokens_seen += input_batch.numel()\n", " #加一下一共有多少\n", " global_step += 1\n", " #看一下一共训练了多少步\n", " # Optional evaluation step\n", " if global_step % eval_freq == 0:\n", " #按照一定的步数进行记录\n", " train_loss, val_loss = evaluate_model(\n", " model, train_loader, val_loader, device, eval_iter)\n", " #计算损失函数\n", " train_losses.append(train_loss)\n", " val_losses.append(val_loss)\n", " track_tokens_seen.append(tokens_seen)\n", " #加到list中\n", " print(f\"Ep {epoch+1} (Step {global_step:06d}): \"\n", " f\"Train loss {train_loss:.3f}, Val loss {val_loss:.3f}\")\n", "\n", " # Print a sample text after each epoch\n", " generate_and_print_sample(\n", " model, tokenizer, device, start_context\n", " )\n", "\n", " return train_losses, val_losses, track_tokens_seen\n", "\n", "\n", "def evaluate_model(model, train_loader, val_loader, device, eval_iter):\n", " #评价模块\n", " model.eval()\n", " #检验模式\n", " with torch.no_grad():\n", " #我认为的双保险,防止梯度更新\n", " train_loss = calc_loss_loader(train_loader, model, device, num_batches=eval_iter)\n", " val_loss = calc_loss_loader(val_loader, model, device, num_batches=eval_iter)\n", " model.train()\n", " #\t在评估结束后切换回训练模式,确保模型能继续用于训练。\n", " return train_loss, val_loss\n", "\n", "\n", "def generate_and_print_sample(model, tokenizer, device, start_context):\n", " model.eval()\n", " context_size = model.pos_emb.weight.shape[0]\n", " encoded = text_to_token_ids(start_context, tokenizer).to(device)\n", " with torch.no_grad():\n", " token_ids = generate_text_simple(\n", " model=model, idx=encoded,\n", " max_new_tokens=50, context_size=context_size\n", " )\n", " decoded_text = token_ids_to_text(token_ids, tokenizer)\n", " print(decoded_text.replace(\"\\n\", \" \")) # Compact print format\n", " model.train()" ] }, { "cell_type": "markdown", "id": "9252ffa4-8162-466e-b347-27cb89b9a5ee", "metadata": {}, "source": [ "(GPT的解释)\n", "- global_step 是训练循环中的重要计数器,主要用于控制学习率调度、记录日志、保存模型检查点和控制终止条件等任务。\n", "- 在分批训练中,它提供了一个统一的参考点,有助于管理复杂的训练流程。\n", "- epoch 是按完整数据集的迭代单位,而 global_step 是按 batch 单位,粒度更细,用于管理更精确的任务。例如:\n", " 1.\t动态学习率调整,某些学习率调度器需要以 batch 为单位进行调整,而不是每个 epoch。例如,WarmUp 会在固定的前 N 步逐渐升高学习率。\n", "\t2.\t频繁日志记录,记录训练日志时,通常是每隔 log_interval 步输出一次,而不是每个 epoch。\n", "\t3.\t检查点保存,保存模型状态通常是按步数完成,尤其是当训练需要中断和恢复时,global_step 是更精确的 token。" ] }, { "cell_type": "markdown", "id": "a301b333-b9d4-4eeb-a212-3a9874e3ac47", "metadata": {}, "source": [ "- 我们用上述的定义训练一下这个模型" ] }, { "cell_type": "code", "execution_count": 27, "id": "3422000b-7aa2-485b-92df-99372cd22311", "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3422000b-7aa2-485b-92df-99372cd22311", "outputId": "0e046603-908d-4093-8ae5-ef2f632639fb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ep 1 (Step 000000): Train loss 9.781, Val loss 9.933\n", "Ep 1 (Step 000005): Train loss 8.111, Val loss 8.339\n", "Every effort moves you,,,,,,,,,,,,. \n", "Ep 2 (Step 000010): Train loss 6.661, Val loss 7.048\n", "Ep 2 (Step 000015): Train loss 5.961, Val loss 6.616\n", "Every effort moves you, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and, and,, and, and,\n", "Ep 3 (Step 000020): Train loss 5.726, Val loss 6.600\n", "Ep 3 (Step 000025): Train loss 5.201, Val loss 6.348\n", "Every effort moves you, and I had been. \n", "Ep 4 (Step 000030): Train loss 4.417, Val loss 6.278\n", "Ep 4 (Step 000035): Train loss 4.069, Val loss 6.226\n", "Every effort moves you know the \"I he had the donkey and I had the and I had the donkey and down the room, I had\n", "Ep 5 (Step 000040): Train loss 3.732, Val loss 6.160\n", "Every effort moves you know it was not that the picture--I had the fact by the last I had been--his, and in the \"Oh, and he said, and down the room, and in\n", "Ep 6 (Step 000045): Train loss 2.850, Val loss 6.179\n", "Ep 6 (Step 000050): Train loss 2.427, Val loss 6.141\n", "Every effort moves you know,\" was one of the picture. The--I had a little of a little: \"Yes, and in fact, and in the picture was, and I had been at my elbow and as his pictures, and down the room, I had\n", "Ep 7 (Step 000055): Train loss 2.104, Val loss 6.134\n", "Ep 7 (Step 000060): Train loss 1.882, Val loss 6.233\n", "Every effort moves you know,\" was one of the picture for nothing--I told Mrs. \"I was no--as! The women had been, in the moment--as Jack himself, as once one had been the donkey, and were, and in his\n", "Ep 8 (Step 000065): Train loss 1.320, Val loss 6.238\n", "Ep 8 (Step 000070): Train loss 0.985, Val loss 6.242\n", "Every effort moves you know,\" was one of the axioms he had been the tips of a self-confident moustache, I felt to see a smile behind his close grayish beard--as if he had the donkey. \"strongest,\" as his\n", "Ep 9 (Step 000075): Train loss 0.717, Val loss 6.293\n", "Ep 9 (Step 000080): Train loss 0.541, Val loss 6.393\n", "Every effort moves you?\" \"Yes--quite insensible to the irony. She wanted him vindicated--and by me!\" He laughed again, and threw back the window-curtains, I had the donkey. \"There were days when I\n", "Ep 10 (Step 000085): Train loss 0.391, Val loss 6.452\n", "Every effort moves you know,\" was one of the axioms he laid down across the Sevres and silver of an exquisitely appointed luncheon-table, when, on a later day, I had again run over from Monte Carlo; and Mrs. Gis\n" ] } ], "source": [ "# Note:\n", "# Uncomment the following code to calculate the execution time\n", "#下面可以看一下计算了多久\n", "# import time\n", "# start_time = time.time()\n", "\n", "torch.manual_seed(123)\n", "model = GPTModel(GPT_CONFIG_124M)\n", "model.to(device)\n", "#经典操作\n", "optimizer = torch.optim.AdamW(model.parameters(), lr=0.0004, weight_decay=0.1)\n", "#用Adam进行优化,其中学习rate为0.004,动量衰减是0.1\n", "num_epochs = 10\n", "#10论学习\n", "train_losses, val_losses, tokens_seen = train_model_simple(\n", " model, train_loader, val_loader, optimizer, device,\n", " num_epochs=num_epochs, eval_freq=5, eval_iter=5,\n", " start_context=\"Every effort moves you\", tokenizer=tokenizer\n", ")\n", "#记录了开始文本、检验的频率\n", "# 注意:\n", "# 如果需要显示执行时间,请取消注释以下代码\n", "# end_time = time.time()\n", "# execution_time_minutes = (end_time - start_time) / 60\n", "# print(f\"训练完成耗时 {execution_time_minutes:.2f} 分钟。\")" ] }, { "cell_type": "code", "execution_count": 28, "id": "0WSRu2i0iHJE", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 487 }, "id": "0WSRu2i0iHJE", "outputId": "9d36c61b-517d-4f07-a7e8-4563aff78b11" }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeoAAAEiCAYAAAA21pHjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy80BEi2AAAACXBIWXMAAA9hAAAPYQGoP6dpAABXqUlEQVR4nO3dd3gU5drH8e9uyqb3DiQEElLo3RCwEQmISFFRT1RAlCMdEUVUEGyIIgdBDlhe4VgQGyBSBaSGKhCKhNBCQkmhpZOQZJ/3jyUbliaBhN2E+3Ndc7E788zMvUOS387MMzMapZRCCCGEEBZJa+4ChBBCCHF9EtRCCCGEBZOgFkIIISyYBLUQQghhwSSohRBCCAsmQS2EEEJYMAlqIYQQwoJJUAshhBAWTIJaCCGEsGAS1ELUAMeOHUOj0ZCQkGDuUoQQlUyCWggLodFobjiMHz/e3CUKIczA2twFCCEM0tLSjK9//PFHxo0bR1JSknGck5OTOcoSQpiZ7FELYSH8/PyMg6urKxqNxvjex8eHKVOmULt2bXQ6Hc2aNWP58uXXXVZpaSnPP/884eHhpKamAvDbb7/RokUL7OzsqFevHhMmTKCkpMQ4j0aj4auvvqJnz544ODgQGhrKokWLjNPPnz9PXFwc3t7e2NvbExoayuzZs69bwy+//ELjxo2xt7fH09OTmJgY8vPzjdO/+uorIiIisLOzIzw8nP/+978m8x8/fpzevXvj5uaGh4cH3bt359ixY8bpffv2pUePHkyePBl/f388PT0ZPHgwxcXFN73NhagWlBDC4syePVu5uroa30+ZMkW5uLioH374QR04cEC99tprysbGRh08eFAppVRycrIC1K5du1RhYaHq2bOnat68ucrMzFRKKbV+/Xrl4uKi5syZo44cOaL++OMPVbduXTV+/HjjOgBVu3ZtNXfuXHXo0CE1bNgw5eTkpM6ePauUUmrw4MGqWbNmavv27So5OVmtXLlSLVq06Jr1nzp1SllbW6spU6ao5ORktWfPHjVjxgyVm5urlFLqu+++U/7+/urXX39VR48eVb/++qvy8PBQc+bMUUopdfHiRRUREaGef/55tWfPHrV//371r3/9S4WFhamioiKllFJ9+vRRLi4u6qWXXlKJiYnq999/Vw4ODuqLL76o3P8MIcxMgloIC3RlUAcEBKj333/fpE3r1q3VoEGDlFLlQb1hwwbVsWNH1b59e5WVlWVs27FjR/XBBx+YzP/tt98qf39/43tAvfXWW8b3eXl5ClDLli1TSinVrVs31a9fv5uqf8eOHQpQx44du+b0+vXrq7lz55qMe/fdd1VUVJSxtrCwMKXX643Ti4qKlL29vVqxYoVSyhDUQUFBqqSkxNjmiSeeUE8++eRN1ShEdSHnqIWwcDk5OZw6dYro6GiT8dHR0ezevdtk3NNPP03t2rX5888/sbe3N47fvXs38fHxvP/++8ZxpaWlFBYWUlBQgIODAwBNmjQxTnd0dMTFxYXMzEwABg4cyGOPPcbOnTvp1KkTPXr0oF27dtesuWnTpnTs2JHGjRsTGxtLp06dePzxx3F3dyc/P58jR47Qv39/XnzxReM8JSUluLq6Gus9fPgwzs7OJsstLCzkyJEjxvcNGzbEysrK+N7f35+9e/feYGsKUf1IUAtRgzz88MN89913bN68mQcffNA4Pi8vjwkTJtCrV6+r5rGzszO+trGxMZmm0WjQ6/UAdOnShZSUFJYuXcrKlSvp2LEjgwcPZvLkyVct08rKipUrV7Jp0yb++OMPpk+fzptvvsnWrVuNXwq+/PJL2rZte9V8ZfW2bNmS77///qple3t731S9QtQUEtRCWDgXFxcCAgKIj4/nvvvuM46Pj4+nTZs2Jm0HDhxIo0aNePTRR1myZImxfYsWLUhKSiIkJOS2avH29qZPnz706dOHDh068Oqrr14zqMEQmtHR0URHRzNu3DiCgoJYsGABI0eOJCAggKNHjxIXF3fNeVu0aMGPP/6Ij48PLi4ut1WzENWdBLUQ1cCrr77K22+/Tf369WnWrBmzZ88mISHhmnucQ4cOpbS0lEceeYRly5bRvn17xo0bxyOPPEJgYCCPP/44Wq2W3bt3s2/fPt57772bqmHcuHG0bNmShg0bUlRUxOLFi4mIiLhm261bt7J69Wo6deqEj48PW7du5fTp08b2EyZMYNiwYbi6utK5c2eKior466+/OH/+PCNHjiQuLo6PP/6Y7t27884771C7dm1SUlKYP38+r732GrVr1771jSlENSNBLUQ1MGzYMLKzs3nllVfIzMwkMjKSRYsWERoaes32I0aMQK/X8/DDD7N8+XJiY2NZvHgx77zzDpMmTcLGxobw8HBeeOGFm67B1taWMWPGcOzYMezt7enQoQPz5s27ZlsXFxfWr1/P1KlTycnJISgoiE8++YQuXboA8MILL+Dg4MDHH3/Mq6++iqOjI40bN2bEiBEAODg4sH79ekaPHk2vXr3Izc2lVq1adOzYUfawxV1Ho5RS5i5CCCGEENcmNzwRQgghLJgEtRBCCGHBJKiFEEIICyZBLYQQQlgwCWohhBDCgklQCyGEEBZMgvo6ZsyYQd26dbGzs6Nt27Zs27bN3CVZhPXr19OtWzcCAgLQaDQsXLjQZLpSinHjxuHv74+9vT0xMTEcOnTIpM25c+eIi4vDxcUFNzc3+vfvT15enkmbPXv20KFDB+zs7KhTpw4fffTRVbX8/PPPhIeHY2dnR+PGjVm6dGmlf947aeLEibRu3RpnZ2d8fHzo0aOHyfOowXCv68GDB+Pp6YmTkxOPPfYYGRkZJm1SU1Pp2rUrDg4O+Pj48Oqrr5o8zhJg7dq1tGjRAp1OR0hICHPmzLmqnpr4OzBz5kyaNGmCi4sLLi4uREVFsWzZMuN02b6V68MPP0Sj0RivjwfZxrfEzA8FsUjz5s1Ttra26uuvv1Z///23evHFF5Wbm5vKyMgwd2lmt3TpUvXmm2+q+fPnK0AtWLDAZPqHH36oXF1d1cKFC9Xu3bvVo48+qoKDg9WFCxeMbTp37qyaNm2qtmzZojZs2KBCQkLU008/bZyenZ2tfH19VVxcnNq3b5/64YcflL29vfr888+NbeLj45WVlZX66KOP1P79+9Vbb72lbGxs1N69e6t8G1SV2NhYNXv2bLVv3z6VkJCgHn74YRUYGKjy8vKMbV566SVVp04dtXr1avXXX3+pe+65R7Vr1844vaSkRDVq1EjFxMSoXbt2qaVLlyovLy81ZswYY5ujR48qBwcHNXLkSLV//341ffp0ZWVlpZYvX25sU1N/BxYtWqSWLFmiDh48qJKSktQbb7yhbGxs1L59+5RSsn0r07Zt21TdunVVkyZN1PDhw43jZRtXnAT1NbRp00YNHjzY+L60tFQFBASoiRMnmrEqy3NlUOv1euXn56c+/vhj47isrCyl0+nUDz/8oJRSav/+/QpQ27dvN7ZZtmyZ0mg06uTJk0oppf773/8qd3d343OHlVJq9OjRKiwszPi+d+/eqmvXrib1tG3bVv373/+u1M9oTpmZmQpQ69atU0oZtqWNjY36+eefjW0SExMVoDZv3qyUMnyR0mq1Kj093dhm5syZysXFxbg9X3vtNdWwYUOTdT355JMqNjbW+P5u+h1wd3dXX331lWzfSpSbm6tCQ0PVypUr1X333WcMatnGt0YOfV/h4sWL7Nixg5iYGOM4rVZLTEwMmzdvNmNlli85OZn09HSTbefq6krbtm2N227z5s24ubnRqlUrY5uYmBi0Wi1bt241trn33nuxtbU1tomNjSUpKYnz588b21y+nrI2Nen/KDs7GwAPDw8AduzYQXFxscnnDg8PJzAw0GT7Nm7cGF9fX2Ob2NhYcnJy+Pvvv41tbrTt7pbfgdLSUubNm0d+fj5RUVGyfSvR4MGD6dq161XbQbbxrZF7fV/hzJkzlJaWmvyQAPj6+nLgwAEzVVU9pKenA1xz25VNS09Px8fHx2S6tbU1Hh4eJm2Cg4OvWkbZNHd3d9LT02+4nupOr9czYsQIoqOjadSoEWD47La2tri5uZm0vXL7Xmu7lE27UZucnBwuXLjA+fPna/TvwN69e4mKiqKwsBAnJycWLFhAZGQkCQkJsn0rwbx589i5cyfbt2+/apr8DN8aCWohLNDgwYPZt28fGzduNHcpNU5YWBgJCQlkZ2fzyy+/0KdPH9atW2fusmqE48ePM3z4cFauXGnynHNxe+TQ9xW8vLywsrK6qhdiRkYGfn5+ZqqqeijbPjfadn5+fmRmZppMLykp4dy5cyZtrrWMy9dxvTY14f9oyJAhLF68mDVr1pg8ztHPz4+LFy+SlZVl0v7K7Xur287FxQV7e/sa/ztga2tLSEgILVu2ZOLEiTRt2pRPP/1Utm8l2LFjB5mZmbRo0QJra2usra1Zt24d06ZNw9raGl9fX9nGt0CC+gq2tra0bNmS1atXG8fp9XpWr15NVFSUGSuzfMHBwfj5+Zlsu5ycHLZu3WrcdlFRUWRlZbFjxw5jmz///BO9Xk/btm2NbdavX09xcbGxzcqVKwkLC8Pd3d3Y5vL1lLWpzv9HSimGDBnCggUL+PPPP686/N+yZUtsbGxMPndSUhKpqakm23fv3r0mX4ZWrlyJi4sLkZGRxjY32nZ32++AXq+nqKhItm8l6NixI3v37iUhIcE4tGrViri4OONr2ca3wNy92SzRvHnzlE6nU3PmzFH79+9XAwYMUG5ubia9EO9Wubm5ateuXWrXrl0KUFOmTFG7du1SKSkpSinD5Vlubm7qt99+U3v27FHdu3e/5uVZzZs3V1u3blUbN25UoaGhJpdnZWVlKV9fX/Xss8+qffv2qXnz5ikHB4erLs+ytrZWkydPVomJiertt9+u9pdnDRw4ULm6uqq1a9eqtLQ041BQUGBs89JLL6nAwED1559/qr/++ktFRUWpqKgo4/SyS1s6deqkEhIS1PLly5W3t/c1L2159dVXVWJiopoxY8Y1L22pib8Dr7/+ulq3bp1KTk5We/bsUa+//rrSaDTqjz/+UErJ9q0Kl/f6Vkq28a2QoL6O6dOnq8DAQGVra6vatGmjtmzZYu6SLMKaNWsUcNXQp08fpZThEq2xY8cqX19fpdPpVMeOHVVSUpLJMs6ePauefvpp5eTkpFxcXFS/fv1Ubm6uSZvdu3er9u3bK51Op2rVqqU+/PDDq2r56aefVIMGDZStra1q2LChWrJkSZV97jvhWtsVULNnzza2uXDhgho0aJByd3dXDg4OqmfPniotLc1kOceOHVNdunRR9vb2ysvLS73yyiuquLjYpM2aNWtUs2bNlK2trapXr57JOsrUxN+B559/XgUFBSlbW1vl7e2tOnbsaAxppWT7VoUrg1q2ccVplFLKPPvyQgghhPgnco5aCCGEsGAS1EIIIYQFk6AWQgghLJgEtRBCCGHBJKiFEEIICyZBLYQQQlgwCeobKCoqYvz48RQVFZm7lBpJtm/Vku1b9WQbVy3ZvgZyHfUN5OTk4OrqSnZ2Ni4uLuYup8aR7Vu1ZPtWPdnGVUu2r4HsUQshhBAWTIJaCCGEsGA1/nnUJSUl7Nq1C19fX7Tain0vyc3NBeDkyZPk5ORURXl3Ndm+VUu2b9WTbVy1avL21ev1ZGRk0Lx5c6ytbxzFNf4c9fbt22nTpo25yxBCCCGusm3bNlq3bn3DNjV+j9rX1xcwbAx/f38zVyOEEEJAWloabdq0MWbUjdT4oC473O3v70/t2rXNXI0QQghR7mZOyZq1M9n69evp1q0bAQEBaDQaFi5caDJdKcW4cePw9/fH3t6emJgYDh06ZJ5ihRBCCDMwa1Dn5+fTtGlTZsyYcc3pH330EdOmTWPWrFls3boVR0dHYmNjKSwsvMOVCiGEEOZh1kPfXbp0oUuXLtecppRi6tSpvPXWW3Tv3h2Ab775Bl9fXxYuXMhTTz11J0sVQgghzMJiz1EnJyeTnp5OTEyMcZyrqytt27Zl8+bNEtRCiCpRWlpKcXGxucsQ1ZyNjQ1WVlaVsiyLDer09HSAq3rE+fr6GqddS1FRkcl9YcuuwxNCiBtRSpGenk5WVpa5SxE1hJubG35+fmg0mttajsUG9a2aOHEiEyZMqJqFl5bA6gkQfB+ExvxzeyFEtVEW0j4+Pjg4ONz2H1dx91JKUVBQQGZmJsBtXxpssUHt5+cHQEZGhsmHzMjIoFmzZtedb8yYMYwcOdL4/uTJk0RGRlZOUdu+gE3TYOf/YMBa8KhXOcsVQphVaWmpMaQ9PT3NXY6oAezt7QHIzMzEx8fntg6DW+y9voODg/Hz82P16tXGcTk5OWzdupWoqKjrzqfT6XBxcTEOzs7OlVbTL9pYjuoioDAb5sVBUV6lLVsIYT5l56QdHBzMXImoScp+nm63z4NZgzovL4+EhAQSEhIAQweyhIQEUlNT0Wg0jBgxgvfee49Fixaxd+9ennvuOQICAujRo8cdr/VU1gXe/P0gT2cPJt/GEzL3w6IhULPvwCrEXUUOd4vKVFk/T2YN6r/++ovmzZvTvHlzAEaOHEnz5s0ZN24cAK+99hpDhw5lwIABtG7dmry8PJYvX46dnd0drzXAzZ53ezQiAw/65g9Br7GGvxdA/Kd3vBYhhBB3D7MG9f33349S6qphzpw5gOHbyDvvvEN6ejqFhYWsWrWKBg0amK3e3q3q0LtVbbbrw5ik6WcYuXoCHF594xmFEKIaqVu3LlOnTr3p9mvXrkWj0VR5j/k5c+bg5uZWpeuwRBZ7jtpSvdO9EeF+znxecD+r7WNB6eGX5+FcsrlLE0LcZTQazQ2H8ePH39Jyt2/fzoABA266fbt27UhLS8PV1fWW1iduTIK6guxsrJj5TEucdDYMPP8vTjo2hMIs+PEZuJhv7vKEEHeRtLQ04zB16lRcXFxMxo0aNcrYVilFSUnJTS3X29u7Qh3rbG1tK+V6YXFtEtS3INjLkY8eb8JFbOh1diBFdl6QsQ9+k85lQog7x8/Pzzi4urqi0WiM7w8cOICzszPLli2jZcuW6HQ6Nm7cyJEjR+jevTu+vr44OTnRunVrVq1aZbLcKw99azQavvrqK3r27ImDgwOhoaEsWrTIOP3KQ99lh6hXrFhBREQETk5OdO7cmbS0NOM8JSUlDBs2DDc3Nzw9PRk9ejR9+vSpcGfhmTNnUr9+fWxtbQkLC+Pbb781TlNKMX78eAIDA9HpdAQEBDBs2DDj9P/+97+EhoZiZ2eHr68vjz/+eIXWfadIUN+ihxv707ddXTLwYEDhMJTWGv6eD5umm7s0IUQlUEpRcLHELIOqxC/8r7/+Oh9++CGJiYk0adKEvLw8Hn74YVavXs2uXbvo3Lkz3bp1IzU19YbLmTBhAr1792bPnj08/PDDxMXFce7cueu2LygoYPLkyXz77besX7+e1NRUkz38SZMm8f333zN79mzi4+PJycm56gmK/2TBggUMHz6cV155hX379vHvf/+bfv36sWbNGgB+/fVX/vOf//D5559z6NAhFi5cSOPGjQFDZ+Zhw4bxzjvvkJSUxPLly7n33nsrtP47xWJveFIdvPFwBAnHs1h3PIRZni8yMH8mrPsImsWBo9w0QYjq7EJxKZHjVphl3fvficXBtnL+PL/zzjs89NBDxvceHh40bdrU+P7dd99lwYIFLFq0iCFDhlx3OX379uXpp58G4IMPPmDatGls27aNzp07X7N9cXExs2bNon79+gAMGTKEd955xzh9+vTpjBkzhp49ewLw2WefsXTp0gp9tsmTJ9O3b18GDRoEGK4c2rJlC5MnT+aBBx4gNTUVPz8/YmJisLGxITAwkDZt2gCQmpqKo6MjjzzyCM7OzgQFBRmvQLI0skd9G2yttcyIa4Gbgw2TzrZnjW8feH65hLQQwmK0atXK5H1eXh6jRo0iIiICNzc3nJycSExM/Mc96iZNmhhfOzo64uLiYrxF5rU4ODgYQxoMt9Esa5+dnU1GRoYxNAGsrKxo2bJlhT5bYmIi0dHRJuOio6NJTEwE4IknnuDChQvUq1ePF198kQULFhjP0z/00EMEBQVRr149nn32Wb7//nsKCgoqtP47Rfaob1MtN3v+82Qz+s3eTr+UWD5Nd6e7n7mrEkLcLnsbK/a/E2u2dVcWR0dHk/ejRo1i5cqVTJ48mZCQEOzt7Xn88ce5ePHiDZdjY2Nj8l6j0aDX6yvUvjIP6d+MOnXqkJSUxKpVq1i5ciWDBg3i448/Zt26dTg7O7Nz507Wrl3LH3/8wbhx4xg/fjzbt2+3uEvAZI+6EjwQ5sOQB0IAGDN/L4czcyF1KywbLZ3LhKimNBoNDrbWZhmqsvd0fHw8ffv2pWfPnjRu3Bg/Pz+OHTtWZeu7FldXV3x9fdm+fbtxXGlpKTt37qzQciIiIoiPjzcZFx8fb/J8B3t7e7p168a0adNYu3YtmzdvZu/evQBYW1sTExPDRx99xJ49ezh27Bh//vnnbXyyqiF71JXk5YcasCPlPJuPnuX1b9bwc9G/0RQXgE8ktOxj7vKEEAKA0NBQ5s+fT7du3dBoNIwdO/aGe8ZVZejQoUycOJGQkBDCw8OZPn0658+fr9CXlFdffZXevXvTvHlzYmJi+P3335k/f76xF/ucOXMoLS2lbdu2ODg48N1332Fvb09QUBCLFy/m6NGj3Hvvvbi7u7N06VL0ej1hYWFV9ZFvmexRVxIrrYZPn26Gj7OOv85YscDjRVRkd2j0mLlLE0IIoylTpuDu7k67du3o1q0bsbGxtGjR4o7XMXr0aJ5++mmee+45oqKicHJyIjY2tkK3iO7RoweffvopkydPpmHDhnz++efMnj2b+++/HzA8D/rLL78kOjqaJk2asGrVKn7//Xc8PT1xc3Nj/vz5PPjgg0RERDBr1ix++OEHGjZsWEWf+NZp1J0+aXCHnThxgjp16nD8+HFq165d5evbevQs//pqK6V6PRN7NubptkFVvk4hxO0pLCwkOTmZ4OBgszxLQIBeryciIoLevXvz7rvvmrucSnGjn6uKZJPsUVeytvU8GdUpDNDw9u/72Xcy23Ceeuc3cNEyexQKIcSdlpKSwpdffsnBgwfZu3cvAwcOJDk5mX/961/mLs3iSFBXgX/fW4+O4T5cLNEz6PudFC0aCYuGGoaafQBDCCFuilarZc6cObRu3Zro6Gj27t3LqlWriIiIMHdpFkc6k1UBrVbDJ72b0nXaRlLPFTAtvTGjtNZo9v0CAc2h3fVvKiCEEHeDOnXqXNVjW1yb7FFXETcHW2Y+0wJbKy0zkn3ZHDLSMGHlWDi61qy1CSGEqD4kqKtQk9pujH3EcBjnuX3NOFP/McNjMX/uB+dTzFydEEKI6kCCuoo9c08Q3ZoGUKKHx1Ifp8S3KVw4Bz/GSecyIYQQ/0iCuoppNBom9mpMPW9HUnIVr2hfRTl4Qfpe+H24dC4TQghxQxLUd4CTzpqZcS2xs9HyW7KWX+u9Bxor2PsTbJlp7vKEEEJYMAnqOyTMz5kPehqeg/rqDheOtHjDMOGPtyB5vRkrE0IIYckkqO+gXi1q83SbOigFT+xqQkHE46BK4ee+kHXjR8wJIURVuf/++xkxYoTxfd26dZk6deoN59FoNCxcuPC2111Zy7mR8ePH06xZsypdR1WSoL7D3u7WkEh/F84VFNP/7DMov6ZQcNbQE1zOVwshKqBbt2507tz5mtM2bNiARqNhz549FV7u9u3bGTBgwO2WZ+J6YZmWlkaXLl0qdV01jQT1HWZnY8XMZ1rgrLNmc2oBn/m8bXjCVqd3oQofbSeEqHn69+/PypUrOXHixFXTZs+eTatWrWjSpEmFl+vt7Y2Dg0NllPiP/Pz80Ol0d2Rd1ZUEtRkEeTry8ROGX55PthWy4t5fIaidmasSQlQ3jzzyCN7e3syZM8dkfF5eHj///DP9+/fn7NmzPP3009SqVQsHBwcaN27MDz/8cMPlXnno+9ChQ9x7773Y2dkRGRnJypUrr5pn9OjRNGjQAAcHB+rVq8fYsWMpLi4GDI+bnDBhArt370aj0aDRaIw1X3noe+/evTz44IPY29vj6enJgAEDyMvLM07v27cvPXr0YPLkyfj7++Pp6cngwYON67oZer2ed955h9q1a6PT6WjWrBnLly83Tr948SJDhgzB398fOzs7goKCmDhxIgBKKcaPH09gYCA6nY6AgACGDRt20+u+FXILUTPp3MifF9oH89XGZEb9spcIfzcCPR3g1C5I+AE6TwStlbnLFEJczK/4PFY6sLr057W0BEqLQKMFG/t/Xq6t402vxtramueee445c+bw5ptvGp/l/PPPP1NaWsrTTz9NXl4eLVu2ZPTo0bi4uLBkyRKeffZZ6tevT5s2bf5xHXq9nl69euHr68vWrVvJzs42OZ9dxtnZmTlz5hAQEMDevXt58cUXcXZ25rXXXuPJJ59k3759LF++3PisaFdX16uWkZ+fT2xsLFFRUWzfvp3MzExeeOEFhgwZYvJlZM2aNfj7+7NmzRoOHz7Mk08+SbNmzXjxxRdvart9+umnfPLJJ3z++ec0b96cr7/+mkcffZS///6b0NBQpk2bxqJFi/jpp58IDAzk+PHjHD9+HIBff/2V//znP8ybN4+GDRuSnp7O7t27b2q9t8qig7q0tJTx48fz3XffkZ6eTkBAAH379uWtt96q0MPFLdXoLuHsOp7FjpTzDPx+B78+3xi77x4znLN28Yf2L5u7RCHEBwEVn+eJOdCwp+H1gd8NHUaD2kO/JeVtpjY2/K5faXx2hVb1/PPP8/HHH7Nu3Trjc5hnz57NY489hqurK66urowaNcrYfujQoaxYsYKffvrppoJ61apVHDhwgBUrVhAQYNgWH3zwwVXnld966y3j67p16zJq1CjmzZvHa6+9hr29PU5OTlhbW+Pn53fddc2dO5fCwkK++eYbHB0NX1g+++wzunXrxqRJk/D19QXA3d2dzz77DCsrK8LDw+natSurV6++6aCePHkyo0eP5qmnngJg0qRJrFmzhqlTpzJjxgxSU1MJDQ2lffv2aDQagoLKH1ecmpqKn58fMTEx2NjYEBgYeFPb8XZY9KHvSZMmMXPmTD777DMSExOZNGkSH330EdOnTzd3aZXCxkrLZ/9qjoejLX+fymHc8hRUl4+hbgdo/YK5yxNCVAPh4eG0a9eOr7/+GoDDhw+zYcMG+vfvDxh2eN59910aN26Mh4cHTk5OrFixgtTUm7vSJDExkTp16hhDGiAqKuqqdj/++CPR0dH4+fnh5OTEW2+9ddPruHxdTZs2NYY0QHR0NHq9nqSkJOO4hg0bYmVVfsTR39+fzMzMm1pHTk4Op06dIjo62mR8dHQ0iYmJgOHwekJCAmFhYQwbNow//vjD2O6JJ57gwoUL1KtXjxdffJEFCxZQUlJSoc9ZURa9R71p0ya6d+9O165dAcO3tB9++IFt27aZubLK4+9qz9Qnm9F39jZ++usEgR5NGPLcItBe9h1KKeloJoS5vHGq4vNYXdY5KrybYRmaK/aLRuy9vbou079/f4YOHcqMGTOYPXs29evX57777gPg448/5tNPP2Xq1Kk0btwYR0dHRowYwcWLFytt/Zs3byYuLo4JEyYQGxuLq6sr8+bN45NPPqm0dVzOxsbG5L1Go0Gv11fa8lu0aEFycjLLli1j1apV9O7dm5iYGH755Rfq1KlDUlISq1atYuXKlQwaNMh4ROPKuiqLRe9Rt2vXjtWrV3Pw4EEAdu/ezcaNG2/Ylb+oqIicnBzjkJube6fKvWX3NvBm/KMNAZj8x0HmJ1z2h2HDJ7D0Vbl0SwhzsXWs+GB12T6QlbVh3OXnp2+03FvQu3dvtFotc+fO5ZtvvuH55583nh6Mj4+ne/fuPPPMMzRt2pR69eoZ/6bejIiICI4fP05aWppx3JYtW0zabNq0iaCgIN58801atWpFaGgoKSmmDx6ytbWltLT0H9e1e/du8vPLz9/Hx8ej1WoJCwu76ZpvxMXFhYCAgKsesRkfH09kZKRJuyeffJIvv/ySH3/8kV9//ZVz584BYG9vT7du3Zg2bRpr165l8+bN7N1beV+8rmTRe9Svv/46OTk5hIeHY2VlRWlpKe+//z5xcXHXnWfixIlMmDDhDlZZOZ6LqsvJ8xf4fP1RXvtlD74udkQ7Z8DqdwFl6FjW+UPZsxZCXMXJyYknn3ySMWPGkJOTQ9++fY3TQkND+eWXX9i0aRPu7u5MmTKFjIwMk1C6kZiYGBo0aECfPn34+OOPycnJ4c033zRpExoaSmpqKvPmzaN169YsWbKEBQsWmLSpW7cuycnJJCQkULt2bZydna+6LCsuLo63336bPn36MH78eE6fPs3QoUN59tlnjeenK8Orr77K22+/Tf369WnWrBmzZ88mISGB77//HoApU6bg7+9P8+bN0Wq1/Pzzz/j5+eHm5sacOXMoLS2lbdu2ODg48N1332Fvb29yHruyWfQe9U8//cT333/P3Llz2blzJ//73/+YPHky//vf/647z5gxY8jOzjYO+/fvv4MV357RncN5pIk/JXrFS9/u4ICqA49eOh+/dRaseFP2rIUQ19S/f3/Onz9PbGysyfnkt956ixYtWhAbG8v999+Pn58fPXr0uOnlarVaFixYwIULF2jTpg0vvPAC77//vkmbRx99lJdffpkhQ4bQrFkzNm3axNixY03aPPbYY3Tu3JkHHngAb2/va14i5uDgwIoVKzh37hytW7fm8ccfp2PHjnz22WcV2xj/YNiwYYwcOZJXXnmFxo0bs3z5chYtWkRoaChg6MH+0Ucf0apVK1q3bs2xY8dYunQpWq0WNzc3vvzyS6Kjo2nSpAmrVq3i999/x9PTs1JrvJxGKcv9y1+nTh1ef/11Bg8ebBz33nvv8d1333HgwIGbWsaJEyeoU6cOx48fp3bt2lVVaqUpLC7lua+3sS35HP6udswf1A7/wz8anrQF0G4oPCQ3RxGiMhUWFpKcnExwcDB2dnbmLkfUEDf6uapINln0HnVBQQFarWmJVlZWldppwNLY2VjxxbMtqe/tSFp2If1mbye3YRx0nWJosGk6rJ4ge9ZCCHGXsOig7tatG++//z5Llizh2LFjLFiwgClTptCzZ09zl1al3BxsmdOvDd7OOg6k5zLwu51cbN4PHp5saLDxP/DnexLWQghxF7DooJ4+fTqPP/44gwYNIiIiglGjRvHvf/+bd99919ylVbk6Hg7M7tsaB1srNh4+w+vz96BavwCdJxkabJgMayeat0ghhBBVzqJ7fTs7OzN16tR/fNxaTdWolisz4lrwwv/+Yv7Ok9R2s2dkp5cMj8Zc8QasmwQaK7h/tLlLFUIIUUUseo9awANhPrzfoxEA0/48zLxtqRA12NChDGDtB7B+shkrFEIIUZUkqKuBp9oEMvTBEADeXLiPNUmZED0MYsYbGvz5LqRV/JmzQghTNbmjqrjzKuvnyaIPfYtyIx9qwMmsC8zfeZLB3+/kp39H0aj9y6D04OAF/hV/5qwQwsDW1hatVsupU6fw9vbG1ta2Rjz4R5iHUoqLFy9y+vRptFottra2t7U8CepqQqPR8GGvJmTmFLHx8Bn6zdnO/IHtqNPhFdOGJUVgLQ9hF6IitFotwcHBpKWlcerULdzbW4hrcHBwIDAw8KrLjCtKgroasbXW8t9nWtB71mYOpOfSb852fn2pHa4Ol24En38GvukOzZ+Fe14yb7FCVDO2trYEBgZSUlLyj/ekFuKfWFlZYW1tXSlHZiSoqxkXOxtm92tNzxmbOJyZx4vf/sW3/dugs7aCfb9Cxj7DddbNnga7qx/MLoS4Po1Gg42NTZU9BUmIWyGdyaohf1d75jzfGmedNduSz/HKT7vR6xW0GQAd34a+SySkhRCihpCgrqbC/VyY9WxLbKw0LN6TxqTlBwz3/+4wErxCyhvmZpivSCGEELdNgroaiw7xYtJjht7en68/yjebj5k2OLQKPm0Ku76788UJIYSoFBLU1VyvFrV55aEGAIxf9Dcr91+2B310DZRcgN+GwDc9YMf/oOCceQoVQghxSySoa4AhD4bwVOs66BUM/WEnu1LPGyZ0eg/uGQQoQ2j/Pgwmh8J3j0PCXCjMNmvdQggh/pkEdQ2g0Wh4r0cj7g/zprBYzwv/+4uUs/mGc9adJ8LQnfDgWPBtDPoSOLwSFg6Ej0Ng7lOw5ycoyjX3xxBCCHENGqVq9rMSK/Jw7uouv6iEJ7/YzL6TOQR7OfLrwHZ4OF5xR5zTB+HvBfD3fDh9oHy8lQ5CH4JH/gNOPne2cCGEuMtUJJtkj7oGcdRZ83Xf1tRysyf5TD4v/G87hcVX3LjBu4HhaVuDt8LAzXDva+AZAqVFkBIP9u7lbTMTofjCnf0QQgghTEhQ1zA+znb87/nWuNrbsDM1i+HzdlGqv85BE99IePBNGPIX/HsDdJsGVpdu9KAUfN/bcHj8xI479wGEEEKYkKCugUJ8nPnyuVbYWmlZ8XcG7y7ezw3PcGg0hod6RD5aPi43DVCGwPaJKB9/YCkcWgmlxVVWvxBCiHJyC9Eaqk2wB5/0bsrQH3YxZ9MxUs7mM+bhCBr4Ot/cAlwCYPgeOJ8Mtg6GcUrBqvFwJslwiDyiG9S5B7TWoLUyDJor/9WCX+Py894F5+BcMti5gFdo+frOJQPq0nzWhj17By+4zZvZCyFEdSdBXYN1axrA2bwi3luSyJqk06w7eJonWtZhZKcG+LrY/fMCtFrwrF/+vvQiBN8LF85B/mnY+Y1h+CdPzIGGPQ2vj66FX/pB3Q7Qd3F5my8fNCz3cjoX8G96aWgGAc3Ao76EtxDiriJBXcP1jQ7m3gbefLQ8ieV/p/PjX8dZtPsUL3YIZsB99XHSVeBHwFoHXSdDl0lwbCPsXwjnU0CVgr7U8Gxsfell70tBrwc7t8uWYQeugVf3LLd1MnwR0JcY5tWXQFEOHNtgGC5v59fEENrNnzWcZxdCiBpMLs+6i/x17BwfLE1kZ2oWAF5OtgyPacBTretgY2Vhe6mlxXA6CdIS4FSC4d/0fYY7rZV5Zj6EdDS8Tt4ABxZDSIzhMjMhhLhZSkFJIVzMh4t5l/4te11Q/trBExr2qJRVViSbZI/6LtKqrge/DmzH8n3pTFp+gGNnCxi7cB+z45N5vXM4D0X6VsqzUyuFlQ34NTIMzZ8xjCstgTMHy8M7oHl5+8OrYOsswy9bWVAXF8LKcYZD5wHNwCsMrORHXogaQynDzZoKzhr6vxScNQyFWeBSq7yDrFKGU25FedDrC3DwMIxf/Q5s+9IQwkr/z+sLjKq0oK4I+at1l9FoNHRp7E9MpC9zt6by6epDHD2dz4Bvd9CmrgdjHg6neaD7Py/IHKysDYe6fSOh2b9Mp9V/oPwcepnM/bDt8/L31nbg28jQi93ezXAOXOd8xXDpvLiVPI9YiDtOrzdccVJw1vC7WtYf5e8FhtNtZUF8eSiXXrz2skIeKg9qjQYO/gHF+YZbJ5cFtb7UcIrtcjYOYOt46V8nw+uy4fIrYO4gOfR9l8spLObzdUf4akMyRSWGb5Rdm/jzWmwYQZ6OZq7uNp09An99fenQ+W64eJO3SX09tfx53r+PgL2/wANvQNQgw7jzxwx76mXBrnM2/EKX/WvrADb2YONo+Nf20r9Ovoae8ELUBHq94QiWjb0hCAHOHYXcdCguMNwsqfiC4bBx8QXTccWXDidfOAcBLQz3cwAouQjveRtev5ZcHqiLR8Jf/3f9WmwcDIelHTwM/9q5Gb5wtx9R3mbnN4arUCK6lf9+52YY9qbLgtjG4Y79jsqhb3HTXOxseDU2nGfuCeKTPw7y684TLNmTxh9/p/PMPUEMfTD06tuQVhee9SH2fcNrvd7wRyQtwfBvUY7hkNlVQ44hbMsUZhsCXnPZOfzcdNj/W8XrGbEP3OoYXv/5Huz8Fu4ZWP7HJDcdFr98KeQdygPextEQ/raO5V8IjF8OnMClNlhX0/+jmqakyHRv7/I9wAvnLrv/gILwroY+FQBZqbBukiFgyn5mAdZOMvy8lt3T4B//BcK6lB9xyj8Li4YYLnl88lvT5Z786+p5r7Xc0mJDsIbGlgdqYTZ8GGh4/VamoaMpwNoPYc+PFdtml+8rWtuCvYfhiFZRbnlQh3a6FMSepoFcNpRdQnojLZ67epyzL+BbsXrNwOKD+uTJk4wePZply5ZRUFBASEgIs2fPplWrVuYurUbxd7Vn8hNN6d8+mInLDrD+4Glmxx/jlx0nGHR/CP2i62JnU433BrVa8AoxDBXR9RN48C3TW6u6BcLDk68O+8Icw6G14guGDijFBeV7ERfzDUFbpuAs5KUb9kiM485B0tKKf7aX4g3n8gE2z4AtMw1/qB94wzCuKA+WvWYa7pcfAbC2M/SyV6WX9bovNXTUK/tDmbYbUrcYbjdb1oGv5CJs+OSy+S6bt+y9lY3hPvLWlw0Rj5Zf9pd13PDlydkfal/2O336oGHPxtqufD4rnWF5d6ofRWkJXDhvWLedi2Hc+WOw92ewdYZ7Xipv+3+dIGP/zR+1AXCtXR7UBWcNz413DjAN6kN/GAK1IlzrlL8uuWD4mbK64ovcqZ2GZVeE52X3PbC5LBiLC8qD2iXA8DNy+RElm7IjTJe9LvsSau8BHsGm63nt6NX/x2GdDcNdyqKD+vz580RHR/PAAw+wbNkyvL29OXToEO7uFnoOtQaI8Hfhm+fbsOHQaT5YeoDEtBwmLT/At5uPMSo2jB7NaqHVWkiHszvBwaM8rMq4BECbFyu2nCvPMN03Glr2BUfv8nHOftDt02uHfHGBIXAv5hm+FJT9W5RnCN4yeRmQfdwwvkxhFiR8X7F6AV5YXf7Zj66DlWOhyVPlQa1KYd2HFV+ud3h5UB/bCAtfgvoPwrMLytt8+eB1Qk9THtoazaVBaxjfZRI0fry83oUDDTfb+ddle3izu0LuqfJ5NFrTZWi0hm1d1iEJDF/Kyv6/s08YjoZ4hpgG9cWC8no1Vlfs8V16be9+KTAv1R3Yrnx+5wDDE+50V9yQqO2/Ibf7peDSXP0vXDEOw2cuY+dm+Jm6/IgQQNuXDIeAb7isS/9a2RgC1qVW+fxWNjDq8KXTPJeFdsx4w3A7LKVDqwWx6KCeNGkSderUYfbs2cZxwcHBN5hDVJYOod4sHurFwl0nmfxHEqeyCxn5026+2pDMGw9H0D7Uy9wlVi9X/vFx9jMMl3PwMIT37bhnEER0N/1yYesIHcddFvR5hlApC/rSItDaXLq7nHX5neYuPwLg1cBw05paLcvHaW2gVf/L5tFe9traEFj6EsNRg5Iiw3pKikz3+Bw8oE5b8L6ik07Zl4/Sois6C126jObyIxFlSorKXxdfgJyThn4Bl8tKMXyRqYiLl33pcQs0XIXgGmja5rGvLt1NzwN0rhW/KY+zL9w76urxTXpXbDlX0jld+2eq/gO3t1wAJ+9/biMqhUV3JouMjCQ2NpYTJ06wbt06atWqxaBBg3jxxZvfm5HOZLevsLiUr+OTmbnmCLlFJQDc18Cb17uEE+HvYubqRI2n15eHvDHwL2I4j6o3HK1QenDxLz9FUZhtOLdr4wDeYeXLOrnzUqCr8vmufG1tV743bOcml/SJKlGRbLLooLazM9zmcuTIkTzxxBNs376d4cOHM2vWLPr06XPNeYqKiigqKv9mffLkSSIjIyWoK8G5/ItMW32I77akUKJXaDTQMdyXZnVcifB3IcLfBX9XO8u5FlsIISxUjQlqW1tbWrVqxaZNm4zjhg0bxvbt29m8efM15xk/fjwTJky4arwEdeU5diafj1cksWRv2lXT3BxsiPBzuRTczkT4uxDq64TOuhp3RBNCiEpWYy7P8vf3JzLS9F7OERER/Prrr9edZ8yYMYwcOdL4vmyPWlSeul6OzIhrwcCT2Ww6cobEtFwS03I4nJlHVkExm4+eZfPRs8b21loNIT5OJuEd4e+Cl5POjJ9CCCGqB4sO6ujoaJKSkkzGHTx4kKCgoOvOo9Pp0OnKAyAnJ+e6bcXtaVTLlUa1XI3vi0pKOZSRR2JajjG896flkH2hmAPpuRxIz2XBrvL5vZ11RPqX731H+rsQ7OWItaXdd1wIIczoloL6+PHjaDQa4+76tm3bmDt3LpGRkQwYMKDSinv55Zdp164dH3zwAb1792bbtm188cUXfPHFF5W2DlF5dNZWV4W3Uoq07MJL4W0I8P1pORw7m8/p3CLW5Roev1m+DC1hfobQ7tTQl/sa+GB1N10OJoQQV7ilc9QdOnRgwIABPPvss6SnpxMWFkbDhg05dOgQQ4cOZdy4cZVW4OLFixkzZgyHDh0iODiYkSNHSq/vGiC/qISkjEt73acMIX4gPZeCi6Um7Wq52fOvtoE82bqOHCoXQtQYVd6ZzN3dnS1bthAWFsa0adP48ccfiY+P548//uCll17i6NGjt1x8ZZOgrj70ekXquQIS03LYfuw883edIKvAcMtFGysNXRr582xUEK2C3KVnuRCiWqvyzmTFxcXG88CrVq3i0UcNTygJDw8nLe3qnsBC3AytVkNdL0fqejnSpbE/r3UOY/GeNL7bkkLC8SwW7T7Fot2nCPN15pmoIHo2r4WTzqK7WQghxG27pV47DRs2ZNasWWzYsIGVK1fSubPhHqynTp3C09OzUgsUdy87Gyseb1mbhYOj+X1Ie55sVQc7Gy1JGbmMXbiPtu+v4q2FezmQLh0GhRA11y0d+l67di09e/YkJyeHPn368PXXXwPwxhtvcODAAebPn1/phd4qOfRds2RfKObXHSf4bmsKR0/nG8e3ruvOM/cE0bmRn1yzLYSweHfkhielpaXk5OSYPCDj2LFjODg44OPjcyuLrBIS1DWTUorNR87y3dYUVvydQane8GPs6WjLk63r8HSbQOp43MSj74QQwgyq/Bz1hQsXUEoZQzolJYUFCxYQERFBbGzsrSxSiArRaDS0C/GiXYgXGTmFzNt2nLnbUsjIKeK/a48wc90RHgzz4Zl7gri3gbdc4iWEqLZuaY+6U6dO9OrVi5deeomsrCzCw8OxsbHhzJkzTJkyhYEDB1ZFrbdE9qjvHiWlelYlZvLdlhQ2Hj5jHF/b3Z64tkH0blUbT7nESwhhASqSTbfUmWznzp106NABgF9++QVfX19SUlL45ptvmDZt2q0sUojbZm2lpXMjP757oS1/vnIf/dsH42JnzYnzF5i0/ABRE/9kxLxd7Eg5b+5ShRDipt1SUBcUFODsbHjA+R9//EGvXr3QarXcc889pKSkVGqBQtyKet5OjH0kkq1vxPDR401oUtuVi6V6Fiac4rGZmxg8dycZOdd4prEQQliYWwrqkJAQFi5cyPHjx1mxYgWdOnUCIDMzExcXeT6xsBz2tlb0blWHRUPas2hINI+3rI1WA0v2pNHxk3XMjk82dkQTQghLdEtBPW7cOEaNGkXdunVp06YNUVFRgGHvunnz5pVaoBCVpUltNyY/0ZRFQ9rTrI4beUUlTPh9P91nbGT38SxzlyeEENd0y5dnpaenk5aWRtOmTdFqDXm/bds2XFxcCA8Pr9Qib4d0JhPXotcrftieyqRlB8gpLEGjgWfaBjEqNgxXextzlyeEqOHuyHXUl68MsNgQlKAWN3I6t4iJSxOZv+skAF5OOsY+EsGjTQPkfuJCiCpT5b2+9Xo977zzDq6urgQFBREUFISbmxvvvvsuer3+looWwhy8nXVMebIZc19oSz1vR87kFTF8XgLP/N9Wjp7OM3d5Qghxa0H95ptv8tlnn/Hhhx+ya9cudu3axQcffMD06dMZO3ZsZdcoRJVrF+LFsuEdGNWpATprLfGHz9J56gamrDxIYXHpPy9ACCGqyC0d+g4ICGDWrFnGp2aV+e233xg0aBAnT56stAJvlxz6FhWVcjafcb/9zbqDpwEI8nTgne6NuK+Bt5krE0LUFFV+6PvcuXPX7DAWHh7OuXPnbmWRQliMIE9H5vRrzX/jWuDroiPlbAF9vt4m114LIcziloK6adOmfPbZZ1eN/+yzz2jSpMltFyWEuWk0Gh5u7M+qkffxfHSwXHsthDCbWzr0vW7dOrp27UpgYKDxGurNmzdz/Phxli5dary9qCWQQ9+iMuw7mc2bC/cZr7duVMuF93s0pmkdN7PWJYSonqr80Pd9993HwYMH6dmzJ1lZWWRlZdGrVy/+/vtvvv3221sqWghL1qiWK/MHtuO9Ho1wtrNm38kcevw3nrEL95F9odjc5QkharDbvo76crt376ZFixaUllpOL1nZoxaV7XRuER8sTWSBXHsthLhFVb5HLcTdzNtZx3/Krr32Mr32es+JLPRy/loIUYmszV2AENVVuxAvlo3owBfrjjJ9zWHiD5/l0c/i8XKypX2IF/c28KZ9iBc+LnbmLlUIUY1JUAtxG3TWVgztGMqjzQL4aHkSfx7I5EzeRRYmnGJhwikAwv2cubeBNx1CvWhd1wM7GyszVy2EqE4qFNS9evW64fSsrKzbqUWIaivI05EZcS0oKillZ0oWGw6dZsOhM+w7lc2B9FwOpOfyxfqj6Ky1tAn24N5Qbzo08CLM11nOawshbqhCQe3q6vqP05977rnbKkiI6kxnbUVUfU+i6nvyWmc4m1dE/JGzbDhoCO70nEI2HDrDhkNnYKnhfHeHUC/uDfUmOsQLb2eduT+CEMLCVGqv76r24YcfMmbMGIYPH87UqVNvah7p9S0shVKKw5l5rD90hg2HTrPl6FkKi00fYhPp70KHBobgblXXHZ21HCYXoiaqSDZVm3PU27dv5/PPP5c7n4lqS6PREOrrTKivM/3bB1NYXMrOlPPG4P77VA770wzD5+uOYmej5Z56nnQINZzfDvVxksPkQtyFqkVQ5+XlERcXx5dffsl7771n7nKEqBR2Nla0C/GiXYgXr3cJ50xeEfGHz7Du0mHy07lFrE06zdokw8NBfJx1RId4XRo88Xe1N/MnEELcCdUiqAcPHkzXrl2JiYn5x6AuKiqiqKjI+D43N7eqyxOiUng56ejerBbdm9VCKUVSRi4bDp5h/aHTbEs+R2ZuEQt2nTTeaKWetyPtLwX3PfU8cbW3MfMnEEJUBYsP6nnz5rFz5062b99+U+0nTpzIhAkTqrgqIaqWRqMh3M+FcD8XXry3nvEw+cbDZ4g/cpa9J7I4ejqfo6fz+WZzCloNNK7tRnR9T9qHeNEiyF0uAxOihrDozmTHjx+nVatWrFy50nhu+v7776dZs2bX7Ux25R71yZMniYyMlM5kokbJLihm89GzbDpyho2Hz3D0dL7JdJ21ltZ1PYgO8aJ9iBeRAS5YaeX8thCWoiKdySw6qBcuXEjPnj2xsirfMygtLUWj0aDVaikqKjKZdi3S61vcDdKyLxB/+Czxh88Qf/gMmblFJtNd7W1oV9+TdpeCu66ng3RME8KMakxQ5+bmkpKSYjKuX79+hIeHM3r0aBo1avSPy5CgFnebssvANh4+Q/zhs2w5epa8ohKTNrXc7GlX35P2oYbg9nSS67eFuJNqzOVZzs7OV4Wxo6Mjnp6eNxXSQtyNLr8MrF90MCWlenafyGbTYcNh8p2p5zmZdYGfd5zg5x0n0FlrGdMlnOei6qKVw+NCWByLDmohxO2zttLSMsidlkHuDO0YSsHFErYfO2+4FCzpNEkZuYz/fT+rEjP5+IkmctmXEBbGog99VwY59C3E9Sml+HZLCh8sTaSwWI+LnTXv9mhE92a1zF2aEDWaPI9aCHFTNBoNz0XVZcmwDjSt7UpOYQnD5yUwZO5Osgoumrs8IQQS1EIIoL63E78MbMeImFCstBoW70kjdup61h08be7ShLjrSVALIQCwsdIyIqYB8we2o563Ixk5RfT5ehvjftvHhYul5i5PiLuWBLUQwkTTOm4sGdqBPlFBAHyzOYWu0zaQcDzLvIUJcZeSoBZCXMXe1ooJ3Rvxbf82+LnYcfRMPo/N3MR/Vh6kuFT/zwsQQlQaCWohxHV1CPVmxYh7ebRpAKV6xaerD/HYzE0czswzd2lC3DUkqIUQN+TqYMO0p5sz7enmuNhZs+dENl2nbWBOfDJ6fY2+ulMIiyBBLYS4KY82DeCPl++jQ6gXRSV6xv++n+e+3kZa9gVzlyZEjSZBLYS4aX6udnzzfBve6d4QOxstGw+fIfY/6/kt4aS5SxOixpKgFkJUiNwkRYg7S4JaCHFL5CYpQtwZEtRCiFsmN0kRoupJUAshbtv1bpKyan8GF0vkumshboc85lIIUSnKbpISE+nLqz/v4eiZfF745i9c7Kx5KNKPrk38aB/ija217B8IURES1EKISlV2k5RPVx9i8Z5TZOYW8evOE/y68wTOdtY8FOnLw4386dDAC521lbnLFcLiyfOohRBVplSv2JFynqV701i6N43M3CLjNGedNTGRvjzc2J8OoV7Y2Uhoi7tHRbJJgloIcUfo9YodqedZsieNZfvSyMgpD20nnTUxET483Nifext4S2iLGk+C+jIS1EJYHr1esTP1PEv2prFsbzrpOYXGaU46azpeCu37JLRFDSVBfRkJaiEsm16v2HX8PEv2pLNsXxpp2eWh7WhrRccIw+Hx+8MktEXNIUF9GQlqIaoPQ2hnsXRvGsv2pnHqitB+MMKXro39uD/MR0JbVGsS1JeRoBaietLrFQknsli6J41l+9I5mVX+8A97Gyta1XUnqr4nUfU8aVzLFWsruexLVB8VySa5PEsIYZG0Wg0tAt1pEejOm10j2H0im6V701iyJ42TWRfYcOgMGw6dAQzntVsbg9uLyAAXrLQaM38CISqHBLUQwuJpNBqa1XGjWR03xnQJJykjl81HzrL5yFm2Jp8j+0Ixa5JOsybJcJ9xZztr2gZ7cE89T6LqexLh54JWgltUUxLUQohqRaPREO7nQrifC/2igynVKxLTcthy1BDc25LPkVtYwqrETFYlZgLg5mBD22APoup5ElXfiwa+Tmg0EtyierDooJ44cSLz58/nwIED2Nvb065dOyZNmkRYWJi5SxNCWAgrrYZGtVxpVMuVFzrUo6RUz9+ncth8Kbi3HztHVkExK/7OYMXfGQB4OtpyTz1P7rl0jru+t6MEt7BYFt2ZrHPnzjz11FO0bt2akpIS3njjDfbt28f+/ftxdHS8qWVIZzIh7m7FpXr2nsxm85GzbDlqCO7CYtMHhXg764iq58k99TxpXdedYC9H6ZwmqlSN7fV9+vRpfHx8WLduHffee+9NzSNBLYS43MUSPbtPZBnPce9IPX/VE75srbTU93EizNeJMD8Xwv2caeDnTICrnex5i0pRY3t9Z2dnA+Dh4WHmSoQQ1ZWttZbWdT1oXdeDYR1DKSwuZVdqFpuPnmXLkbPsO5VNwcVSEtNySEzLAU4Z53W2sybM15kwv0uDrzPhfi64OtiY7wOJGq/a7FHr9XoeffRRsrKy2Lhx43XbFRUVUVRUfg/hkydPEhkZKXvUQoibotcrTmZd4EB6LgczcjmQnktSeg5HT+dTor/2n0tfF51xz7ssyEN8nOSmLOK6auQe9eDBg9m3b98NQxoMHdAmTJhwh6oSQtQ0Wq2GOh4O1PFw4KFIX+P4iyV6jp7JIyk91zgcSM/lZNYFMnKKyMg5zfqDp8uXo4G6Xo7G4A73cyY6xAtnO9n7FhVTLfaohwwZwm+//cb69esJDg6+YVvZoxZC3Em5hcUczCgL8BySMgwhfr6g+Kq2TjprnmhVm37tggn0dDBDtcJS1Jg9aqUUQ4cOZcGCBaxdu/YfQxpAp9Oh0+mM73NycqqyRCHEXc7ZzoaWQe60DHI3jlNKcTq3yBjaB9Jz2ZFynuQz+cyOP8b/Nh3joUhf+revR+u67tJBTdyQRQf14MGDmTt3Lr/99hvOzs6kp6cD4Orqir29vZmrE0KIa9NoNPi42OHjYkeHUG/AEN7rDp7m6/hjrD942nhdd+NarvRvH8zDjf2xtZZLwsTVLPrQ9/W+Zc6ePZu+ffve1DLk8iwhhKU5mJHL7Phk5u88SdGlS8N8XXQ8F1WXf7UJxN3R1swViqpWY6+jvhUS1EIIS3U2r4i5W1P5ZksKp3MNfWvsbLT0alGb56ODCfFxMnOFoqpIUF9GgloIYemKSkpZvDuN/9uYzP608n4194d50799MO1DvOQ8dg1TYzqTCSHE3UBnbcVjLWvTq0Uttiaf4/82JrMqMYO1SadZm3SaMF9nnm9fl+7Nasm12Xch2aMWQggLdOxMPnM2HeOnv45TcLEUMDxMJO6eIJ69JwhvZ90/LEFYMjn0fRkJaiFEdZZ9oZgft6fyv00pnMy6ABjuRd6taQD92wcTGeBi5grFrZCgvowEtRCiJigp1bP873T+b2Myu1KzjOOj6nnSN7ouHUK9cLCVs5nVhZyjFkKIGsbaSssjTQJ4pEkAO1PP8/XGZJbtSzc8d/voWWysNDQPdCe6vhftQz1pUtsNG3lUZ40gQS2EENVMi0B3WvzLnZNZF/hm0zEW70njZNYFtiWfY1vyOf6zChxtrWhbz5PoEC+iQzwJ83WWnuPVlBz6FkKIak4pReq5AjYePsOmw2fZdOTMVfca93LS0a6+J9EhhvCu7S73GjcnOfQthBB3EY1GQ5CnI0GejsS1DUKvV+xPy2HTkTPEHz7LtuRznMkrYtHuUyzabXi+dpCng2Fvu74XUfU98ZC7oVksCWohhKhhtFoNjWq50qiWKwPurc/FEj27Us8Tf/gM8UfOknA8i5SzBaScTWXu1lQ0Goj0dyE6xIt29T1pE+whHdMsiBz6FkKIu0xuYTHbks8Rf/gs8YfPkJSRazK9rGNa+xAv2gR70LiWK446Ce7KJIe+hRBCXJeznQ0dI3zpGOELQGZuIZuPGEI7/vBZk45pAFoNhPo407SOK03ruNG0ththfs7Sq/wOkaAWQoi7nI+zHd2b1aJ7s1oopUg5W0D8EUPHtF2p5zmVXWh4tnZGLj/9dQIAnbWWRrVcaVrbjaZ1XGlWx41ADwfpWV4FJKiFEEIYaTQa6no5UtfL0DENIDOnkN0nstl9PIvdJ7JIOJ5FbmEJO1LOsyPlvHFeNwebS8HtRrM6hhD3dJJbnd4uCWohhBA35ONix0ORdjwUaThUrtcrjp3NZ/eJLHYfzybheBb7T+WQVVDMuoOnWXfwtHHe2u72huC+FOCNarlIR7UKkq0lhBCiQrRaDfW8najn7UTP5oaOUBdL9BxIz2H38SwSjmez+0QWhzPzOHH+AifOX2DJnjTDvBpo4OtM09pu1PN2vHRZmQNBng4S4NchW0UIIcRts7XW0qS2G01qu/FslGFcTmEx+05kk3Aiy3DY/Hg26TmFHEjP5UB67lXL8HLSUdfTgUBPB4I8DAEe6OlAXU9H3B1s7trz3xLUQgghqoSLnQ3tQrxoF+JlHJeeXcjuE1nsO5nNsbMFpJ7NJ+VcAVkFxZzJK+JMXhF/XXbeu4yzztoQ4J4OBHo4lge6pyP+LnZotTU3xCWohRBC3DF+rnb4ufoR29DPZHx2QTEp5/JJOVtA6rkCUs7mX7opSwHpOYXkFpXw96kc/j6Vc9Uyba201Pawp66nI4EeDgR6OBDgZk9td3sC3Oyr/d64BLUQQgizc3WwoYmD4dD5lQqLSzl+zhDaKZeFeOq5Ak6cL+BiqZ6jp/M5ejr/msu2s9ES4GZPrUtDwKWh7L2fqx221pZ7TbgEtRBCCItmZ2NFqK8zob7OV00r1StOZV24FOL5pJ4t4Pj5Ak5mFXIq6wKnc4soLL5xkGs04O2ko5Z7eYAHuNoZXrsb3rvam2+vXIJaCCFEtWWl1VDHw4E6Hg60x+uq6UUlpaRnF3Iy6wInz1/g1KUAP5VteH8y6wJFJXoyc4vIzC1iV2rWNdfjYGtFgJs9jQJcmPpU8yr+VKYkqIUQQtRYOmsr45PFrkUpxbn8i5zKuhTmWRcMQX5pOJlVyJm8IgoulnI4M88s9zyXoBZCCHHX0mg0eDrp8HTS0bi26zXbFBaXkpZt2BM3x8FvCWohhBDiBuxsrAj2ciTY69p75VXNcru5XWbGjBnUrVsXOzs72rZty7Zt28xdkhBCCHFHWHxQ//jjj4wcOZK3336bnTt30rRpU2JjY8nMzDR3aUIIIUSVs/ignjJlCi+++CL9+vUjMjKSWbNm4eDgwNdff23u0oQQQogqZ9FBffHiRXbs2EFMTIxxnFarJSYmhs2bN19znqKiInJycoxDbu7V95MVQgghqguLDuozZ85QWlqKr6+vyXhfX1/S09OvOc/EiRNxdXU1DpGRkXeiVCGEEKJK1Lhe32PGjGHkyJHG98ePH6dRo0akpaWZsSohhBCiXFkm6fX6f2xr0UHt5eWFlZUVGRkZJuMzMjLw8/O75jw6nQ6dTmd8X1BQAECbNm2qrlAhhBDiFmRkZBAYGHjDNhYd1La2trRs2ZLVq1fTo0cPwPDtY/Xq1QwZMuSmltG8eXO2bduGr68vWu3tHenPzc0lMjKS/fv34+x89T1nxdVkm1WcbLOKk21WcbLNKq4yt5lerycjI4Pmzf/5dqQapZS6rbVVsR9//JE+ffrw+eef06ZNG6ZOncpPP/3EgQMHrjp3XdVycnJwdXUlOzsbFxeXO7ru6kq2WcXJNqs42WYVJ9us4sy1zSx6jxrgySef5PTp04wbN4709HSaNWvG8uXL73hICyGEEOZg8UENMGTIkJs+1C2EEELUJBZ9eZal0el0vP322yad1cSNyTarONlmFSfbrOJkm1WcubaZxZ+jFkIIIe5mskcthBBCWDAJaiGEEMKCSVALIYQQFkyCugLkudg3b+LEibRu3RpnZ2d8fHzo0aMHSUlJ5i6r2vjwww/RaDSMGDHC3KVYtJMnT/LMM8/g6emJvb09jRs35q+//jJ3WRartLSUsWPHEhwcjL29PfXr1+fdd99FuiqZWr9+Pd26dSMgIACNRsPChQtNpiulGDduHP7+/tjb2xMTE8OhQ4eqrB4J6pskz8WumHXr1jF48GC2bNnCypUrKS4uplOnTuTn55u7NIu3fft2Pv/8c5o0aWLuUiza+fPniY6OxsbGhmXLlrF//34++eQT3N3dzV2axZo0aRIzZ87ks88+IzExkUmTJvHRRx8xffp0c5dmUfLz82natCkzZsy45vSPPvqIadOmMWvWLLZu3YqjoyOxsbEUFhZWTUFK3JQ2bdqowYMHG9+XlpaqgIAANXHiRDNWVX1kZmYqQK1bt87cpVi03NxcFRoaqlauXKnuu+8+NXz4cHOXZLFGjx6t2rdvb+4yqpWuXbuq559/3mRcr169VFxcnJkqsnyAWrBggfG9Xq9Xfn5+6uOPPzaOy8rKUjqdTv3www9VUoPsUd+EW3kutjCVnZ0NgIeHh5krsWyDBw+ma9euJj9r4toWLVpEq1ateOKJJ/Dx8aF58+Z8+eWX5i7LorVr147Vq1dz8OBBAHbv3s3GjRvp0qWLmSurPpKTk0lPTzf5HXV1daVt27ZVlgfV4s5k5naj52IfOHDATFVVH3q9nhEjRhAdHU2jRo3MXY7FmjdvHjt37mT79u3mLqVaOHr0KDNnzmTkyJG88cYbbN++nWHDhmFra0ufPn3MXZ5Fev3118nJySE8PBwrKytKS0t5//33iYuLM3dp1UZ6ejrANfOgbFplk6AWVW7w4MHs27ePjRs3mrsUi3X8+HGGDx/OypUrsbOzM3c51YJer6dVq1Z88MEHgOFJefv27WPWrFkS1Nfx008/8f333zN37lwaNmxIQkICI0aMICAgQLaZBZND3zfhVp6LLQyGDBnC4sWLWbNmDbVr1zZ3ORZrx44dZGZm0qJFC6ytrbG2tmbdunVMmzYNa2trSktLzV2ixfH39ycyMtJkXEREBKmpqWaqyPK9+uqrvP766zz11FM0btyYZ599lpdffpmJEyeau7Rqo+xv/p3MAwnqm3D5c7HLlD0XOyoqyoyVWS6lFEOGDGHBggX8+eefBAcHm7ski9axY0f27t1LQkKCcWjVqhVxcXEkJCRgZWVl7hItTnR09FWX/B08eJCgoCAzVWT5CgoK0GpN/+xbWVmh1+vNVFH1ExwcjJ+fn0ke5OTksHXr1irLAzn0fZNGjhxJnz59aNWqlfG52Pn5+fTr18/cpVmkwYMHM3fuXH777TecnZ2N525cXV2xt7c3c3WWx9nZ+arz946Ojnh6esp5/et4+eWXadeuHR988AG9e/dm27ZtfPHFF3zxxRfmLs1idevWjffff5/AwEAaNmzIrl27mDJlCs8//7y5S7MoeXl5HD582Pg+OTmZhIQEPDw8CAwMZMSIEbz33nuEhoYSHBzM2LFjCQgIoEePHlVTUJX0Ja+hpk+frgIDA5Wtra1q06aN2rJli7lLsljANYfZs2ebu7RqQy7P+me///67atSokdLpdCo8PFx98cUX5i7JouXk5Kjhw4erwMBAZWdnp+rVq6fefPNNVVRUZO7SLMqaNWuu+ferT58+SinDJVpjx45Vvr6+SqfTqY4dO6qkpKQqq0eeniWEEEJYMDlHLYQQQlgwCWohhBDCgklQCyGEEBZMgloIIYSwYBLUQgghhAWToBZCCCEsmAS1EEIIYcEkqIUQQggLJkEthKh0Go2GhQsXmrsMIWoECWohapi+ffui0WiuGjp37mzu0oQQt0AeyiFEDdS5c2dmz55tMk6n05mpGiHE7ZA9aiFqIJ1Oh5+fn8ng7u4OGA5Lz5w5ky5dumBvb0+9evX45ZdfTObfu3cvDz74IPb29nh6ejJgwADy8vJM2nz99dc0bNgQnU6Hv78/Q4YMMZl+5swZevbsiYODA6GhoSxatMg47fz588TFxeHt7Y29vT2hoaFXfbEQQhhIUAtxFxo7diyPPfYYu3fvJi4ujqeeeorExEQA8vPziY2Nxd3dne3bt/Pzzz+zatUqkyCeOXMmgwcPZsCAAezdu5dFixYREhJiso4JEybQu3dv9uzZw8MPP0xcXBznzp0zrn///v0sW7aMxMREZs6ciZeX153bAEJUJ1X2XC4hhFn06dNHWVlZKUdHR5Ph/fffV0oZHkH60ksvmczTtm1bNXDgQKWUUl988YVyd3dXeXl5xulLlixRWq1WpaenK6WUCggIUG+++eZ1awDUW2+9ZXyfl5enALVs2TKllFLdunVT/fr1q5wPLEQNJ+eohaiBHnjgAWbOnGkyzsPDw/g6KirKZFpUVBQJCQkAJCYm0rRpUxwdHY3To6Oj0ev1JCUlodFoOHXqFB07drxhDU2aNDG+dnR0xMXFhczMTAAGDhzIY489xs6dO+nUqRM9evSgXbt2t/RZhajpJKiFqIEcHR2vOhRdWezt7W+qnY2Njcl7jUaDXq8HoEuXLqSkpLB06VJWrlxJx44dGTx4MJMnT670eoWo7uQctRB3oS1btlz1PiIiAoCIiAh2795Nfn6+cXp8fDxarZawsDCcnZ2pW7cuq1evvq0avL296dOnD9999x1Tp07liy++uK3lCVFTyR61EDVQUVER6enpJuOsra2NHbZ+/vlnWrVqRfv27fn+++/Ztm0b//d//wdAXFwcb7/9Nn369GH8+PGcPn2aoUOH8uyzz+Lr6wvA+PHjeemll/Dx8aFLly7k5uYSHx/P0KFDb6q+cePG0bJlSxo2bEhRURGLFy82flEQQpiSoBaiBlq+fDn+/v4m48LCwjhw4ABg6JE9b948Bg0ahL+/Pz/88AORkZEAODg4sGLFCoYPH07r1q1xcHDgscceY8qUKcZl9enTh8LCQv7zn/8watQovLy8ePzxx2+6PltbW8aMGcOxY8ewt7enQ4cOzJs3rxI+uRA1j0YppcxdhBDiztFoNCxYsIAePXqYuxQhxE2Qc9RCCCGEBZOgFkIIISyYnKMW4i4jZ7uEqF5kj1oIIYSwYBLUQgghhAWToBZCCCEsmAS1EEIIYcEkqIUQQggLJkEthBBCWDAJaiGEEMKCSVALIYQQFkyCWgghhLBg/w95Zz43LjhONQAAAABJRU5ErkJggg==", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import matplotlib.pyplot as plt\n", "from matplotlib.ticker import MaxNLocator\n", "\n", "\n", "def plot_losses(epochs_seen, tokens_seen, train_losses, val_losses):\n", " fig, ax1 = plt.subplots(figsize=(5, 3))\n", "\n", " # Plot training and validation loss against epochs\n", " ax1.plot(epochs_seen, train_losses, label=\"Training loss\")\n", " ax1.plot(epochs_seen, val_losses, linestyle=\"-.\", label=\"Validation loss\")\n", " ax1.set_xlabel(\"Epochs\")\n", " ax1.set_ylabel(\"Loss\")\n", " ax1.legend(loc=\"upper right\")\n", " ax1.xaxis.set_major_locator(MaxNLocator(integer=True)) # only show integer labels on x-axis\n", "\n", " # Create a second x-axis for tokens seen\n", " ax2 = ax1.twiny() # Create a second x-axis that shares the same y-axis\n", " ax2.plot(tokens_seen, train_losses, alpha=0) # Invisible plot for aligning ticks\n", " ax2.set_xlabel(\"Tokens seen\")\n", "\n", " fig.tight_layout() # Adjust layout to make room\n", " plt.savefig(\"loss-plot.pdf\")\n", " plt.show()\n", "\n", "epochs_tensor = torch.linspace(0, num_epochs, len(train_losses))\n", "plot_losses(epochs_tensor, tokens_seen, train_losses, val_losses)\n", "#一个经典的plot画图函数" ] }, { "cell_type": "markdown", "id": "8bc83ded-5f80-4e1c-bf4d-ccb59999d995", "metadata": {}, "source": [ "- 从以上结果可以看出,模型在开始阶段生成的是难以理解的字符串,但是在后期能够生成基本符合语法的句子。\n", "- 从训练集和验证集的损失值可以看出,模型开始出现过拟合现象。\n", "- 如果检查后期它生成的某些段落,会发现它们与训练集中的内容完全相同(模型只是简单地记住了训练数据,背住答案罢了)。\n", "- 之后的部分,我们将讨论一些解码策略,这些策略可以一定程度缓解这种“背答案”的问题。\n", "- 请注意,这里的过拟合是由于训练集非常非常小,并且我们对其进行了多次迭代。\n", " - 本次 LLM 训练主要用于教学目的;我们的目标是观察模型是否能够学会生成连贯的文本。\n", " - 为了避免花费数周或数月时间在大量昂贵硬件上训练模型,我们将在后续加载预训练权重。" ] }, { "cell_type": "markdown", "id": "eb380c42-b31c-4ee1-b8b9-244094537272", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "de713235-1561-467f-bf63-bf11ade383f0", "metadata": {}, "source": [ "**如果您对通过更高深的技术增强此训练函数感兴趣,例如学习率预热、余弦退火和梯度裁剪,请参阅[附录D](../../appendix-D/01_main-chapter-code)。**" ] }, { "cell_type": "markdown", "id": "6d5cdf2f-09a5-4eb0-a20a-d7aac5c14c2c", "metadata": {}, "source": [ "**更大的数据集跟更深度的训练,可以在以下找到链接 [../03_bonus_pretraining_on_gutenberg](../03_bonus_pretraining_on_gutenberg)**" ] }, { "cell_type": "markdown", "id": "699f45fc-bf78-42f2-bd24-2355db41b28f", "metadata": { "id": "699f45fc-bf78-42f2-bd24-2355db41b28f" }, "source": [ "## 5.3 控制随机性的解码策略" ] }, { "cell_type": "markdown", "id": "6be9086e-2c27-41da-97d0-49137d0ba3c7", "metadata": {}, "source": [ "- 对于像我们训练的这种规模相对较小的GPT模型(LLM),推理阶段的计算成本较低。因此,如果在训练时使用了GPU,推理阶段则无需依赖GPU资源。\n", "- 我们可以利用第5章介绍的`generate_text_simple`函数(该函数已在简单训练函数中被调用)来逐步生成新文本,每次生成一个单词(或 token)。\n", "- 正如5.1.2节所提到的,下一个生成的 token 是从词汇表中选取概率得分最高的 token 。" ] }, { "cell_type": "code", "execution_count": 29, "id": "2734cee0-f6f9-42d5-b71c-fa7e0ef28b6d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output text:\n", " Every effort moves you know,\" was one of the axioms he laid down across the Sevres and silver of an exquisitely appointed lun\n" ] } ], "source": [ "model.to(\"cpu\")\n", "model.eval()\n", "\n", "tokenizer = tiktoken.get_encoding(\"gpt2\")\n", "\n", "token_ids = generate_text_simple(\n", " model=model,\n", " idx=text_to_token_ids(\"Every effort moves you\", tokenizer),\n", " max_new_tokens=25,\n", " context_size=GPT_CONFIG_124M[\"context_length\"]\n", ")\n", "#经典的载入\n", "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))" ] }, { "cell_type": "markdown", "id": "d25dbe31-bb7c-4893-b25b-47d0492d4aa4", "metadata": {}, "source": [ "- 即使我们多次调用 `generate_text_simple` 函数,大语言模型(LLM)生成的输出也始终是确定性的,即每次结果相同。\n", "- 为了增强生成文本的灵活性,我们引入了两种解码策略来改进 `generate_text_simple`:**温度缩放** 和 **top-k 采样**。\n", "- 这些方法能够调节模型生成文本的随机性和多样性,从而满足不同的应用需求。" ] }, { "cell_type": "markdown", "id": "4bb6f380-a798-4fd9-825c-17b7cd29a994", "metadata": {}, "source": [ "### 5.3.1 温度缩放" ] }, { "cell_type": "markdown", "id": "a7f4f53c-0612-43d3-aa82-52447eac50fa", "metadata": {}, "source": [ "- 在之前的实现中,我们始终使用 `torch.argmax` 来选择概率最高的 token 作为下一个生成的 token。\n", "- 为了增加生成文本的多样性,我们可以改用 `torch.multinomial(probs, num_samples=1)`,从概率分布中随机采样下一个 token。\n", "- 在这种方法中,每个索引被选中的概率与其在输入张量中对应的概率值成正比,从而实现基于概率的随机采样。" ] }, { "cell_type": "markdown", "id": "e7531bae-d5de-44c0-bc78-78fed077e22a", "metadata": {}, "source": [ "- 以下是对生成下一个 token 过程的简单回顾,假设我们使用一个非常小的词汇表来说明:" ] }, { "cell_type": "code", "execution_count": 30, "id": "01a5ce39-3dc8-4c35-96bc-6410a1e42412", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "forward\n" ] } ], "source": [ "vocab = { \n", " \"closer\": 0,\n", " \"every\": 1, \n", " \"effort\": 2, \n", " \"forward\": 3,\n", " \"inches\": 4,\n", " \"moves\": 5, \n", " \"pizza\": 6,\n", " \"toward\": 7,\n", " \"you\": 8,\n", "} \n", "\n", "inverse_vocab = {v: k for k, v in vocab.items()}\n", "#插入\n", "# Suppose input is \"every effort moves you\", and the LLM\n", "# returns the following logits for the next token:\n", "next_token_logits = torch.tensor(\n", " [4.51, 0.89, -1.90, 6.75, 1.63, -1.62, -1.89, 6.28, 1.79]\n", ")\n", "\n", "probas = torch.softmax(next_token_logits, dim=0)\n", "#softmax归一化\n", "next_token_id = torch.argmax(probas).item()\n", "#选个可能性最大\n", "# The next generated token is then as follows:\n", "print(inverse_vocab[next_token_id])" ] }, { "cell_type": "code", "execution_count": 31, "id": "6400572f-b3c8-49e2-95bc-433e55c5b3a1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "forward\n" ] } ], "source": [ "torch.manual_seed(123)\n", "next_token_id = torch.multinomial(probas, num_samples=1).item()\n", "print(inverse_vocab[next_token_id])" ] }, { "cell_type": "markdown", "id": "c63d0a27-830b-42b5-9986-6d1a7de04dd9", "metadata": {}, "source": [ "- 我们不再依赖 `torch.argmax` 来选择最可能的 token ,而是通过 `torch.multinomial(probas, num_samples=1)` 从 softmax 分布中采样来确定下一个 token 。\n", "- 为了直观地理解这一过程,我们可以使用原始的 softmax 概率对下一个 token 进行 1,000 次采样,并观察结果分布:" ] }, { "cell_type": "code", "execution_count": 32, "id": "b23b863e-252a-403c-b5b1-62bc0a42319f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "73 x closer\n", "0 x every\n", "0 x effort\n", "582 x forward\n", "2 x inches\n", "0 x moves\n", "0 x pizza\n", "343 x toward\n" ] } ], "source": [ "def print_sampled_tokens(probas):\n", " torch.manual_seed(123) # Manual seed for reproducibility\n", " sample = [torch.multinomial(probas, num_samples=1).item() for i in range(1_000)]\n", " #从概率分布 probas 中按照权重进行一次采样,并生成索引\n", " sampled_ids = torch.bincount(torch.tensor(sample))\n", " #然后变成单词\n", " for i, freq in enumerate(sampled_ids):\n", " print(f\"{freq} x {inverse_vocab[i]}\")\n", "#统计采样过程中每个词的出现频率\n", "print_sampled_tokens(probas)" ] }, { "cell_type": "markdown", "id": "32e7d9cf-a26d-4d9a-8664-4af1efa73832", "metadata": {}, "source": [ "- 我们可以通过一种称为**温度缩放**的技术来调节概率分布和 token 选择的过程。\n", "- 温度缩放的核心操作是将 logits 除以一个大于 0 的数值(即温度值),然后再应用 softmax 函数。\n", "- 当温度值大于 1 时,softmax 输出的概率分布会更加均匀,从而增加生成文本的多样性。\n", "- 当温度值小于 1 时,softmax 输出的概率分布会更加集中(更陡峭或更尖锐),从而倾向于选择概率更高的 token,减少随机性。" ] }, { "cell_type": "markdown", "id": "f4e24fe1-3c4e-4ebe-877e-d5d8b703d148", "metadata": {}, "source": [ "模型的预测概率往往过于自信或低估某些类别的概率,尤其在分类任务中。\n", "温度缩放通过引入一个参数 T > 0 来重新调整 logits,改善预测概率的校准性能" ] }, { "cell_type": "code", "execution_count": 33, "id": "0759e4c8-5362-467c-bec6-b0a19d1ba43d", "metadata": {}, "outputs": [], "source": [ "def softmax_with_temperature(logits, temperature):\n", " scaled_logits = logits / temperature\n", " return torch.softmax(scaled_logits, dim=0)\n", "#温度校正\n", "# Temperature values\n", "temperatures = [1, 0.1, 5] # Original, higher confidence, and lower confidence\n", "#初始校正系数\n", "# Calculate scaled probabilities\n", "scaled_probas = [softmax_with_temperature(next_token_logits, T) for T in temperatures]" ] }, { "cell_type": "code", "execution_count": 34, "id": "2e66e613-4aca-4296-a984-ddd0d80c6578", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAeoAAAEiCAYAAAA21pHjAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjkuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy80BEi2AAAACXBIWXMAAA9hAAAPYQGoP6dpAABM5klEQVR4nO3deVxU1f8/8Newg2wimyAKiiYUO0q4oUWCGmqkGWooIt8scYFwjUUgwDQR/YRiKu5rRlqaJvIRcc0dMxEDREhBcSVA1jm/P/xxP44DyH7v4Pv5eMzjw5y5d+Y185l8zz333HNEjDEGQgghhAiSHN8BCCGEEFI/KtSEEEKIgFGhJoQQQgSMCjUhhBAiYFSoCSGEEAGjQk0IIYQIGBVqQgghRMCoUBNCCCECpsB3gPYmFotx7949aGhoQCQS8R2HEELIG4gxhn///RdGRkaQk2v4mPmNK9T37t2DiYkJ3zEIIYQQ5Ofno1u3bg1u88YVag0NDQAvPhxNTU2e0xBCCHkTFRcXw8TEhKtJDXnjCnVtd7empiYVakIIIbxqzClYGkxGCCGECBivhTotLQ0eHh4wMjKCSCTC/v37X7tPamoq7O3toaysDHNzc2zevLnNcxJCCCF84bVQl5aWwsbGBvHx8Y3a/vbt2xg1ahSGDRuGq1evYu7cuZg+fTp+//33Nk5KCCGE8IPXc9QjRozAiBEjGr19QkICzMzMsGLFCgCAhYUFTp06hZUrV8LNza2tYhJC2plYLEZlZSXfMQhpNkVFRcjLy7fKc8nUYLKzZ8/C1dVVos3NzQ1z586td5+KigpUVFRw94uLi9sqHiGkFVRWVuL27dsQi8V8RyGkRbS1tWFoaNjiOTtkqlAXFhbCwMBAos3AwADFxcV4/vw5VFVVpfaJiYlBeHh4e0UkhLQAYwwFBQWQl5eHiYnJayeCIESIGGMoKyvDgwcPAABdu3Zt0fPJVKFujkWLFiEwMJC7X3vtGiFEeKqrq1FWVgYjIyOoqanxHYeQZqs9cHzw4AH09fVb1A0uU4Xa0NAQ9+/fl2i7f/8+NDU16zyaBgBlZWUoKyu3RzxCGm+JVgOPPWu/HAJTU1MDAFBSUuI5CSEtV/tjs6qqqkWFWqb6lZydnZGSkiLRlpycDGdnZ54SEULaAs3DTzqC1voe81qoS0pKcPXqVVy9ehXAi8uvrl69iry8PAAvuq29vb257WfMmIGcnBzMnz8fN2/exJo1a7B3714EBATwEZ8QQghpc7wW6osXL8LOzg52dnYAgMDAQNjZ2SE0NBQAUFBQwBVtADAzM8OhQ4eQnJwMGxsbrFixAhs2bKBLswghhHRYvJ6jHjp0KBhj9T5e16xjQ4cOxZUrV9owFSFEaEwXHmrX18tdOqrR276uezMsLAxLlixpYSJhMTU1xdy5cxu8NFboZs+ejdOnT+P69euwsLDgenaFSKYGkxFCiNAUFBRwf+/ZswehoaHIzMzk2tTV1fmI1WSMMdTU1EBBof3KQmVlJa8DB6dNm4Y//vgD165d4y1DY8jUYDJCCBEaQ0ND7qalpQWRSCTRtnv3blhYWEBFRQV9+/bFmjVruH1zc3MhEomwd+9eDB48GKqqqujXrx9u3bqFCxcuwNHREerq6hgxYgSKioq4/aZOnYqxY8ciPDwcenp60NTUxIwZMyRmcxOLxYiJiYGZmRlUVVVhY2ODffv2cY+npqZCJBLh8OHDcHBwgLKyMk6dOoXs7GyMGTMGBgYGUFdXR79+/XDs2DFuv6FDh+LOnTsICAiASCTiehSWLFkCW1tbic8mLi4OpqamUrmjoqJgZGSEt956C8CLZYc/+eQTaGtrQ0dHB2PGjEFubm5r/N9Tr9WrV2PmzJno2bNnm75Oa6BCTQghbWTHjh0IDQ1FVFQUMjIyEB0djZCQEGzZskViu7CwMAQHB+Py5ctQUFDAxIkTMX/+fKxatQonT55EVlYWN3anVkpKCjIyMpCamopdu3YhKSlJYnKnmJgYbN26FQkJCfjrr78QEBCAyZMn48SJExLPs3DhQixduhQZGRmwtrZGSUkJRo4ciZSUFFy5cgXu7u7w8PDgxgslJSWhW7duiIiIQEFBgUSPQmOkpKQgMzMTycnJOHjwIKqqquDm5gYNDQ2cPHkSp0+fhrq6Otzd3RucRlZdXb3B24wZM5qUS8io65sQQtpIWFgYVqxYAU9PTwAvBsTeuHED69atw5QpU7jtgoKCuEGxc+bMgZeXF1JSUjBw4EAAgK+vr9SYHSUlJSQmJkJNTQ1vv/02IiIiMG/ePERGRqKqqgrR0dE4duwYd/lqz549cerUKaxbtw4uLi7c80REROCDDz7g7uvo6MDGxoa7HxkZiZ9//hm//PIL/P39oaOjA3l5eWhoaMDQ0LDJn0mnTp2wYcMGrst7+/btEIvF2LBhA3d0vmnTJmhrayM1NRXDhw+v83led05ZU1OzydmEigo1IYS0gdLSUmRnZ8PX1xd+fn5ce3V1NbS0JCe8sba25v6unSbZyspKoq12OspaNjY2ErO3OTs7o6SkBPn5+SgpKUFZWZlEAQZenBOuvcqmlqOjo8T9kpISLFmyBIcOHUJBQQGqq6vx/PlziStwWsLKykrivHR6ejqysrKgoaEhsV15eTmys7PrfR5zc/NWySMLqFATQkgbKCkpAQCsX78eTk5OEo+9OkuVoqIi93ftUeWrbU1ZpKT2tQ8dOgRjY2OJx16dqbFTp04S94OCgpCcnIzvvvsO5ubmUFVVxbhx4167mpmcnJzUVTxVVVVS2736eiUlJXBwcMCOHTukttXT06v39V43SG/y5MlISEhocBtZQYWaEELagIGBAYyMjJCTk4NJkya1+vOnp6dLLEZ07tw5qKurw8TEBDo6OlBWVkZeXp5EN3djnD59GlOnTsVHH30E4EUhfXVgl5KSEjfday09PT0UFhaCMcb92GjMJU/29vbYs2cP9PX1m9RdTV3fhBBCWiw8PByzZ8+GlpYW3N3dUVFRgYsXL+LJkycSiwU1R2VlJXx9fREcHIzc3FyEhYXB398fcnJy0NDQQFBQEAICAiAWizFo0CA8e/YMp0+fhqampsT58Vf17t0bSUlJ8PDwgEgkQkhIiNTRvKmpKdLS0vDpp59CWVkZurq6GDp0KIqKirBs2TKMGzcOR44cweHDh19bMCdNmoTly5djzJgxiIiIQLdu3XDnzh0kJSVh/vz56NatW537tbTrOysrCyUlJSgsLMTz58+5wm9paSm4ueZp1DchhLSR6dOnY8OGDdi0aROsrKzg4uKCzZs3w8zMrMXP/f7776N3794YMmQIJkyYgNGjR0tMrBIZGYmQkBDExMTAwsIC7u7uOHTo0GtfOzY2Fp07d8aAAQPg4eEBNzc32NvbS2wTERGB3Nxc9OrVi+uetrCwwJo1axAfHw8bGxucP38eQUFBr30fampqSEtLQ/fu3eHp6QkLCwv4+vqivLy8TY+Kp0+fDjs7O6xbtw63bt3iZsm8d+9em71mc4lYQ1ODdUDFxcXQ0tLCs2fPOlTXCJExtHpWncrLy3H79m2YmZlBRUWF7ziCNXXqVDx9+hT79+/nOwppQEPf56bUIjqiJoQQQgSMCjUhhBAiYDSYjBBCZExdCxaRjouOqAkhhBABo0JNCCGECBgVakIIIUTAqFATQgghAkaFmhBCCBEwKtSEEEKIgFGhJoSQFhCJRA3eXp7Ws6MwNTVFXFwc3zFaJC8vD6NGjYKamhr09fUxb948VFdXN7hPVFQUBgwYADU1NWhra7dPUNB11IQQWdDQlKtt8nqNn8a1oKCA+3vPnj0IDQ1FZmYm1/a65RiFgjGGmpoaKCi0X1morKzkZQGMmpoajBo1CoaGhjhz5gwKCgrg7e0NRUVFREdH17tfZWUlxo8fD2dnZ2zcuLHd8tIRNSGEtIChoSF309LSgkgkkmjbvXs3LCwsoKKigr59+2LNmjXcvrm5uRCJRNi7dy8GDx4MVVVV9OvXD7du3cKFCxfg6OgIdXV1jBgxAkVFRdx+U6dOxdixYxEeHg49PT1oampixowZEmtGi8VixMTEwMzMDKqqqrCxscG+ffu4x1NTUyESiXD48GE4ODhAWVkZp06dQnZ2NsaMGQMDAwOoq6ujX79+OHbsGLff0KFDcefOHQQEBHC9BgCwZMkS2NraSnw2cXFxMDU1lcodFRUFIyMjvPXWWwCA/Px8fPLJJ9DW1oaOjg7GjBkjtbRmazp69Chu3LiB7du3w9bWFiNGjEBkZCTi4+MbXHc7PDwcAQEBsLKyarNsdaFCTQghbWTHjh0IDQ1FVFQUMjIyEB0djZCQEGzZskViu7CwMAQHB+Py5ctQUFDAxIkTMX/+fKxatQonT55EVlYWQkNDJfZJSUlBRkYGUlNTsWvXLiQlJSE8PJx7PCYmBlu3bkVCQgL++usvBAQEYPLkyThx4oTE8yxcuBBLly5FRkYGrK2tUVJSgpEjRyIlJQVXrlyBu7s7PDw8kJeXBwBISkpCt27dEBERgYKCAokehcZISUlBZmYmkpOTcfDgQVRVVcHNzQ0aGho4efIkTp8+DXV1dbi7uzdYNNXV1Ru8zZgxo959z549CysrKxgYGHBtbm5uKC4uxl9//dWk99MeqOubEELaSFhYGFasWAFPT08AgJmZGW7cuIF169ZJrAkdFBQENzc3AMCcOXPg5eWFlJQUDBw4EADg6+srNW2okpISEhMToaamhrfffhsRERGYN28eIiMjUVVVhejoaBw7dgzOzs4AgJ49e+LUqVNYt24dXFxcuOeJiIjABx98wN3X0dGBjY0Ndz8yMhI///wzfvnlF/j7+0NHRwfy8vLQ0NCAoaFhkz+TTp06YcOGDVyX9/bt2yEWi7Fhwwbu6HzTpk3Q1tZGamoqhg8fXufz1K4fXZ+GVqQqLCyUKNIAuPuFhYWNfSvthgo1IYS0gdLSUmRnZ8PX1xd+fn5ce3V1NbS0JM+5W1tbc3/XFoyXu1cNDAzw4MEDiX1sbGygpqbG3Xd2dkZJSQny8/NRUlKCsrIyiQIMvDjHamdnJ9Hm6Ogocb+kpARLlizBoUOHUFBQgOrqajx//pw7om4pKysrifPS6enpyMrKgoaGhsR25eXlyM7Orvd5zM3NWyWPLKBCTQghbaCkpAQAsH79ejg5OUk8Ji8vL3FfUVGR+7v2qPLVNrFY3OTXPnToEIyNjSUeU1ZWlrjfqVMniftBQUFITk7Gd999B3Nzc6iqqmLcuHENdkMDgJycHBhjEm1VVVVS2736eiUlJXBwcMCOHTukttXT06v39V43SG/y5MlISEio8zFDQ0OcP39eou3+/fvcY0JDhZoQQtqAgYEBjIyMkJOTg0mTJrX686enp+P58+dQVVUFAJw7dw7q6uowMTGBjo4OlJWVkZeXJ9HN3RinT5/G1KlT8dFHHwF4UUhfHdilpKSEmpoaiTY9PT0UFhaCMcb92Hhd9zQA2NvbY8+ePdDX12+wu/pVLen6dnZ2RlRUFB48eAB9fX0AQHJyMjQ1NWFpadnoDO2FCjUhhLSR8PBwzJ49G1paWnB3d0dFRQUuXryIJ0+eIDAwsEXPXVlZCV9fXwQHByM3NxdhYWHw9/eHnJwcNDQ0EBQUhICAAIjFYgwaNAjPnj3D6dOnoampKXF+/FW9e/dGUlISPDw8IBKJEBISInU0b2pqirS0NHz66adQVlaGrq4uhg4diqKiIixbtgzjxo3DkSNHcPjw4dcW30mTJmH58uUYM2YMIiIi0K1bN9y5cwdJSUmYP38+unXrVud+Len6Hj58OCwtLfHZZ59h2bJlKCwsRHBwMGbOnMn1OJw/fx7e3t5ISUnheiXy8vLw+PFj5OXloaamhvuxYG5u3qaX4fE+6js+Ph6mpqZQUVGBk5OTVHfEq+Li4vDWW29BVVUVJiYmCAgIQHl5eTulJYSQxps+fTo2bNiATZs2wcrKCi4uLti8eTPMzMxa/Nzvv/8+evfujSFDhmDChAkYPXq0xOQqkZGRCAkJQUxMDCwsLODu7o5Dhw699rVjY2PRuXNnDBgwAB4eHnBzc4O9vb3ENhEREcjNzUWvXr247mkLCwusWbMG8fHxsLGxwfnz5xEUFPTa96Gmpoa0tDR0794dnp6esLCwgK+vL8rLy5t0hN0U8vLyOHjwIOTl5eHs7IzJkyfD29sbERER3DZlZWXIzMyU6L4PDQ2FnZ0dwsLCUFJSAjs7O9jZ2eHixYttkrOWiL16UqEd7dmzB97e3khISICTkxPi4uLw448/IjMzk+uOeNnOnTsxbdo0JCYmYsCAAbh16xamTp2KTz/9FLGxsY16zeLiYmhpaeHZs2dt9iUg5LUamsCjCZNtdDTl5eW4ffs2zMzMoKKiwnccwZo6dSqePn2K/fv38x2FNKCh73NTahGvR9SxsbHw8/ODj48PLC0tkZCQADU1NSQmJta5/ZkzZzBw4EBMnDgRpqamGD58OLy8vF57FE4IIYTIKt4KdWVlJS5dugRXV9f/hZGTg6urK86ePVvnPgMGDMClS5e4wpyTk4PffvsNI0eObJfMhBBCSHvjbTDZw4cPUVNTU+dF5zdv3qxzn4kTJ+Lhw4cYNGgQGGOorq7GjBkzsHjx4npfp6KiAhUVFdz94uLi1nkDhBDCk1cnPyEdG++DyZoiNTUV0dHRWLNmDS5fvoykpCQcOnQIkZGR9e4TExMDLS0t7mZiYtKOiQkhhJCW4e2IWldXF/Ly8txF5rXu379f7wXnISEh+OyzzzB9+nQAL2a4KS0txf/93//h66+/hpyc9O+ORYsWSVwGUVxcTMWaEEKIzODtiFpJSQkODg5ISUnh2sRiMVJSUri5aV9VVlYmVYxrZ/ipb/C6srIyNDU1JW6EEEKIrOB1wpPAwEBMmTIFjo6O6N+/P+Li4lBaWgofHx8AgLe3N4yNjRETEwMA8PDwQGxsLOzs7ODk5ISsrCyEhITAw8NDako+QgghpCPgtVBPmDABRUVFCA0NRWFhIWxtbXHkyBFugFleXp7EEXRwcDBEIhGCg4Nx9+5d6OnpwcPDA1FRUXy9BUIIIaRN8TrhCR9owhMiCDThSZ1owhPSkXSICU8IIYQQ0jAq1IQQ0gIikajB28vzb3cUpqamiIuL4ztGi9T1/9Xu3bv5jlUnWj2LECJ4Vlus2vX1/pzyZ6O3LSgo4P7es2cPQkNDkZmZybW15apKrYkxhpqaGigotF9ZqKyshJKSUru93qs2bdoEd3d37r62tjZvWRpCR9SEENIChoaG3E1LSwsikUiibffu3bCwsICKigr69u2LNWvWcPvm5uZCJBJh7969GDx4MFRVVdGvXz/cunULFy5cgKOjI9TV1TFixAgUFRVx+02dOhVjx45FeHg49PT0oKmpiRkzZqCyspLbRiwWIyYmBmZmZlBVVYWNjQ327dvHPZ6amgqRSITDhw/DwcEBysrKOHXqFLKzszFmzBgYGBhAXV0d/fr1w7Fjx7j9hg4dijt37iAgIIA7EgWAJUuWwNbWVuKziYuLg6mpqVTuqKgoGBkZ4a233gIA5Ofn45NPPoG2tjZ0dHQwZswYqTWw24K2trbE/1dCHRdBhZoQQtrIjh07EBoaiqioKGRkZCA6OhohISHYsmWLxHZhYWEIDg7G5cuXoaCggIkTJ2L+/PlYtWoVTp48iaysLISGhkrsk5KSgoyMDKSmpmLXrl1ISkpCeHg493hMTAy2bt2KhIQE/PXXXwgICMDkyZNx4sQJiedZuHAhli5dioyMDFhbW6OkpAQjR45ESkoKrly5And3d3h4eCAvLw8AkJSUhG7duiEiIgIFBQUSPQqNkZKSgszMTCQnJ+PgwYOoqqqCm5sbNDQ0cPLkSZw+fRrq6upwd3eX+OHxKnV19QZvM2bMeG2WmTNnQldXF/3790diYmK983Hwjbq+CSGkjYSFhWHFihXw9PQEAJiZmeHGjRtYt24dpkyZwm0XFBQENzc3AMCcOXPg5eWFlJQUDBw4EADg6+srNb+3kpISEhMToaamhrfffhsRERGYN28eIiMjUVVVhejoaBw7doybQKpnz544deoU1q1bBxcXF+55IiIi8MEHH3D3dXR0YGNjw92PjIzEzz//jF9++QX+/v7Q0dGBvLw8NDQ06p1FsiGdOnXChg0buC7v7du3QywWY8OGDdzR+aZNm6CtrY3U1FQMHz68zue5evVqg6/zupHUEREReO+996CmpoajR4/iyy+/RElJCWbPnt3k99TWqFATQkgbKC0tRXZ2Nnx9feHn58e1V1dXQ0tL8vI8a2tr7u/aeSSsrKwk2h48eCCxj42NDdTU1Lj7zs7OKCkpQX5+PkpKSlBWViZRgIEX54Tt7Owk2hwdHSXul5SUYMmSJTh06BAKCgpQXV2N58+fc0fULWVlZSVxXjo9PR1ZWVnQ0NCQ2K68vBzZ2dn1Po+5uXmLcoSEhHB/29nZobS0FMuXL6dCTQghb4qSkhIAwPr16+Hk5CTx2KszKSoqKnJ/1x5VvtomFoub/NqHDh2CsbGxxGPKysoS9zt16iRxPygoCMnJyfjuu+9gbm4OVVVVjBs3rsFuaODFMsWvdh1XVVVJbffq65WUlMDBwQE7duyQ2lZPT6/e13vdIL3JkycjISGhwW1e5uTkhMjISFRUVEh9RnyjQk0IIW3AwMAARkZGyMnJwaRJk1r9+dPT0/H8+XOoqqoCAM6dOwd1dXWYmJhAR0cHysrKyMvLk+jmbozTp09j6tSp+OijjwC8KKSvDuxSUlJCTU2NRJuenh4KCwvBGON+bLyuexoA7O3tsWfPHujr6zdpEqqWdn3X9XydO3cWXJEGqFATQkibCQ8Px+zZs6GlpQV3d3dUVFTg4sWLePLkicSqfs1RWVkJX19fBAcHIzc3F2FhYfD394ecnBw0NDQQFBSEgIAAiMViDBo0CM+ePcPp06ehqakpcX78Vb1790ZSUhI8PDwgEokQEhIidTRvamqKtLQ0fPrpp1BWVoauri6GDh2KoqIiLFu2DOPGjcORI0dw+PDh1xbMSZMmYfny5RgzZgwiIiLQrVs33LlzB0lJSZg/fz66detW534t6fr+9ddfcf/+fbz77rtQUVFBcnIyoqOjERQU1OznbEs06psQQtrI9OnTsWHDBmzatAlWVlZwcXHB5s2bYWZm1uLnfv/999G7d28MGTIEEyZMwOjRoyUmV4mMjERISAhiYmJgYWEBd3d3HDp06LWvHRsbi86dO2PAgAHw8PCAm5sb7O3tJbaJiIhAbm4uevXqxXVPW1hYYM2aNYiPj4eNjQ3Onz/fqMKnpqaGtLQ0dO/eHZ6enrCwsICvry/Ky8vbbJpnRUVFxMfHw9nZGba2tli3bh1iY2MRFhbWJq/XUjTXNyF8oLm+60RzfTfO1KlT8fTpU+zfv5/vKKQBNNc3IYQQ8gagQk0IIYQIGA0mI4QQGfPq5CekY2vWEfXx48dbOwchhBBC6tCsQu3u7o5evXrhm2++QX5+fmtnIoQQQsj/16xCfffuXfj7+2Pfvn3o2bMn3NzcsHfv3tfOXEMIIY3xhl2MQjqo1voeN6tQ6+rqIiAgAFevXsUff/yBPn364Msvv4SRkRFmz56N9PT0VglHCHmz1E6tST/6SUdQVlYGQHI62OZo8WAye3t7GBoaokuXLli6dCkSExOxZs0aODs7IyEhAW+//XZLX4IQ8oZQUFCAmpoaioqKoKioCDk5ujCFyB7GGMrKyvDgwQNoa2tLze3eVM0u1FVVVThw4AASExORnJwMR0dHfP/99/Dy8kJRURGCg4Mxfvx43Lhxo0UBCSFvDpFIhK5du+L27du4c+cO33EIaRFtbe1mLQX6qmYV6lmzZmHXrl1gjOGzzz7DsmXL8M4773CPd+rUCd999x2MjIxaHJAQ8mZRUlJC7969qfubyDRFRcUWH0nXalahvnHjBv7zn//A09Oz3pVGdHV16TIuQkizyMnJ0RSihPx/zToBFBYWhvHjx0sV6erqaqSlpQF4ca6pqcurEUIIIURSswr1sGHD8PjxY6n2Z8+eYdiwYS0ORQghhJAXmlWoX14Y/GWPHj1Cp06dWhyKEEIIIS806Ry1p6cngBcjM6dOnSrR9V1TU4Nr165hwIABrZuQEEIIeYM1qVBrab1YQ5cxBg0NDaiqqnKPKSkp4d1334Wfn1/rJiSEEELeYE0q1Js2bQIAmJqaIigoiLq5CSGEkDbW7FHfrVWk4+PjYWpqChUVFTg5OeH8+fMNbv/06VPMnDkTXbt2hbKyMvr06YPffvutVbIQQgghQtPoI2p7e3ukpKSgc+fOsLOzq3MwWa3Lly836jn37NmDwMBAJCQkwMnJCXFxcXBzc0NmZib09fWltq+srMQHH3wAfX197Nu3D8bGxrhz5w60tbUb+zYIIYQQmdLoQj1mzBhu8NjYsWNb5cVjY2Ph5+cHHx8fAEBCQgIOHTqExMRELFy4UGr7xMREPH78GGfOnOEmOTc1NW2VLIQQQogQiRhP68lVVlZCTU0N+/btkyj8U6ZMwdOnT3HgwAGpfUaOHAkdHR2oqanhwIED0NPTw8SJE7FgwYJ6p2qrqKhARUUFd7+4uBgmJiZ49uwZNDU1W/19EdIoS7QaeOxZ++UghPCiuLgYWlpajapFvC1N8/DhQ9TU1MDAwECi3cDAAIWFhXXuk5OTg3379qGmpga//fYbQkJCsGLFCnzzzTf1vk5MTAy0tLS4m4mJSau+D0IIIaQtNbrru3Pnzg2el35ZXbOWtQaxWAx9fX388MMPkJeXh4ODA+7evYvly5cjLCyszn0WLVqEwMBA7n7tETUhhBAiCxpdqOPi4lr1hXV1dSEvL4/79+9LtN+/f7/eZcG6du0qtSKJhYUFCgsLUVlZCSUlJal9lJWV6104hBBCCBG6RhfqKVOmtOoLKykpwcHBASkpKdw5arFYjJSUFPj7+9e5z8CBA7Fz506IxWJuQflbt26ha9eudRZpQgghRNY1+hx1cXGxxN8N3RorMDAQ69evx5YtW5CRkYEvvvgCpaWl3Chwb29vLFq0iNv+iy++wOPHjzFnzhzcunULhw4dQnR0NGbOnNno1ySEEEJkSZPOURcUFEBfXx/a2tp1nq+uXayjpqamUc85YcIEFBUVITQ0FIWFhbC1tcWRI0e4AWZ5eXnckTMAmJiY4Pfff0dAQACsra1hbGyMOXPmYMGCBY19G4QQQohMafTlWSdOnMDAgQOhoKCAEydONLitkNehbsqQeEJawnThoXofy1WZWP+OdHkWIR1eU2pRo4+oXy6+Qi7EhBBCSEfSpEU5XvbkyRNs3LgRGRkZAABLS0v4+PhAR0en1cIRQgghb7pmTXiSlpYGU1NTrF69Gk+ePMGTJ0+wevVqmJmZIS0trbUzEkIIIW+sZh1Rz5w5ExMmTMDatWu5a5pramrw5ZdfYubMmfjzzz9bNSQhhBDypmrWEXVWVha++uoriYlH5OXlERgYiKysrFYLRwghhLzpmlWo7e3tuXPTL8vIyICNjU2LQxFCCCHkhUZ3fV+7do37e/bs2ZgzZw6ysrLw7rvvAgDOnTuH+Ph4LF26tPVTEkIIIW+oRl9HLScnB5FIhNdt3pQJT/hA11GT9kLXURNC6tMm11Hfvn27xcEIIYQQ0jSNLtQ9evRoyxyEEEIIqUOzJzwBgBs3biAvLw+VlZUS7aNHj25RKEIIIYS80KxCnZOTg48++gh//vmnxHnr2oU6hHyOmhBCCJElzbo8a86cOTAzM8ODBw+gpqaGv/76C2lpaXB0dERqamorRySEEELeXM06oj579iz++9//QldXF3JycpCTk8OgQYMQExOD2bNn48qVK62dkxBCCHkjNeuIuqamBhoaGgAAXV1d3Lt3D8CLAWeZmZmtl44QQgh5wzXriPqdd95Beno6zMzM4OTkhGXLlkFJSQk//PADevbs2doZCSGEkDdWswp1cHAwSktLAQARERH48MMPMXjwYHTp0gV79uxp1YCEEELIm6xZhdrNzY3729zcHDdv3sTjx4/RuXNnbuQ3IYQQQlquRddRA0B+fj4AwMTEpMVhCCGEECKpWYPJqqurERISAi0tLZiamsLU1BRaWloIDg5GVVVVa2ckhBBC3ljNOqKeNWsWkpKSsGzZMjg7OwN4ccnWkiVL8OjRI6xdu7ZVQxJCCCFvqmYV6p07d2L37t0YMWIE12ZtbQ0TExN4eXlRoSaEEEJaSbO6vpWVlWFqairVbmZmBiUlpZZmIoQQQsj/16xC7e/vj8jISFRUVHBtFRUViIqKgr+/f6uFI4QQQt50je769vT0lLh/7NgxdOvWDTY2NgCA9PR0VFZW4v3332/dhIQQQsgbrNGFWktLS+L+xx9/LHGfLs8ihBBCWl+jC/WmTZvaMgchhBBC6tCiCU+Kioq4RTjeeust6OnptUooQgghhLzQrMFkpaWlmDZtGrp27YohQ4ZgyJAhMDIygq+vL8rKylo7IyGEEPLGalahDgwMxIkTJ/Drr7/i6dOnePr0KQ4cOIATJ07gq6++avLzxcfHw9TUFCoqKnBycsL58+cbtd/u3bshEokwduzYJr8mIYQQIguaVah/+uknbNy4ESNGjICmpiY0NTUxcuRIrF+/Hvv27WvSc+3ZsweBgYEICwvD5cuXYWNjAzc3Nzx48KDB/XJzcxEUFITBgwc35y0QQgghMqFZhbqsrAwGBgZS7fr6+k3u+o6NjYWfnx98fHxgaWmJhIQEqKmpITExsd59ampqMGnSJISHh9P614QQQjq0ZhVqZ2dnhIWFoby8nGt7/vw5wsPDubm/G6OyshKXLl2Cq6vr/wLJycHV1RVnz56td7+IiAjo6+vD19f3ta9RUVGB4uJiiRshhBAiK5o16jsuLg7u7u5SE56oqKjg999/b/TzPHz4EDU1NVJH5wYGBrh582ad+5w6dQobN27E1atXG/UaMTExCA8Pb3QmQgghREiaVaitrKzw999/Y8eOHVxB9fLywqRJk6CqqtqqAV/277//4rPPPsP69euhq6vbqH0WLVqEwMBA7n5xcTFNzkIIIURmNLlQV1VVoW/fvjh48CD8/Pxa9OK6urqQl5fH/fv3Jdrv378PQ0NDqe2zs7ORm5sLDw8Prk0sFgMAFBQUkJmZiV69eknso6ysDGVl5RblJIQQQvjS5HPUioqKEuemW0JJSQkODg5ISUnh2sRiMVJSUuo81923b1/8+eefuHr1KncbPXo0hg0bhqtXr9KRMiGEkA6nWV3fM2fOxLfffosNGzZAQaFFk5shMDAQU6ZMgaOjI/r374+4uDiUlpbCx8cHAODt7Q1jY2PExMRARUUF77zzjsT+2traACDVTgghhHQEzaqyFy5cQEpKCo4ePQorKyt06tRJ4vGkpKRGP9eECRNQVFSE0NBQFBYWwtbWFkeOHOEGmOXl5UFOrlmD0wkhhBCZ16xCra2tLbV6Vkv4+/vXu451ampqg/tu3ry51XIQQgghQtOkQi0Wi7F8+XLcunULlZWVeO+997BkyZI2HelNCCGEvMma1KccFRWFxYsXQ11dHcbGxli9ejVmzpzZVtkIIYSQN16Tjqi3bt2KNWvW4PPPPwcAHDt2DKNGjcKGDRvoPDIhhHRwpgsP1dmeu3RUOyd5szSpuubl5WHkyJHcfVdXV4hEIty7d6/VgxFCCCGkiYW6uroaKioqEm2Kioqoqqpq1VCEEEIIeaFJXd+MMUydOlVipq/y8nLMmDFD4hKtplyeRQghhJD6NalQT5kyRapt8uTJrRaGEEIIIZKaVKg3bdrUVjkIIYQQUgcaqk0IIYQIGBVqQgghRMCoUBNCCCECRoWaEEIIETAq1IQQQoiAUaEmhBBCBIwKNSGEECJgVKgJIYQQAaNCTQghhAgYFWpCCCFEwKhQE0IIIQJGhZoQQggRMCrUhBBCiIBRoSaEEEIEjAo1IYQQImBUqAkhhBABo0JNCCGECJgC3wEIIZKstljV+9ifU/5sxySEECGgI2pCCCFEwKhQE0IIIQImiEIdHx8PU1NTqKiowMnJCefPn6932/Xr12Pw4MHo3LkzOnfuDFdX1wa3J4QQQmQZ7+eo9+zZg8DAQCQkJMDJyQlxcXFwc3NDZmYm9PX1pbZPTU2Fl5cXBgwYABUVFXz77bcYPnw4/vrrLxgbG/PwDgghhNSHxly0HO9H1LGxsfDz84OPjw8sLS2RkJAANTU1JCYm1rn9jh078OWXX8LW1hZ9+/bFhg0bIBaLkZKS0s7JCSGEkLbHa6GurKzEpUuX4OrqyrXJycnB1dUVZ8+ebdRzlJWVoaqqCjo6Om0VkxBCCOENr13fDx8+RE1NDQwMDCTaDQwMcPPmzUY9x4IFC2BkZCRR7F9WUVGBiooK7n5xcXHzAxNCCCHtjPeu75ZYunQpdu/ejZ9//hkqKip1bhMTEwMtLS3uZmJi0s4pCSGEkObjtVDr6upCXl4e9+/fl2i/f/8+DA0NG9z3u+++w9KlS3H06FFYW1vXu92iRYvw7Nkz7pafn98q2QkhhJD2wGuhVlJSgoODg8RAsNqBYc7OzvXut2zZMkRGRuLIkSNwdHRs8DWUlZWhqakpcSOEEEJkBe+XZwUGBmLKlClwdHRE//79ERcXh9LSUvj4+AAAvL29YWxsjJiYGADAt99+i9DQUOzcuROmpqYoLCwEAKirq0NdXZ2390EIIYS0Bd4L9YQJE1BUVITQ0FAUFhbC1tYWR44c4QaY5eXlQU7ufwf+a9euRWVlJcaNGyfxPGFhYViyZEl7RieEEELaHO+FGgD8/f3h7+9f52OpqakS93Nzc9s+ECGEECIQMj3qmxBCCOnoqFATQgghAkaFmhBCCBEwQZyjfhPRRPWEEEIag46oCSGEEAGjQk0IIYQIGBVqQgghRMCoUBNCCCECRoWaEEIIETAq1IQQQoiAUaEmhBBCBIwKNSGEECJgVKgJIYQQAaNCTQghhAgYFWpCCCFEwKhQE0IIIQJGi3IQQlqMFpkhHYnQvs90RE0IIYQIGBVqQgghRMCo65s0mtC6gwgh5E1AR9SEEEKIgFGhJoQQQgSMur5byHThoXofy106qh2TEEII6YjoiJoQQggRMCrUhBBCiIBR1zfp0GikOqmPLH43ZDEzaTk6oiaEEEIEjAo1IYQQImBUqAkhhBABE0Shjo+Ph6mpKVRUVODk5ITz5883uP2PP/6Ivn37QkVFBVZWVvjtt9/aKSkhhBDSvngv1Hv27EFgYCDCwsJw+fJl2NjYwM3NDQ8ePKhz+zNnzsDLywu+vr64cuUKxo4di7Fjx+L69evtnJwQQghpe7wX6tjYWPj5+cHHxweWlpZISEiAmpoaEhMT69x+1apVcHd3x7x582BhYYHIyEjY29vj+++/b+fkhBBCSNvj9fKsyspKXLp0CYsWLeLa5OTk4OrqirNnz9a5z9mzZxEYGCjR5ubmhv3797dlVEIIIfVZolX/Y2bd2y9HB8VroX748CFqampgYGAg0W5gYICbN2/WuU9hYWGd2xcWFta5fUVFBSoqKrj7z549AwAUFxe3JDpHXFFW72MNvUbN85pm7dca3gn7vd7Hroe71fsYn5mbi8/MDX43RKzex/j+nOv7ftB3g398Z67vO03f56arfR7G6v/sOIxHd+/eZQDYmTNnJNrnzZvH+vfvX+c+ioqKbOfOnRJt8fHxTF9fv87tw8LCGAC60Y1udKMb3QR3y8/Pf22t5PWIWldXF/Ly8rh//75E+/3792FoaFjnPoaGhk3aftGiRRJd5WKxGI8fP0aXLl0gEola+A4kFRcXw8TEBPn5+dDU1GzV524rlLl9UOb2QZnbB2VuOcYY/v33XxgZGb12W14LtZKSEhwcHJCSkoKxY8cCeFFIU1JS4O/vX+c+zs7OSElJwdy5c7m25ORkODs717m9srIylJWVJdq0tbVbI369NDU1BfFFaArK3D4oc/ugzO2DMreMlpZWo7bjfa7vwMBATJkyBY6Ojujfvz/i4uJQWloKHx8fAIC3tzeMjY0RExMDAJgzZw5cXFywYsUKjBo1Crt378bFixfxww8/8Pk2CCGEkDbBe6GeMGECioqKEBoaisLCQtja2uLIkSPcgLG8vDzIyf3vKrIBAwZg586dCA4OxuLFi9G7d2/s378f77zzDl9vgRBCCGkzvBdqAPD396+3qzs1NVWqbfz48Rg/fnwbp2o6ZWVlhIWFSXW1Cxllbh+UuX1Q5vZBmduXiLHGjA0nhBBCCB94n5mMEEIIIfWjQk0IIYQIGBVqQgghRMCoUBNCCCECRoW6maqrq7F161apWdIIIYSQ1kSjvltATU0NGRkZ6NGjB99RGm3KlCnw9fXFkCFD+I7SJD179sSFCxfQpUsXifanT5/C3t4eOTk5PCX7n19++aXR244ePboNk7zZampq8Oeff6JHjx7o3Lkz33FkVlMWnxDKTF+vSktLa/BxWfl3UBDXUcuq/v374+rVqzJVqJ89ewZXV1f06NEDPj4+mDJlCoyNjfmO9Vq5ubmoqZFe0aaiogJ3797lIZG02mlwa4lEIomVcV6eW76u9yIEW7Zsga6uLkaNGgUAmD9/Pn744QdYWlpi165dgvyuz507F1ZWVvD19UVNTQ1cXFxw5swZqKmp4eDBgxg6dCjfEWWStrZ2o9dDEOr3ua7/72Xhv8NXUaFugS+//BKBgYHIz8+Hg4MDOnXqJPG4tbU1T8nqt3//fhQVFWHbtm3YsmULwsLC4OrqCl9fX4wZMwaKiop8R5Tw8lHq77//LjE3bk1NDVJSUmBqaspDMmlisZj7+9ixY1iwYAGio6O5eejPnj2L4OBgREdH8xXxtaKjo7F27VoAL/LGx8dj5cqVOHjwIAICApCUlMRzQmn79u3D5MmTAQC//vorbt++jZs3b2Lbtm34+uuvcfr0aZ4T1m3fvn3Yu3cv8vLyUFlZKfHY5cuXeUr1P8ePH+f+zs3NxcKFCzF16lSJ7/OWLVu46Z2F6MmTJxL3q6qqcOXKFYSEhCAqKoqnVM3w2vW1SL1EIpHUTU5OjvtfWXDp0iXm7+/PVFRUmK6uLps7dy67desW37E4dX3GtTclJSXWp08f9uuvv/IdU8rbb7/NTp48KdWelpbG+vbty0OixlFVVWV37txhjDE2f/589tlnnzHGGLt+/TrT1dXlM1q9lJWVuaUC/fz82Jw5cxhjjOXk5DANDQ0ek9Vv1apVTF1dnfn7+zMlJSX2+eefM1dXV6alpcUWL17Mdzwp7733ntTywowxtmPHDubi4tL+gVooNTWV2dvb8x2j0WgwWQvcvn1b6paTk8P9r9AVFBQgOTkZycnJkJeXx8iRI/Hnn3/C0tISK1eu5DsegBdHqWKxGD169EBRURF3XywWo6KiApmZmfjwww/5jiklOzu7zlXatLS0kJub2+55GktdXR2PHj0CABw9ehQffPABAEBFRQXPnz/nM1q9DAwMcOPGDdTU1ODIkSNc5rKyMsjLy/Ocrm5r1qzBDz/8gP/85z9QUlLC/PnzkZycjNmzZ+PZs2d8x5Ny9uxZODo6SrU7Ojri/PnzPCRqGQMDA2RmZvIdo/H4/qVA2ldlZSXbt28fGzVqFFNUVGQODg5s7dq17NmzZ9w2SUlJTFtbm8eUkiorK9l7770nqCP91xk8eDD74IMPWGFhIddWWFjIhg8fzoYMGcJjsoZNnDiR2dvbM19fX6ampsYePnzIGGPswIED7O233+Y5Xd3CwsKYlpYW69u3L+vevTsrLy9njDG2ceNG9u677/Kcrm6qqqosNzeXMcaYnp4eu3r1KmOMsVu3bjEdHR0+o9WpT58+bN68eVLt8+bNY3369OEhUeOkp6dL3K5evcoOHz7MXFxc2MCBA/mO12h0jrqFtm3bhoSEBNy+fRtnz55Fjx49EBcXBzMzM4wZM4bveFK6du0KsVgMLy8vnD9/Hra2tlLbDBs2rM3X7G4KRUVFXLt2je8YTbJx40Z4enqie/fuMDExAQDk5+dzq70JVXx8PIKDg5Gfn4+ffvqJG2V/6dIleHl58ZyubkuWLME777yD/Px8jB8/nlt0QV5eHgsXLuQ5Xd0MDQ3x+PFj9OjRA927d8e5c+dgY2OD27dvSwxAFIqVK1fi448/xuHDh+Hk5AQAOH/+PP7++2/89NNPPKern62trdSgTgB49913kZiYyFOqpqPLs1pg7dq1CA0Nxdy5cxEVFYXr16+jZ8+e2Lx5M7Zs2SIxGEMotm3bhvHjx0NFRYXvKE0SEBAAZWVlLF26lO8ojcYYQ3JyMm7evAkAsLCwgKura6NH0pKmKy8vl4nv9vTp02FiYoKwsDDEx8dj3rx5GDhwIC5evAhPT09s3LiR74hS/vnnH6xduxYZGRkAXnyfZ8yYwf0QFaI7d+5I3JeTk4Oenp5MfEdeRoW6BSwtLREdHY2xY8dCQ0MD6enp6NmzJ65fv46hQ4fi4cOHfEeUUFVVBVVVVVy9elXm1u+eNWsWtm7dit69e9c5wj42NpanZNJk+XMGgJMnT2LdunXIycnBjz/+CGNjY2zbtg1mZmYYNGgQ3/Gk1NTUIDo6GgkJCbh//z5u3bqFnj17IiQkBKampvD19eU7opTacRYKCi86NXfv3o0zZ86gd+/e+Pzzz6GkpMRzwv+pqqqCu7s7EhIS0Lt3b77jvJFoMFkL3L59G3Z2dlLtysrKKC0t5SFRwxQVFdG9e3eZuXbwZdevX4e9vT00NDRw69YtXLlyhbtdvXqV73gSZPlz/umnn+Dm5gZVVVVcvnwZFRUVAF5cfy/Uy8qioqKwefNmLFu2TKLAvfPOO9iwYQOPyeonJyfHFWkA+PTTT7F69WrMmjVLUEUakM1TTy87ceIEPDw8YG5uDnNzc4wePRonT57kO1bT8Hh+XOZZWFiw/fv3M8YYU1dXZ9nZ2YwxxlavXs3s7Oz4jFavDRs2sJEjR7JHjx7xHaVDk9XP2dbWlm3ZsoUxJvmdvnz5MjMwMOAzWr169erFjh07xhiTzJyRkSGoQZEvMzMzY1OnTuUGvtUqKipiZmZmPKWq39y5c9mCBQv4jtFk27ZtYwoKCuyTTz5hq1atYqtWrWKffPIJU1RUZDt27OA7XqPRYLIWCAwMxMyZM1FeXg7GGM6fP49du3YhJiZGsL/kv//+e2RlZcHIyAg9evSQ6kIWwkQLr/PPP/8AALp168ZzkvrJ6uecmZlZ57SKWlpaePr0afsHaoS7d+/C3Nxcql0sFqOqqoqHRK+Xm5sLBQUFDB48GL/88gsMDQ0BvOjGf/W8qhBUV1cjMTERx44dE/ypp5dFRUVh2bJlCAgI4Npmz56N2NhYREZGYuLEiTymazwq1C0wffp0qKqqIjg4GGVlZZg4cSKMjIywatUqfPrpp3zHq9Or01zKCrFYjG+++QYrVqxASUkJAEBDQwNfffUVvv76a8jJCessjqx+zoaGhsjKypKa7e3UqVPo2bMnP6Few9LSEidPnpSa3nTfvn11npoSApFIhCNHjiAoKAgODg7Yv38/+vXrx3esetWeegKAW7duSTwm5MGROTk58PDwkGofPXo0Fi9ezEOiZuL7kL6jKC0tZffv3+c7Roe1cOFCpqenx9asWcNdExkfH8/09PQEOZOTrIqOjmaWlpbs3LlzTENDg508eZJt376d6enpsdWrV/Mdr0779+9nWlpabOnSpUxNTY0tX76cTZ8+nSkpKbGjR4/yHa9OIpGI+/di4cKFTFVVlW3bto0VFhbKzKyGsqBXr14sISFBqn3t2rXM3Nych0TNQ4W6BcrKylhpaSl3Pzc3l61cuZL9/vvvPKZ6vSdPnrD169ezhQsXcudQL126xP755x+ek9Wva9eu7MCBA1Lt+/fvZ0ZGRjwk6pjEYjH75ptvWKdOnbipWlVUVFhwcDDf0RqUlpbGXF1dmZ6eHlNVVWUDBw4U9H+HcnJyEj/st23bxlRUVJiPjw8V6la0Zs0apqSkxGbMmMG2bt3Ktm7dyj7//HOmrKxcZwEXKro8qwWGDx8OT09PzJgxA0+fPsVbb70FJSUlPHz4ELGxsfjiiy/4jijl2rVrcHV15aayzMzMRM+ePREcHIy8vDxs3bqV74h1UlFRwbVr19CnTx+J9szMTNja2gpuesuamhqsXLmy3kUXHj9+zFOyxqmsrERWVhZKSkpgaWkJdXV1viN1KHJycigsLIS+vj7XdvbsWXz00UcoKioS5BUDFy9erPf7LMTFWmr9/PPPWLFihcT13/PmzRPkhFT14vuXgizr0qULu379OmOMsfXr1zNra2tWU1PD9u7dK9iFF95//31uKsCXR8iePn2a9ejRg8dkDevfvz+bNWuWVLu/vz9zcnLiIVHDQkJCWNeuXdl3333HVFRUWGRkJPP19WVdunRhq1at4jteh+Lr68uOHz/Od4xWUVhYyFJTU/mOIWXXrl1MUVGRffjhh0xJSYl9+OGHrE+fPkxLS4tNnTqV73j18vb2ZidOnOA7RotRoW6Bl1caGj9+PFuyZAljjLG8vDymqqrKZ7R6aWpqsqysLMaYZKHOzc1lysrKfEZrUGpqKuvUqROzsLBg06ZNY9OmTWMWFhZMXV2dpaWl8R1PSs+ePdnBgwcZYy8+59rPfNWqVczLy4vPaA0qKSlhwcHBzNnZmfXq1YuZmZlJ3IRo9OjRTFlZmXXr1o0FBQWxK1eu8B3ptcLDw1lKSopUe0lJCQsPD+chUcOsrKzY999/zxj7378bYrGY+fn5sdDQUJ7T1W/MmDFMUVGRmZubs6ioKHb37l2+IzULFeoWsLKyYqtWrWJ5eXlMU1OTnTlzhjHG2MWLFwV7zamenh67fPkyY0yyUB89epR169aNz2ivdffuXbZ48WLm6enJPD092ddffy3Y//DU1NS4H3GGhobs0qVLjDHGsrOzmaamJp/RGvTpp5+yrl27svnz57OVK1eyuLg4iZtQPX78mK1bt465uLgwOTk5ZmlpyaKiotjt27f5jlan2mVaV6xYIdEu1MFkampq3Gepo6PDrl27xhhj7MaNG8zQ0JDHZK/34MEDtmLFCmZtbc0UFBSYu7s727t3L6usrOQ7WqNRoW6BH3/8kSkqKjI5OTnm6urKtUdHRzN3d3cek9XP19eXjR07llVWVjJ1dXWWk5PD7ty5w+zs7Lh1fIXio48+4lb12rJli9TkEELWp08fdu7cOcYYYwMHDmQxMTGMMcZ2797N9PT0+IzWIC0tLXbq1Cm+Y7RIfn4+W7ZsGevbty+Tl5fnO06dRCIR2717N+vSpQubOnUqq6ioYIwJt1AbGxtzxdnKyopbm/rMmTOC/uH5qkuXLjF/f3+moqLCdHV12dy5c2ViVT4q1C1UUFDALl++zGpqari2P/74g2VkZPCYqn5Pnz5lrq6uTFtbm8nLyzMTExOmqKjIhgwZwkpKSviOJ0FRUZHdu3ePMSY9SlboFixYwKKiohhjL4qzgoICMzc3Z0pKSoKe4cnU1JTduHGD7xjNVllZyX7++Wf28ccfMxUVFcFeEVB7eVZWVhazsLBgzs7O7P79+4It1F5eXtzRf0REBNPT02PTp09nPXr0YB999BHP6Rrn3r17bOnSpeytt95inTp1Yt7e3uz9999nCgoKLDY2lu94DaJR361EFmbLetmpU6dw7do1lJSUwN7eHq6urnxHkmJtbQ17e3sMGzYMPj4+WL16NTQ1Nevc1tvbu53TNc25c+e4RRfqmoBBKLZv344DBw5gy5YtUFNT4ztOox0/fhw7d+7ETz/9BLFYDE9PT0yaNAnvvfeeICfkkJeXR0FBAfT19VFcXIxPPvkEf/31FxISEjB69GjBjfp+/PgxysvLYWRkBLFYjGXLlnHf5+DgYHTu3JnviHWqqqrCL7/8gk2bNuHo0aOwtrbG9OnTMXHiRO7fkp9//hnTpk3DkydPeE5bPyrULSBrs2UBL9ZEFvKydC87ffo0vvrqK2RnZ+Px48fQ0NCo8x9dkUgk+MudhMzOzk7ic83KygJjDKamplBUVJTYVohTnxobG+Px48dwd3fHpEmT4OHhwa1JLVSvXp4lFosxd+5crF27FmKxWHCFWlbp6upCLBbDy8sLfn5+sLW1ldrm6dOnsLOzw+3bt9s/YCPRFKIt8PXXX2Pjxo1YunQpBg4cCODFkeqSJUtQXl6OqKgonhNKMzU1xaBBgzB58mSMGzdOsL+EAWDgwIE4d+4cgBf/sN26dUviulMh6969O4YOHQoXFxcMHToUvXr14jtSvWR1utNaS5Yswfjx46Gtrc13lEbbtGkTtLS0uPtycnJYvXo17OzskJaWxmOyunl7e2PYsGEYMmSIoL/Lr1q5ciXGjx/f4PrT2tragi7SAB1Rt4iRkRHXVfWyAwcO4Msvv8Tdu3d5Sla/K1euYOfOndi9ezeKiorg7u6OyZMnC/IoxNPTE5s3b4ampia2bNmCTz75BKqqqnzHapTt27cjLS0NqampyMrKgrGxMVxcXLjCTev6tg1ZOwUlK6ZPn460tDSJ73LtD1H6Lrc9KtQtIGuzZb2MMYbU1FSp83qJiYl8R+MoKSnhzp076Nq1q8Q5PVlTUFCAEydO4ODBg9izZ4+guzYvXLgAsVgMJycnifY//vgD8vLycHR05ClZ/WTlFNTq1avxf//3f1BRUcHq1avr3U4kEmHWrFntmKzx7t69i7S0NJw4cQInTpzArVu30LVrV+4HEmkbVKhbwMnJCU5OTlL/0c2aNQsXLlzgum2F7vLly/D19cW1a9cEVUBkfTBZWVkZTp06hdTUVBw/fhxXrlyBhYUFhg4dipUrV/Idr079+/fH/PnzMW7cOIn2pKQkfPvtt/jjjz94Sla/RYsWYePGjQgPD5c6BeXn5yeYU1BmZma4ePEiunTpAjMzs3q3E4lEyMnJacdkjVf7nT5+/DhSU1Nx+fJlWFpa4sqVK3xH69CoULfAiRMnMGrUKHTv3h3Ozs4AXszXm5+fj99++w2DBw/mOWH9/vnnH+zcuRM7d+7E9evX4ezsjEmTJmHGjBl8R+OcOXMGgYGBMjmYbMCAARKF2cXFBUOGDBH0mAAAUFdXx7Vr16SWtLx9+zasra3x77//8pSsfrJ4Cupltf8EC3F0eq3FixcjNTWV+07Xdn3Lwne6I6BC3UL37t1DfHw8bt68CeDFhO9ffvkljIyMeE5Wt3Xr1mHnzp04deoULCwsMGnSJEycOFFqLV+hqWsRAyHT0dGBnJwchg8fjqFDh2Lo0KFSp0iEqEuXLjh48CD3w7PWmTNnMGrUKEFewiKrp6A2btyIlStX4u+//wYA9O7dG3PnzsX06dN5TiZNTk4Oenp6CAgIgKenp0x8lzsSKtRvGBMTE3h5eWHSpEmwsbHhO06j3blzB3l5eVi3bh1ycnLw448/wtjYGNu2bYOZmRkGDRrEd0QJjDH8+eefSE1NxYkTJ5CWlgYlJSW4uLhg2LBh8PPz4ztinby8vFBQUIADBw5wo5KfPn2KsWPHQl9fH3v37uU5oTRZPAUVGhqK2NhYzJo1S6I37vvvv0dAQAAiIiJ4TigpPT0dJ06cQGpqKk6ePMl9l2XpR6gso0LdRNeuXWv0ttbW1m2YpHkYYzh16pTMFLxaP/30Ez777DNMmjQJ27Ztw40bN9CzZ098//33+O233/Dbb7/xHbFejDFcunQJ33//PXbs2CHowWR3797FkCFD8OjRI9jZ2QEArl69CgMDAyQnJwvyGvz6TkHl5eXh8OHDgjwFpaenh9WrV8PLy0uifdeuXZg1axYePnzIU7LGSU9Px8qVKwX/fe4o6DrqJrK1tYVIJMLrft+IRCJBfnmTkpK4gnf58mVUVFQAAJ49e4bo6GjBFrxvvvkGCQkJ8Pb2xu7du7n2gQMH4ptvvuExWd0uX76M1NRUpKam4tSpU/j3339hZWWFWbNmwcXFhe949TI2Nsa1a9ewY8cOpKenQ1VVFT4+PvDy8pKa/EQoXFxckJmZibVr13JrDnt6egr6FFRVVVWdI+gdHBxQXV3NQ6KGMcZw5coVie90cXExrK2tBf197ijoiLqJ7ty50+hthXje187ODgEBAfD29oaGhgbS09PRs2dPXLlyBSNGjEBhYSHfEeukpqaGGzduwNTUVCJ3Tk4OLC0tUV5ezndECQoKCrCzs+OunR4yZIjEBBekdZWXl+PatWt48OABxGKxxGOvDjITglmzZkFRURGxsbES7UFBQXj+/Dni4+N5Sla3zp07o6SkBDY2NlyX9+DBg2VqkhlZRkfUTfRy8Y2JiYGBgQGmTZsmsU1iYiKKioqwYMGC9o73WpmZmRgyZIhUu5aWFp4+fdr+gRrJ0NAQWVlZMDU1lWg/deqU1AhlvtXU1CApKQmDBw+WyRGxf//9N44fP15n0QsNDeUpVf2OHDkCb29vPHr0SKqnS6g9W8CLwWRHjx7Fu+++C+DFtep5eXnw9vZGYGAgt92rxZwP27dvx+DBg+u9PJK0LSrULVA7gvpVb7/9Nj799FNBFmpZKngv8/Pzw5w5c5CYmAiRSIR79+7h7NmzCAoKQkhICN/xJMjLy+OTTz5BRkaGzBXq9evX44svvoCuri4MDQ0lLhkSiUSCLNSzZs3C+PHjERoaCgMDA77jNMr169dhb28PAMjOzgbwYl5qXV1dXL9+ndtOKJdsjRo1ivubZn/jQbus0dVBKSsrs5ycHKn27OxspqyszEOi14uOjmaWlpbs3LlzTENDg508eZJt376d6enpsdWrV/Mdr15isZh98803rFOnTkwkEjGRSMRUVFRYcHAw39Hq5ODgwI4dO8Z3jCbr3r07W7p0Kd8xmkRDQ4NlZWXxHaNDq6mpYeHh4UxTU5PJyckxOTk5pqWlxSIiIiSW+CVtgwp1C5ibm7Nt27ZJtW/dupWZmZnxkOj1ZK3gvaqiooL99ddf7I8//mD//vsv33HqdfjwYWZra8t+/fVXdu/ePfbs2TOJm1BpaGiw7OxsvmM0iY+PD9uwYQPfMTq0hQsXMj09PbZmzRqWnp7O0tPTWXx8PNPT02OLFy/mO16HR4PJWmDZsmVYtmwZli9fjvfeew8AkJKSgvnz5+Orr77CokWLeE5Yv8rKSmRlZaGkpASWlpZQV1fnO1KH8vL80i93XzLGBH3e1NfXF/369RPUDHWvU1ZWhvHjx0NPTw9WVlZSo9Nnz57NU7KOQ9Znf5N1dI66BebNm4dHjx7hyy+/RGVlJYAXsyQtWLBA0EUaeLHghaWlJd8xOqzjx4/zHaFZzM3NERISgnPnzslM0du1axeOHj0KFRUVpKamSp1XF2JmWfP48WP07dtXqr1v376Cm763I6Ij6lZQUlKCjIwMqKqqonfv3oJbLpKQxpLFxSIMDQ0xe/ZsLFy4UDArZXU0sjj7W0dChZqQNvL06VNs3LiRm4Tj7bffxrRp0+h66lamo6ODCxcuoFevXnxH6bBkeQGijoAKNSFt4OLFi3Bzc4Oqqir69+8P4MVaz8+fP8fRo0e5S3OEIDAwEJGRkejUqZPE9buvEolEWLFiRTsma5yAgADo6elh8eLFfEfpsPLy8qCgoFDnAkTV1dXo3r07zwk7NirUhLSBwYMHw9zcHOvXr4eCwouhINXV1Zg+fTpycnKQlpbGc8L/GTZsGH7++Wdoa2tj2LBh9W4nEonw3//+tx2TNc7s2bOxdetW2NjYwNraWuq8uhAmDJF18vLyKCgokFq97tGjR9DX1xfs4MiOggo1IW1AVVUVV65ckRqAc+PGDTg6OqKsrIynZB2PLP64kDX1LTN7584dWFpaorS0lKdkbwYa9U1IG9DU1EReXp5Uoc7Pz4eGhgZPqTomWR1hLwtqT4XUzkqnpqbGPVZTU4M//vgDtra2PKV7c1ChJqQNTJgwAb6+vvjuu+8wYMAAAMDp06cxb948qaUNCRGqK1euAPjf+upKSkrcY0pKSrCxsUFQUBBf8d4Y1PVNSCu5du0a3nnnHcjJyaGyshLz5s1DQkICt2yhoqIivvjiCyxdupQu4SMyxcfHB6tWraJFOXhChZqQVvLygJuePXviwoULUFVV5RZd6NWrl0TXISGENAZ1fRPSSrS1tXH79m3o6+sjNzcXYrEYampqsLKy4jsaIUSGUaEmpJV8/PHHcHFxQdeuXSESieDo6Ah5efk6txXiDF+EEGGiQk1IK/nhhx/g6emJrKwszJ49G35+fjTCmxDSYnSOmpA24OPjg9WrV1OhJoS0GBVqQgghRMBoqRlCCCFEwKhQE0IIIQJGhZoQQggRMCrUhBBCiIBRoSaEEEIEjAo1IYQQImBUqAkhhBABo0JNCCGECNj/AziNpZr5Sbj4AAAAAElFTkSuQmCC", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Plotting\n", "x = torch.arange(len(vocab))\n", "bar_width = 0.15\n", "\n", "fig, ax = plt.subplots(figsize=(5, 3))\n", "for i, T in enumerate(temperatures):\n", " rects = ax.bar(x + i * bar_width, scaled_probas[i], bar_width, label=f'Temperature = {T}')\n", "\n", "ax.set_ylabel('Probability')\n", "ax.set_xticks(x)\n", "ax.set_xticklabels(vocab.keys(), rotation=90)\n", "ax.legend()\n", "\n", "plt.tight_layout()\n", "plt.savefig(\"temperature-plot.pdf\")\n", "plt.show()\n", "#一套经典的画图" ] }, { "cell_type": "markdown", "id": "d750e989-842a-4cfa-a44b-cf44d6e49163", "metadata": {}, "source": [ "- 从结果中可以看出,当温度设置为 0.1 时,概率分布变得更加陡峭,接近于 `torch.argmax` 的行为,因此最可能的 token 几乎总是被选中:" ] }, { "cell_type": "code", "execution_count": 35, "id": "e4600713-c51e-4f53-bf58-040a6eb362b8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 x closer\n", "0 x every\n", "0 x effort\n", "985 x forward\n", "0 x inches\n", "0 x moves\n", "0 x pizza\n", "15 x toward\n" ] } ], "source": [ "print_sampled_tokens(scaled_probas[1])" ] }, { "cell_type": "markdown", "id": "526e93cb-8e2a-42a1-b1ba-4fd5fe64c26b", "metadata": {}, "source": [ "- 当温度设置为 5 时,概率分布变得更加均匀,从而增加了生成文本的多样性和随机性:" ] }, { "cell_type": "code", "execution_count": 36, "id": "9dfb48f0-bc3f-46a5-9844-33b6c9b0f4df", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "165 x closer\n", "75 x every\n", "42 x effort\n", "239 x forward\n", "71 x inches\n", "46 x moves\n", "32 x pizza\n", "227 x toward\n", "103 x you\n" ] } ], "source": [ "print_sampled_tokens(scaled_probas[2])" ] }, { "cell_type": "markdown", "id": "0c83f0c4-3774-4375-ad7f-96440ba5fef7", "metadata": {}, "source": [ "- 假设大语言模型(LLM)的输入是“every effort moves you”,上述方法有时可能会生成无意义的文本,例如“every effort moves you pizza”,其出现的概率为 3.2%(即在 1000 次采样中出现了 32 次)。" ] }, { "cell_type": "markdown", "id": "c6e4873e-07e4-4abb-85df-bdaedcc1a6f7", "metadata": {}, "source": [ "### 5.3.2 Top-k 取样" ] }, { "cell_type": "markdown", "id": "6d4da95a-8bb2-4f69-a9b0-a643531db5df", "metadata": {}, "source": [ "- 为了在使用更高温度增加输出多样性的同时减少生成无意义句子的概率,我们可以将采样限制在前 k 个最可能的 token 中:" ] }, { "cell_type": "markdown", "id": "7ae6fffd-2730-4abe-a2d3-781fc4836f17", "metadata": {}, "source": [ "\n", "\n", "- (请注意,此图中的数值已截取到小数点后两位,以减少视觉干扰。Softmax 行中的值总和应为 1.0。)" ] }, { "cell_type": "markdown", "id": "0ba12da5-6ff1-4008-91b8-d2d537cbc14c", "metadata": {}, "source": [ "- 我们可以按照下述建议补充代码" ] }, { "cell_type": "markdown", "id": "8b7f110a-8aa7-4c84-8d71-5cfd28468b48", "metadata": {}, "source": [ "-\t控制输出质量: 减少低概率、无意义的词被选中的机会。\n", "-\t保持多样性: 允许模型在概率较高的几个候选词中随机选择,而不是总是选择最高概率的词(这会导致输出缺乏变化)。" ] }, { "cell_type": "code", "execution_count": 37, "id": "2a7f908a-e9ec-446a-b407-fb6dbf05c806", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Top logits: tensor([6.7500, 6.2800, 4.5100])\n", "Top positions: tensor([3, 7, 0])\n" ] } ], "source": [ "top_k = 3\n", "top_logits, top_pos = torch.topk(next_token_logits, top_k)\n", "#topK采样\n", "print(\"Top logits:\", top_logits)\n", "print(\"Top positions:\", top_pos)" ] }, { "cell_type": "code", "execution_count": 38, "id": "753865ed-79c5-48b1-b9f2-ccb132ff1d2f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([4.5100, -inf, -inf, 6.7500, -inf, -inf, -inf, 6.2800, -inf])\n" ] } ], "source": [ "new_logits = torch.where(\n", " condition=next_token_logits < top_logits[-1],\n", " input=torch.tensor(float(\"-inf\")), \n", " other=next_token_logits\n", ")\n", "#不是前K遮蔽掉\n", "print(new_logits)" ] }, { "cell_type": "markdown", "id": "dfa6fa49-6e99-459d-a517-d7d0f51c4f00", "metadata": {}, "source": [ "> NOTE: \n", ">\n", "> 一种稍微更高效的实现方式可以通过以下代码实现:\n", ">\n", "> ```python\n", "> new_logits = torch.full_like( # create tensor containing -inf values\n", "> next_token_logits, -torch.inf\n", ">) \n", "> new_logits[top_pos] = next_token_logits[top_pos] # copy top k values into the -inf tensor\n", "> ```\n", ">\n", "> For more details, see https://github.com/rasbt/LLMs-from-scratch/discussions/326\n" ] }, { "cell_type": "code", "execution_count": 39, "id": "4844f000-c329-4e7e-aa89-16a2c4ebee43", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([0.0615, 0.0000, 0.0000, 0.5775, 0.0000, 0.0000, 0.0000, 0.3610, 0.0000])\n" ] } ], "source": [ "topk_probas = torch.softmax(new_logits, dim=0)\n", "print(topk_probas)" ] }, { "cell_type": "markdown", "id": "56056503-a15d-4315-a3ff-46647a4c7c45", "metadata": {}, "source": [ "### 5.3.3 优化文本更新功能" ] }, { "cell_type": "markdown", "id": "34770423-473d-46f6-a5fa-6b2979564d26", "metadata": {}, "source": [ "- 在前两小节中,我们介绍了**温度采样**和**top-k 采样**的概念。\n", "- 现在,我们将结合这两种方法,对之前用于生成大语言模型(LLM)文本的 `generate_simple` 函数进行改进,创建一个新的 `generate` 函数:" ] }, { "cell_type": "markdown", "id": "b2a9c09a-41fb-465e-9f60-eb4fd2a7230d", "metadata": {}, "source": [ " - (译者):用自己的话总结下\n", " - 温度校正是更加平滑,防止数据差之毫厘以谬以千里 \n", " - topK是防止臭鱼烂虾进入筛选范围提高质量" ] }, { "cell_type": "code", "execution_count": 40, "id": "8e318891-bcc0-4d71-b147-33ce55febfa3", "metadata": {}, "outputs": [], "source": [ "def generate(model, idx, max_new_tokens, context_size, temperature=0.0, top_k=None, eos_id=None):\n", "#生成模块\n", " # For-loop is the same as before: Get logits, and only focus on last time step\n", " for _ in range(max_new_tokens):\n", " idx_cond = idx[:, -context_size:]\n", " with torch.no_grad():\n", " logits = model(idx_cond)\n", " logits = logits[:, -1, :]\n", " #计算预测值,但是切最后一个\n", " # New: Filter logits with top_k sampling\n", " #top K采样\n", " if top_k is not None:\n", " # Keep only top_k values\n", " top_logits, _ = torch.topk(logits, top_k)\n", " min_val = top_logits[:, -1]\n", " logits = torch.where(logits < min_val, torch.tensor(float(\"-inf\")).to(logits.device), logits)\n", " \n", " # New: Apply temperature scaling\n", " #温度校正\n", " if temperature > 0.0:\n", " logits = logits / temperature\n", "\n", " # Apply softmax to get probabilities\n", " probs = torch.softmax(logits, dim=-1) # (batch_size, context_len)\n", "\n", " # Sample from the distribution\n", " idx_next = torch.multinomial(probs, num_samples=1) # (batch_size, 1)\n", " #从概率分布中采样下一个 token \n", "\n", " # Otherwise same as before: get idx of the vocab entry with the highest logits value\n", " else:\n", " idx_next = torch.argmax(logits, dim=-1, keepdim=True) # (batch_size, 1)\n", " #如果未启用采样,选择概率最高的 token 作为下一个 token \n", " if idx_next == eos_id: # Stop generating early if end-of-sequence token is encountered and eos_id is specified\n", " break\n", "\n", " # Same as before: append sampled index to the running sequence\n", " idx = torch.cat((idx, idx_next), dim=1) # (batch_size, num_tokens+1)\n", "\n", " return idx" ] }, { "cell_type": "code", "execution_count": 41, "id": "aa2a0d7d-0457-42d1-ab9d-bd67683e7ed8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output text:\n", " Every effort moves you stand to work on surprise, a one of us had gone with random-\n" ] } ], "source": [ "torch.manual_seed(123)\n", "\n", "token_ids = generate(\n", " model=model,\n", " idx=text_to_token_ids(\"Every effort moves you\", tokenizer),\n", " max_new_tokens=15,\n", " context_size=GPT_CONFIG_124M[\"context_length\"],\n", " top_k=25,\n", " temperature=1.4\n", ")\n", "\n", "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))\n", "#经典的操作" ] }, { "cell_type": "markdown", "id": "4e2002ca-f4c1-48af-9e0a-88bfc163ba0b", "metadata": {}, "source": [ "## 5.4 在Pytorch中加载并保留权重" ] }, { "cell_type": "markdown", "id": "0fc52676-f026-4566-a226-2a90269f9d53", "metadata": {}, "source": [ "- 大模型的训练是很贵的, 所以导入已训练好的参数是很有必要的\n", "" ] }, { "cell_type": "markdown", "id": "10e4c7f9-592f-43d6-a00e-598fa01dfb82", "metadata": {}, "source": [ "- 在Pytorch中我们所推荐的保存方式是所谓的 `state_dict` ,这玩意通过调用 `torch.save` 的子模块 `.state_dict()` :" ] }, { "cell_type": "code", "execution_count": 42, "id": "3d67d869-ac04-4382-bcfb-c96d1ca80d47", "metadata": {}, "outputs": [], "source": [ "torch.save(model.state_dict(), \"model.pth\")\n", "#训练完的数据保存一下" ] }, { "cell_type": "markdown", "id": "90e889e0-07bf-43e5-8f92-5c5c7aeaad9e", "metadata": {}, "source": [ "- 之后我们可以对新的 `GPTModel` 导入已经训练好的参数:" ] }, { "cell_type": "code", "execution_count": 43, "id": "9d57d914-60a3-47f1-b499-5352f4c457cb", "metadata": {}, "outputs": [], "source": [ "model = GPTModel(GPT_CONFIG_124M)\n", "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", "model.load_state_dict(torch.load(\"model.pth\", map_location=device, weights_only=True))\n", "model.eval();" ] }, { "cell_type": "markdown", "id": "caa81aec-9c72-4f46-8ae2-4a4fde3edbc1", "metadata": {}, "source": [ "- 自适应的Adam跟AdamW相较于SGD更好!\n", "- 但是这些算法需要另外的参数, 所以保存训练好的参数就更有必要了:" ] }, { "cell_type": "code", "execution_count": 44, "id": "bbd175bb-edf4-450e-a6de-d3e8913c6532", "metadata": {}, "outputs": [], "source": [ "torch.save({\n", " \"model_state_dict\": model.state_dict(),\n", " \"optimizer_state_dict\": optimizer.state_dict(),\n", " }, \n", " \"model_and_optimizer.pth\"\n", ")\n", "#全家整整齐齐地保存" ] }, { "cell_type": "code", "execution_count": 45, "id": "8a0c7295-c822-43bf-9286-c45abc542868", "metadata": {}, "outputs": [], "source": [ "checkpoint = torch.load(\"model_and_optimizer.pth\", weights_only=True)\n", "#保存检查点\n", "model = GPTModel(GPT_CONFIG_124M)\n", "model.load_state_dict(checkpoint[\"model_state_dict\"])\n", "\n", "optimizer = torch.optim.AdamW(model.parameters(), lr=0.0005, weight_decay=0.1)\n", "optimizer.load_state_dict(checkpoint[\"optimizer_state_dict\"])\n", "model.train();\n", "#调整到训练模式" ] }, { "cell_type": "markdown", "id": "4194350e-0409-4a63-8ffd-d3a896509032", "metadata": {}, "source": [ "## 5.5 从OpenAI导入超参数" ] }, { "cell_type": "markdown", "id": "83eb6c38-7278-40e0-bd9f-8a2b1feac3ec", "metadata": {}, "source": [ "- 在之前的实验中,我们仅使用了一本非常短的小故事书来训练一个小型 GPT-2 模型,这主要是为了教学目的。\n", "- 对此感兴趣的读者可以在[../03_bonus_pretraining_on_gutenberg](../03_bonus_pretraining_on_gutenberg)中找到基于完整古登堡计划图书语料库的更长时间预训练记录。\n", "- 幸运的是,我们无需花费数万到数十万美元在大型预训练语料库上预训练模型,而是可以直接加载 OpenAI 提供的预训练权重。" ] }, { "cell_type": "markdown", "id": "127ddbdb-3878-4669-9a39-d231fbdfb834", "metadata": {}, "source": [ "- 有关从Hugging Face中加载权重的另一种方法请参阅 [../02_alternative_weight_loading](../02_alternative_weight_loading)" ] }, { "cell_type": "markdown", "id": "75cab892-a165-4f43-9601-f517bc212ab6", "metadata": {}, "source": [ "- 首先,我们需要一些基础代码来从 OpenAI 下载文件并将权重加载到 Python 中。\n", "- 由于 OpenAI 使用了 [TensorFlow](https://www.tensorflow.org/),我们需要安装并使用 TensorFlow 来加载权重;同时,[tqdm](https://github.com/tqdm/tqdm) 是一个用于显示进度条的库。\n", "- 取消注释并运行下一个代码单元以安装所需的库。" ] }, { "cell_type": "code", "execution_count": 46, "id": "fb9fdf02-972a-444e-bf65-8ffcaaf30ce8", "metadata": {}, "outputs": [], "source": [ "# pip install tensorflow tqdm" ] }, { "cell_type": "code", "execution_count": 47, "id": "a0747edc-559c-44ef-a93f-079d60227e3f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "TensorFlow version: 2.16.1\n", "tqdm version: 4.66.4\n" ] } ], "source": [ "print(\"TensorFlow version:\", version(\"tensorflow\"))\n", "print(\"tqdm version:\", version(\"tqdm\"))\n", "#tensorflow他到底还是来了" ] }, { "cell_type": "code", "execution_count": 48, "id": "c5bc89eb-4d39-4287-9b0c-e459ebe7f5ed", "metadata": {}, "outputs": [], "source": [ "# Relative import from the gpt_download.py contained in this folder\n", "from gpt_download import download_and_load_gpt2\n", "#召唤神仙" ] }, { "cell_type": "markdown", "id": "ff76a736-6f9f-4328-872e-f89a7b70a2cc", "metadata": {}, "source": [ "- 通过如下代码下载124M的模型:" ] }, { "cell_type": "code", "execution_count": 49, "id": "76271dd7-108d-4f5b-9c01-6ae0aac4b395", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File already exists and is up-to-date: gpt2/124M/checkpoint\n", "File already exists and is up-to-date: gpt2/124M/encoder.json\n", "File already exists and is up-to-date: gpt2/124M/hparams.json\n", "File already exists and is up-to-date: gpt2/124M/model.ckpt.data-00000-of-00001\n", "File already exists and is up-to-date: gpt2/124M/model.ckpt.index\n", "File already exists and is up-to-date: gpt2/124M/model.ckpt.meta\n", "File already exists and is up-to-date: gpt2/124M/vocab.bpe\n" ] } ], "source": [ "settings, params = download_and_load_gpt2(model_size=\"124M\", models_dir=\"gpt2\")" ] }, { "cell_type": "code", "execution_count": 50, "id": "b1a31951-d971-4a6e-9c43-11ee1168ec6a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Settings: {'n_vocab': 50257, 'n_ctx': 1024, 'n_embd': 768, 'n_head': 12, 'n_layer': 12}\n" ] } ], "source": [ "print(\"Settings:\", settings)" ] }, { "cell_type": "code", "execution_count": 51, "id": "857c8331-130e-46ba-921d-fa35d7a73cfe", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Parameter dictionary keys: dict_keys(['blocks', 'b', 'g', 'wpe', 'wte'])\n" ] } ], "source": [ "print(\"Parameter dictionary keys:\", params.keys())" ] }, { "cell_type": "code", "execution_count": 52, "id": "c48dac94-8562-4a66-84ef-46c613cdc4cd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[-0.11010301 -0.03926672 0.03310751 ... -0.1363697 0.01506208\n", " 0.04531523]\n", " [ 0.04034033 -0.04861503 0.04624869 ... 0.08605453 0.00253983\n", " 0.04318958]\n", " [-0.12746179 0.04793796 0.18410145 ... 0.08991534 -0.12972379\n", " -0.08785918]\n", " ...\n", " [-0.04453601 -0.05483596 0.01225674 ... 0.10435229 0.09783269\n", " -0.06952604]\n", " [ 0.1860082 0.01665728 0.04611587 ... -0.09625227 0.07847701\n", " -0.02245961]\n", " [ 0.05135201 -0.02768905 0.0499369 ... 0.00704835 0.15519823\n", " 0.12067825]]\n", "Token embedding weight tensor dimensions: (50257, 768)\n" ] } ], "source": [ "print(params[\"wte\"])\n", "print(\"Token embedding weight tensor dimensions:\", params[\"wte\"].shape)" ] }, { "cell_type": "markdown", "id": "466e100c-294e-4afc-a70a-2f398ac4c104", "metadata": {}, "source": [ "- 此外,`model_size` 参数还支持 \"355M\"、\"774M\" 和 \"1558M\" 等选项。\n", "- 下图总结了这些不同规模模型之间的主要差异:" ] }, { "cell_type": "markdown", "id": "20f19d32-5aae-4176-9f86-f391672c8f0d", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "ea6e5076-f08d-41fc-bd8b-1cfe53538f41", "metadata": {}, "source": [ "- 在上述操作中,我们已经成功将 124M 的 GPT-2 模型权重加载到 Python 中,但仍需将这些权重传输到我们的 `GPTModel` 实例中。\n", "- 首先,我们需要初始化一个新的 `GPTModel` 实例。\n", "- 需要注意的是,原始的 GPT 模型在多头注意力模块中为查询、键和值矩阵的线性层初始化了带有偏置向量的权重,这种做法既不必要也不推荐。然而,为了正确加载权重,我们在实现中必须将 `qkv_bias` 参数设置为 `True`。\n", "- 此外,我们使用了原始 GPT-2 模型所支持的 `1024` token 的上下文长度。" ] }, { "cell_type": "code", "execution_count": 53, "id": "9fef90dd-0654-4667-844f-08e28339ef7d", "metadata": {}, "outputs": [], "source": [ "# Define model configurations in a dictionary for compactness\n", "model_configs = {\n", " \"gpt2-small (124M)\": {\"emb_dim\": 768, \"n_layers\": 12, \"n_heads\": 12},\n", " \"gpt2-medium (355M)\": {\"emb_dim\": 1024, \"n_layers\": 24, \"n_heads\": 16},\n", " \"gpt2-large (774M)\": {\"emb_dim\": 1280, \"n_layers\": 36, \"n_heads\": 20},\n", " \"gpt2-xl (1558M)\": {\"emb_dim\": 1600, \"n_layers\": 48, \"n_heads\": 25},\n", "}\n", "#把每个大小的模型都与先载入并确定好\n", "# Copy the base configuration and update with specific model settings\n", "model_name = \"gpt2-small (124M)\" # Example model name\n", "NEW_CONFIG = GPT_CONFIG_124M.copy()\n", "NEW_CONFIG.update(model_configs[model_name])\n", "NEW_CONFIG.update({\"context_length\": 1024, \"qkv_bias\": True})\n", "\n", "gpt = GPTModel(NEW_CONFIG)\n", "gpt.eval();" ] }, { "cell_type": "markdown", "id": "272f29ac-8342-4b3d-a57d-9b0166ced314", "metadata": {}, "source": [ "- 接下来的任务是将 OpenAI 的权重分配到我们 `GPTModel` 实例中对应的权重张量中。" ] }, { "cell_type": "code", "execution_count": 54, "id": "f9a92229-c002-49a6-8cfb-248297ad8296", "metadata": {}, "outputs": [], "source": [ "def assign(left, right):\n", " if left.shape != right.shape:\n", " raise ValueError(f\"Shape mismatch. Left: {left.shape}, Right: {right.shape}\")\n", " return torch.nn.Parameter(torch.tensor(right))" ] }, { "cell_type": "code", "execution_count": 55, "id": "f22d5d95-ca5a-425c-a9ec-fc432a12d4e9", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def load_weights_into_gpt(gpt, params):\n", " gpt.pos_emb.weight = assign(gpt.pos_emb.weight, params['wpe'])\n", " #位置权重\n", " gpt.tok_emb.weight = assign(gpt.tok_emb.weight, params['wte'])\n", " #单词全中\n", " for b in range(len(params[\"blocks\"])):\n", " #三个参数的输入\n", " q_w, k_w, v_w = np.split(\n", " (params[\"blocks\"][b][\"attn\"][\"c_attn\"])[\"w\"], 3, axis=-1)\n", " gpt.trf_blocks[b].att.W_query.weight = assign(\n", " gpt.trf_blocks[b].att.W_query.weight, q_w.T)\n", " gpt.trf_blocks[b].att.W_key.weight = assign(\n", " gpt.trf_blocks[b].att.W_key.weight, k_w.T)\n", " gpt.trf_blocks[b].att.W_value.weight = assign(\n", " gpt.trf_blocks[b].att.W_value.weight, v_w.T)\n", "\n", " q_b, k_b, v_b = np.split(\n", " (params[\"blocks\"][b][\"attn\"][\"c_attn\"])[\"b\"], 3, axis=-1)\n", " gpt.trf_blocks[b].att.W_query.bias = assign(\n", " gpt.trf_blocks[b].att.W_query.bias, q_b)\n", " gpt.trf_blocks[b].att.W_key.bias = assign(\n", " gpt.trf_blocks[b].att.W_key.bias, k_b)\n", " gpt.trf_blocks[b].att.W_value.bias = assign(\n", " gpt.trf_blocks[b].att.W_value.bias, v_b)\n", "\n", " gpt.trf_blocks[b].att.out_proj.weight = assign(\n", " gpt.trf_blocks[b].att.out_proj.weight, \n", " params[\"blocks\"][b][\"attn\"][\"c_proj\"][\"w\"].T)\n", " gpt.trf_blocks[b].att.out_proj.bias = assign(\n", " gpt.trf_blocks[b].att.out_proj.bias, \n", " params[\"blocks\"][b][\"attn\"][\"c_proj\"][\"b\"])\n", "\n", " gpt.trf_blocks[b].ff.layers[0].weight = assign(\n", " gpt.trf_blocks[b].ff.layers[0].weight, \n", " params[\"blocks\"][b][\"mlp\"][\"c_fc\"][\"w\"].T)\n", " gpt.trf_blocks[b].ff.layers[0].bias = assign(\n", " gpt.trf_blocks[b].ff.layers[0].bias, \n", " params[\"blocks\"][b][\"mlp\"][\"c_fc\"][\"b\"])\n", " gpt.trf_blocks[b].ff.layers[2].weight = assign(\n", " gpt.trf_blocks[b].ff.layers[2].weight, \n", " params[\"blocks\"][b][\"mlp\"][\"c_proj\"][\"w\"].T)\n", " gpt.trf_blocks[b].ff.layers[2].bias = assign(\n", " gpt.trf_blocks[b].ff.layers[2].bias, \n", " params[\"blocks\"][b][\"mlp\"][\"c_proj\"][\"b\"])\n", "\n", " gpt.trf_blocks[b].norm1.scale = assign(\n", " gpt.trf_blocks[b].norm1.scale, \n", " params[\"blocks\"][b][\"ln_1\"][\"g\"])\n", " gpt.trf_blocks[b].norm1.shift = assign(\n", " gpt.trf_blocks[b].norm1.shift, \n", " params[\"blocks\"][b][\"ln_1\"][\"b\"])\n", " gpt.trf_blocks[b].norm2.scale = assign(\n", " gpt.trf_blocks[b].norm2.scale, \n", " params[\"blocks\"][b][\"ln_2\"][\"g\"])\n", " gpt.trf_blocks[b].norm2.shift = assign(\n", " gpt.trf_blocks[b].norm2.shift, \n", " params[\"blocks\"][b][\"ln_2\"][\"b\"])\n", "\n", " gpt.final_norm.scale = assign(gpt.final_norm.scale, params[\"g\"])\n", " gpt.final_norm.shift = assign(gpt.final_norm.shift, params[\"b\"])\n", " gpt.out_head.weight = assign(gpt.out_head.weight, params[\"wte\"])\n", " \n", "#主要目的是将预训练的模型参数加载到一个gpt中\n", "load_weights_into_gpt(gpt, params)\n", "gpt.to(device);" ] }, { "cell_type": "markdown", "id": "4f7472cb-54dc-4311-96d8-b2694f885cee", "metadata": {}, "source": [ "- 如果模型正确加载了,我们可以用先前的`generate` :" ] }, { "cell_type": "code", "execution_count": 56, "id": "1f690253-f845-4347-b7b6-43fabbd2affa", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Output text:\n", " Every effort moves you toward finding an ideal new way to practice something!\n", "\n", "What makes us want to be on top of that?\n", "\n", "\n" ] } ], "source": [ "torch.manual_seed(123)\n", "\n", "token_ids = generate(\n", " model=gpt,\n", " idx=text_to_token_ids(\"Every effort moves you\", tokenizer).to(device),\n", " max_new_tokens=25,\n", " context_size=NEW_CONFIG[\"context_length\"],\n", " top_k=50,\n", " temperature=1.5\n", ")\n", "\n", "print(\"Output text:\\n\", token_ids_to_text(token_ids, tokenizer))" ] }, { "cell_type": "markdown", "id": "28493b9b-a1ae-4f31-87bc-c10ee4447f44", "metadata": {}, "source": [ "- 我们可以确认模型权重已正确加载,因为模型能够生成连贯的文本;如果我们在加载过程中出现任何错误,模型将无法实现这一点。\n", "\n", "- 如果您想了解另一种从 Hugging Face Hub 加载权重的方法,请参阅 [../02_alternative_weight_loading](../02_alternative_weight_loading)。\n", "\n", "- 如果您对 GPT 架构与 Llama 架构(Meta AI 开发的一种流行大语言模型)之间的比较感兴趣,请查看附加内容:[../07_gpt_to_llama](../07_gpt_to_llama)。" ] }, { "cell_type": "markdown", "id": "f2a66474-230d-4180-a8ff-843e04f1f1c4", "metadata": {}, "source": [ "## 总结与收获" ] }, { "cell_type": "markdown", "id": "fc7ed189-a633-458c-bf12-4f70b42684b8", "metadata": {}, "source": [ "- 请参考 [./gpt_train.py](./gpt_train.py) 脚本,这是一个独立的训练脚本。\n", "- [./gpt_generate.py](./gpt_generate.py) 脚本会加载 OpenAI 提供的预训练权重,并根据提示生成文本。\n", "- 您可以在 [./exercise-solutions.ipynb](./exercise-solutions.ipynb) 中找到练习的解答。" ] } ], "metadata": { "accelerator": "GPU", "colab": { "gpuType": "A100", "machine_shape": "hm", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 5 }