{ "cells": [ { "cell_type": "markdown", "id": "cbbc1fe3-bff1-4631-bf35-342e19c54cc0", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Supplementary code for the Build a Large Language Model From Scratch book by Sebastian Raschka
\n", "
Code repository: https://github.com/rasbt/LLMs-from-scratch\n", "
\n", "
\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "2b022374-e3f6-4437-b86f-e6f8f94cbebc", "metadata": {}, "source": [ "# **扩展 Tiktoken BPE 分词器,添加新 Token** " ] }, { "cell_type": "markdown", "id": "bcd624b1-2060-49af-bbf6-40517a58c128", "metadata": {}, "source": [ "- 本笔记本介绍 **如何扩展现有的 BPE 分词器**,并重点讲解 **如何在 OpenAI 的 [Tiktoken](https://github.com/openai/tiktoken) 实现中添加新 Token**。 \n", "- 如果需要 **分词的基础知识**,请参考 [第 2 章](https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/01_main-chapter-code/ch02.ipynb) 和 **BPE from Scratch** [教程](link)。 \n", "- 例如,假设我们有一个 **GPT-2 分词器**,并希望对以下文本进行编码:" ] }, { "cell_type": "code", "execution_count": 1, "id": "798d4355-a146-48a8-a1a5-c5cec91edf2c", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[15496, 11, 2011, 3791, 30642, 62, 16, 318, 257, 649, 11241, 13, 220, 50256]\n" ] } ], "source": [ "import tiktoken\n", "\n", "base_tokenizer = tiktoken.get_encoding(\"gpt2\")\n", "sample_text = \"Hello, MyNewToken_1 is a new token. <|endoftext|>\"\n", "\n", "token_ids = base_tokenizer.encode(sample_text, allowed_special={\"<|endoftext|>\"})\n", "print(token_ids)" ] }, { "cell_type": "markdown", "id": "5b09b19b-772d-4449-971b-8ab052ee726d", "metadata": {}, "source": [ "- **遍历每个 Token ID**,可以帮助我们更好地理解 **如何通过词汇表解码 Token ID**: " ] }, { "cell_type": "code", "execution_count": 2, "id": "21fd634b-bb4c-4ba3-8b69-9322b727bf58", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "15496 -> Hello\n", "11 -> ,\n", "2011 -> My\n", "3791 -> New\n", "30642 -> Token\n", "62 -> _\n", "16 -> 1\n", "318 -> is\n", "257 -> a\n", "649 -> new\n", "11241 -> token\n", "13 -> .\n", "220 -> \n", "50256 -> <|endoftext|>\n" ] } ], "source": [ "for token_id in token_ids:\n", " print(f\"{token_id} -> {base_tokenizer.decode([token_id])}\")" ] }, { "cell_type": "markdown", "id": "fd5b1b9b-b1a9-489e-9711-c15a8e081813", "metadata": {}, "source": [ "- 如上所示,**\"MyNewToken_1\" 被拆分为 5 个子词 Token**,这对于 BPE 处理 **未知词汇** 时是正常行为。 \n", "- 但如果 **\"MyNewToken_1\" 是一个特殊 Token**,我们希望它像 **`\"<|endoftext|>\"`** 一样 **作为单个 Token 进行编码**,本笔记本将讲解如何实现该功能。 " ] }, { "cell_type": "markdown", "id": "65f62ab6-df96-4f88-ab9a-37702cd30f5f", "metadata": {}, "source": [ " \n", "## 1. 添加特殊的token" ] }, { "cell_type": "markdown", "id": "c4379fdb-57ba-4a75-9183-0aee0836c391", "metadata": {}, "source": [ "- 需要注意,我们必须 **将新 Token 作为特殊 Token 添加**。原因在于: \n", " - **新 Token 在原始分词器训练过程中并未出现**,因此 **没有对应的“合并规则”(merges)**。 \n", " - 即使我们手动创建这些合并规则,**也很难在不破坏现有分词体系的情况下,将其正确整合**(详情请参考 **BPE from Scratch** 笔记本 [链接] 了解“合并规则”)。 \n", "\n", "- 例如,假设我们希望 **添加 2 个新 Token**:" ] }, { "cell_type": "code", "execution_count": 3, "id": "265f1bba-c478-497d-b7fc-f4bd191b7d55", "metadata": {}, "outputs": [], "source": [ "# Define custom tokens and their token IDs\n", "custom_tokens = [\"MyNewToken_1\", \"MyNewToken_2\"]\n", "custom_token_ids = {\n", " token: base_tokenizer.n_vocab + i for i, token in enumerate(custom_tokens)\n", "}" ] }, { "cell_type": "markdown", "id": "1c6f3d98-1ab6-43cf-9ae2-2bf53860f99e", "metadata": {}, "source": [ "- 接下来,我们创建一个自定义的 **`Encoding`** 对象,用于存储 **特殊 Token**,具体如下: " ] }, { "cell_type": "code", "execution_count": 4, "id": "1f519852-59ea-4069-a8c7-0f647bfaea09", "metadata": {}, "outputs": [], "source": [ "# Create a new Encoding object with extended tokens\n", "extended_tokenizer = tiktoken.Encoding(\n", " name=\"gpt2_custom\",\n", " pat_str=base_tokenizer._pat_str,\n", " mergeable_ranks=base_tokenizer._mergeable_ranks,\n", " special_tokens={**base_tokenizer._special_tokens, **custom_token_ids},\n", ")" ] }, { "cell_type": "markdown", "id": "90af6cfa-e0cc-4c80-89dc-3a824e7bdeb2", "metadata": {}, "source": [ "- 就这样!现在我们可以验证 **分词器是否能够正确编码示例文本**: " ] }, { "cell_type": "markdown", "id": "153e8e1d-c4cb-41ff-9c55-1701e9bcae1c", "metadata": {}, "source": [ "- 如我们所见,**新添加的 Token**(`50257` 和 `50258`)**已成功编码到输出中**: " ] }, { "cell_type": "code", "execution_count": 5, "id": "eccc78a4-1fd4-47ba-a114-83ee0a3aec31", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[36674, 2420, 351, 220, 50257, 290, 220, 50258, 13, 220, 50256]\n" ] } ], "source": [ "special_tokens_set = set(custom_tokens) | {\"<|endoftext|>\"}\n", "\n", "token_ids = extended_tokenizer.encode(\n", " \"Sample text with MyNewToken_1 and MyNewToken_2. <|endoftext|>\",\n", " allowed_special=special_tokens_set\n", ")\n", "print(token_ids)" ] }, { "cell_type": "markdown", "id": "dc0547c1-bbb5-4915-8cf4-caaebcf922eb", "metadata": {}, "source": [ "- 同样,我们还可以 **逐个 Token 检查编码结果**: " ] }, { "cell_type": "code", "execution_count": 6, "id": "7583eff9-b10d-4e3d-802c-f0464e1ef030", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "36674 -> Sample\n", "2420 -> text\n", "351 -> with\n", "220 -> \n", "50257 -> MyNewToken_1\n", "290 -> and\n", "220 -> \n", "50258 -> MyNewToken_2\n", "13 -> .\n", "220 -> \n", "50256 -> <|endoftext|>\n" ] } ], "source": [ "for token_id in token_ids:\n", " print(f\"{token_id} -> {extended_tokenizer.decode([token_id])}\")" ] }, { "cell_type": "markdown", "id": "17f0764e-e5a9-4226-a384-18c11bd5fec3", "metadata": {}, "source": [ "- 如上所示,我们已成功 **更新分词器**。 \n", "- 但如果要将其用于 **预训练的 LLM**,还需要 **更新 LLM 的嵌入层(embedding layer)和输出层(output layer)**,具体方法将在 **下一节** 进行讲解。 " ] }, { "cell_type": "markdown", "id": "8ec7f98d-8f09-4386-83f0-9bec68ef7f66", "metadata": {}, "source": [ " \n", "## 2. 更新预训练的LLM" ] }, { "cell_type": "markdown", "id": "b8a4f68b-04e9-4524-8df4-8718c7b566f2", "metadata": {}, "source": [ "- 本节将讲解 **如何在更新分词器后,对现有的预训练 LLM 进行相应调整**。 \n", "- 我们将使用 **书中主章节所采用的原始预训练 GPT-2 模型** 进行演示。 " ] }, { "cell_type": "markdown", "id": "1a9b252e-1d1d-4ddf-b9f3-95bd6ba505a9", "metadata": {}, "source": [ " \n", "### 2.1 加载预训练的GPT" ] }, { "cell_type": "code", "execution_count": 7, "id": "ded29b4e-9b39-4191-b61c-29d6b2360bae", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "checkpoint: 100%|███████████████████████████| 77.0/77.0 [00:00<00:00, 34.4kiB/s]\n", "encoder.json: 100%|███████████████████████| 1.04M/1.04M [00:00<00:00, 4.78MiB/s]\n", "hparams.json: 100%|█████████████████████████| 90.0/90.0 [00:00<00:00, 24.7kiB/s]\n", "model.ckpt.data-00000-of-00001: 100%|███████| 498M/498M [00:33<00:00, 14.7MiB/s]\n", "model.ckpt.index: 100%|███████████████████| 5.21k/5.21k [00:00<00:00, 1.05MiB/s]\n", "model.ckpt.meta: 100%|██████████████████████| 471k/471k [00:00<00:00, 2.33MiB/s]\n", "vocab.bpe: 100%|████████████████████████████| 456k/456k [00:00<00:00, 2.45MiB/s]\n" ] } ], "source": [ "# Relative import from the gpt_download.py contained in this folder\n", "from gpt_download import download_and_load_gpt2\n", "\n", "settings, params = download_and_load_gpt2(model_size=\"124M\", models_dir=\"gpt2\")" ] }, { "cell_type": "code", "execution_count": 8, "id": "93dc0d8e-b549-415b-840e-a00023bddcf9", "metadata": {}, "outputs": [], "source": [ "# Relative import from the gpt_download.py contained in this folder\n", "from previous_chapters import GPTModel\n", "\n", "GPT_CONFIG_124M = {\n", " \"vocab_size\": 50257, # Vocabulary size\n", " \"context_length\": 256, # Shortened context length (orig: 1024)\n", " \"emb_dim\": 768, # Embedding dimension\n", " \"n_heads\": 12, # Number of attention heads\n", " \"n_layers\": 12, # Number of layers\n", " \"drop_rate\": 0.1, # Dropout rate\n", " \"qkv_bias\": False # Query-key-value bias\n", "}\n", "\n", "# Define model configurations in a dictionary for compactness\n", "model_configs = {\n", " \"gpt2-small (124M)\": {\"emb_dim\": 768, \"n_layers\": 12, \"n_heads\": 12},\n", " \"gpt2-medium (355M)\": {\"emb_dim\": 1024, \"n_layers\": 24, \"n_heads\": 16},\n", " \"gpt2-large (774M)\": {\"emb_dim\": 1280, \"n_layers\": 36, \"n_heads\": 20},\n", " \"gpt2-xl (1558M)\": {\"emb_dim\": 1600, \"n_layers\": 48, \"n_heads\": 25},\n", "}\n", "\n", "# Copy the base configuration and update with specific model settings\n", "model_name = \"gpt2-small (124M)\" # Example model name\n", "NEW_CONFIG = GPT_CONFIG_124M.copy()\n", "NEW_CONFIG.update(model_configs[model_name])\n", "NEW_CONFIG.update({\"context_length\": 1024, \"qkv_bias\": True})\n", "\n", "gpt = GPTModel(NEW_CONFIG)\n", "gpt.eval();" ] }, { "cell_type": "markdown", "id": "83f898c0-18f4-49ce-9b1f-3203a277b29e", "metadata": {}, "source": [ " \n", "### 2.2 使用预训练过的GPT" ] }, { "cell_type": "markdown", "id": "a5a1f5e1-e806-4c60-abaa-42ae8564908c", "metadata": {}, "source": [ "- 接下来,我们使用 **原始分词器** 和 **更新后的分词器** 对以下示例文本进行编码,并进行对比: " ] }, { "cell_type": "code", "execution_count": 9, "id": "9a88017d-cc8f-4ba1-bba9-38161a30f673", "metadata": { "tags": [] }, "outputs": [], "source": [ "sample_text = \"Sample text with MyNewToken_1 and MyNewToken_2. <|endoftext|>\"\n", "\n", "original_token_ids = base_tokenizer.encode(\n", " sample_text, allowed_special={\"<|endoftext|>\"}\n", ")" ] }, { "cell_type": "code", "execution_count": 10, "id": "1ee01bc3-ca24-497b-b540-3d13c52c29ed", "metadata": {}, "outputs": [], "source": [ "new_token_ids = extended_tokenizer.encode(\n", " \"Sample text with MyNewToken_1 and MyNewToken_2. <|endoftext|>\",\n", " allowed_special=special_tokens_set\n", ")" ] }, { "cell_type": "markdown", "id": "1143106b-68fe-4234-98ad-eaff420a4d08", "metadata": {}, "source": [ "- 现在,我们将 **原始 Token ID 输入到 GPT 模型中**: " ] }, { "cell_type": "code", "execution_count": 11, "id": "6b06827f-b411-42cc-b978-5c1d568a3200", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[ 0.2204, 0.8901, 1.0138, ..., 0.2585, -0.9192, -0.2298],\n", " [ 0.6745, -0.0726, 0.8218, ..., -0.1768, -0.4217, 0.0703],\n", " [-0.2009, 0.0814, 0.2417, ..., 0.3166, 0.3629, 1.3400],\n", " ...,\n", " [ 0.1137, -0.1258, 2.0193, ..., -0.0314, -0.4288, -0.1487],\n", " [-1.1983, -0.2050, -0.1337, ..., -0.0849, -0.4863, -0.1076],\n", " [-1.0675, -0.5905, 0.2873, ..., -0.0979, -0.8713, 0.8415]]])\n" ] } ], "source": [ "import torch\n", "\n", "with torch.no_grad():\n", " out = gpt(torch.tensor([original_token_ids]))\n", "\n", "print(out)" ] }, { "cell_type": "markdown", "id": "082c7a78-35a8-473e-a08d-b099a6348a74", "metadata": {}, "source": [ "- 如上所示,模型能够正常运行 **(为简洁起见,代码仅显示原始输出,未将其转换回文本)**。 \n", "- 若需了解 **如何将模型输出转换回文本**,请参考 **第 5 章 [链接] 的 `generate` 函数(5.3.3 节)**。 " ] }, { "cell_type": "markdown", "id": "628265b5-3dde-44e7-bde2-8fc594a2547d", "metadata": {}, "source": [ "- 如果我们 **使用更新后的分词器生成的 Token ID** 再次输入模型,会发生什么情况? " ] }, { "cell_type": "markdown", "id": "9796ad09-787c-4c25-a7f5-6d1dfe048ac3", "metadata": {}, "source": [ "```python\n", "with torch.no_grad():\n", " gpt(torch.tensor([new_token_ids]))\n", "\n", "print(out)\n", "\n", "...\n", "# IndexError: index out of range in self\n", "```" ] }, { "cell_type": "markdown", "id": "77d00244-7e40-4de0-942e-e15cdd8e3b18", "metadata": {}, "source": [ "- 如我们所见,这会导致 **索引错误(Index Error)**。 \n", "- 这是因为 **GPT 模型的输入嵌入层(Embedding Layer)和输出层(Output Layer)** 预设了固定的 **词汇表大小(Vocabulary Size)**,而更新后的分词器可能已超出该范围:\n", "\n", "" ] }, { "cell_type": "markdown", "id": "dec38b24-c845-4090-96a4-0d3c4ec241d6", "metadata": {}, "source": [ " \n", "### **2.3 更新嵌入层(Updating the Embedding Layer)** " ] }, { "cell_type": "markdown", "id": "b1328726-8297-4162-878b-a5daff7de742", "metadata": {}, "source": [ "- 我们首先 **更新模型的嵌入层(Embedding Layer)**。 \n", "- 首先,需要注意 **嵌入层包含 50,257 个条目**,这正好对应于 **原始词汇表的大小**: " ] }, { "cell_type": "code", "execution_count": 12, "id": "23ecab6e-1232-47c7-a318-042f90e1dff3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Embedding(50257, 768)" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gpt.tok_emb" ] }, { "cell_type": "markdown", "id": "d760c683-d082-470a-bff8-5a08b30d3b61", "metadata": {}, "source": [ "- 我们希望 **扩展嵌入层**,**增加 2 个新 Token**。 \n", "- 简而言之,我们 **创建一个更大的嵌入层**,然后 **将原始嵌入层的权重复制到新嵌入层中**。 " ] }, { "cell_type": "code", "execution_count": 13, "id": "4ec5c48e-c6fe-4e84-b290-04bd4da9483f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Embedding(50259, 768)\n" ] } ], "source": [ "num_tokens, emb_size = gpt.tok_emb.weight.shape\n", "new_num_tokens = num_tokens + 2\n", "\n", "# Create a new embedding layer\n", "new_embedding = torch.nn.Embedding(new_num_tokens, emb_size)\n", "\n", "# Copy weights from the old embedding layer\n", "new_embedding.weight.data[:num_tokens] = gpt.tok_emb.weight.data\n", "\n", "# Replace the old embedding layer with the new one in the model\n", "gpt.tok_emb = new_embedding\n", "\n", "print(gpt.tok_emb)" ] }, { "cell_type": "markdown", "id": "63954928-31a5-4e7e-9688-2e0c156b7302", "metadata": {}, "source": [ "- 如上所示,我们的 **嵌入层(Embedding Layer)已成功扩展**。 " ] }, { "cell_type": "markdown", "id": "6e68bea5-255b-47bb-b352-09ea9539bc25", "metadata": {}, "source": [ " \n", "### **2.4 更新输出层(Updating the Output Layer)** " ] }, { "cell_type": "markdown", "id": "90a4a519-bf0f-4502-912d-ef0ac7a9deab", "metadata": {}, "source": [ "- 接下来,我们需要 **扩展输出层(Output Layer)**,该层当前包含 **50,257 个输出特征**,其大小与嵌入层的词汇表大小相同。 \n", "- **(顺带一提,你可能会对额外的学习资料感兴趣,其中探讨了 PyTorch 中 `Linear` 层与 `Embedding` 层的相似性。)** " ] }, { "cell_type": "code", "execution_count": 14, "id": "6105922f-d889-423e-bbcc-bc49156d78df", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Linear(in_features=768, out_features=50257, bias=False)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "gpt.out_head" ] }, { "cell_type": "markdown", "id": "29f1ff24-9c00-40f6-a94f-82d03aaf0890", "metadata": {}, "source": [ "- **扩展输出层(Output Layer)的过程** 与 **扩展嵌入层(Embedding Layer)** 类似: " ] }, { "cell_type": "code", "execution_count": 15, "id": "354589db-b148-4dae-8068-62132e3fb38e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Linear(in_features=768, out_features=50259, bias=True)\n" ] } ], "source": [ "original_out_features, original_in_features = gpt.out_head.weight.shape\n", "\n", "# Define the new number of output features (e.g., adding 2 new tokens)\n", "new_out_features = original_out_features + 2\n", "\n", "# Create a new linear layer with the extended output size\n", "new_linear = torch.nn.Linear(original_in_features, new_out_features)\n", "\n", "# Copy the weights and biases from the original linear layer\n", "with torch.no_grad():\n", " new_linear.weight[:original_out_features] = gpt.out_head.weight\n", " if gpt.out_head.bias is not None:\n", " new_linear.bias[:original_out_features] = gpt.out_head.bias\n", "\n", "# Replace the original linear layer with the new one\n", "gpt.out_head = new_linear\n", "\n", "print(gpt.out_head)" ] }, { "cell_type": "markdown", "id": "df5d2205-1fae-4a4f-a7bd-fa8fc37eeec2", "metadata": {}, "source": [ "- 首先,我们先 **使用原始 Token ID 测试模型**,观察其是否仍能正常运行: " ] }, { "cell_type": "code", "execution_count": 16, "id": "df604bbc-6c13-4792-8ba8-ecb692117c25", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[ 0.2267, 0.9132, 1.0494, ..., -0.2330, -0.3008, -1.1458],\n", " [ 0.6808, -0.0495, 0.8574, ..., 0.0671, 0.5572, -0.7873],\n", " [-0.1947, 0.1045, 0.2773, ..., 1.3368, 0.8479, -0.9660],\n", " ...,\n", " [ 0.1200, -0.1027, 2.0549, ..., -0.1519, -0.2096, 0.5651],\n", " [-1.1920, -0.1819, -0.0981, ..., -0.1108, 0.8435, -0.3771],\n", " [-1.0612, -0.5674, 0.3229, ..., 0.8383, -0.7121, -0.4850]]])\n" ] } ], "source": [ "with torch.no_grad():\n", " output = gpt(torch.tensor([original_token_ids]))\n", "print(output)" ] }, { "cell_type": "markdown", "id": "3d80717e-50e6-4927-8129-0aadfa2628f5", "metadata": {}, "source": [ "- 接下来,让我们 **测试更新后的模型在新增 Token 上的表现**。 " ] }, { "cell_type": "code", "execution_count": 17, "id": "75f11ec9-bdd2-440f-b8c8-6646b75891c6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[[ 0.2267, 0.9132, 1.0494, ..., -0.2330, -0.3008, -1.1458],\n", " [ 0.6808, -0.0495, 0.8574, ..., 0.0671, 0.5572, -0.7873],\n", " [-0.1947, 0.1045, 0.2773, ..., 1.3368, 0.8479, -0.9660],\n", " ...,\n", " [-0.0656, -1.2451, 0.7957, ..., -1.2124, 0.1044, 0.5088],\n", " [-1.1561, -0.7380, -0.0645, ..., -0.4373, 1.1401, -0.3903],\n", " [-0.8961, -0.6437, -0.1667, ..., 0.5663, -0.5862, -0.4020]]])\n" ] } ], "source": [ "with torch.no_grad():\n", " output = gpt(torch.tensor([new_token_ids]))\n", "print(output)" ] }, { "cell_type": "markdown", "id": "d88a1bba-db01-4090-97e4-25dfc23ed54c", "metadata": {}, "source": [ "- 如我们所见,**模型已成功支持扩展后的 Token 集**。 \n", "- 实际应用中,我们通常需要 **对模型进行微调(Fine-tuning)或持续预训练(Continual Pretraining)**,特别是 **新扩展的嵌入层(Embedding Layer)和输出层(Output Layer)**,以确保模型能够有效学习新 Token 的表示。 " ] }, { "cell_type": "markdown", "id": "6de573ad-0338-40d9-9dad-de60ae349c4f", "metadata": {}, "source": [ "### **关于权重共享(Weight Tying)** \n", "\n", "- **如果模型使用了权重共享(Weight Tying)**,即 **嵌入层(Embedding Layer)与输出层(Output Layer)共享相同的权重**(类似于 **Llama 3** [链接]),那么 **扩展输出层的过程将更为简单**。 \n", "- 在这种情况下,我们 **只需直接将嵌入层的权重复制到输出层**: " ] }, { "cell_type": "code", "execution_count": 18, "id": "4cbc5f51-c7a8-49d0-b87f-d3d87510953b", "metadata": {}, "outputs": [], "source": [ "gpt.out_head.weight = gpt.tok_emb.weight" ] }, { "cell_type": "code", "execution_count": 19, "id": "d0d553a8-edff-40f0-bdc4-dff900e16caf", "metadata": {}, "outputs": [], "source": [ "with torch.no_grad():\n", " output = gpt(torch.tensor([new_token_ids]))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 5 }