{ "cells": [ { "cell_type": "markdown", "id": "ba450fb1-8a26-4894-ab7a-5d7bfefe90ce", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Supplementary code for the Build a Large Language Model From Scratch book by Sebastian Raschka
\n", "
Code repository: https://github.com/rasbt/LLMs-from-scratch\n", "
汉化的库: https://github.com/GoatCsu/CN-LLMs-from-scratch.git\n", "
\n", "
\n", "\n", "
\n" ] }, { "cell_type": "markdown", "id": "51c9672d-8d0c-470d-ac2d-1271f8ec3f14", "metadata": {}, "source": [ "# Chapter 6 练习" ] }, { "cell_type": "markdown", "id": "5fea8be3-30a1-4623-a6d7-b095c6c1092e", "metadata": {}, "source": [ "## Exercise 6.1: 增加上下文长度" ] }, { "cell_type": "markdown", "id": "5860ba9f-2db3-4480-b96b-4be1c68981eb", "metadata": {}, "source": [ "我们可以通过将最大长度设置为 1024 来填充输入至模型支持的最大token数:\n", "\n", "```python\n", "max_length = 1024\n", "\n", "train_dataset = SpamDataset(base_path / \"train.csv\", max_length=max_length, tokenizer=tokenizer)\n", "val_dataset = SpamDataset(base_path / \"validation.csv\", max_length=max_length, tokenizer=tokenizer)\n", "test_dataset = SpamDataset(base_path / \"test.csv\", max_length=max_length, tokenizer=tokenizer)\n", "```\n", "\n", "或者,我们也可以通过以下方式定义 `max_length`:\n", "\n", "``` python\n", "max_length = model.pos_emb.weight.shape[0]\n", "```\n", "\n", "or\n", "\n", "```python\n", "max_length = BASE_CONFIG[\"context_length\"]\n", "```" ] }, { "cell_type": "markdown", "id": "2b0f4d5d-17fd-4265-93d8-ea08a22fdaf8", "metadata": {}, "source": [ "为了方便起见,您可以通过以下命令运行此实验:\n", "```bash\n", "python additional-experiments.py --context_length \"model_context_length\"\n", "```\n", "使用 ../02_bonus_additional-experiments 文件夹中的代码,这将导致测试准确率大幅下降,达到 78.33%(而在主章节中为 95.67%)。" ] }, { "cell_type": "markdown", "id": "5a780455-f52a-48d1-ab82-6afd40bcad8b", "metadata": {}, "source": [ "## Exercise 6.2: 微调整个模型" ] }, { "cell_type": "markdown", "id": "56aa5208-aa29-4165-a0ec-7480754e2a18", "metadata": {}, "source": [ "我们可以通过移除以下代码行来微调整个模型,而不仅仅是微调最后一个 Transformer 层:\n", "```python\n", "for param in model.parameters():\n", " param.requires_grad = False\n", "```\n", "\n", "为了方便起见,您可以通过以下命令运行此实验:\n", "\n", "```bash\n", "python additional-experiments.py --trainable_layers all\n", "```\n", "\n", "用这儿的的代码[../02_bonus_additional-experiments](../02_bonus_additional-experiments) 可以提升一个点!" ] }, { "cell_type": "markdown", "id": "2269bce3-f2b5-4a76-a692-5977c75a57b6", "metadata": {}, "source": [ "## Exercise 6.3: 微调第一个token与最后一个token" ] }, { "cell_type": "markdown", "id": "7418a629-51b6-4aa2-83b7-bc0261bc370f", "metadata": {}, "source": [ "我们可以通过将以下代码中的最后一个输出token改为第一个输出token来微调第一个token:\n", "\n", "```python\n", "model(input_batch)[:, -1, :]\n", "```\n", "\n", "to\n", "\n", "```python\n", "model(input_batch)[:, 0, :]\n", "```\n", "\n", "在代码的所有相关位置进行修改。\n", "\n", "为了方便起见,您可以通过以下命令运行此实验:\n", "\n", "```\n", "python additional-experiments.py --trainable_token first\n", "```\n", "\n", "[../02_bonus_additional-experiments](../02_bonus_additional-experiments) 这里的代码将导致测试准确率大幅下降,降至 75.00%(在主章节中为 95.67%)。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.6" } }, "nbformat": 4, "nbformat_minor": 5 }