{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "c109c0e7-1aad-42ab-88d8-0990559b59e5",
   "metadata": {},
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"https://mng.bz/lZ5B\">Build a Reasoning Model (From Scratch)</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/reasoning-from-scratch\">https://github.com/rasbt/reasoning-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"https://mng.bz/lZ5B\"><img src=\"https://sebastianraschka.com/images/reasoning-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88c613ef-f4e5-49c3-b19d-3cf36dce0bf1",
   "metadata": {},
   "source": [
    "# Appendix D: Using larger LLMs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9c1cd731-7e23-4430-8ec6-c4a86a177f81",
   "metadata": {},
   "source": [
    "Packages that are being used in this notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b6882804-a2c4-4c98-ad42-1b108cbffa5b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "reasoning_from_scratch version: 0.1.17\n",
      "torch version: 2.10.0\n",
      "tokenizers version: 0.21.4\n"
     ]
    }
   ],
   "source": [
    "from importlib.metadata import version\n",
    "\n",
    "used_libraries = [\n",
    "    \"reasoning_from_scratch\",  # for download functions\n",
    "    \"torch\",\n",
    "    \"tokenizers\"\n",
    "]\n",
    "\n",
    "for lib in used_libraries:\n",
    "    print(f\"{lib} version: {version(lib)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d13703b-f75b-43fe-9c8a-e7459a884f36",
   "metadata": {},
   "source": [
    "- The main chapters use the Qwen3 0.6B base model because it is the smallest model in the\n",
    "Qwen3 family and therefore the easiest to run on consumer hardware\n",
    "- However, the same `Qwen3Model` implementation from appendix C can also be used to load larger dense Qwen3 checkpoints with the same from-scratch PyTorch model code"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e0cfd0a-f08a-4196-adde-619a23ccc24b",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## D.1 Larger dense Qwen3 configurations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97a9fcc6-f3f0-4447-a74a-715a800b1a76",
   "metadata": {},
   "source": [
    "The repository includes configuration dictionaries for several larger dense Qwen3 models (beyond the 0.6B model) in\n",
    "`reasoning_from_scratch.appendix_c` ([reasoning_from_scratch/appendix_c.py](https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/appendix_c.py)):\n",
    "\n",
    "| Model size | Configuration dictionary |\n",
    "| --- | --- |\n",
    "| 1.7B | `QWEN3_CONFIG_1_7B` |\n",
    "| 4B | `QWEN3_CONFIG_4B` |\n",
    "| 8B | `QWEN3_CONFIG_8B` |\n",
    "| 14B | `QWEN3_CONFIG_14B` |\n",
    "| 32B | `QWEN3_CONFIG_32B` |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b82f50d-c04b-4a79-b3e1-73756781bb7d",
   "metadata": {},
   "source": [
    "<img src=\"https://sebastianraschka.com/images/reasoning-from-scratch-images/appendix-d/Appendix_D_F01_raschka.webp\" width=\"500px\">"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5791527a-2c80-400e-aec2-aa7a271eb69d",
   "metadata": {},
   "source": [
    "- As mentioned in the figure above, these are the \"dense\" Qwen3 variants, which can run on single GPUs\n",
    "- There are also \"sparse\" Mixture-of-Experts variants of Qwen3, but they are not supported via this books' code; however, if you are interested in a from-scratch implementation, you can find one here: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch05/11_qwen3\n",
    "- All of these use the same overall architecture pattern as the 0.6B model from appendix C\n",
    "- What changes are the embedding size, number of layers, number of attention heads, and\n",
    "feed-forward hidden dimension"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa4ae6b2-3ab4-4127-b925-16314cf2c90a",
   "metadata": {},
   "source": [
    "- As a rough lower bound, storing weights in bfloat16 requires about 2 bytes per parameter\n",
    "- This means that the checkpoint weights alone are on the order of:\n",
    "\n",
    "| Model size | Rough weight memory in bfloat16 |\n",
    "| --- | --- |\n",
    "| 1.7B | about 3.4 GB |\n",
    "| 4B | about 8 GB |\n",
    "| 8B | about 16 GB |\n",
    "| 14B | about 28 GB |\n",
    "| 32B | about 64 GB |\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "43c711a3-ea2d-4ab1-9bdc-9f6154c53af5",
   "metadata": {},
   "source": [
    "- In practice, the real runtime memory usage is higher because we also need memory for\n",
    "activations, temporary buffers, and often the KV cache"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "123c9c2c-da52-458e-a557-26f4f342e358",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## D.2 Downloading larger checkpoints overview"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8812f03-17d9-4026-922a-3c33f27713d4",
   "metadata": {},
   "source": [
    "- Unlike the 0.6B checkpoints used in the main chapters, larger official Qwen3 models are\n",
    "typically distributed as `safetensors` files, sometimes split across multiple shards\n",
    "- The helper function `download_from_huggingface_from_snapshots` to load these requires some additional packages:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2364d29d-88d3-45a0-bd69-a69060164915",
   "metadata": {},
   "source": [
    "```bash\n",
    "!uv add huggingface_hub safetensors\n",
    "```\n",
    "\n",
    "or\n",
    "\n",
    "```bash\n",
    "!pip install huggingface_hub safetensors\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e5b7c66e-3afe-4f91-bab0-f127351fece8",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## D.3 Loading a larger base model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4de72b45-2d0e-455c-b274-3ff2f7b681e6",
   "metadata": {},
   "source": [
    "- Download weights:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a694d56a-d9ce-467a-b0a4-bf256547afec",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using Apple Silicon GPU (MPS)\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/sebastian/Developer/reasoning-from-scratch/.venv/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
      "  from .autonotebook import tqdm as notebook_tqdm\n",
      "Fetching 13 files: 100%|██████████████████████| 13/13 [00:00<00:00, 2616.79it/s]\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "from reasoning_from_scratch.ch02 import get_device\n",
    "from reasoning_from_scratch.appendix_c import (\n",
    "    download_from_huggingface_from_snapshots\n",
    ")\n",
    "\n",
    "\n",
    "device = get_device()\n",
    "local_dir = Path(\"qwen3-4b-base\")\n",
    "\n",
    "weights = download_from_huggingface_from_snapshots(\n",
    "    repo_id=\"Qwen/Qwen3-4B-Base\",\n",
    "    local_dir=local_dir,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c6fd12a3-0b26-4052-8054-c8e3e61bce96",
   "metadata": {},
   "source": [
    "- Initialize model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "56697fe2-bd0d-47c2-b53c-395d5b4da597",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model uses weight tying.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Qwen3Model(\n",
       "  (tok_emb): Embedding(151936, 2560)\n",
       "  (trf_blocks): ModuleList(\n",
       "    (0-35): 36 x TransformerBlock(\n",
       "      (att): GroupedQueryAttention(\n",
       "        (W_query): Linear(in_features=2560, out_features=4096, bias=False)\n",
       "        (W_key): Linear(in_features=2560, out_features=1024, bias=False)\n",
       "        (W_value): Linear(in_features=2560, out_features=1024, bias=False)\n",
       "        (out_proj): Linear(in_features=4096, out_features=2560, bias=False)\n",
       "        (q_norm): RMSNorm()\n",
       "        (k_norm): RMSNorm()\n",
       "      )\n",
       "      (ff): FeedForward(\n",
       "        (fc1): Linear(in_features=2560, out_features=9728, bias=False)\n",
       "        (fc2): Linear(in_features=2560, out_features=9728, bias=False)\n",
       "        (fc3): Linear(in_features=9728, out_features=2560, bias=False)\n",
       "      )\n",
       "      (norm1): RMSNorm()\n",
       "      (norm2): RMSNorm()\n",
       "    )\n",
       "  )\n",
       "  (final_norm): RMSNorm()\n",
       "  (out_head): Linear(in_features=2560, out_features=151936, bias=False)\n",
       ")"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from reasoning_from_scratch.qwen3 import (\n",
    "    Qwen3Model, load_hf_weights_into_qwen\n",
    ")\n",
    "from reasoning_from_scratch.appendix_c import QWEN3_CONFIG_4B\n",
    "\n",
    "\n",
    "model = Qwen3Model(QWEN3_CONFIG_4B)\n",
    "load_hf_weights_into_qwen(\n",
    "    model,\n",
    "    param_config={\n",
    "        \"n_layers\": QWEN3_CONFIG_4B[\"n_layers\"],\n",
    "        \"hidden_dim\": QWEN3_CONFIG_4B[\"hidden_dim\"],\n",
    "    },\n",
    "    params=weights,\n",
    ")\n",
    "model.to(device)\n",
    "model.eval()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb99f826-49eb-49d4-b649-3395c97d5266",
   "metadata": {},
   "source": [
    "- Load tokenizer:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "1f80ad58-2995-4952-8f90-f76a9aaba3ca",
   "metadata": {},
   "outputs": [],
   "source": [
    "from reasoning_from_scratch.qwen3 import Qwen3Tokenizer\n",
    "import shutil\n",
    "\n",
    "# Note that the original base tokenizer is called \"tokenizer.json\"\n",
    "# We rename it to distinguish from the reasoning tokenizer (next section)\n",
    "tokenizer_src = local_dir / \"tokenizer.json\"\n",
    "tokenizer_path = local_dir / \"tokenizer-base.json\"\n",
    "\n",
    "if not tokenizer_path.exists():\n",
    "    shutil.copyfile(tokenizer_src, tokenizer_path)\n",
    "\n",
    "tokenizer = Qwen3Tokenizer(tokenizer_file_path=tokenizer_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2586d0c1-034e-4cd3-8aa0-e3c8e7ad2325",
   "metadata": {},
   "source": [
    "- Use model:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "943d671a-132f-49a3-932c-9b0eabdf3009",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " Large language models are artificial intelligence systems that use deep learning techniques to understand and generate human-like text. They are trained on vast amounts of data and can perform a wide range of natural language processing tasks, such as translation, summarization, and question answering."
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from reasoning_from_scratch.ch02 import (\n",
    "    generate_text_basic_stream_cache,\n",
    ")\n",
    "\n",
    "prompt = \"Explain large language models in two sentences.\"\n",
    "input_ids = torch.tensor(\n",
    "    tokenizer.encode(prompt),\n",
    "    device=device,\n",
    ").unsqueeze(0)\n",
    "\n",
    "for token in generate_text_basic_stream_cache(\n",
    "    model=model,\n",
    "    token_ids=input_ids,\n",
    "    max_new_tokens=64,\n",
    "    eos_token_id=tokenizer.eos_token_id,\n",
    "):\n",
    "    print(tokenizer.decode(token.squeeze(0).tolist()), end=\"\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "acb9de1d-dc40-4e47-bd92-a3efd1f64fa0",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## D.4 Loading a larger reasoning variant"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e4e45fea-eaab-4067-aa8e-d103b790465a",
   "metadata": {},
   "source": [
    "- The same idea also works for larger reasoning-style Qwen3 models\n",
    "- The architecture for a given model size stays the same; only the checkpoint and tokenizer settings change"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c4cdd902-7402-4142-b1bd-cbde3a453834",
   "metadata": {},
   "source": [
    "For example, to load the 4B reasoning variant instead of the 4B base variant, we would:\n",
    "\n",
    "- switch the repository ID from `Qwen/Qwen3-4B-Base` to `Qwen/Qwen3-4B`;\n",
    "- copy the `tokenizer.json` file to `tokenizer-reasoning.json`;\n",
    "- initialize the tokenizer as follows:"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87c71dd6-ac91-4ca8-836e-d21a67469986",
   "metadata": {},
   "source": [
    "```python\n",
    "tokenizer = Qwen3Tokenizer(\n",
    "    tokenizer_file_path=tokenizer_path,\n",
    "    apply_chat_template=True,\n",
    "    add_generation_prompt=True,\n",
    "    add_thinking=True,\n",
    ")\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0ee741b9-fff2-4ebc-a6b5-d59fc11e7e63",
   "metadata": {},
   "source": [
    "- The rest of the model-loading and -usage code stays the same"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23c3543a-66dd-4ff3-b837-131485f862ae",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## D.5 Practical recommendations"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b4ab6b1e-72e7-4fa7-876d-e3aade38b9b1",
   "metadata": {},
   "source": [
    "- No code in this section"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}