{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "afdad2e1-f810-4fb0-8a06-060e670d84f0",
   "metadata": {},
   "source": [
    "<table style=\"width:100%\">\n",
    "<tr>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<font size=\"2\">\n",
    "Supplementary code for the <a href=\"https://mng.bz/lZ5B\">Build a Reasoning Model (From Scratch)</a> book by <a href=\"https://sebastianraschka.com\">Sebastian Raschka</a><br>\n",
    "<br>Code repository: <a href=\"https://github.com/rasbt/reasoning-from-scratch\">https://github.com/rasbt/reasoning-from-scratch</a>\n",
    "</font>\n",
    "</td>\n",
    "<td style=\"vertical-align:middle; text-align:left;\">\n",
    "<a href=\"https://mng.bz/lZ5B\"><img src=\"https://sebastianraschka.com/images/reasoning-from-scratch-images/cover-small.webp\" width=\"100px\"></a>\n",
    "</td>\n",
    "</tr>\n",
    "</table>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a14ea71d-cc3c-44e5-af10-4ba9b43badc4",
   "metadata": {},
   "source": [
    "# Chapter 2: Exercise Solutions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69ba1b97-d56b-4e56-ae4d-367e179cbd40",
   "metadata": {},
   "source": [
    "Packages that are being used in this notebook:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ecb22369-1c9a-4931-b189-e26c0b281bf0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "reasoning_from_scratch version: 0.1.2\n",
      "torch version: 2.7.1\n",
      "tokenizers version: 0.21.4\n"
     ]
    }
   ],
   "source": [
    "from importlib.metadata import version\n",
    "\n",
    "used_libraries = [\n",
    "    \"reasoning_from_scratch\",\n",
    "    \"torch\",\n",
    "    \"tokenizers\"  # Used by reasoning_from_scratch\n",
    "]\n",
    "\n",
    "for lib in used_libraries:\n",
    "    print(f\"{lib} version: {version(lib)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "36701b0d-8069-4002-beb2-1d1b0701160e",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## Exercise 2.1: Encoding unknown words"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ddca30e0-e252-43ff-b8eb-a405dad28592",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "✓ qwen3/tokenizer-base.json already up-to-date\n",
      "[9707] --> Hello\n",
      "[11] --> ,\n",
      "[1644] -->  Ar\n",
      "[29406] --> dw\n",
      "[838] --> ark\n",
      "[273] --> le\n",
      "[339] --> th\n",
      "[10920] --> yr\n",
      "[87] --> x\n",
      "[13] --> .\n",
      "[47375] -->  Haus\n",
      "[2030] -->  und\n",
      "[93912] -->  Garten\n",
      "[13] --> .\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "\n",
    "from reasoning_from_scratch.qwen3 import (\n",
    "    download_qwen3_small,\n",
    "    Qwen3Tokenizer,\n",
    ")\n",
    "\n",
    "download_qwen3_small(kind=\"base\", tokenizer_only=True, out_dir=\"qwen3\")\n",
    "\n",
    "tokenizer_path = Path(\"qwen3\") / \"tokenizer-base.json\"\n",
    "tokenizer = Qwen3Tokenizer(tokenizer_file_path=tokenizer_path)\n",
    "\n",
    "prompt = \"Hello, Ardwarklethyrx. Haus und Garten.\"\n",
    "input_token_ids_list = tokenizer.encode(prompt)\n",
    "\n",
    "for i in input_token_ids_list:\n",
    "    print(f\"{[i]} --> {tokenizer.decode([i])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1bef14d2-40da-42b3-872c-60afee3f015f",
   "metadata": {},
   "source": [
    "- Unknown words are broken into smaller pieces of subwords or even single tokens; this allows the tokenizer and LLM to handle any input\n",
    "- German words (Haus und Garten) are not broken down here, suggesting that the tokenizer has seen German texts during training, and the LLM was likely trained on German texts as well"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2feec8a6-2f4b-4176-98ee-73417a2cafcd",
   "metadata": {},
   "source": [
    "&nbsp;\n",
    "## Exercise 2.2: Run code on GPU devices"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6b94813-7136-48be-9b49-5d7774108a1e",
   "metadata": {},
   "source": [
    "- Simply delete the line `device = torch.device(\"cpu\")` in section 2.5, and then rerun the code\n",
    "- For convenience, a minimal, self-contained example using the relevant code from chapter 2 is included below"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "76979f69-c206-4a72-af15-5f6a7ba602e7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using Apple Silicon GPU (MPS)\n",
      "✓ qwen3/qwen3-0.6B-base.pth already up-to-date\n",
      "✓ qwen3/tokenizer-base.json already up-to-date\n"
     ]
    }
   ],
   "source": [
    "from pathlib import Path\n",
    "import torch\n",
    "\n",
    "from reasoning_from_scratch.ch02 import (\n",
    "    get_device,\n",
    "    generate_text_basic_stream,\n",
    "    generate_text_basic_stream_cache,\n",
    "    generate_stats\n",
    ")\n",
    "from reasoning_from_scratch.qwen3 import (\n",
    "    download_qwen3_small,\n",
    "    Qwen3Tokenizer,\n",
    "    Qwen3Model,\n",
    "    QWEN_CONFIG_06_B\n",
    ")\n",
    "\n",
    "device = get_device()\n",
    "device = torch.device(\"cpu\")\n",
    "\n",
    "download_qwen3_small(kind=\"base\", tokenizer_only=False, out_dir=\"qwen3\")\n",
    "\n",
    "tokenizer_path = Path(\"qwen3\") / \"tokenizer-base.json\"\n",
    "model_path = Path(\"qwen3\") / \"qwen3-0.6B-base.pth\"\n",
    "\n",
    "tokenizer = Qwen3Tokenizer(tokenizer_file_path=tokenizer_path)\n",
    "model = Qwen3Model(QWEN_CONFIG_06_B)\n",
    "model.load_state_dict(torch.load(model_path))\n",
    "\n",
    "model.to(device);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0d59c901-899f-4ad5-bdf0-7b5f8ca5c9ed",
   "metadata": {},
   "outputs": [],
   "source": [
    "prompt = \"Explain large language models in 1 sentence.\"\n",
    "input_token_ids_tensor = torch.tensor(\n",
    "    tokenizer.encode(prompt),\n",
    "    device=device\n",
    "    ).unsqueeze(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "3b0ee697-6bdf-48c6-b99e-6e7b9765e7fb",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Output length: 41\n",
      "Time: 9.63 sec\n",
      "4 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that can understand, generate, and process human language, enabling them to perform a wide range of tasks, from answering questions to writing articles, and even creating creative content.\n"
     ]
    }
   ],
   "source": [
    "import time\n",
    "\n",
    "\n",
    "max_new_tokens = 100\n",
    "start_time = time.time()\n",
    "generated_ids = []\n",
    "\n",
    "for token in generate_text_basic_stream(\n",
    "    model=model,\n",
    "    token_ids=input_token_ids_tensor,\n",
    "    max_new_tokens=max_new_tokens,\n",
    "    eos_token_id=tokenizer.eos_token_id\n",
    "):\n",
    "    token_id = token.squeeze(0).tolist()\n",
    "    print(\n",
    "        tokenizer.decode(token_id),\n",
    "        end=\"\",\n",
    "        flush=True\n",
    "    )\n",
    "\n",
    "    next_token_id = token.squeeze(0)\n",
    "    generated_ids.append(next_token_id)  # Collect generated tokens\n",
    "\n",
    "end_time = time.time()\n",
    "\n",
    "output_token_ids_tensor = torch.cat(generated_ids, dim=0)\n",
    "generate_stats(output_token_ids_tensor, tokenizer, start_time, end_time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "087a8bd6-e35d-4e04-94c1-9500f821fb3d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Time: 1.51 sec\n",
      "27 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that can understand, generate, and process human language, enabling them to perform a wide range of tasks, from answering questions to writing essays, and even creating creative content.\n"
     ]
    }
   ],
   "source": [
    "start_time = time.time()\n",
    "\n",
    "for token in generate_text_basic_stream_cache(\n",
    "    model=model,\n",
    "    token_ids=input_token_ids_tensor,\n",
    "    max_new_tokens=max_new_tokens,\n",
    "    eos_token_id=tokenizer.eos_token_id\n",
    "):\n",
    "    token_id = token.squeeze(0).tolist()\n",
    "    print(\n",
    "        tokenizer.decode(token_id),\n",
    "        end=\"\",\n",
    "        flush=True\n",
    "    )\n",
    "\n",
    "    next_token_id = token.squeeze(0)\n",
    "    generated_ids.append(next_token_id)  # Collect generated tokens\n",
    "\n",
    "end_time = time.time()\n",
    "\n",
    "output_token_ids_tensor = torch.cat(generated_ids, dim=0)\n",
    "generate_stats(output_token_ids_tensor, tokenizer, start_time, end_time)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "7ebfdb70-7f18-4162-b2cc-019ba41867f5",
   "metadata": {},
   "outputs": [],
   "source": [
    "if device.type == \"mps\":\n",
    "    print(f\"`torch.compile` is not supported for the {model.__class__.__name__} model on MPS (Apple Silicon) as of this writing.\")\n",
    "    model_compiled = model\n",
    "    # Assignment so that notebook doesn't stop here if someone uses \"Run All Cells\"\n",
    "else:\n",
    "    major, minor = map(int, torch.__version__.split(\".\")[:2])\n",
    "    if (major, minor) >= (2, 8):\n",
    "        # This avoids retriggering model recompilations \n",
    "        # in PyTorch 2.8 and newer\n",
    "        # if the model contains code like self.pos = self.pos + 1\n",
    "        torch._dynamo.config.allow_unspec_int_on_nn_module = True\n",
    "        \n",
    "    model_compiled = torch.compile(model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "ac60af3f-24e6-404b-ae6f-51577d7e4092",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warm-up run\n",
      "Time: 11.78 sec\n",
      "3 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that use vast amounts of text data to understand, generate, and process human language, enabling them to perform tasks such as translation, summarization, and question answering.\n",
      "\n",
      "------------------------------\n",
      "\n",
      "Timed run 1:\n",
      "Time: 6.68 sec\n",
      "5 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that use vast amounts of text data to understand, generate, and process human language, enabling them to perform tasks such as translation, summarization, and question answering.\n",
      "\n",
      "------------------------------\n",
      "\n",
      "Timed run 2:\n",
      "Time: 6.60 sec\n",
      "6 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that use vast amounts of text data to understand, generate, and process human language, enabling them to perform tasks such as translation, summarization, and question answering.\n",
      "\n",
      "------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i in range(3):\n",
    "\n",
    "    start_time = time.time()\n",
    "    generated_ids = []\n",
    "    \n",
    "    for token in generate_text_basic_stream(\n",
    "        model=model_compiled,\n",
    "        token_ids=input_token_ids_tensor,\n",
    "        max_new_tokens=max_new_tokens,\n",
    "        eos_token_id=tokenizer.eos_token_id\n",
    "    ):\n",
    "        token_id = token.squeeze(0).tolist()\n",
    "        print(\n",
    "            tokenizer.decode(token_id),\n",
    "            end=\"\",\n",
    "            flush=True\n",
    "        )\n",
    "    \n",
    "        next_token_id = token.squeeze(0)\n",
    "        generated_ids.append(next_token_id)  # Collect generated tokens\n",
    "    \n",
    "    end_time = time.time()\n",
    "    \n",
    "\n",
    "    if i == 0:\n",
    "        print(\"Warm-up run\")\n",
    "    else:\n",
    "        print(f\"Timed run {i}:\")\n",
    "\n",
    "    output_token_ids_tensor = torch.cat(generated_ids, dim=0)\n",
    "    generate_stats(output_token_ids_tensor, tokenizer, start_time, end_time)\n",
    "\n",
    "    print(f\"\\n{30*'-'}\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b604a2af-ca23-491a-90c8-3c4dda61ccd1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Warm-up run\n",
      "Time: 8.05 sec\n",
      "5 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that can understand, generate, and process human language, enabling them to perform a wide range of tasks, from answering questions to writing articles, and even creating creative content.\n",
      "\n",
      "------------------------------\n",
      "\n",
      "Timed run 1:\n",
      "Time: 0.64 sec\n",
      "64 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that can understand, generate, and process human language, enabling them to perform a wide range of tasks, from answering questions to writing articles, and even creating creative content.\n",
      "\n",
      "------------------------------\n",
      "\n",
      "Timed run 2:\n",
      "Time: 0.63 sec\n",
      "64 tokens/sec\n",
      "\n",
      " Large language models are artificial intelligence systems that can understand, generate, and process human language, enabling them to perform a wide range of tasks, from answering questions to writing articles, and even creating creative content.\n",
      "\n",
      "------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "for i in range(3):\n",
    "\n",
    "    start_time = time.time()\n",
    "    generated_ids = []\n",
    "    \n",
    "    for token in generate_text_basic_stream_cache(\n",
    "        model=model_compiled,\n",
    "        token_ids=input_token_ids_tensor,\n",
    "        max_new_tokens=max_new_tokens,\n",
    "        eos_token_id=tokenizer.eos_token_id\n",
    "    ):\n",
    "        token_id = token.squeeze(0).tolist()\n",
    "        print(\n",
    "            tokenizer.decode(token_id),\n",
    "            end=\"\",\n",
    "            flush=True\n",
    "        )\n",
    "    \n",
    "        next_token_id = token.squeeze(0)\n",
    "        generated_ids.append(next_token_id)  # Collect generated tokens\n",
    "    \n",
    "    end_time = time.time()\n",
    "    \n",
    "\n",
    "    if i == 0:\n",
    "        print(\"Warm-up run\")\n",
    "    else:\n",
    "        print(f\"Timed run {i}:\")\n",
    "\n",
    "    output_token_ids_tensor = torch.cat(generated_ids, dim=0)\n",
    "    generate_stats(output_token_ids_tensor, tokenizer, start_time, end_time)\n",
    "\n",
    "    print(f\"\\n{30*'-'}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "913c502c-b158-4902-9cce-b02480586c5e",
   "metadata": {},
   "source": [
    "| Tokens Generated  | Mode              | Hardware        | Tokens/sec    | GPU Memory (VRAM) |\n",
    "|-------------------|-------------------|-----------------|---------------|-------------------|\n",
    "| 41                | Regular           | Mac Mini M4 CPU | 6             | -                 |\n",
    "| 41                | Regular compiled  | Mac Mini M4 CPU | 6             | -                 |\n",
    "| 41                | KV cache          | Mac Mini M4 CPU | 28            | -                 |\n",
    "| 41                | KV cache compiled | Mac Mini M4 CPU | 68            | -                 |\n",
    "|                   |                   |                 |               |                   |\n",
    "| 41                | Regular           | Mac Mini M4 GPU | 17            | -                 |\n",
    "| 41                | Regular compiled  | Mac Mini M4 GPU | InductorError | -                 |\n",
    "| 41                | KV cache          | Mac Mini M4 GPU | 18            | -                 |\n",
    "| 41                | KV cache compiled | Mac Mini M4 GPU | InductorError | -                 |\n",
    "|                   |                   |                 |               |                   |\n",
    "| 41                | Regular           | NVIDIA H100 GPU | 51            | 1.55 GB           |\n",
    "| 41                | Regular compiled  | NVIDIA H100 GPU | 164           | 1.81 GB           |\n",
    "| 41                | KV cache          | NVIDIA H100 GPU | 48            | 1.52 GB           |\n",
    "| 41                | KV cache compiled | NVIDIA H100 GPU | 141           | 1.81 GB           |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}