{ "cells": [ { "cell_type": "markdown", "id": "c109c0e7-1aad-42ab-88d8-0990559b59e5", "metadata": {}, "source": [ "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "Supplementary code for the Build a Reasoning Model (From Scratch) book by Sebastian Raschka
\n", "
Code repository: https://github.com/rasbt/reasoning-from-scratch\n", "
\n", "
\n", "\n", "
\n" ] }, { "cell_type": "markdown", "id": "88c613ef-f4e5-49c3-b19d-3cf36dce0bf1", "metadata": {}, "source": [ "# Appendix E: Batching and throughput-oriented execution" ] }, { "cell_type": "markdown", "id": "9c1cd731-7e23-4430-8ec6-c4a86a177f81", "metadata": {}, "source": [ "Packages that are being used in this notebook:" ] }, { "cell_type": "code", "execution_count": 1, "id": "b6882804-a2c4-4c98-ad42-1b108cbffa5b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "reasoning_from_scratch version: 0.1.17\n", "torch version: 2.10.0\n", "tokenizers version: 0.21.4\n" ] } ], "source": [ "from importlib.metadata import version\n", "\n", "used_libraries = [\n", " \"reasoning_from_scratch\", # for download functions\n", " \"torch\",\n", " \"tokenizers\"\n", "]\n", "\n", "for lib in used_libraries:\n", " print(f\"{lib} version: {version(lib)}\")" ] }, { "cell_type": "markdown", "id": "9d13703b-f75b-43fe-9c8a-e7459a884f36", "metadata": {}, "source": [ "- Throughout the main chapters, we usually process one example at a time\n", "- This keeps the code compact and easier to understand\n", "- But also, the code is already very expensive to run, so adding batching support would add little benefit due to hardware and resource limitations\n", "- However, in certain contexts, having the ability to run the code in batched mode is still useful\n", "- This appendix explains the broad idea behind batched execution and shows how to use it for the different chapters using code from the supplementary materials" ] }, { "cell_type": "markdown", "id": "11887fc6-cccd-4dd1-90a7-109c9e846135", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "8700acaf-2b57-453f-8f65-1eed2e67147d", "metadata": {}, "source": [ " \n", "## E.1 Why batching helps" ] }, { "cell_type": "markdown", "id": "7e2c1e91-d9bb-46fe-912c-1486bcf333cc", "metadata": {}, "source": [ "- There are two different performance goals:\n", " - latency: how quickly we get the answer for a single prompt;\n", " - throughput: how many prompts we can process in a given amount of time.\n", "- Single-example generation is often best for minimizing latency and for debugging code\n", "- Batching targets throughput primarily\n", "- If we want to evaluate hundreds of problems on MATH-500, generate many self-consistency samples, or train on many supervised examples, batching can reduce the total runtime substantially on suitable hardware\n", " - That said, batching is not guaranteed to be faster on every device\n", " - Small models on CPUs or some less optimized GPUs may not benefit from batching; we may even get slowdowns, because the additional padding and batching overhead can offset the gains from parallelism" ] }, { "cell_type": "markdown", "id": "9cd42b54-dcfa-43f0-a9a3-b7c78745d6a2", "metadata": {}, "source": [ " \n", "## E.2 Running batched generation" ] }, { "cell_type": "markdown", "id": "c8da03f3-e953-4a7b-b30a-5e9541e3af22", "metadata": {}, "source": [ "- The main technical obstacle in batching is that prompts usually have different lengths\n", "- For example, one math problem may tokenize to 40 tokens while another may tokenize to 120 tokens\n", "- Since tensors in PyTorch must have rectangular shapes, we pad the shorter sequences so they all fit into a single batch tensor" ] }, { "cell_type": "markdown", "id": "2755ee99-a7a5-4bd5-9fb5-e12a0a13a732", "metadata": {}, "source": [ "- Conceptually, this makes batched generation much more difficult to implement than single-prompt generation\n", "- In the main chapter, we used the `Qwen3Model` class from `reasoning_from_scratch.qwen3` (which uses the Qwen3 implementation explained in appendix C)\n", "- For batched generation, since we have to keep track of padding tokens, etc., there is a separate `Qwen3Model` class in `reasoning_from_scratch.qwen3_batched` (the source code can be viewed in the supplementary materials at https://github.com/rasbt/reasoning-from-scratch/blob/main/reasoning_from_scratch/qwen3_batched.py)" ] }, { "cell_type": "markdown", "id": "13de4a75-87de-4cbe-8299-dc1371c32975", "metadata": {}, "source": [ "- To illustrate the usage of the batched generation utilities, let's take a look at a concrete example\n", "- We start with a single-sequence text generation example similar to what we have used in the main chapters\n", "- Here, were apply it to two prompts (`[\"2+2?\", \"3+3=6?\"]`) sequentially:" ] }, { "cell_type": "code", "execution_count": 2, "id": "852cccaa-0f00-4846-a034-ff125ccbca05", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using Apple Silicon GPU (MPS)\n", "✓ qwen3/qwen3-0.6B-base.pth already up-to-date\n", " \\boxed{4}\n", " \\boxed{6}\n" ] } ], "source": [ "import torch\n", "\n", "from reasoning_from_scratch.ch02 import (\n", " get_device,\n", " generate_text_basic_stream_cache,\n", ")\n", "from reasoning_from_scratch.ch03 import (\n", " load_model_and_tokenizer,\n", " render_prompt,\n", ")\n", "\n", "device = get_device()\n", "model, tokenizer = load_model_and_tokenizer(\n", " which_model=\"base\",\n", " device=device,\n", " use_compile=False,\n", ")\n", "\n", "for problem in [\"2+2?\", \"3+3=6?\"]:\n", " prompt = render_prompt(problem)\n", " input_ids = torch.tensor(\n", " tokenizer.encode(prompt),\n", " dtype=torch.long,\n", " device=device,\n", " ).unsqueeze(0)\n", "\n", " for token in generate_text_basic_stream_cache(\n", " model=model,\n", " token_ids=input_ids,\n", " max_new_tokens=32,\n", " eos_token_id=tokenizer.eos_token_id,\n", " ):\n", " next_token_id = token.squeeze(0)\n", " print(tokenizer.decode(next_token_id.tolist()), end=\"\", flush=True)\n", "\n", " print()" ] }, { "cell_type": "markdown", "id": "97e45dcc-b6e0-45a3-b952-ce77d5cdcad3", "metadata": {}, "source": [ "- Below, we will use similar code from `reasoning_from_scratch.qwen3_batched` that supports batching\n", "- Note that the batched version does not support streaming, though, meaning we have to wait until all results are generated before they are decoded and printed\n", "- Here, the batched generation uses left padding, which will be explained in the next section\n", "- For now, let's start with a usage example to illustrate how it's used (before we get into how it works internally)" ] }, { "cell_type": "code", "execution_count": 3, "id": "ac94b7bf-50c7-41a0-ad16-1cd829eb2220", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✓ qwen3/qwen3-0.6B-base.pth already up-to-date\n", " \\boxed{4}\n", " \\boxed{6}\n" ] } ], "source": [ "from reasoning_from_scratch.qwen3_batched import (\n", " generate_text_basic_batched_cache,\n", " load_model_and_tokenizer,\n", ")\n", "\n", "model, tokenizer = load_model_and_tokenizer(\n", " which_model=\"base\",\n", " device=device,\n", " use_compile=False,\n", ")\n", "\n", "problems = [\"2+2?\", \"3+3=6?\"]\n", "prompts = [render_prompt(problem) for problem in problems]\n", "tokenized = [tokenizer.encode(p) for p in prompts]\n", "pad_id = tokenizer.pad_token_id\n", "max_len = max(len(t) for t in tokenized)\n", "\n", "left_padded = [\n", " [pad_id] * (max_len - len(t)) + t\n", " for t in tokenized\n", "]\n", "input_ids = torch.tensor(left_padded, dtype=torch.long, device=device)\n", "\n", "generated = generate_text_basic_batched_cache(\n", " model=model,\n", " token_ids=input_ids,\n", " max_new_tokens=32,\n", " eos_token_id=tokenizer.eos_token_id,\n", " pad_id=pad_id,\n", ")\n", "\n", "for row in generated:\n", " eos_pos = (row == tokenizer.eos_token_id).nonzero(as_tuple=True)[0]\n", " if len(eos_pos) > 0:\n", " row = row[:eos_pos[0]]\n", " print(tokenizer.decode(row.tolist()))" ] }, { "cell_type": "markdown", "id": "a31ae932-3f9b-4072-8e3a-03edea0146ef", "metadata": {}, "source": [ "- As we can see, the results are exactly the same as before\n", "- The difference is that these results were generated in parallel via `generate_text_basic_batched_cache`\n", "- The next section briefly explains how this works under the hood" ] }, { "cell_type": "markdown", "id": "d1ec9fda-54cd-472f-96e2-06242ebaf794", "metadata": {}, "source": [ "- An even more optimized code implementation replaces `generate_text_basic_batched_cache` with `generate_text_basic_batched_cache_stop`\n", "- `generate_text_basic_batched_cache` keeps every row in the active batch for every decode step \n", "- `generate_text_basic_batched_cache_stop`removes finished rows from the active compute batch (it's more complicated to implement internally, but can optimize performance\n", "- This is illustrated in the figure below" ] }, { "cell_type": "markdown", "id": "37729e70-96c4-4643-9ed8-0623653fadad", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "c1454be8-8e00-49ed-b7c3-12453f71e618", "metadata": {}, "source": [ "- Side note: in Qwen3, the The `` tokens are `<|endoftext|>`, but the figure uses `` for visual compactness" ] }, { "cell_type": "markdown", "id": "40d5ba01-99a7-48fe-b180-b1c22df33f5b", "metadata": {}, "source": [ " \n", "## E.3 Padding and attention masks" ] }, { "cell_type": "markdown", "id": "f39de403-5383-41e8-82fc-5dbc19966334", "metadata": {}, "source": [ "- In single-example mode, if we tokenize a short prompt such as `\"2+2?\"`, we can pass it to the model as a simple tensor of shape `(1, 4)`:\n", " - `input_ids = torch.tensor([[17, 10, 17, 30]])`" ] }, { "cell_type": "markdown", "id": "56057d0f-e3b3-461e-8b42-43b45acb85cc", "metadata": {}, "source": [ "- Internally, the model builds a standard causal attention mask internally so that each position can only attend to itself and earlier tokens\n", "- If you are unfamiliar with self-attention, I have an article that provides more background information: https://magazine.sebastianraschka.com/p/understanding-and-coding-self-attention\n", "- Conceptually, that mask looks like this:" ] }, { "cell_type": "markdown", "id": "fa512a3c-d039-4a3f-af4a-9a0c594222a5", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "fa7f6b8a-8b6d-4b60-a7a2-0e8dab93eb75", "metadata": {}, "source": [ "- `1` means \"masked out\" and `0` means \"allowed\"\n", "- So the first token cannot look ahead to later positions, the second token can only look at the first two positions, and so on\n", "- This is the standard autoregressive masking pattern" ] }, { "cell_type": "markdown", "id": "91f0e8a8-7635-4ec7-b378-15d1cd04e464", "metadata": {}, "source": [ "- Batching changes the situation because different prompts usually have different lengths\n", "- Suppose we process `\"2+2?\"` together with the slightly longer prompt `\"3+3=6?\"`\n", "- Since PyTorch tensors must be rectangular, the shorter row has to be padded to match the longer one\n", "- Here, this is done with left padding:" ] }, { "cell_type": "markdown", "id": "791aab1e-c066-4420-9c30-370c5f6e5283", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "e044e27f-fea0-4f58-83c1-5bec57b137c7", "metadata": {}, "source": [ "- Note that we keep an additional `attn_mask` internally; this is just to keep track off the padded positions\n", "- In this `attn_mask`, `True` means padded and `False` means not padded\n", "- We use this additional `attn_mask` to identify the tokens in the causal mask that correspond to the pad token IDs\n", "- Masking padded keys and zeroing padded queries are important steps to make batching behave similarly to the single-example execution" ] }, { "cell_type": "markdown", "id": "64316817-70bf-43d0-8238-d74aa5dbad37", "metadata": {}, "source": [ "- By the way, we use the `<|endoftext|>` token, but it does not really matter because the corresponding token positions are ignored anyways" ] }, { "cell_type": "code", "execution_count": 4, "id": "80a0dbad-4f43-4d15-a2bb-471e2a713754", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "151643\n" ] } ], "source": [ "print(tokenizer.pad_token_id)" ] }, { "cell_type": "code", "execution_count": 5, "id": "26872d96-1b98-4969-ab05-3038c32fd9a7", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "<|endoftext|>\n" ] } ], "source": [ "print(tokenizer.decode([151643]))" ] }, { "cell_type": "markdown", "id": "86edccc8-944f-4c9e-acd3-cf35e9ebbffa", "metadata": {}, "source": [ " \n", "## E.4 Chapter 3: batched MATH-500 evaluation" ] }, { "cell_type": "markdown", "id": "e4ca16c9-a00d-493f-8ebe-48e3022544f3", "metadata": {}, "source": [ "- The supplementary materials includes a script for the evaluation method implemented in chapter 3 that we can download and use similar to how we did it in chapter 6:" ] }, { "cell_type": "code", "execution_count": 7, "id": "57db75e8-3019-4b72-b765-81e0c437d6b9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "evaluate_math500.py: 3.5 KB\n", "math500_test.json: 462.1 KB\n" ] } ], "source": [ "from reasoning_from_scratch.ch07 import download_from_github\n", "\n", "download_from_github(\n", " \"ch03/02_math500-verifier-scripts/evaluate_math500.py\"\n", ")\n", "download_from_github(\n", " \"ch03/01_main-chapter-code/math500_test.json\",\n", " out=\"math500_test.json\",\n", ")" ] }, { "cell_type": "markdown", "id": "d784cdb0-b391-44b7-9d4c-eb4cb86bb904", "metadata": {}, "source": [ "- Then, to run it, we can execute the following command in a code terminal (replace `uv run` with `python` if you are not a uv user):" ] }, { "cell_type": "markdown", "id": "31cc2a1a-c8ab-4f78-b9f4-43d53451d28e", "metadata": {}, "source": [ "```bash\n", "uv run evaluate_math500.py \\\n", " --dataset_size 500 \\\n", " --which_model \"reasoning\"\n", "```" ] }, { "cell_type": "markdown", "id": "5d712291-5186-408e-8511-aef4fe3d2aa8", "metadata": {}, "source": [ "- The bonus material also includes a version of this for batched generation that applies the batching method we discussed previously\n", "- The download is similar to before, except that we replace `evaluate_math500.py` with `evaluate_math500_batched.py`" ] }, { "cell_type": "code", "execution_count": 8, "id": "38c530b5-0958-41f8-93cc-7edc959a249d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "evaluate_math500_batched.py: 8.3 KB\n" ] } ], "source": [ "download_from_github(\n", " \"ch03/02_math500-verifier-scripts/evaluate_math500_batched.py\"\n", ")" ] }, { "cell_type": "markdown", "id": "224f801d-49c9-4115-8ab1-740a85ad2b07", "metadata": {}, "source": [ "- The usage is also similar to the non-batched version, except that we now provide an additional `--batch_size` argument to specify how many prompts and answers the LLM should process in parallel" ] }, { "cell_type": "markdown", "id": "a0efd481-42da-45d4-8c0c-e17509da33e7", "metadata": {}, "source": [ "```bash\n", "uv run evaluate_math500_batched.py \\\n", " --dataset_size 500 \\\n", " --which_model \"reasoning\" \\\n", " --batch_size 64\n", "```" ] }, { "cell_type": "markdown", "id": "7a207780-630d-40b8-b48b-6efbc046c817", "metadata": {}, "source": [ "- The ideal batch size depends on what your hardware can handle; a batch size of 64 uses approximately 23.39 GB RAM (the non-batched script uses approximately 1.84 GB RAM)\n", "- We will compare and discuss the performance difference towards the end of the appendix" ] }, { "cell_type": "markdown", "id": "d444b175-d5a0-4666-b31d-95ac8edf79c5", "metadata": {}, "source": [ " \n", "## E.5 Chapter 4: batched self-consistency sampling" ] }, { "cell_type": "markdown", "id": "16fd29af-7200-4560-b5e3-2b055bf9f02c", "metadata": {}, "source": [ "- The optional `self_consistency_math500_batched.py` script that implements self-consistency sampling in chapter 4 does not mix different prompts into one padded tensor\n", "- Instead, it repeats the same prompt `num_samples` times and samples several continuations in parallel for self-consistency voting\n", "- Because every row starts from the same prompt length, this script uses the regular `Qwen3Model` from reasoning_from_scratch.qwen3 instead of reasoning_from_scratch.qwen3_batched, since padding is not needed for equal prompt lengths" ] }, { "cell_type": "markdown", "id": "21f66b82-5107-4e7c-b9b2-0b4662001247", "metadata": {}, "source": [ "- We can download the script as follows:" ] }, { "cell_type": "code", "execution_count": null, "id": "0f94f180-5047-40b1-9180-1eafbaed9af2", "metadata": {}, "outputs": [], "source": [ "download_from_github(\n", " \"ch04/02_math500-inference-scaling-scripts/self_consistency_math500_batched.py\"\n", ")" ] }, { "cell_type": "markdown", "id": "01ea01a9-918c-449c-b693-b1bef4524347", "metadata": {}, "source": [ "- To download the non-batched version, simply drop the `\"_batched\"` in the file name above\n", "- We can run the script as follows (the syntax for the non-batched script is identical)" ] }, { "cell_type": "markdown", "id": "65fb0072-ef63-4b01-a881-1b318f1bf668", "metadata": {}, "source": [ "```bash\n", "uv run self_consistency_math500_batched.py \\\n", " --which_model base \\\n", " --temperature 0.9 \\\n", " --top_p 0.9 \\\n", " --num_samples 3 \\\n", " --dataset_size 500 \\\n", " --prompt_suffix \"\\n\\nExplain step by step.\"\n", "```" ] }, { "cell_type": "markdown", "id": "c9353377-e426-4a89-893d-c7203ee9491b", "metadata": {}, "source": [ "- More about the performance at the end of this appendix" ] }, { "cell_type": "markdown", "id": "f1dfcfbe-33c5-440b-9d14-a089fd64d493", "metadata": {}, "source": [ " \n", "## E.6 Chapter 6: batched GRPO rollouts" ] }, { "cell_type": "markdown", "id": "9a129086-1f2c-4183-be7d-4d964eaaacd9", "metadata": {}, "source": [ "- Self-refinement in chapter 5 is a sequential technique that itself does not benefit from batching\n", "- One could run self-refinement loops for multiple inputs in parallel, but this is non-trivial to implement and thus not part of the supplementary material\n", "- Instead, we continue with a batched version of RLVR in chapter 6\n", "- In chapter 6, we use the same prompt for the different rollouts; hence, no padding is required here; so, similar to section E.5, the code uses the regular `Qwen3Model` class from `reasoning_from_scratch.qwen3`\n", "- The relevant scripts can be fetched with:" ] }, { "cell_type": "code", "execution_count": null, "id": "6446395e-3879-4bb9-a209-04aa84c95ae1", "metadata": {}, "outputs": [], "source": [ "# Non-batched version\n", "download_from_github(\n", " \"ch06/02_rlvr_grpo_scripts_intro/rlvr_grpo_original_no_kl.py\"\n", ")\n", "\n", "# Batched version\n", "download_from_github(\n", " \"ch06/02_rlvr_grpo_scripts_intro/rlvr_grpo_original_no_kl_batched.py\"\n", ")\n", "\n", "# Batched version with GPU support\n", "download_from_github(\n", " \"ch06/02_rlvr_grpo_scripts_intro/rlvr_grpo_original_no_kl_batched_fsdp.py\"\n", ")" ] }, { "cell_type": "markdown", "id": "472ed887-7888-4738-b435-6e3247d75e52", "metadata": {}, "source": [ "```bash\n", "uv run rlvr_grpo_original_no_kl_batched.py \\\n", " --num_rollouts 8 \\\n", " --steps 100 \\\n", " --batch_size 4 \\\n", " --max_new_tokens 1024\n", "```" ] }, { "cell_type": "markdown", "id": "5286486f-96b3-4b39-80e5-3fff2675aa67", "metadata": {}, "source": [ "- In the current script, `--batch_size` controls how many rollouts are generated in parallel within a step\n", "- This increases throughput, but it also increases memory pressure, so in practice you may need to reduce `--num_rollouts` or `--max_new_tokens`\n", "- If you have multiple GPUs, the FSDP variant follows the same pattern and adds `--num_gpus`\n", "- Again, we will return to the performance discussion at the end of this appendix\n", "- As of this writing, batched versions of the chapter 7 scripts are not available in the supplementary materials yet, but will be added over time; conceptually, they will work similarly to the chapter 6 scripts" ] }, { "cell_type": "markdown", "id": "e656bca7-ab7e-4c7a-a1a0-96fa4b5b4e2f", "metadata": {}, "source": [ " \n", "## E.7 Chapter 8: batched distillation" ] }, { "cell_type": "markdown", "id": "65aff052-175f-4f09-9bd8-7e8191d1c431", "metadata": {}, "source": [ "- Chapter 8 returns to the padding-aware style from chapter 3, since distillation examples have different prompt and answer lengths\n", "- You can download the script and sample training dataset as follows:" ] }, { "cell_type": "code", "execution_count": 9, "id": "b2b13023-584a-4826-a3da-fc3b71722f2d", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "distill_batched.py: 17.9 KB\n", "deepseek-r1-math-train.json: 107538.0 KB\n" ] } ], "source": [ "from reasoning_from_scratch.ch08 import load_distill_data\n", "\n", "download_from_github(\n", " \"ch08/04_train_with_distillation/distill_batched.py\"\n", ")\n", "_ = load_distill_data(\n", " partition=\"deepseek-r1-math-train\",\n", " local_path=\"deepseek-r1-math-train.json\",\n", ")" ] }, { "cell_type": "markdown", "id": "20639239-16e6-4b4f-94b3-09920a306562", "metadata": {}, "source": [ "- For the non-batched version, drop the `\"_batched\"` in the file name\n", "- We can run the script as follows:" ] }, { "cell_type": "markdown", "id": "3f04e5b6-ecc2-4724-ad17-625a396847dd", "metadata": {}, "source": [ "```bash\n", "uv run distill_batched.py \\\n", " --data_path deepseek-r1-math-train.json \\\n", " --dataset_size 12000 \\\n", " --validation_size 10 \\\n", " --epochs 2 \\\n", " --use_think_tokens \\\n", " --max_seq_len 1024 \\\n", " --batch_size 4\n", "```" ] }, { "cell_type": "markdown", "id": "1b79895b-40fa-4a34-8a83-30cb8f185b26", "metadata": {}, "source": [ " \n", "## E.8 Single-sequence versus batch generation" ] }, { "cell_type": "markdown", "id": "3c9994ca-d128-4c0a-a014-3e2f51410d23", "metadata": {}, "source": [ "- The table below summarizes the runtime and RAM usage numbers for the scripts above" ] }, { "cell_type": "markdown", "id": "d36c900d-0d57-4697-9d8a-32b665aa2212", "metadata": {}, "source": [ "| Row | Script | Batch size | RAM | H100 Total time (min) | DGX Spark Total time (min) |\n", "|-----|------------------------------------------|------------|----------|------------------------|-----------------------------|\n", "| 1 | evaluate_math500.py | - | 1.8 GB | 90.0 | 174.7 |\n", "| 2 | evaluate_math500_batched.py | 64 | 23.39 GB | 16.0 | 108.4 |\n", "| | | | | | |\n", "| 3 | self_consistency_math500.py | - | 1.79 GB | 252.0 | 340.8 |\n", "| 4 | self_consistency_math500_batched.py | 3 | 2.45 GB | 129.0 | 243.3 |\n", "| | | | | | |\n", "| 5 | rlvr_grpo_original_no_kl.py | - | 43.35 GB | 68.0 | 63.7 |\n", "| 6 | rlvr_grpo_original_no_kl_batched.py | 4 | 44.91 GB | 19.0 | 23.1 |\n", "| | | | | | |\n", "| 7 | distill.py | - | 8.29 GB | 10.9 | 32.8 |\n", "| 8 | distill_batched.py | 4 | 8.34 GB | 9.1 | 28.2 |" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 5 }