{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "e337fddc",
   "metadata": {
    "papermill": {
     "duration": 0.0,
     "end_time": "2026-05-27T10:41:31.187362+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:31.187362+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# 09 · Tree of Thoughts (ToT) — beam search over a tree of reasoning steps\n",
    "\n",
    "> **TL;DR.** Instead of one greedy reasoning step at a time (Chain-of-Thought), the agent generates **K candidate next-thoughts** at every step, **scores each one**, keeps the **top `beam_width`** via beam search, and expands those for another layer. The result is a tree of reasoning paths; the **highest-scoring complete leaf** wins.\n",
    ">\n",
    "> **Reach for it when** the task has multiple plausible approaches and the *right* approach is non-obvious upfront (creative writing, multi-constraint planning, logic puzzles, math).\n",
    "> **Avoid when** there's a single obvious approach — CoT is cheaper.\n",
    "\n",
    "| Property | Value |\n",
    "|---|---|\n",
    "| Origin | Yao et al., *Tree of Thoughts*, NeurIPS 2023 ([arXiv:2305.10601](https://arxiv.org/abs/2305.10601)) |\n",
    "| Reasoning style | **Search** over a tree of partial solutions |\n",
    "| External tools needed? | No |\n",
    "| Memory across episodes? | No |\n",
    "| Cost vs. CoT | ≈ `branching × beam_width × max_depth × 2` LLM calls (generate + evaluate per thought) |\n",
    "| Default LLM | **Llama 3.3** (fast) with strict rubric; Qwen3-Thinking shown in § 10 as comparison |\n",
    "\n",
    "Important nuance: **ToT only shows its value if the evaluator can discriminate**. For subjective tasks (creative writing) even a reasoning model often scores every candidate 5/5, and beam search degenerates to brute force. This notebook therefore uses **Game of 24** — an arithmetic puzzle where wrong branches are *objectively* wrong — and Llama 3.3 as a fast default. § 10 demonstrates the Qwen3-Thinking variant for the same task."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f78d5415",
   "metadata": {
    "papermill": {
     "duration": 0.008126,
     "end_time": "2026-05-27T10:41:31.203593+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:31.195467+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 2 · Architecture at a glance\n",
    "\n",
    "```mermaid\n",
    "flowchart TB\n",
    "    R([root: task]) --> E1[Expand<br/><sub>K candidates per frontier node</sub>]\n",
    "    E1 --> S1[Score each]\n",
    "    S1 --> P1[Prune to beam_width]\n",
    "    P1 --> Q{depth &lt; max_depth?}\n",
    "    Q -->|yes| E1\n",
    "    Q -->|no| F[Finalize<br/><sub>synthesise best path</sub>]\n",
    "    F --> Z([final answer])\n",
    "\n",
    "    style E1 fill:#e3f2fd,stroke:#1976d2\n",
    "    style S1 fill:#fff3e0,stroke:#f57c00\n",
    "    style P1 fill:#fce4ec,stroke:#c2185b\n",
    "    style F fill:#e8f5e9,stroke:#388e3c\n",
    "```\n",
    "\n",
    "**One node per phase of beam search.** Each loop iteration deepens the tree by one layer. At every layer: generate K children per surviving parent, score each with `LLMJudge`, keep the top `beam_width` overall."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f1234f6",
   "metadata": {
    "papermill": {
     "duration": 0.008501,
     "end_time": "2026-05-27T10:41:31.212094+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:31.203593+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 3 · Theory\n",
    "\n",
    "### 3.1 · CoT vs. ToT in one picture\n",
    "\n",
    "```\n",
    "Chain-of-Thought:  task → t1 → t2 → t3 → answer       (1 path, no backtracking)\n",
    "\n",
    "Tree-of-Thoughts:        ┌── t1a ── t2a ── t3a ┐\n",
    "                  task ──┼── t1b ── t2b ── t3b ┼──> pick best leaf\n",
    "                         └── t1c ── t2c ── t3c ┘\n",
    "                         (K branches × max_depth, beam-pruned each layer)\n",
    "```\n",
    "\n",
    "CoT *commits* to its first reasoning step. If `t1` is suboptimal, every subsequent step inherits that mistake and CoT has no mechanism to recover. ToT keeps multiple alternatives alive long enough to compare them under the evaluator's scoring rubric.\n",
    "\n",
    "### 3.2 · The three loop phases\n",
    "\n",
    "**(a) Generate.** Given a partial reasoning path (root → … → current node), produce K *substantively different* next-thoughts via `with_structured_output(_ThoughtCandidates)`. The `Field` description explicitly demands different angles, not paraphrases — without this, the model produces near-duplicates and ToT collapses to expensive CoT.\n",
    "\n",
    "**(b) Score.** Each new thought gets evaluated by an `LLMJudge[_ThoughtScore]` (the same Judge class from notebook 01) with rubric: *\"Score this thought as a step toward solving the task: 5 = strong, 1 = off-track\"*. The Judge sees the task + the full reasoning path leading to this thought, then commits to a 1-5 integer.\n",
    "\n",
    "**(c) Prune.** Collect every thought at the current depth and keep the top `beam_width` by score. The frontier shrinks back to a manageable size before the next expansion.\n",
    "\n",
    "### 3.3 · The branching and beam parameters\n",
    "\n",
    "| Parameter | Effect | Typical range |\n",
    "|---|---|---|\n",
    "| `branching` (K) | How many candidates generated per parent | 2-5 |\n",
    "| `beam_width` (N) | How many candidates survive each layer | 1-3 |\n",
    "| `max_depth` (D) | How many reasoning steps deep the tree goes | 2-5 |\n",
    "\n",
    "Total LLM calls ≈ `2 × N × K × D` (generate + score per node). With defaults (K=3, N=2, D=3) → 36 LLM calls. This is **expensive** compared to CoT (3 calls) — ToT is for *quality-over-cost* tasks.\n",
    "\n",
    "### 3.4 · Where ToT sits\n",
    "\n",
    "| Pattern | Search? | Depth | Cost vs. CoT | When |\n",
    "|---|---|---|---|---|\n",
    "| CoT | no | linear | 1× | obvious approach |\n",
    "| Self-Consistency (nb 21) | parallel N samples | linear | N× | majority-vote helps |\n",
    "| Reflection (nb 01) | iterate on 1 draft | linear | 2-4× | quality > speed |\n",
    "| **Tree of Thoughts** *(this notebook)* | **beam over tree** | **K×D** | **~2NKD×** | many plausible approaches |\n",
    "| LATS (nb 22) | MCTS + reward | bandit-balanced | much higher | very large search space |\n",
    "| Multi-Agent Debate (nb 28) | adversarial branches | depends | medium | adversarial consensus helps |\n",
    "\n",
    "### 3.5 · Why reasoning models help here\n",
    "\n",
    "Each \"thought\" in ToT is itself a chunk of reasoning. A non-reasoning model (Llama 3.3-Instruct) produces a one-liner per thought; a reasoning model (Qwen3-Thinking) uses its internal `<think>` budget to plan the thought, leading to:\n",
    "\n",
    "- **Less hollow thoughts.** Reasoning models think *about* the thought before committing.\n",
    "- **Better candidate diversity.** Reasoning models more reliably produce different angles when asked.\n",
    "- **Better scoring discrimination.** The evaluator is also reasoning — distinguishes 4 from 5 more reliably.\n",
    "\n",
    "The tradeoff: reasoning models are slower and pricier. For ToT at our default `K=3, N=2, D=3` (36 LLM calls), each call having reasoning makes the overall run ~3-5× slower than CoT on a thinking model.\n",
    "\n",
    "### 3.6 · What goes wrong (you'll see live in § 9)\n",
    "\n",
    "1. **Lenient evaluator.** Every thought scores 5/5 → beam search degenerates (no signal to prune on). Mitigation: tighten rubric — *\"reserve 5/5 for genuinely excellent thoughts\"*.\n",
    "2. **Mode collapse on candidates.** K candidates are near-paraphrases. Mitigation: tighten `_ThoughtCandidates` description.\n",
    "3. **Premature commitment.** Depth-1 thoughts are scored before they've had a chance to \"show their work\" via continuation. Common ToT bug.\n",
    "4. **Cost explosion.** `2 × N × K × D` calls × reasoning model = expensive. Cap depth aggressively.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01444370",
   "metadata": {
    "papermill": {
     "duration": 0.005912,
     "end_time": "2026-05-27T10:41:31.220193+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:31.214281+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 4 · Setup\n",
    "\n",
    "Llama 3.3 70B as the primary LLM. Llama is fast and (with the strict rubric in § 3) can discriminate well on **objective tasks** where wrong arithmetic is clearly wrong. We compare against Qwen3-Thinking in § 10.\n",
    "\n",
    "Why not the reasoning model as default? On *creative* tasks every thought \"sounds fine\" and even Qwen3-Thinking scores 5/5 across the board (see § 11.1's \"lenient evaluator\" pathology). For ToT to demonstrate beam pruning, the **task** must have objectively-bad branches. That's why this notebook uses Game-of-24."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "5422d3c6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T10:41:31.228346Z",
     "iopub.status.busy": "2026-05-27T10:41:31.228346Z",
     "iopub.status.idle": "2026-05-27T10:41:34.325934Z",
     "shell.execute_reply": "2026-05-27T10:41:34.325934Z"
    },
    "papermill": {
     "duration": 3.105741,
     "end_time": "2026-05-27T10:41:34.325934+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:31.220193+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Primary LLM: meta-llama/Llama-</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3.3</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-70B-Instruct</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">────────────────────────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mPrimary LLM: meta-llama/Llama-\u001b[0m\u001b[1;36m3.3\u001b[0m\u001b[1;36m-70B-Instruct\u001b[0m \u001b[92m────────────────────────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from agentic_architectures import get_llm, enable_langsmith, settings\n",
    "from agentic_architectures.architectures import TreeOfThoughts\n",
    "from agentic_architectures.ui import print_md, print_header, print_step\n",
    "\n",
    "enable_langsmith()\n",
    "primary_llm = get_llm(provider=\"nebius\", model=\"meta-llama/Llama-3.3-70B-Instruct\", temperature=0.5)\n",
    "print_header(f\"Primary LLM: {primary_llm.model}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "825b711d",
   "metadata": {
    "papermill": {
     "duration": 0.002008,
     "end_time": "2026-05-27T10:41:34.333462+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:34.331454+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 5 · Library walkthrough\n",
    "\n",
    "Source: [`src/agentic_architectures/architectures/tree_of_thoughts.py`](../src/agentic_architectures/architectures/tree_of_thoughts.py).\n",
    "\n",
    "Key pieces:\n",
    "\n",
    "1. **`_ThoughtCandidates`** schema — forces the generator to emit *substantively different* candidates, not paraphrases.\n",
    "2. **`_ThoughtScore`** schema — forces the evaluator to commit to a 1-5 score + a one-sentence rationale.\n",
    "3. **`_expand_and_score`** — for each frontier node, walks the path from root, asks the LLM for K alternatives, scores each via `LLMJudge`.\n",
    "4. **`_prune`** — keeps top `beam_width` thoughts at the current depth as the new frontier.\n",
    "5. **`_finalize`** — walks the best leaf back to root, synthesises the final answer along that path.\n",
    "6. **`_path_from_root(thoughts, id)`** — flat-tree helper that walks parent pointers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "a472d06d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T10:41:34.341485Z",
     "iopub.status.busy": "2026-05-27T10:41:34.341485Z",
     "iopub.status.idle": "2026-05-27T10:41:34.359177Z",
     "shell.execute_reply": "2026-05-27T10:41:34.357348Z"
    },
    "papermill": {
     "duration": 0.023717,
     "end_time": "2026-05-27T10:41:34.361191+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:34.337474+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--- ThoughtCandidates schema ---\n",
      "{\n",
      "  \"description\": \"K candidate next-thoughts at one tree node.\",\n",
      "  \"properties\": {\n",
      "    \"candidates\": {\n",
      "      \"description\": \"Substantively DIFFERENT next reasoning steps or partial solutions. Each must explore a different angle / approach / framing. Avoid producing variants that are paraphrases of each other.\",\n",
      "      \"items\": {\n",
      "        \"type\": \"string\"\n",
      "      },\n",
      "      \"minItems\": 2,\n",
      "      \"title\":...\n",
      "\n",
      "--- ThoughtScore schema ---\n",
      "{\n",
      "  \"description\": \"Score for one candidate thought \\u2014 STRICT rubric to force discrimination.\",\n",
      "  \"properties\": {\n",
      "    \"score\": {\n",
      "      \"description\": \"STRICT 1-5 scoring. Be discriminating \\u2014 if you score everything 5, beam search has no signal to prune on.\\n  1 = clearly off-track, contradicts the task, or contains a factual error.\\n  2 = on-topic but weak: overlapping with a sibling, vag...\n"
     ]
    }
   ],
   "source": [
    "from agentic_architectures.architectures.tree_of_thoughts import _ThoughtCandidates, _ThoughtScore\n",
    "import json\n",
    "print('--- ThoughtCandidates schema ---')\n",
    "print(json.dumps(_ThoughtCandidates.model_json_schema(), indent=2)[:400] + '...')\n",
    "print()\n",
    "print('--- ThoughtScore schema ---')\n",
    "print(json.dumps(_ThoughtScore.model_json_schema(), indent=2)[:400] + '...')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8be05a9",
   "metadata": {
    "papermill": {
     "duration": 0.005955,
     "end_time": "2026-05-27T10:41:34.369154+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:34.363199+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 6 · State\n",
    "\n",
    "The tree is stored as a **flat list of nodes** with explicit `parent_id` pointers. Cleaner than a nested dict for `LangGraph` state (which prefers value types).\n",
    "\n",
    "| Field | Type | Purpose |\n",
    "|---|---|---|\n",
    "| `task` | `str` | root task |\n",
    "| `thoughts` | `list[{id, content, score, depth, parent_id, rationale}]` | full tree, **appended** to each round |\n",
    "| `frontier` | `list[int]` (ids) | which thoughts to expand next |\n",
    "| `depth` | `int` | current tree depth |\n",
    "| `final_answer` | `str` | set by `_finalize` |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "319ad10d",
   "metadata": {
    "papermill": {
     "duration": 0.0,
     "end_time": "2026-05-27T10:41:34.377213+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:34.377213+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 7 · Build the graph\n",
    "\n",
    "Four nodes: `root → expand → prune → (expand again | finalize) → END`. The `expand` node *does* both generate and score (combined to keep state mutations local)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3cd113e2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T10:41:34.385285Z",
     "iopub.status.busy": "2026-05-27T10:41:34.385285Z",
     "iopub.status.idle": "2026-05-27T10:41:35.394305Z",
     "shell.execute_reply": "2026-05-27T10:41:35.394305Z"
    },
    "papermill": {
     "duration": 1.00902,
     "end_time": "2026-05-27T10:41:35.394305+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:34.385285+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGoAAAITCAIAAABg8R7gAAAQAElEQVR4nOydB2ATZf/Hn7ukabp3SwdtKWVVppQhW1qGDBmCliWi/AVeUJChMgRZoiKIA0QRUZChIMsXQVEEpPICyt7Q0kEHpXQ3STPu/r/k2jQtGXe5S/u0vc/LGy/PPTfy7e/Z4yelaRqJ2IsUifBAlI8Xony8EOXjhSgfL0T5eMFXvpRrqtv/Fhbla1SlOp0WIRoRJKKp8k+a0P8PojFfEQHn9SGkFFFa/eWEBOlrThShP9AZ7kgYLifKz+ovoAzhcKwrf2hlZP0Xmrlv+SMMd4CviETI8LUyvALSiZC7kq6e0iat3GOeckc8IOyr950/VnT17/ziAi28vERKyFxJiYTQv6gOtCFoiiYkBBzrZTDcv+Kr4cfRtMSJ0GkqwuE/uor4hvN6mENS/wXupv/NFYoz4UygafxqMhklrrxzBaSEhF+tVlEaDU1pKbmbtMkTbk8/H4C4w1m+C8cKz/3+iKJQYKhLbLxveCtnVJcpyaNPHniQcVep01BN2ngMmBDI6XJu8n23LEWlpFp19u41whfVL26cKfn70EOw38krmrC/ioN8X8xL8g91Hj0rDNVfTuzJvXq6sPtQ//Z9vNjEZyvf57Pv9h0dFPOUB2oAbJibNH5+pKefxGZMVvKtn3P3/1ZEy1xQw+HLt5Nj4/w69rNhgySyxca3kuJeaNSgtAOmvB915rfcooc669FsyLd1RWpgY5eWnXlVjuooXQb67ViTaj2ONfn++aNQUUyNnBGCGiQd47zlbuRPn2VYiWNVvqOPYrp4ogbM6JnhWfeUViJYlO/Cn0W0lu410g81YNw8STcv6V7LBmhRvksn8wPCa7q86NevX0ZGBterkpKShgwZghxDux4+D9JVls5alK+0SNO5f42aXlZWVn5+PuLO9evXkcN4Ms6L0tGpN80nYfM9LnculEKzPLylQ9qzUNPcuXPnf//739TU1CZNmnTt2nXatGkXLlyYOnUqnB02bFjv3r3XrFkDNrVnz55z585lZmZGRUUNHz581KhRzB3i4uImT5587NgxuGrChAnbtm2DwNjY2DfeeGPcuHFIaFzcpdf+LopoaSYtmpfv3rVSJ2cCOYZdu3Z98803s2bN6t69+/Hjx9evX+/m5jZp0qR169ZB4IEDB0JDQyEaKAjCLVy4kCCIlJSUDz74IDg4GC6BU05OTvv27evcuTOI2LFjR4jw22+/wd8DOQYPb2l+TpnZU+blK3qkkbvabrLYx/nz52NiYpjcasSIEZ06dVIoFI9HW7VqVWlpaUiIvtoElnXw4MG///6bkQ/08vLymjt3LqoRPP1kGUkKs6fMy6cu0znJbDdI7KNdu3afffbZsmXLOnTo0KtXr7Aw830QkMbBThMTEyGNMyGMVTLAHwDVFNC3qlFTZk+Zl4/WUUjiKOsbO3YspNYTJ04sXbpUKpVCafv6668HBFTpraQoaubMmWq1esaMGWB6Hh4er7zyimkEmUyGagroeSUs5GTm5XN2dVIpzOvNH5IkRxhITk4+e/bsV199VVJS8vHHH5vGuXnz5rVr1zZs2AAZHBNSXFwcGMitL1MoVCWUhDSvn3n53L2cCh+ZT+38gTy+VatWTZs2jTIAukA5UC1OQUEBfBr1SjYAl6DaAEoCqdy8UOYzuNDmLqpSLXIMR44cmTdv3smTJwsLC0+dOgX1D8gNITwyMhI+jx49evXqVZAV0jXUSIqKiqDYXb16NdRvoGJo9obh4eG5ublQiBtzSWGBgTBvPy7ytenmAd2Aj7I0yAEsWrQI1Jk9ezZU35YvXw61PKidQDiUIUOHDt24cSMULI0aNVqxYsWVK1f69u0Ltbnp06dDpQ9kNVb9TOnRo0f79u2hIP7111+RA1CW6mJizfcTW+wu3bTwXmCYfNi0YNSwuXmu5Ped2TPWRps9a7F20qKjR+Y9R2V/dYjTh3K9/J0snbU4TN5rpP+VxIILxws7WBg0yc7OTkhIMHvK3d0dClOzpyDZQpMDOYZvDZg9BTVtS+kM6kZm8wSGkkLNqyujLZ21NtZx7IfcOxeLpqyKMntWq9Xm5OSYPaVSqeRyudlTUCA4rv5RbMDsKSiCPD3N911COPy9zZ76fmUqDF1OeCcCWcDGUNGX85MjWroOnNgINTzS76gOfnl/+kfRVuLYaJmB6d27WqouRQ2QQ19n9hweZD2O7YZtXELQ5qVJqIGx5d3Uxs1c2/awMa7Napw3L1u9c3X69DW1U+mveb54M6n3c0ExXWyPL7KdZZByTfnz1xnte/v2HF7fZreYknZDeejbzIgWboNeZpXdc5sitGlBskRGDhgfFBpdD4fNd3yYXpir7jksoHV3tuOLnCeo/bI5O+VmqYubJLqde88R/qjuc+FE0bXEgsI8jV8j54S53CZA2Tk98pdvH9y/U6pWUjI5KXeTuHtJpTKSlCLTiYj6Ph4Ceu4qnsRMmzSZe1oZU4Io48zR8gmVEEhQFZMtSZKgmEmSFQfVfwZpmIfKXFgRRyIlddrKx5Bk+ctIpBJtma6kUAeN2TKVjiQI/1DZqGlhyAlxxU75GErydGd/y8tJV+nfQ6kjSJIykY/pYjTentHF9LMy5mPzZ1HFr6Vo+HmSSk3Jyr9HlZ9BGOaj0pUPAqDDV6erGqc8nJDKCLmLxCfIqW13n9Dm9o+I8ZKvBhgwYMCOHTv8/DAdrcd9Zj00DaGdh3BFlI8Xony8wF0+jUYDg+IIV7CWjzKUsjAyh3AFa/kwT7lIlI8nWL8c5hkfEq2PJ6J8vBDl44UoHy9wl08sOuxHtD5eiPLxQpSPF1BtFuWzH9H6eCHKxwtRPl6I8vFC7HHhhWh9vJBIJB4eWG99gvtQUWFhIcIYvJOGVArpF2GMKB8vRPl4IcrHC1E+XuBecRHlsx/R+nghyscLUT5eiPLxQpSPF6J8vBDl44UoHy9E+XiBv3w4ripaunTpwYMHmReDT8IASZLnzp1DmIHjpPVp06ZFRkaSBqDZC58gn6WN1moXHOULDAzs16+faQjIN2zYMIQfmC6ZGDduXERE5fYfoaGhw4cPR/iBqXwwwPbss88aF8T079/f29sb4Qe+C3bGjBnD5HchISEjR45EWOKQkvfUTwXFpSptGcV4yGEWizMrmZk14kyI1InQavTrywlkOFWx1Nm4uDwj8/6dO3dDQ0JbtGhO0+VvWsXrjsGRDrPivMraccPiclTxy6RSibObtH13X99QgfezFVi+3Z9m5qYrnWQSGn6jutxhkOkC/PJPQzgppSktAeIRRJUV+kY3QyColtIRSEKYyGHq54k5pgmKoElTWU0dHhnuQ0hkSFtGu3lLJywIR8IhpHxHvnuQeVc1+o0I5KhtT/nyy+YsZbH6pSURSCAEk+/AF1mFj7QjXmuM8Ob37VlFuWUTF0ciIRCs6MhMUXYbEoSwJ35csFJBpd1QIyEQRr67F5WQ2wQ1qbnNgPngLCevnbFnc/zHEabLQFGkpnRY78hhik5Hq0qF2ZZVoB4X/XY3dUY++EvrdEgQGqKLT5pGQlU3hJKPRo5ysCA8BGFxE3WuCCUfgepM2sXR+hooghUddSnxCpdYBJLPpH2OP7RwWbVgRQddd/TDsegg6k7qFYsOXsCfmsTM+uoSwhmfcHmfRFJ3il7hECzv09WdLgPonhbK/PAdKmLJiOf6ZWZxc2sp5n3lZGdnFRRw7rkD06Pqesm75N03JRJJUFDwrh+2Ln33w149+yoUirXr3rt48Z/i4qLIiKhnnhk2fNhoJnJaWsq6T96/feeGRCKNjIx6aeKUDu1jL1z8Z/YcvVvLceOHde/ee8WyNeyfjl3i5eoVz8nJKfneXfi3cvnatm06QMjbC17PzLy/fNmaH3f90qtX3CeffnDj5jUIz8/Pm/HapMDARl99uWP9Z1t8vH2Xr1gAWoOCq1augwjbvz/ASTsYt5MI9LsFk49rByRBENnZmUuXfNitWy9vb5//nUm8cuXivDnvtGr5hJeX97ixk9q0af/d1q8g5u4922XOznPnLAoJDg0LC583d7FSqThwcDeyF5oSLPHWZtEREd7E6JPn3r27cNykSaVPi+bNWt26pXe8CxbarFlL47JyNze3xmERt2/fQPYCw/BCeZCsTfnApozHjx7lyuVVvDC4urqClcFBHpxyruL5SO7iolDa78WL0iGhOutxqbiATalUVTwwlypK/f30fj9d4VRZFffWSoXCzxcLZw3CyEcT+skriActmseoVKo7d28ZQ27cuBppSMtwCo41mvKxsaLiotS0e6bJnCskqZ8Vg4RAGPngZXgOVHbu3C0kJGzt2pU3b13Py3u0+ZsNINkLoyfAqaFDnystLVmzduWDB9kpKcmr3l8MaXnQM/rpfo3DI+Hz+PGj129cZf8sGBSkaGHqzbgkXigZoPLh6en1n+kTx45/9t/zZ5cv+wgKXzgVFtp4yeL3oWxJGDtk1uxXIeSTdV9DYoeD0JCwgQOGbvl246ZNn6HaQJg5Lpf/Kjy57+HEJdGoLrDjg2TfINnomQJMlhZurKMOIVznhnBjHXUH/YRMrLoMoOSoQ/ann6eKlXxQb6lL9gfiCeR4XRwm50WDLDoIwSpsDbHoEHBGjlBFB6pD47wCjukLVXSgOjTLQEAaZNEh3ISmhlryipM07Ae7RltDRRj5SFLvtRXVEWRyiVwuTMVPmLs0beVF0wK1gxyPVkP5Bgnj3VoY+Vx8kdxVknjgIcKevAy1Tk13H+6DhECw3uZhr0bcu1akFmapmAM5sjWjZaxgC9OFXJAKo39fzU/2DnCObOnh7G5mKJqoWugR5c8nTE+ZxmGOCRhJoasO7cAf3ZBV0CYVuKo3Z1xQV5wiEa0h0m6W5mQoBk0KDm8hmF964VeT71pzvyhXo9PS2oq13abSoKrqVMpX4fmaIB7rjau43ngfkiAoQ2wmRCKpMj2O8eltuFF5BBhac3IiXN2lvUYGhMcIph1C2DvXHjhw4Pbt20Xn2nYiujfmhSgfLzD39iRaHy+wlg+KNYqiJBJ8m4OitxheiPLxQnT1xAvR+nghyscLUT5eiHkfL0Tr44UoHy9E+XghyscLUT5eiPLxQpSPF6J8vBCrzbwQrY8Xony8wN1bTEBAAMIYrOXT6XQ5OTkIY0RfRbwQ5eOFKB8vRPl4IcrHC1E+XuAun06oDWscg2h9vBDl4wXu8hk3D8IT0fp4IcrHC1E+Xojy8UKUjxeifLzAcVXRa6+9durUKaJiTRpJkhRFwdd///0XYQaOu4bPnDkzLCyMrAAZFAwPF9I1p1DgKF90dHSPHj1MkwWYXu/evRF+YLpn/fjx4xs3rvQWCsejRo1C+IGpfKGhoXFxccwxZHyxsbGMp2jcwNdjQkJCAuPdHT5feOEFhCWcKy63/1Wq1VWb8XSVTXmMC5vLv9KIJh9bIc4K5/5dX/lTdaJNi9bKnICrOUWGVeWVN6nyWNryxkBmn2huB9nFmAAAEABJREFUHTpJEhFPeLi5I/ZwqLhsfy+9MF8Nz9Coq2yaUbkQvPw1qslpUNDCPR9fnl9lrblhR+PKu1VdVm76dzIcEmbfwfSeJFG+RYAxvmkEvU9wipK7SUa/EenuhdjAVr6v37nn5SvvNzEY4/V5wnBq/8OUa8WvLI6SsTBDVvJtWngvopXXU0N9UcNAp0Y7VydN+9D2xuS2i44Te3IJkmg42gESGfIOkP/w0X2bMW3Ll35H6R3gjBoYka08ih7Z7ui2LV+ZQifFeoqiQ3D3ITVa29tK2a64aNWUTl1n9qcSCq2OpjS2SwVxAzoL0KycqIryWYBgtRurbfkIsh44w+OM3hUeaftn25ZPvy9fg8v6IO0SNCXmffaib06waI6J8pmHaVPbRMz7zEPDzyaFSLwNM+8j9K4EbZufaH0WYLezuGh9FqBZdeuKRYcF2Fmf7WRZDxJvQUH+03Gxfx4/yv4SkmblQFpMvObRERRFi9VmeyFoVg5cHJIs8/IerVi5MGHskOEj41eueic9PZUJ/2jNihfGDFapyt2dbt+x5ZnBPbKyM2/fuQmJ6+Rfx175vwQ4GPX8wPUb1hrvdvr0XyvfWwQXQuTZc6ZeuPgPE75v/48jR/VPS0uZ9MrzcBVce+TXn41X/XHs1/EThj87vO/7H76bn5+HuEIQbLoMhJdPp9O9MWfKxUv/vjFrwTdf/+Dj7fuf6RMzMvUd31OmzNRoNFu3bYLj3NyH32/fPP0/c4IbhUgl+kTw/febVyxf++vhvyHwwMHdh37ZD4Gg9cpVi8rKyt5+a+l7K9eFh0cuXPQG/HmQwT13SUnxp599OG/OO8d+P9e7V/yHq5c9eJANp5KT74Li/fsP+X7b/gH9h3z2+WrEEULf6BDE+khWZZCRK1cugkUsmL+8S+duvr5+06bO8vTy/umnHXDKw93jtRnzdu/ZDmqu37CmVcvWQwaPMF7Ys2dfkFImkz3dp1+nTk/98ccRCJTL5V9/tWvO7IUd2sfCv6lTZimVyitXLzKXwB9j4ouvxsS0IQgCZIJm6l2Di1pQPyiw0YsTJnt6eMJVg02ewhKWHgxY5H0UN/8g8NvALp7s0In5Cj+sfbuOly6fZ76CNL8dPbRg4azc3JzvtvxkemGz6BbG49CQxr//cZg5VihKv978OZjzo0e5TAiUpMaYLVs+wRx4eHjCJ9gjfGZkpEeaOPA1xuEAK+Nj2erg4tIRfgAYBWRGpoHe3pUeCsaNmfTazFdAU3//KivFTZ1rg9GVlpbAASTGmW9MfrJD53cWvsdYWb8BXau8nrl3KyoqDAurnNDmIue+0To7b1rsKi5cplD6+fm7uLisXPGxaaCErBxd3/Ltxh7d+/zvzCmoiIExGsMZw2GALI9R8/iJo2q1GjI+uCeqandW8PT0MvXHDfaLHIPwFZemTZtD9hQY2Cg0pHxOVGZWhrdXufX999C+pOQ727cd+HH3NsjRY2O7QobInILk2aNHH+YYsrCoJtHIYEeQKhntgBMn/2DzDkFBwX+fPklRFNNjfPp/fyGukATBrlywAddWR8cnO3fu3O2jj5ZDuissLNh/YPfUaROOHDkIpx4+zIESY9qUWW5ubuPGvgxpaoNJBeXcP6fPnP0bDk4lHofaSXz8M3AcFdUMsryDP/+k1Wrh7PnzZ728vHNysq2/Q58+/cBO4c8DhQncav/+HxFXKFaFh0NaHatWroMfvGzF/OvXrzRuHAFCjByZoA9/fzHY5oABQ+AYStg5cxbNmTsNSkxIaxAyNuGlzZvXvz3/dTAZiD94kN77eFzfAampyVDX+Xjdqk6xXd96891dP2zdsfPb4uKi5s1bWXoBiDl1ysyDB/f0je8UFNRo4fwVr8+a7IhZ3LbnuHz5VrJ/iHP/l0KRw4BqGlR6P/l4U9u2HRAe3L1YlLg/Z8bH0dajiY02XojymYdlbo9Fb3NUVPSff/yDcIISrLeZRg7Ic7FHsN5mGhENUD52iHmfBconSttAlM8CNEJI7G22G5IkSEGsj+DW31dPgKEOYaYI0dz6++oHhFDz+xomNLuhIlE+Xojy8cK2fFI56eTS4OYIkTKp1EmIybkurhJlSYObZlCYXSZxEmKgMrqdZ+EjrDeScgSpN0p9g2yvpbItX+eBXjIZcfibTNRgSLqkVBSpn3s9xGZMtgtSd66+ry6jW3fzad6Ry3LhukZmkvrCnw8LH6qnvN+ETXwOy6H3bsh+mK7UaSmd1twlFtdzW17oTdvZnjGsKrfnntYfCKNyEinp6es09q3GiB32bIOjVle9BeP/22RAqcoCbnMDTdUuMY0/cOCA/fsPuMrlj59SlJYmjBkzdcqUwYMH0+acwDOPoy0s/Td6RicsRJCV/58D9tT7ZByfwZ47d+4EBfl6esrNnj137qJGU7r5m41PdmyDya44eFXoLl261K5dO0tnz5w5U1hYmJmZuWDBAoQHeMl35cqVNm3aWDp79uxZZjOrW7duLVu2DGEAXvJdvnzZkvVBui4uLmYmXUB+/eeff+7btw/VNhjJl5eXV1paamm3IDBMU+cTIOWWLVuSk5NRrYKRfFZMDzh+/DikXNOQ+/fvz5s3D9UqGPW4gHxWMj7G0EBBmUzm6enp5OR06NAhVNvgJd+MGTMsnZXL5cz2hw8ePMjPz2/ZsiXCAIwSr/Vay549e5iDoqKipUuXIjzAxfqgZGjdujXBYhZws2bNoqKiEB7gIp9106vGypUrER7gknitV5irAVqnp6cjDMBFPuu1lmpcv379xx+5z7d1AFjIB81YqIj4+/uzjN+tW7fg4GCEAVjkfWB6bdu2ZR8/wgDCACysj1PGx7B3715o4aHaBgv5OGV8DImJiThsIl77iVetViclJXFtRSQkJBBcloo5iNqXzw7TAzp16oQwoPYTL1TiOJUbDNDtvGPHDlTb1L58dpQbgJeX1xdffKFUKlGtgoX12ZF4gUWLFpWUlKBapZbzvnv37kFt2cPDA3FnwIABqLapZeuDQZ++ffsiu7h69ervv/+OapVali86OvrEiRPILn799deHDx+iWqWWEy/IB0MWhrXjcq7XPvXUU9D3h2qV2i86oJcUkiHiDnQcBAQEoFql9uWDWgvUXRBHYKBy9WrOu7MITl21vmvXrqWmpqLapq7KFxISMmvWLFTb1H6b18/Pz9nZGXpMQRH2V4kzrCqxwwAh4wPFUW1TV+XbvXt3o0aNUG1TJ+WDnoLNmzez2RTd0WDhoxLeoXPnzufOnUN1DSysD/qNY2JioC7CMj4MdBw+fBhhAC7jvJzS78mTJ+3rpBEcXOTj1PaYMWNGly5dEAbUSeuDjgYYVkcYgIt8YWFh0HVcUFBgMyZ0ES5ZsgThAUbz+0wNcMiQIZaiwcicq6srwgOMnGv37NkTKnS0gdDQ0J9//tlstPz8fJlM5ubmhjCg9tu8gwYNys7OJgwgQyUG5LMyhcXHxwdhQ+0n3vfeew+aX6ZTBuA4NjbWUvxRo0bV+vikkdqXr3379hMmTHB3r1zn6uvrC/mg2cjQTaBWq41bmdY6WBQdY8aMiY+Pl0rLcxJQBxohZmNC7/zOnTsRNuBS8sKYN9ScmXIDqnWWylao7mFSaDBgVHFZu3ZtVFSURCKxMv0Hanx//vknwgZeFZdda+4XPtTo15frmA3HaBY777BaXG7xVqZX21g4buNlSAlBkoSbh/S518Pd2LnSfhz75du0IMXNW9qup39otFzH3KtiZbbp2u5qS8mJil9NPLaUvJosZleBo8fWqT/+UNP7IJPI1fbikiD0KEt95VR+ZnLJ5OVNZXaVRnbK9+X8e09082vXyxPVC7a/d+/5WU18uc82tyfv27c+29VDWm+0Axo3dzu4yZ5hT3vky81UhjWrV9uRdB0YqCjWIu7YI59OQ/sE1itv5TJ3/Sb1edmcFbRHPq1Wp622GUndR0fBP85bdYkb0FXAyjtMdUT5KqG5C2iPfPo+JdKOPxXWQAWuhqxP/yTKzso2tti3yEZMvBXQrLbKrYYoXyV22J998tlVSmEOYY9jCDutr/6ppy93iRopeeFJVP0TkLBnb3S75COgsVLfSl5Df1YNFR34DA4LB2FP2SEWHRXYZRB2Wl/9S7sGg6ihxEvQ9dL6aqriQovOnxhqYqDSiuvxn/buem70gFOJx+P6df5s/UcQ8szgHrt+2Gq89sPVy6ZMHY/0K3+T4NobN6+9s3guHDyfMOiLjet0OmaQyqI3dA7YlZ7skU9fcHDpcbHielwmkykUpQcP7pn/9rIRw563chNmPuSatSvi4gb+duT0wvkrftz9/Z/HjyKr3tC5/Ch7CkR75NMXHNx7XMy6HoeKvkqlSkiYGB830NSfsyV694rv0zsepGzX7smQ4NDbt28gq97Qufwoe4pDu6wP4N7fV831eEpq5a6tLVuw9R1u6lTW3d2DcSht3Rs6W2hUQyWvfh4Kd+sz63qcQcZ6H3KzS2FsekN3HDXXYWXW9bhNdJTOZhyb3tBZQdhTetjZWY+4J16zrscfRyZzVioVxq9sylDr3tDZQtdY0UHbU3SYdT3+ODExbU6c/IPZoGXb95tzc3Ns3tmKN3RO2NGSt6/eZ0+rg3E9DjnUknffNLoef5wZ0+f6+vgNHdan34CuZWWquL4D2dx81cp1vXvHL1sxH+p9e/ftMnpD5wA7n57VL7JD8s9n3+n6TECLzt4s42Poevxxvn33bsK88IAQbs407O1xETFgb49LvQMKaqLG+vs4VZsxdD3+OFBBomumt9mubu36iV31vnraY1VjU4Rooj56XaypkbZ6OdbBallAdexs89bHrI+osTku9XCow5Ciamygsv5RgzOs6uNApV2IA5W8sEc+iYQkJfXN3TYJ7SiCYw+rffKRUkKtQPUMgkQBwZzls8eIvPydUq8Xo3rE2cN5Mrk9UthzTcJrYQW5KlSPFsYkXS7s0NsPccfOFZUlj3Tff5Aa3c6ryxB7nooP188WXzj6sP+EoKg29ixSt389b94D3f716WUqiiQIjaZqG9hkbS1NG6bNVXPHTDy2CheZrtc1XFMlnDa0CYiqq3bp8vp7lcAK6KqPQ9WfKJHqW2kSqaRND6+ug+wc1eS7DU5+ljbtTqlGU2040UQV/e8mTb7qFWU2azHU9GlTmZhvBunKZ2rv23dg4MD+clcXknnVyhsb7kEZrjGGMi0H/bqTioneFY+rrBVXnCIlhH+wPLwFr005MNpFyCzx8fF79uzx9mY7rlLD4C7f3bt3o6KicNhn0yy4y4c5uDceJk6caJzEhyG4L8q6fv26RMK5MVBj4J5479y5U+teJawg5n28wDrvU6lUkydPRhiDdd5XVlaWkpKCMAbrxAtlLsjXtGlThCti3scLrPO+7OzsOXPmIIzBOu9TKpVpaWkIY7BOvFB0gAFi4gjaLGLexwus8z5ocrz77rsIY7DO+4qKirKysnapV70AABAASURBVBDGYJ14FQpFfn5+aGgowhUx7+MF1nnf+fPn16xZgzAG67yvoKAgJ8f2qqJaBOvEW1xcXFJSEhzMfUfbmkLM+3iBdd6XmJiI1Qb/j4N13ge1llu3biGMwT3vKy0txcEXpSXEvI8XWOd9586dW7duHcIYrPM+GCrCwYO2FbBOvJDxQc1ZbPPWW7DO+27evLl8+XKEMVjnfRqNJjk5GWGMONbBCzHv4wXWeV9GRoY4zms/FEWJeZ/9aLVaMEAx76u3YJ33FRYWTpkyBWEM1nkfSZK3b99GGINj4gWLY0oMKDqg6qffq5KioAr9zz/YbUWEY+JNSEiAQgO6miHxQqcLKAja4TlghKN8Tz/9dPPmzU2TBRzjOb8e06Jj0qRJfn6VS10DAgLGjBmD8ANT+bp27RoTE8MYIGR8TZs2teLzuBbBt+JiNEAfH5/Ro0cjLMFXvnbt2rVv3x7KEGh19OnTB2EJt4qLugTt/SK9OF+n0UCSsnTh4yu7K88QpOW9/8oXnlcJMSyBNoSa29qRWaxu4W6oYqU0K5xkpKubJLafb6suHHzncqg2lxbqtq5M9QmSteriLZWROopxx2rOu3X5qvDqZ5iV4uZ2ICLKV9aXO8w2iUEYJWJWgVe5vvqTiEpvYfrF5mZ3O9LHIR5z+o2cCOn9pNKT+x/qKLr1Ux6IHWyt79Y5xbHd2eMXRqH6zs4PUkKbugx+JYhNZLZ534m92R36+KMGwJi3IlNvlLBcQ8xKvtv/KMFGn+hef5yRW8fF3enIt6zmFbLK+9KTSiXSBrTpl9yNLMhTsYnJSj6tWqMua0DdgiqllmWRLXpI5YUoHy9YFR167zoNab9DgvX2jqysj6n8NxzYuylnJR+BGpTxGVIbu91Z2VabG9Rmr/rNxQQseWm7PKnUXSjWW8qzK3nFsWALiBUXM1joFjIDu6KjoVVcENu8iqX1NazEy96LGst6X730Ri4ALFsdnN0gKRSK995fPHhorzffmpGcfPfpuNjLly8gexk+Mn7rtq+RwZ9vXL/OCBvYWx/ixJWrF48e/WX6f2a3bxfr7e3z4oTJgYECrK2KadV6wniHbwpGMNvyssBRJa9CUQqf8XHPMJ5KJ700FQlBq1at4R9yMAZbEa7arPfHy2VE88DBPes+eR8ORjzXr1Ns16lTZhm9BC5d9jbcDWR9/8N3lUpFTEybqa/OZBS5dy/p4M97zl84l52dGRkRNWjQ8GHPjqp2Z0i8G75Y+8fRs4mJJxYtrj5vd9t3e8PCwmFsc/M3G/535lROTnbr1u1HDHu+a9ceiAsEa8cnbBMvxcWnJ/xsT0+vZcvn7/vpKFgf5H2Vz5NKL1+5AAXRxi+2BQYELVg4a9UHS7Z++xOcWr9hDQg3e/ZC0DctLeWTTz8ICgru2qW72Ue0bt1u7ZqNxq9wbWlJiZ9fABx/+tmHh48cfG3GvN694xMTjy9Z+uaC+ct794pDrKFZN1FZdhkgARttSoVi3tzFrq6ucBzXdyCYIZQz8PWdd1ZBkg9uFALhHdrHHjly8Oy5vy3J5+XlDXGYYzD2jIz0zz/d4uLiUlZW9utv/x075qVnhz4HpwY9M+zq1Utbt23iJB8MRgtqfYLW+xqHRzLaIYODbKRft1ukD6HpvXt3nTmbaPTJGxxsezXb3bu3P1//0cIFK5o21U/Bun37hlqt7hT7lDFC+3YdwRgLiwq9PL0QewRsdQiL2T2sKYp6e8FMjUb9f5NntG8f6+Hu8drMV2zdCRUVFy1aPHvYs6P79I5nQhgf1I9fm5/3iL18NCVo4pVICEdv/nv7zs2bN699tHpDxyfLq3UgRIB/oPWrVqxYAPnjtKmzjCF+/vrsb87shaGhjU1jcqo26T1wUsJVXCgd0ukc2+ooLCyAT6NeKSnJ8K9JpLVtN3fs/Db53t3Nm3aZbuwcFhru7OyMDLknE5KfnwcllTG7YIPBcT2r3I9d4iVoR/cZQE0FCuUfftw2ZcrMgvy8zz5fDTWe7AcW9/+6dOn8pq8/h2o5KGgMDA1pHBgY9NLEKVBWRIQ3adEi5vT//oJjCF+2dDVyAI5qdXAlKKgRZP/fbf1q2PC+kO4Wzl/+KC/3ncVzJ04a9d2WPY/Hh+IV6esra00DZ0yf+9zIhIQXXmzatPmOXd+eP3/Wzc39iZi2c+YsQo6B1RShX7dlJV1WTlhU/+cHMfz0aQp01k96J8JmTLG/zyxsh4rYFR36lRUNqMOq3JMNC9gOVJINyfzYN/DZ9jY3qM5SgUfaGl7exxaW9T7UoIY7oJXFMr2xq/dRNN2QfKIamlgCFh1i4rWAOEzOC/byNagpQvrWGJuY7Of3NaDUSzP+MFkg5n28YCUfSRiWODUYpFIS6YRLvK4eThJpfXNGbgUYlpW5sEpurETpNtBXq2bdkKn7KIq1kS1YratkZ1My5OXvfGhTBmoAXDxWSNKo8yBW40oc1vPu+DAd6cih/8F3K0z+nD746N7VwikfsO0Y5rYc+vv37xfnqWXOJKKQRlslOVf1x1z+SUr0w0yoYi2zMVwCWTNVGY6QybWosooJl+vHDCu/ElRFjk4aVuXSlMkTGRfU5S6kESlFtK48HKLBEylmyKHcX3TlyzBI5aRORTs5Ey8vi0Ss4bwNTuED+twfuYpiDaWtciHzlsYD5pOUEkw0wpBJ6MOZ95YStEm4fiq7QZfq8kmJ5KTksLBwqWEszXgVo5/+To/JZ1xMDdfqm+oVb0JAGWlQE5HlEZhj47JoVzeniNZuLWK5OXjHfQuw/v3779q1y9fXF2EJ7m1ejUbj5OSEcAV3+bRaLYz/IlwR5eOFKB8vsJZPp9NJoOMc4+4KrOXD3PSQKB9PRPl4IcrHC9y3/Bflsx/R+nghyscLUT5eYN5fgETr44koHy9E+XghyscLsejghWh9vBDl44UoHy+wfjkYRPXy4rKGucbB3dFYSUkJwhi8k4ZUCukXYYwoHy9E+XghyscLUT5eiPLxQpSPF6J8vBDl44UoHy9E+XghyscLUT5eiPLxQpSPF6J8vMBxWcz06dNzcnJIkoSRtrS0tNDQUHhJOD58+DDCDBxX6cbHx2dkZCQlJYF28BWOMzMz1Wo1wg8c5RsxYkTjxlW2zoRe++joaIQfmK4Rf/HFF023y3Rzc3v++ecRfmAq3+DBgyMjIynDXlzw2aRJk7g4Dhsv1xj47lAwceJEb29vOJDL5aJzbc6AuTVv3hzK3JCQkKFDhyIsEabicu5oQdrNkpICvcdorZbSacuXaxucN+tXH5PMYmnD10pfziRNU4bN2Qia8cvHLL+iKvaggWSrVpc5OTlJJFXqp8zKZ9PF68i4HpswXU1NU9rKBV0kSVA07Swn3b2kjSLlPQf5I2fEE17ynTmcfyWxUFmqJSVIIpVIZVKpsxR+D6XVlf8Movx36jeSrnhQ5e5kRJWtnSq8a9NVfZQTFSdNQivCqv6U6rFoEhGmGwZISFpHUTpap4E/M6XTUc4ukvAWbgMm2Nhc2wp2ynf+98Izv+bCpa7eriGt/GSuDt5U3DFkXH1UlFsKmka1cR84kZU37WrYI9+WpSmKYp1/Y++g5t6o7lOco8y48ZAg6CmrOO+szFm+DfOSnF1lTbuGoPpFxtXcvMziYdNCw5u7sL+Km3wb5iYFRvr6N62fTsp1OnTzWMr4+RFeAWx7UjjIt35uUkTbRu4BclSvuX4sdcCERk3bsnIRwLbe99WCZN8Qz3qvHdC8W/iRrVksI7OS74e1GVCJCm6F6WYqwiKVE96Bbl8vuscmsm358rO1OenK5t3r89ZV1QhtE1Cmoo7temgzpm359m647+Zd/9NsNRo18735T5HNaDbkKyulFcWaqM7BCEtKSvPnvtPl4pXfkdD4hXtCC+/E3lzr0WzId3hLlpMc64UpjsPNy+X2vzYM0IZ8OfdVHr7cNhWrNwQ39ytT2tgz00b9UK2mmjR11NKAouJHPx9el5J+Wa1WtWjWNb73y4EBERCe9SBpzedjX5/yzbGT3129ccLLM7B9m36D+k1nXDpduPzbkT++VCqLYlr27N19HHIYci99sruaWNy6u4elONasL+mSgiAIJxeHdAfodLqN3/wnKeX8c0PfnjNjh7ub76dfvZz76D6ckkr07737wKoObQe8v+TU2FFLTyRuv3RNn8FlPbi7Y8/i2A6D3p71U2z7wQcOrUGORCqTQEeclQjW5HuQpiIljtoA6V7axZzclDGjlrZs/pSnh9/Qga+7uXr/dXqXMUK7J/q2ax0nlTo1bfKkn0/o/YybEPj3mZ+8vRr16/OKq6tndFTHLrHDkSMhpWRxgc5KBGuJV1Giddz+USmplyQSp2ZR5d7U4EEgU3JKpQvksJBWxmO53EOp0rv/y81LbxRU2S/SODQGORJ4K63GWvYntXoxdPw6ar9hpapEp9NAtcM00N3Nx+TpZlKGQlHk71c5himTcegdsQMSXsLqft/W5HP3cCIdNhji4e4HP/7lcVUyL9KWlyBIsxqNyvi1rKwUORLonJY6WXsla/I1inLRHc9HjiE0uLlarfT2DvL3DWNCHuVlmFqfWXy8g6/f/AvGQBihr986hRwJ9Ol7+FlrcVmTNqKVHDqyy4ocMkmnWdNOLZs9tXv/yvyC7JLSgsQzez7Z+NLZ8z9bv6rdE/HQ0th/aA30s91N/vfvM3uQI9FpqCZPWOu5slHvk8klOSn5jdsGIAfw8vi1p8/t/f7HRanpVwL8I55sN7DnUy9Yv6RFsy5DBrx2+uzeeYu7QhE8bvTS9V9PcZAvEUWeRv/EWA8rcWx0lx78KjMzuaxl73DU8Eg+kwUDna9Y3QXbRlb97KshmjIdapCoSsva9rCRF9vu1Pf0dbp7JjO6i/mxIcjFF6/qZ/aUVquGmp3ZmmOjgKgZr25CwrF52+x7aZfMntJoYKDdzHi4zEm++M1DyAJZdwphlLhTfxsNVhZjHTr0+by7rfs1sXQ+Lz/TbLhKVSKXm3fbQJJSby/7B6cfp6goV6szPwGwVFHk5mp2bIvw9bHYEXfjj9QOfX26DrJhfayGig58kZmdVtaiV0PJAVMv5OjK1C8vjbAZk1WteNi0ECcpkX4lFzUAinOVpfkKNtoh9iNtLy+PLMktzbyeh+o7aRcfTFzSlGVkbsPkXy245+Ll1ritH6qPFD1Qpl3OnrY6WsK6i47zJI2NbydDg6l5z8aofpHyT3ZJgXLqqmipjMNV9kwR2vnR/UeZKo8At4j2QpaetcWDu4V5aQUu7uRLSyIRR+ycoHb/lurw1kwYCnB2kwU08fYOrnvjIcpCTfadPFWxCjrG2nb36TbUx46b8JoeeeeiIvFgTkmBvk8BOmZJCSlxkhBwS6MbI723R6KKt2WiYsJp+dfy9mqVXjW4hCTKvRmZBhKGmac2AwEpibTVeypp0jCPFcYfddAY0ztSdHGXPNHVu8sz9k+zE2ZybvoSJRS/AAAAeUlEQVRN5a3zxblZaq2a1ukonbr8nvpJpRUehcpDDFNAK73fMo6XKmbpGq8iCIrSVWmu6KfWgsZVG5D6mCRlOgO3PLKMoNTVfxdEJiW0zEXi5S+LauXWsgsrT2zWwd1XEeaI/nl5IcrHC1E+Xojy8UKUjxeifLz4fwAAAP//CW3FMQAAAAZJREFUAwBF9rnnoIiuKQAAAABJRU5ErkJggg==",
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from IPython.display import Image, display\n",
    "\n",
    "arch = TreeOfThoughts(branching=3, beam_width=2, max_depth=3, llm=primary_llm)\n",
    "graph = arch.build()\n",
    "display(Image(graph.get_graph().draw_mermaid_png()))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "44f93353",
   "metadata": {
    "papermill": {
     "duration": 0.0,
     "end_time": "2026-05-27T10:41:35.402458+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:35.402458+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 8 · Live run — Game of 24\n",
    "\n",
    "Concrete task: **Game of 24** — given 4 numbers, find an arithmetic expression that equals 24 using each number exactly once with `+`, `-`, `*`, `/`. This is the canonical ToT benchmark task because **branches are objectively wrong**: an arithmetic step that doesn't produce a number capable of reaching 24 with the remaining numbers is provably bad.\n",
    "\n",
    "We use `[3, 4, 6, 8]` — solvable (one solution: `(8 - 6) × 4 × 3 = 24`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "4871658d",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T10:41:35.428234Z",
     "iopub.status.busy": "2026-05-27T10:41:35.428234Z",
     "iopub.status.idle": "2026-05-27T10:42:53.365844Z",
     "shell.execute_reply": "2026-05-27T10:42:53.362033Z"
    },
    "papermill": {
     "duration": 77.947611,
     "end_time": "2026-05-27T10:42:53.365844+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:41:35.418233+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Final answer</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">──────────────────────────────────────────────────────────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mFinal answer\u001b[0m \u001b[92m──────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; text-decoration: underline\">Step 1: Division</span>                                                                                                   \n",
       "\n",
       "Begin with division: 8 / 4 = 2, remaining numbers: [3, 6, 2]                                                       \n",
       "\n",
       "<span style=\"color: #800080; text-decoration-color: #800080; text-decoration: underline\">Step 2: Multiplication</span>                                                                                             \n",
       "\n",
       "Try multiplication next: 6 * 3 = 18, remaining numbers: [2, 18]                                                    \n",
       "\n",
       "<span style=\"color: #800080; text-decoration-color: #800080; text-decoration: underline\">Step 3: Addition</span>                                                                                                   \n",
       "\n",
       "Add 2 and 18 to get 20. However, the goal is to reach 24, and the provided steps do not directly achieve this. The \n",
       "closest expression from the given steps is (8 / 4) * (6 * 3) + 2, but this does not equal 24 as per the initial    \n",
       "task.                                                                                                              \n",
       "\n",
       "The final answer is: $\\boxed{8 / 4 * 6 * 3 + 2 = 20}$                                                              \n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[4;35mStep 1: Division\u001b[0m                                                                                                   \n",
       "\n",
       "Begin with division: 8 / 4 = 2, remaining numbers: [3, 6, 2]                                                       \n",
       "\n",
       "\u001b[4;35mStep 2: Multiplication\u001b[0m                                                                                             \n",
       "\n",
       "Try multiplication next: 6 * 3 = 18, remaining numbers: [2, 18]                                                    \n",
       "\n",
       "\u001b[4;35mStep 3: Addition\u001b[0m                                                                                                   \n",
       "\n",
       "Add 2 and 18 to get 20. However, the goal is to reach 24, and the provided steps do not directly achieve this. The \n",
       "closest expression from the given steps is (8 / 4) * (6 * 3) + 2, but this does not equal 24 as per the initial    \n",
       "task.                                                                                                              \n",
       "\n",
       "The final answer is: $\\boxed{8 / 4 * 6 * 3 + 2 = 20}$                                                              \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">tree size: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">16</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\"> thoughts  ·  max depth reached: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">  ·  best leaf score: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">/</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">──────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mtree size: \u001b[0m\u001b[1;36m16\u001b[0m\u001b[1;36m thoughts  ·  max depth reached: \u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;36m  ·  best leaf score: \u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;36m/\u001b[0m\u001b[1;36m5\u001b[0m \u001b[92m──────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "TASK = (\n",
    "    \"Game of 24. Numbers: [3, 4, 6, 8]. Find an arithmetic expression using \"\n",
    "    \"each number EXACTLY ONCE, with operators +, -, *, /, that equals 24.\\n\\n\"\n",
    "    \"At each reasoning step, perform ONE arithmetic operation combining two \"\n",
    "    \"numbers from the current set, then report the result + the updated set \"\n",
    "    \"of remaining numbers. Continue until one number remains. The final answer \"\n",
    "    \"is the full expression and whether it equals 24.\"\n",
    ")\n",
    "\n",
    "result = arch.run(TASK)\n",
    "\n",
    "print_header(\"Final answer\")\n",
    "print_md(result.output)\n",
    "print()\n",
    "print_header(\n",
    "    f\"tree size: {result.metadata['total_thoughts']} thoughts  ·  \"\n",
    "    f\"max depth reached: {result.metadata['max_depth_reached']}  ·  \"\n",
    "    f\"best leaf score: {result.metadata['best_leaf_score']}/5\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0c61d7e",
   "metadata": {
    "papermill": {
     "duration": 0.016815,
     "end_time": "2026-05-27T10:42:53.395648+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:42:53.378833+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### 8.0 · What just happened, briefly\n",
    "\n",
    "Three counts to inspect above:\n",
    "\n",
    "- **`tree size`** — should be ≈ 1 (root) + branching × max_depth × beam_width = 1 + 3·3·2 = 19 if every branch survives pruning. Smaller = beam pruned aggressively (a *good* sign on this task: bad arithmetic branches should get low scores and die).\n",
    "- **`best leaf score`** — Game-of-24 forces objective scoring. A healthy tree has scores ranging 1-5: wrong arithmetic gets 1-2; arithmetic that's right but heads away from 24 gets 2-3; arithmetic that's right and looks promising gets 4-5.\n",
    "- **`max depth reached`** — should equal `max_depth=3` (3 operations reduces 4 numbers to 1)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38069d88",
   "metadata": {
    "papermill": {
     "duration": 0.013763,
     "end_time": "2026-05-27T10:42:53.425676+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:42:53.411913+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### 8.1 · Tree visualisation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "7e8f887a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T10:42:53.494015Z",
     "iopub.status.busy": "2026-05-27T10:42:53.489457Z",
     "iopub.status.idle": "2026-05-27T10:42:53.531267Z",
     "shell.execute_reply": "2026-05-27T10:42:53.526138Z"
    },
    "papermill": {
     "duration": 0.098077,
     "end_time": "2026-05-27T10:42:53.534401+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:42:53.436324+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "   [d=0 s=0/5 id=0] [ROOT] Game of 24. Numbers: [3, 4, 6, 8]. Find an arithmetic expression using each number ...\n",
      "       [d=1 s=4/5 id=1] Start with multiplication: 3 * 4 = 12, remaining numbers: [12, 6, 8]\n",
      "           [d=2 s=4/5 id=4] 12 + 6 = 18, remaining numbers: [18, 8]\n",
      "               [d=3 s=1/5 id=10] 18 * 8 = 144, remaining numbers: [144]\n",
      "               [d=3 s=1/5 id=11] 18 + 8 = 26, remaining numbers: [26]\n",
      "               [d=3 s=1/5 id=12] 8 - 18 = -10, remaining numbers: [-10]\n",
      "           [d=2 s=2/5 id=5] 12 * 6 = 72, remaining numbers: [72, 8]\n",
      "           [d=2 s=2/5 id=6] 8 / 6 = 1.33, remaining numbers: [12, 1.33]\n",
      "       [d=1 s=4/5 id=2] Begin with division: 8 / 4 = 2, remaining numbers: [3, 6, 2]\n",
      "           [d=2 s=4/5 id=7] Try multiplication next: 6 * 3 = 18, remaining numbers: [2, 18]\n",
      "⭐             [d=3 s=2/5 id=13] Add 2 and 18 to get 20, then find a way to reach 24 from 20\n",
      "               [d=3 s=2/5 id=14] Subtract 2 from 18 to get 16, then find a way to reach 24 from 16\n",
      "               [d=3 s=2/5 id=15] Divide 18 by 2 to get 9, then find a way to reach 24 from 9\n",
      "           [d=2 s=2/5 id=8] Attempt addition: 3 + 6 = 9, remaining numbers: [2, 9]\n",
      "           [d=2 s=2/5 id=9] Use subtraction: 6 - 3 = 3, remaining numbers: [2, 3]\n",
      "       [d=1 s=4/5 id=3] Unconventional approach: 8 - 6 = 2, remaining numbers: [3, 4, 2]\n",
      "\n",
      "⭐ = best-scoring leaf used to synthesise the final answer\n"
     ]
    }
   ],
   "source": [
    "from collections import defaultdict\n",
    "\n",
    "children = defaultdict(list)\n",
    "by_id = {}\n",
    "for t in result.trace:\n",
    "    by_id[t['id']] = t\n",
    "    children[t.get('parent_id', -1)].append(t['id'])\n",
    "\n",
    "def render_tree(node_id, indent=0):\n",
    "    if node_id not in by_id:\n",
    "        return\n",
    "    t = by_id[node_id]\n",
    "    marker = '⭐' if t['id'] == result.metadata['best_leaf_id'] else '  '\n",
    "    content = t['content'][:90].replace('\\n', ' ')\n",
    "    print(f\"{marker} {' ' * indent}[d={t['depth']} s={t['score']}/5 id={t['id']}] {content}{'...' if len(t['content']) > 90 else ''}\")\n",
    "    for child_id in children.get(node_id, []):\n",
    "        render_tree(child_id, indent + 4)\n",
    "\n",
    "# Root has parent_id=-1; print its children\n",
    "for root_child in children.get(-1, []):\n",
    "    render_tree(root_child)\n",
    "print()\n",
    "print('⭐ = best-scoring leaf used to synthesise the final answer')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c5f52762",
   "metadata": {
    "papermill": {
     "duration": 0.016595,
     "end_time": "2026-05-27T10:42:53.567841+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:42:53.551246+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 9 · What we just observed\n",
    "\n",
    "The cells above ran a 3-deep, 3-wide beam search with `beam_width=2` against **Llama 3.3** on the **Game of 24** puzzle (objective scoring forces real discrimination).\n",
    "\n",
    "### 9.1 · Quantitative summary\n",
    "\n",
    "| Metric | Value |\n",
    "|---|---|\n",
    "| Tree size | **16** thoughts |\n",
    "| Max depth reached | **3** / 3 |\n",
    "| Best leaf score | **2**/5 |\n",
    "| Score distribution (non-root) | [4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1] |\n",
    "| Distinct score values | 3 |\n",
    "\n",
    "### 9.2 · Score distribution table\n",
    "\n",
    "| Score | Count |\n",
    "|---|---|\n",
    "| 4/5 | 5 |\n",
    "| 2/5 | 7 |\n",
    "| 1/5 | 3 |\n",
    "\n",
    "### 9.3 · A sample of captured thoughts\n",
    "\n",
    "| Depth | Score | id | Content snippet |\n",
    "|---|---|---|---|\n",
    "| 0 | 0/5 | 0 | [ROOT] Game of 24. Numbers: [3, 4, 6, 8]. Find an arithmetic expression using each number ... |\n",
    "| 1 | 4/5 | 1 | Start with multiplication: 3 * 4 = 12, remaining numbers: [12, 6, 8] |\n",
    "| 2 | 4/5 | 4 | 12 + 6 = 18, remaining numbers: [18, 8] |\n",
    "| 3 | 1/5 | 10 | 18 * 8 = 144, remaining numbers: [144] |\n",
    "| 3 | 1/5 | 11 | 18 + 8 = 26, remaining numbers: [26] |\n",
    "| 3 | 1/5 | 12 | 8 - 18 = -10, remaining numbers: [-10] |\n",
    "| 2 | 2/5 | 5 | 12 * 6 = 72, remaining numbers: [72, 8] |\n",
    "| 2 | 2/5 | 6 | 8 / 6 = 1.33, remaining numbers: [12, 1.33] |\n",
    "| 1 | 4/5 | 2 | Begin with division: 8 / 4 = 2, remaining numbers: [3, 6, 2] |\n",
    "| 2 | 4/5 | 7 | Try multiplication next: 6 * 3 = 18, remaining numbers: [2, 18] |\n",
    "| 3 | 2/5 | 13 | Add 2 and 18 to get 20, then find a way to reach 24 from 20 |\n",
    "| 3 | 2/5 | 14 | Subtract 2 from 18 to get 16, then find a way to reach 24 from 16 |\n",
    "\n",
    "### 9.4 · Patterns surfaced in this run\n",
    "\n",
    "- **Healthy score spread** (1-4/5). The evaluator is genuinely discriminating between branches, which means beam search is doing real work.\n",
    "\n",
    "### 9.5 · Final answer (verbatim)\n",
    "\n",
    "> Step 1: Division                                                                                                   \n",
    "> \n",
    "> Begin with division: 8 / 4 = 2, remaining numbers: [3, 6, 2]                                                       \n",
    "> \n",
    "> Step 2: Multiplication                                                                                             \n",
    "> \n",
    "> Try multiplication next: 6 * 3 = 18, remaining numbers: [2, 18]                                                    \n",
    "> \n",
    "> Step 3: Addition                                                                                                   \n",
    "> \n",
    "> Add 2 and 18 to…\n",
    "\n",
    "### 9.6 · The takeaway\n",
    "\n",
    "A *healthy* ToT run has:\n",
    "\n",
    "1. **A spread of scores** across thoughts (2-5 range, not all 5/5).\n",
    "2. **The tree actually pruned** — at least one low-scoring branch killed off, not just exhaustive expansion.\n",
    "3. **The best-leaf score visibly higher** than the average score.\n",
    "4. **A final answer that obviously synthesizes the winning path**, not just paraphrases the task.\n",
    "\n",
    "When the evaluator is lenient (everything 5/5), the search reduces to brute-force expansion at high cost — see § 11.1 for the mitigation. The reasoning-model default helps but doesn't solve this entirely."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f536eb5b",
   "metadata": {
    "papermill": {
     "duration": 0.021909,
     "end_time": "2026-05-27T10:42:53.606282+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:42:53.584373+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 10 · Compare with the Qwen3-Thinking reasoning model\n",
    "\n",
    "Same task, smaller tree, but using **Qwen3-Thinking** instead of Llama. We use a smaller `(branching=2, max_depth=2)` because reasoning models are slower per call. The expected difference is **score-distribution quality**: Qwen3-Thinking can usually distinguish a winning arithmetic step from a dead-end one even more sharply than Llama, because each evaluation gets an internal `<think>` budget."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2f0f31ae",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T10:42:53.641644Z",
     "iopub.status.busy": "2026-05-27T10:42:53.641644Z",
     "iopub.status.idle": "2026-05-27T10:44:45.656881Z",
     "shell.execute_reply": "2026-05-27T10:44:45.654098Z"
    },
    "papermill": {
     "duration": 112.042714,
     "end_time": "2026-05-27T10:44:45.664153+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:42:53.621439+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Re-running ToT on Qwen3-Thinking (smaller tree, slower per call)</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">──────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mRe-running ToT on Qwen3-Thinking \u001b[0m\u001b[1;36m(\u001b[0m\u001b[1;36msmaller tree, slower per call\u001b[0m\u001b[1;36m)\u001b[0m \u001b[92m──────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using the numbers [4, 5, 6, 7], the solution follows the identified factor pair strategy:  \n",
      "**Step 1:** Compute $5 + 7 = 12$ and $6 - 4 = 2$.  \n",
      "**Step 2:** Multiply the results: $12 \\times 2 = 24$.  \n",
      "\n",
      "**Final Answer:** $(5 + 7) \\times (6 - 4) = 24$\n",
      "\n",
      "  Score distribution (Qwen3-Thinking, 4 non-root): [4, 4, 4, 2]\n",
      "  Note: a healthy ToT run has a SPREAD of scores (1-5). All-5s = lenient evaluator.\n"
     ]
    }
   ],
   "source": [
    "print_header(\"Re-running ToT on Qwen3-Thinking (smaller tree, slower per call)\")\n",
    "thinking_llm = get_llm(\n",
    "    provider=\"nebius\",\n",
    "    model=\"Qwen/Qwen3-235B-A22B-Thinking-2507-fast\",\n",
    "    temperature=0.4,\n",
    ")\n",
    "thinking_arch = TreeOfThoughts(branching=2, beam_width=1, max_depth=2, llm=thinking_llm)\n",
    "thinking_result = thinking_arch.run(\n",
    "    \"Game of 24. Numbers: [4, 5, 6, 7]. Find arithmetic to equal 24, step by step.\"\n",
    ")\n",
    "print(thinking_result.output[:400])\n",
    "print()\n",
    "score_dist = sorted([t['score'] for t in thinking_result.trace if t['depth'] > 0], reverse=True)\n",
    "print(f\"  Score distribution (Qwen3-Thinking, {len(score_dist)} non-root): {score_dist}\")\n",
    "print(f\"  Note: a healthy ToT run has a SPREAD of scores (1-5). All-5s = lenient evaluator.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6849c17",
   "metadata": {
    "papermill": {
     "duration": 0.014536,
     "end_time": "2026-05-27T10:44:45.695342+00:00",
     "exception": false,
     "start_time": "2026-05-27T10:44:45.680806+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 11 · Failure modes, safety, extensions\n",
    "\n",
    "### 11.1 · Where this breaks\n",
    "\n",
    "| Failure | Mechanism | Mitigation |\n",
    "|---|---|---|\n",
    "| **Lenient evaluator** | Every thought scores 5/5 → no pruning signal | Stricter rubric (\"reserve 5 for excellence\"); different model in evaluator seat |\n",
    "| **Candidate mode collapse** | K candidates are near-paraphrases | Tighten `_ThoughtCandidates` description; raise temperature on generator |\n",
    "| **Premature commitment** | Depth-1 thoughts scored before they've shown their work | Defer evaluation to depth-2 or use look-ahead scoring |\n",
    "| **Cost explosion** | `2 × N × K × D` calls × reasoning model | Cap depth; use a smaller evaluator model |\n",
    "| **Best-leaf isn't best-path** | Highest-scoring leaf may have a weak ancestor | Score *paths*, not nodes (extension) |\n",
    "\n",
    "### 11.2 · Production safety\n",
    "\n",
    "- **Bound depth + branching hard** — runaway ToT can rack up huge bills. Always set `max_depth ≤ 4`.\n",
    "- **Tracing matters.** With 19+ LLM calls per task, LangSmith trace is essential for debugging.\n",
    "- **Evaluator is a single point of failure.** If it's biased toward a framing, ToT will find paths that match the bias even if they're wrong. Use diverse rubrics or multiple judges.\n",
    "\n",
    "### 11.3 · Three extensions\n",
    "\n",
    "1. **LATS (notebook 22)** — replace beam search with Monte Carlo Tree Search + a reward model. The natural successor.\n",
    "2. **Path-level scoring** — score whole root-to-leaf paths instead of individual nodes; eliminates the \"best leaf isn't best path\" failure.\n",
    "3. **Process Reward Model (PRM)** — train a small reward model on intermediate steps and use it as the evaluator.\n",
    "\n",
    "### 11.4 · What to read next\n",
    "\n",
    "- [**21 · Self-Consistency**](./21_self_consistency.ipynb) — simpler N-sample-and-vote alternative.\n",
    "- [**22 · LATS**](./22_lats.ipynb) — ToT + reward → MCTS-style tree search.\n",
    "- [**01 · Reflection**](./01_reflection.ipynb) — single-path refinement vs ToT's multi-path search.\n",
    "\n",
    "### 11.5 · References\n",
    "\n",
    "1. Yao, S. et al. *Tree of Thoughts: Deliberate Problem Solving with Large Language Models.* NeurIPS 2023. [arXiv:2305.10601](https://arxiv.org/abs/2305.10601)\n",
    "2. Long, J. *Large Language Model Guided Tree-of-Thought.* 2023. [arXiv:2305.08291](https://arxiv.org/abs/2305.08291)\n",
    "3. Zhou et al. *Language Agent Tree Search.* 2024. [arXiv:2310.04406](https://arxiv.org/abs/2310.04406) — the LATS paper that extends ToT.\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 198.427667,
   "end_time": "2026-05-27T10:44:47.046067+00:00",
   "environment_variables": {},
   "exception": null,
   "input_path": "all-agentic-architectures/notebooks/09_tree_of_thoughts.ipynb",
   "output_path": "all-agentic-architectures/notebooks/09_tree_of_thoughts.ipynb",
   "parameters": {},
   "start_time": "2026-05-27T10:41:28.618400+00:00",
   "version": "2.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}