` tokens reduce the rate of trivial expansions and ungrounded evaluations.\n",
"\n",
"### 3.4 · Where this sits\n",
"\n",
"| Pattern | Search shape | Reward |\n",
"|---|---|---|\n",
"| Plain CoT | linear (1 path) | none |\n",
"| [Tree of Thoughts (nb 09)](./09_tree_of_thoughts.ipynb) | flat beam of K | LLM-as-Judge per beam item (greedy) |\n",
"| **LATS (this nb)** | **real tree with backup** | **Deterministic-picker per leaf, UCB1 selection** |\n",
"| [Self-Consistency (nb 21)](./21_self_consistency.ipynb) | flat N independent paths | majority-vote (no per-path reward) |\n",
"| [Planning (nb 04)](./04_planning.ipynb) | linear plan over actions | replan on failure |\n",
"\n",
"### 3.5 · Failure modes preview\n",
"\n",
"1. **Flat reward (architecture immune by deterministic-picker).** If all leaves get value 4 because the LLM flat-scores, UCB1 degenerates. The fix above prevents this.\n",
"2. **Greedy depth.** With small `max_iterations`, LATS may not have time to expand much beyond root. Symptom: tree size = `1 + branching` and the answer is just root-child.\n",
"3. **Loop trap.** If the LLM keeps proposing the same move, the tree grows wide but shallow. The `avoids_loops` boolean is meant to penalise this.\n",
"4. **Cost.** Each iteration costs `branching + branching` LLM calls (expand + evaluate). 5 iterations × branching=3 = ~30 calls — careful with the budget."
]
},
{
"cell_type": "markdown",
"id": "5574de4f",
"metadata": {
"papermill": {
"duration": 0.000542,
"end_time": "2026-05-28T02:23:38.176998+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:38.176456+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 4 · Setup"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "446898b6",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-28T02:23:38.176998Z",
"iopub.status.busy": "2026-05-28T02:23:38.176998Z",
"iopub.status.idle": "2026-05-28T02:23:40.013059Z",
"shell.execute_reply": "2026-05-28T02:23:40.013059Z"
},
"papermill": {
"duration": 1.836061,
"end_time": "2026-05-28T02:23:40.013059+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:38.176998+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"Reasoning LLM: Qwen/Qwen3-235B-A22B-Thinking-2507-fast ────────────────────────────────────────────────────────────\n",
"\n"
],
"text/plain": [
"\u001b[1;36mReasoning LLM: Qwen/Qwen3-235B-A22B-Thinking-\u001b[0m\u001b[1;36m2507\u001b[0m\u001b[1;36m-fast\u001b[0m \u001b[92m────────────────────────────────────────────────────────────\u001b[0m\n"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from agentic_architectures import get_llm, enable_langsmith, settings\n",
"from agentic_architectures.architectures import LATS\n",
"from agentic_architectures.ui import print_md, print_header, print_step\n",
"\n",
"enable_langsmith()\n",
"\n",
"# Per handoff §10, nb 22 defaults to Qwen3-Thinking.\n",
"reasoning_llm = get_llm(\n",
" provider=\"nebius\",\n",
" model=\"Qwen/Qwen3-235B-A22B-Thinking-2507-fast\",\n",
" temperature=0.4,\n",
")\n",
"print_header(f\"Reasoning LLM: {reasoning_llm.model}\")"
]
},
{
"cell_type": "markdown",
"id": "ba53c8c6",
"metadata": {
"papermill": {
"duration": 0.0,
"end_time": "2026-05-28T02:23:40.016230+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:40.016230+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 5 · Library walkthrough\n",
"\n",
"Source: [`src/agentic_architectures/architectures/lats.py`](../src/agentic_architectures/architectures/lats.py).\n",
"\n",
"Three load-bearing pieces:\n",
"\n",
"- **`_Node`** dataclass — `id`, `thought`, `parent_id`, `children_ids`, `value`, `visits`, `is_terminal`.\n",
"- **`_LeafEvaluation`** schema — deterministic-picker reward (4 booleans/categorical → Python value).\n",
"- **`_iterate` node** — runs SELECT → EXPAND → EVALUATE → BACKUP in a single LangGraph step.\n",
"\n",
"The `_composite_value` function is the only place reward numbers are computed."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "cfb4915d",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-28T02:23:40.016230Z",
"iopub.status.busy": "2026-05-28T02:23:40.016230Z",
"iopub.status.idle": "2026-05-28T02:23:40.045033Z",
"shell.execute_reply": "2026-05-28T02:23:40.045033Z"
},
"papermill": {
"duration": 0.028803,
"end_time": "2026-05-28T02:23:40.045033+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:40.016230+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"--- _LeafEvaluation schema ---\n",
"{\n",
" \"description\": \"Deterministic-picker reward \\u2014 LLM commits to objective features only.\",\n",
" \"properties\": {\n",
" \"makes_progress\": {\n",
" \"description\": \"True iff this leaf advances toward the goal vs its parent.\",\n",
" \"title\": \"Makes Progress\",\n",
" \"type\": \"boolean\"\n",
" },\n",
" \"is_complete\": {\n",
" \"description\": \"True iff this leaf represents a COMPLETE solution to the original task.\",\n",
" \"title\": \"Is Complete\",\n",
" \"type\": \"boolean\"\n",
" },\n",
" \"avoids_loops\": {\n",
" \"descrip...\n",
"\n",
"--- _composite_value source ---\n",
" @staticmethod\n",
" def _composite_value(features: dict[str, Any]) -> float:\n",
" \"\"\"Python-composed reward. Same deterministic-picker pattern as RLHF nb 15.\"\"\"\n",
" v = 0.0\n",
" if features.get(\"is_complete\", False):\n",
" v += 5.0\n",
" if features.get(\"makes_progress\", False):\n",
" v += 2.0\n",
" if features.get(\"avoids_loops\", False):\n",
" v += 1.0\n",
" conf = features.get(\"confidence\", \"low\")\n",
" v += {\"high\": 2.0, \"medium\": 1.0, \"low\": 0.0}.get(conf, 0.0)\n",
" return min(v, 10.0)\n",
"\n"
]
}
],
"source": [
"from agentic_architectures.architectures.lats import _LeafEvaluation, LATS\n",
"import json\n",
"print('--- _LeafEvaluation schema ---')\n",
"print(json.dumps(_LeafEvaluation.model_json_schema(), indent=2)[:500] + '...')\n",
"print()\n",
"print('--- _composite_value source ---')\n",
"import inspect\n",
"print(inspect.getsource(LATS._composite_value))"
]
},
{
"cell_type": "markdown",
"id": "f9010641",
"metadata": {
"papermill": {
"duration": 0.0,
"end_time": "2026-05-28T02:23:40.045033+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:40.045033+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 6 · State"
]
},
{
"cell_type": "markdown",
"id": "81979bd7",
"metadata": {
"papermill": {
"duration": 0.0,
"end_time": "2026-05-28T02:23:40.054972+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:40.054972+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"| Field | Set by |\n",
"|---|---|\n",
"| `task`, `max_iterations`, `branching` | caller (via constructor) |\n",
"| `nodes` (dict[int, `_Node`]) | `_init`, mutated by every `_iterate` |\n",
"| `next_id`, `root_id` | `_init` / `_iterate` |\n",
"| `iteration` | `_iterate` |\n",
"| `best_leaf_id` | `_iterate` (recomputed each iteration as the leaf with max value) |\n",
"| `final_answer` | `_finalize` |\n",
"| `history` | `_init` + every `_iterate` (`Annotated[..., operator.add]`) |"
]
},
{
"cell_type": "markdown",
"id": "eca3e2a1",
"metadata": {
"papermill": {
"duration": 0.006454,
"end_time": "2026-05-28T02:23:40.061426+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:40.054972+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 7 · Build the graph"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "86debf99",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-28T02:23:40.061426Z",
"iopub.status.busy": "2026-05-28T02:23:40.061426Z",
"iopub.status.idle": "2026-05-28T02:23:42.648964Z",
"shell.execute_reply": "2026-05-28T02:23:42.647820Z"
},
"papermill": {
"duration": 2.588602,
"end_time": "2026-05-28T02:23:42.650028+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:40.061426+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAMMAAAGwCAIAAACb3EYmAAAQAElEQVR4nOydCXwTVdfG70zSdF/ovkIpO7IU2VFboAVfWWQVECjI8oIgm4IIIqKIqAjKq4iggAiKiIBsH4LIIrLJXkA2KW0p3YDuS9okM/OdJG0aQroknWlnmvOHX36TmTuTNHly73PPvXOPnOM4giDVRk4QhA9QSQg/oJIQfkAlIfyASkL4AZWE8AMqiTcyUlRXTuZkpBQzGqJSMazq8cM0ISwhFEc4yrCPognHajd0e3X7Ke0To7M4wlKGYsYnas9iTV5CdxlzUZ2yK+guaNhv50TbySl7J1lAQ8eOUR5ERqyGwnhSNUlLUB/empb9sJhlOZmctnekHV1kNEWpihjjYhRNcSxHySiO4Ux26rZAAY9v6JFRhOHMKAm+cq70XOPCUMz49FJdGl6o7BV1KJxkjIZTFbHFSpZRsXIF7d/AYcCUQGI5qCTryctgt32eWKRk3erJn+rm8XRPdyJxjm1/dDc2X1mo8Q50GD472KJzUUlW8utXyclxyoAwpyHTrPkFi5nCTGbH18n52Zpu/bzaRlb154FKsoYNixLgc5uwuCGpu8RfKzy4OdWvgeOgqVX6qaCSLGbz0nse3nb9JwUQG2D9uwmtu7l3+k+9Skuikizj2wXx/vWd+k/2IzbDhkWJrh7yl14PqrgYTZAqs/H9RN8QB5uSETD+/QY5GapDWx5UXAyVVFUObnqgUbMDXrWJRs2EiUsa3r6Qq8ypqAwqqarcic0d9VYDYqs0Dnf7YVl8BQVQSVXix0/ue/jYO7ra7sf1fIwvVMkX/8gurwAqqUpkPyh6cWJdixtZSoMWzhePZZV3FJVUOQc3pds7yFy9qzEoZTnz5s3bvXs3sZxevXolJycTAegzzr9YyRRkmu/so5Iq5/4dpX9DB1KzXL9+nVhOampqVlYWEQwHZ9nRnWlmD2E8qXJWz4nrOyG4QQt7IgAnT57ctGnTP//84+3t3bZt2+nTp8NGhw4d9EddXFyOHTuWn5//ww8/nD59Oi4uDo5GRkZOmTLFwUEr7rlz58pksoCAALjI5MmT165dqz8RyqxYsYLwzZ41qVkPVGPfNdPzwDqpEtITVZzWIggio5s3b86cObNjx47bt28HTdy+ffu9994jOnnB48KFC0FGsLF169aNGzfGxMSsXLkSyh86dOibb77RX8HOzu6Ojs8++2zo0KFQAHZCsyiEjIDARo7FStbsIZyfVAlp8UqZnCLCcPnyZahaxo8fT9O0v79/y5YtQRNPFhs9enRUVFTDhiXDfLGxsadOnZoxYwbRTkKhUlJSNm/erK+ihKZ+E6dzv2eYPYRKqoTcLA0tWMUdHh5eVFQ0a9aszp07R0REhISEGNo1Y6DigaZt0aJFUGlpNBrY4+npaTgKCqsZGQGuPgoNi47bKhiGFc5JNm/e/IsvvvDx8fnyyy8HDRo0depUqG+eLAZHoTmDArt27Tp//vy4ceOMj9rbC9LymoeGf1Q5R5AKcfNQUEL2Sbp16wZ+aO/eveCQcnJyoH7S1zoGoEu0Y8eO4cOHg5KgBYQ9eXl5pJYozFCXIyRUUmX4N3BgWaGkdOHCBXA8sAHVUr9+/WbPng0qgZ68cRm1Wq1UKn19ffVPVSrV8ePHSS1x719leZ4RlVQJgU3sNRouI1lFBADaMuiy7dy5E4JA165dgz4aSAq69NBggXTOnDkDbRmY8dDQ0D179ty/fz87O3vx4sXgrnJzcwsKCp68IJSER+jcwdWIACTdKbRzMK8ZVFLl2DvKLhwVJNwHnTJos5YvXw6B6UmTJjk7O4Mfksu13SDo0J07dw5qKaiQli5dCp4aOvkDBw7s1KnTtGnT4Gl0dDT02kwuGBwc3L9//zVr1oC1IgKQkVzk42/elmFksnJ2fZ3y8H7xfz+sy3Ntq8iq1+8Mei04qLGZriLWSZXTf0JgUaGG2DxHfnkos6PMyohgPKkqyBTE2U2+7fOkYa+HlFemR48eZmt3hmHA6ED80OxZ0Kv38PAgAgAxT+gGmj0Enh0CVGbfUlhY2IYNG0g5/Hsht0VHt/KOYutWJQoymQ0f3J3+eZPyCjxpWapCYKCAM1XKe0swigfDeWYPgUUzdBJNOLE38+pfWVOWNSLlgEqqKts+v6/MZ8YutNFpk1/Piev5ckCz9k7lFUCfVFWGvR5cVMgc++UhsT02LUnwCXGoQEYElWQRkz8Ku34uLy62kNgSv6xMZhlq6MxK7lLC1s1iVr8Z1/l5n/bRbsQG+PHjJCdX2aDXKvdzqCRrWD03zi/Eccj0Oj6z+7v3EuR2VMyCKllDVJKVfP9BYn62ukO0V+cXKr/TWXLsW5eWeLMgtIVz3wn+VTwFlWQ9Zw9knfsjUy6HYJ1j75cDFM5E6iTfKT6x52FGSrGji2zojPqunhbYaFRSdTm5J/P63zmqYoZwxMXDzslJ5lxPJpPRKtVjK3HRNGGfmLZKyylWY7rQlslaXCUlZYRliGnJJ1boAmRywmieuIrJUnGGwgoZp+EK89nCHHVhgQbejKun3TP9fRq1qaibZhZUEm/8tetRenxRbo4Gvl2W5TSqxz7YivRh7pAJIBpGw0KwnBjFps2eKJNRDMNpv1eq8knDcjtOZidzsJe7espCWzq3fs76bgQqSTK8/fbb3bt37927NxElOO4mGTQajX7CiThBJUkGVBLCD6gkhB/UarWdnR0RK6gkyYB1EsIPqCSEH1BJCD+gT0L4AeskhB9QSQg/oJIQfkAlIfwASkLHjfAA1kkIP6CSEH5AJSE8wHEcwzAyWY2uLm8RqCRpIPIKiaCSpAIqCeEHVBLCD6gkhB9EPhGAoJKkAtZJCD9AFMDPT9SZnFFJ0gAqJJMV38UGKkkagJJMslCIDVSSNEAlIfyASkL4AZWE8AMqCeEHmUzGMAwRMbiKsmQAMYm5WkIlSQaRN3DYukkGVBLCD6gkhB9QSQg/oJIQfkAlIfyASkL4AZWE8IPIlYQ5AsROeHi4Piez/pvSb0RERKxcuZKICYxxi53OnTtTOmgdsOHj4zN27FgiMlBJYmfMmDFeXl7Ge5o3b96uXTsiMlBJYueZZ55p1aqV4ambm9uIESOI+EAlSYCYmBhPT0/9dlhYWNeuXYn4QCVJAGjL2rRpAxvOzs6jRo0iogT7bjVEThp38XhWUb5aUzphjaIpji358CkZ4UpTBhrvNxzKy8u7+s9VB4V9+47tOeMZb0bZAkvTWprJL6lwoF08FM/09ySCgUqqCTYvvVeQrZbb04yasEzpBw7tgSH3KMURjtJrgKNYijNqK/SHtEdYSp9GkqOePEoMyUyN9hiwU8AXTWk0TEhj536Tqpok2SJQSYKz6cN7Di6KF14R5PuzCGUu2fftvWYdXYSonFBJwrJ5SZLcQd7vvwFENPz0SUKjNs5RI3wIr6DjFpCHiar8XLWoZAS0fa7e3Sv5hG9QSQJy+US2wkF0K0O27OauVnP5mYRfUEkCUpjDsKI0D+D6C/JUhFdwLoCAaDgNoxajkjgOtMTz3XOoJIQfUEkIP6CSbBFtXJwi/IJKskW0QzJ8SwmVJCC6+WlEnHCE564AKklAOMODDYBKEhIYimL59iNiBZVkm1C6iQd8gkoSEO38fZH6JP7bXFSSgLAwVsIScYKOW0pQurvTiG2AI7hCY5kdWfTe3NlzplRcZsfOrVG9OpFqAHHJkumX/IFKEhCOWDyPMCIiqlevPhWXadmiVczoifrt+Pi4ESP7EQvR3ceLrVudJqrn85WWadGiFfzXb9+6fZ2IA6yTxIWhdYPKpkdUhxs3/1n47hzYGDaiz9drVuoXUja0bt9tXPPJsvfT09OgQH6+ZdMgsXWTEjRNrHbc+ryAKz5bEhX1n98PnF4wf8m2X344euyQcZlxr7w6YvgYPz//o4fPu7i4EEvgvXVDJQkIy1Z3zD0yIrp7ZDSoqm3bpwMDgm7fvkF4QbfqCeEV9ElCoo0kV+un37RpC8O2i4trfn4e4QXeI9yoJEGhq21HaFqoRgMjk1KC5cQb4yY4001CUPzH//iD79g7Om4B4TjB73AODq6fkfHoxIljFi1BSQkwFwCVJCDab0vgcbcunZ9t3Sp84aI5hcrCqp+lDb7zXSnhugACsuOrpIdJmlHzGxKRsfG9f4fMDA4MdST8gT5JUMTrk3Cmm7QQb42PUQApoeu7iVJKGOOWFlpby4myedOvEccrqCRBEW9vBn2StBCv40afJCV0OsI7J5FqA6Nu4vRJ2rWaKWzdkGqjW8QbWzfpQGvvnRRr64ZzASQEq713UryWm19QSQg/oJIQfsBZJQJip5DZO4qxdZPLZQDhFVSSgPgEOmiKidjIz4UHzi9EQXgFlSQgXft6MgyTfldcajq9J9XZ3Y7wDSpJWLr28Tn8czIRDZn3mfQE5Zh36hO+wTmTgnPtVO6pvRm+wc4hzRzl9jTLlizOz5lMzTWMrJRulCUB1E3j1aV+KxkuKylilBXOOFucdkdJ1kHdIYqiZST3kebejYLcrOIpSxsRAZKpoJJqgoTryr92PCws0GjUnHE+SQOlitBvPxE1NIis9Ji+fIkWqdL9XOn+x68AO+V2NPx397Yb9nowEQZUkmRYsGBBZGRk7969iSjBeJJk0Gg0crl4vy9UkmRAJSH8oFarUUkID2CdhPADKgnhB1QSwg+oJIQfwHHrF58UJ6gkyYB1EsIPqCSEH1BJCD+gT0L4AeskhB9QSQg/oJIQfkAlIfzAMAwqCakuUCHxfocav6CSpIHImzaCSpIKqCSEH0QeliSoJKmAdRLCD6gkhB9Ylm3WrBkRMagkaQAhgJs3bxIRg0qSBtC0WZTBreZBJUkDVBLCD6gkhB9QSQg/oJIQfgAlMQxDRAyuDigZIBAgZjGhkiSDyBs4VJJkELmS0CdJBlQSwg+oJIQfUEkIP6CSEH4AJanVaiJWUEmSAeskhB9EriTMESB22rVrR+mAbwoeWZaFjfDw8I0bNxIxgZFJsdOiRQuiTWJD0TQNjzBm4unpOX78eCIyUEliZ/To0S4uLsZ7GjVqFBERQUQGKkns9OnTJzQ01PDU2dl5+PDhRHygkiTAhAkT3N3d9dshISHR0dFEfKCSJEBkZGTTpk1hA5o5cVZIBKMAfJF0S6nMZ/X5JHW5+rjSDYCjKZrlSjJCanP46TpiRrkldf2yknx/pWklS7NI6o/2iZhU9NDTxdm5iX/krQu5cDGq5CKGxJSle8gTh0ozT5LSVynbaQQtp3wDXDz8rUwmjlGA6rJnTVrK3UKiXd/oiXySZQlJy75v06Mm36jxU7NnmZYp2zZOXPkYZrJYmlOSnTZBqsKebtfDq32UG7EQrJOqxW/r0x+mFPd4KTiwKc9J02uL2GPZ5w5m+DVQBDd2sOhErJOsZ9vnKcp8ZvCMEFLn+OmT+C69fdp0d6n6Kei4rUSlJI9SlHVSRkCTdu5n/3ho0SmoJCs5sTvD3qnOeoMOURHINQAAEABJREFUvT1VxSxRWXAKKslK8nOLudLuWN2Eo5LvKateHB23lajVjMaSn6zkYBiWs+SeKFQSwg+oJIQfUEkIP6CSEPNoY+CUBSMnqCTEPJwubF318qgkhB9QSQg/oJKsRE5TtKgz0lQb3bSUqoNKshKGowhbpwe/y5nSUh6oJCuBoRKWs3JSmCTAvhvCD5b23XAEt+ZY9N7c2XOmkDoKKqnmiIiI6tWrj377/cXz9v+2m/BHfHzciJH9SO2BrVvNEdXzecP2rVvXO3bsSvjj1u3rpFZBJdUc0Lrl5+etWP51j6gO8PTT5R98vebzvbuPwfaBg3v37N0RH3+nYcPGPXv0HjL4ZUrndgcMihozeuLxE0euXLm0e9cRiDz8sv2Hs+dOJyTEeXl6d+sWOX7cFAcHh+82rtm0eR2UhytPnfL6S0NHZWZmrP76s2v/xBYVFYFk4SIhIQ0sebP6OwYscNzYutUCB/afhMc35yzUy+iPwwc+WfZ+0ybNt/ywZ+KE17bv2LJq9Qp9STs7u337f23cuNmny75ycnTa+evWLT9tHD4sZumHKydPnnnsz0Pfb/oGio175dURw8f4+fkfPXweZMQwzOuzJ1+OvfD6rLc3rPu5nofn1NfGJqfcJ5aguyEFHbek2L9/V5s27WbNnFevnufT7TqOG/vqrl3bsrIyie7ONTc39+mvzenQvrNcLh/20uh13/zUPTK6XXiH557t0aN777PnTj15watXL9+7l/D2/A86d+rm6ek15dVZbu4eO3ZsIUKCrZv1UHyEk1iWhTZoTMx/DXvatesIO69cvRQZEQVPmzVtaTgEVdS586c//mTRnbjb+rWUQHxPXvPqtctQEkRZ+j6p8LbtY69cJEKCSrISmiq7WbY6qFQqtVq9fsNq+G+8X18nAQpF2Z1033z7JVRg0K517NAV2rJ1678y2wEENwbX1LsxAx4e9YiQoJKshOV0t7xWG/DLTk5OvXv1jdDVQAYCA4JNSnIct3ffjqFDRvbrO0i/BxRj9ppeXt6Ojo4fLvnceKfMwmHCJ+7RrQRUUu3TqFHTvPw8sD76p1CdpKYm+/r6mRSD/Uql0tvbV/8UKrNTp4+Xd0Eo6evrHxRYIseU1GQPd8vqJLO3gFcAOu5awN7e3sfH9/z5M5cunwe7898J006ePAbtFNgjMMuLP5j/xpxXQSgmZ0EzV79+6G8H9kAvLCcne9nyxa1bhefl5RYUFMDR4OD6GRmPTpw4lpSU2P7pTp06dVu+/IP09DQouWv3L69OiTlwYA8RElRS7TBq5PiLl84tfHe2skjZunX4N2t+hIjRoCG95sydWlCQv+SDz0BtT561cMFSB3uHV8YNHT1mIMhl4sRp8HTQkOjUtJQunZ8FYS1cNOfwkYNQ8qMPV0ZGRi9eMn/g4GiIHURHvzB48AgiJLgugJXs+CrpYZJm1PyGpI6y8b07g14NCm7mWMXy6JMQ86DjRvjBUseNSkL4AZVkJTSMptbp7gpFLKuUUElWwjK64GTdhSOWGSVUkrVQ/MS4RYtuHrcF5VFJiHl087gtKI9KQvgBlYTwAyoJMQ9GJhF+wMgkUjugkhB+wFklVqKwkysUdTmeJJdr81tWvTwqyUqU6ky1pi6vx80RKqChBclYUEkWo1ars7Ozj9/4mlFxqjq6JPepvRn2jjSxZOY3KskCMjIy5syZk5OT4+TktGrVqqDGzru+SCR1kcRrOdEv+1l0Cs6ZrBJ5eXmurq5ffPFFmzZtunfvbth/al/GP2fyWnbybBNpcUI0EaLKJ2cOPEi8mTdmQaiLh4X3oqCSKoZl2eXLl0MlNG3aNLMFjm57dOdKvrqIYZiqf5LlBGvK2V1BkLD87ICcpXd20jLtWgROLrJeI/2DmliW3I2gkioGLFFiYuLFixeHDRtWaWFljrk0H2UpRHXfq3E6UaP0pUSfzLR0RT6uNO9pySFdHtNPly/r1LFzZPfIEvkYZMcZXaFsu+TlXhr2UtMmTWPGxDRv1rzspR97LapkxS2ZzNGCfG6mYDzJPIcPH54/f/6pU6ca66jKKY7uwq5gWqDKVrgQRzcLXqWoqEjmwBw79fu12xe7des2YcKEkBChEtJhnWTK9evXW7ZsuXv37v79+9Nimhap0WhoHRadNX78+NjYWKh3oJkOCgqKjo4eN26cmxv/rg77bmVkZmaCeh4+1OZaHDBggKhkRLShQrkVb8nb21u/AeempqZu3rwZlLRx40bCN6gkLdCKwSN077/55pvIyEgiSl5//fVz584RCwkONl1fID4+fv369YRv0CeRJUuWQG0ENqJhQ1HfBllYWGhFnRQWFmZvb2+4NxyaOehAEAGwXZ/077//Qm0fERFx+/btpk2bEtEDHUlo4CgL+/YXLlx46623ICgP2+7u7tCTIMJgo63btWvX3n33XX2nTBIyIrpluCjLF/9q3769g4MD1BcBAQG//fbb0KFDiTDYlpIKCgq+/PJL2PDx8fnpp58CAwOJdBg7dmxcXByxCqiZ9u7dq1AoILQBXpAIgG0pafLkyY0aNYINPz/LBpXEgFKppKxakHDfvn2GbaiiJk2aRATAJnwSdFVAOv361ebC59XHOp9kls8++2zKlCmOjlVdh6Qq1P06adeuXdBzkbqMiLU+ySxRUVHlDSNaTZ2tkw4ePLhz5861a9fCyKpFc/9ECwRLN2zY4OXlRfgAPhaiHWrj7ZOpg3VSRkYGPF66dGnZsmWE1w+rdoHuArRuhCfgY4E4p35lQV6oU0pKT0+HoQB4hO158+ZB+ITUIfbv38/vXwSjuaNGjSI8UUdat4SEhNDQ0AMHDsAgZevWrQlSNZKTk6FLWMXJDhVTF5T05ptvgnuASojUaWBA8MiRI7w31qAkiDNV/7ISbt1gsAx+UuAc+/TpU+dlRHTjbkJ4PvBezz77LKk2Uq2Tjh8/DiOvP//8c716wiZREA8QyzDOPMEjMPJ4/fr1gQMHkmogPSVB9/7555+Hrlm7du0IIhqk1LoVFxd36dJFX8Pbmoygu967d28iJBCrrE5QQBpKgjg1WCKWZU+cOBEdHU1sDxgqgT+fCMn06dMXL15MrEUCrRuM3ufk5Lz99ttimw5bwwjnk3hBvEo6ffo02MAJEyZkZWXZjq2udbZv396zZ09PT09LTxTprzwtLW3z5s19+/Yl2rSKKCNtCLFmIh3du3eHTjGxHJHO4/b29l69ejVBSoGm49atW0R4YEAmJiaGWI5IW7erV686OTnpZ6UhRKckjUZjZ2dHxIpIW7ejR4+ePHmSIKVQFFUzMsrLy5s/fz6xHJEqqU2bNlghGQM+qSprE1Qf6CFadxuTSH2S8coyCNG1bhBSIsLj5ub28ccfE8sRqU8CdwmBuBYtWhBEB/okK/n777//+OMPgpSCPslKmjdv3qxZM4KUgj7JSjp16kQQI9AnWcndu3dhrA3njRhAn2Ql165dM75zFEGfZCUQTAoPDydIKeiTrOQpHQQpBX2SlcBP8P79+507dyaIDvRJVhIXF7dt2zaClII+yUqCg4OxQjIGfZKVhOkgSCnokyyjf//+8OPTr+0Cb0y/AQNwly5dIrYN+iTLGD16tIuLC6WDpmn9RpMmTYjNgz7JMoYPH96gQQPjPaCnOrCIVvURv08SneOGz8vZ2dnwFIQl3HKtEgJ9kjWMGTPm+vXrRFelT548eeLEicTmQZ9kDTExMfolpyAWMHjwYIKgT7KOXr16hYaGwmcXFRVlxS18dRLx+6Sqtm5Hf3kUF5unKmYZjS69XFmqw5K8c4ZOu/mXMZ810SjHokmOxHKyL5Z/9IldFV/B8MYoSiYn9o7y9lFebSOqkShPYGDsaPr06b/++isRGHBj165ds2I+T5WUdPTnjDtX8hq3c2/e3kMm061zUCoNo+SH1OMJFclj24bUi8b7zZcsff5k+bKnTxx98pQnC1PmCiiIMpu5eSYn8XZB1z5erZ91JaJE/D6pciX9/HmyMpcZMqs+qev8uDS+XXePzi/Y9M3j4JOWLl360UcfEQupxCdlpnGZKcW2ICOg5/Cgy0eziCiRfDzpxJ50R1dbyQEX0EhBycilIzlEfIg/nlSJSpQ5jJ2CnwwHkoCWkYcpxUR8BAUF1cw0G7Bi1k2fr6ROKipSq4oZYjOoVZxGrSHiA+NJCD/g/CSEHyTvk+QKirWhxk2PGH2h5H2SRusbbCnlMkUoUXYw0CdJDY6I9mYb9EmSQ4yVkuR9EkUTmrOheJIOMdZJkvdJ2tpelJ+sQFBanyTGX47kfRLHERvI3V0Ghz5JIJ8kk1NEjCFf4aDQJwnikxgNJ3DeFbHBoU8SxCdRYo2v2BrSjydRFtf2d+/eeWve9F7Pd/lxy3c7dm6N6mX9On9wqR5RHa5evQzbi96bO3vOFCI0lCjbtjrgkzjWYgN6+MiBK1cvvb9oWVhYk6ysjJjR/NxjFBERpVariNCItacqeZ9kBQUF+f7+gd26RcC2v39AixatCB9E9XyeCI/+/nEiPsTvkypREi2z7HOdMWuivjGCVmnihNccHBxXf/3Z4UNnYc/AwdHjXnk1Jyf7+03fODo6duzQddprc7y8vOFQfHzcnr3bL146l5aWEtogrE+fgQNeNL3vFlq3/Py8Fcu/Pnnyz3fenW1ydPP3O4OD62s0mvUbVp/5+8SDB2mtWoUPGjCsSxfLsk5zOoj4qEmfJMg8bvhYWUs+2S9WrgMRhIaGHT18ftTIccaH4IP4+edNNE3v+vXw99/tuHrt8sbv1+oPfbV6xblzp2fOeOvjj74AGf3vi0/O/F1u+ptWrdp+tmKN4X+jRk38/QK8vHy0r/7lsu07tgwaOHzLj3sjI6IWvT/3z+OHiUWgTxLMJ/EZmQwKChk9arx2y8UV6qTbt2/o9y9c+FFhYUGAfyBstwvvcODAnrPnTnXp/IzZi7i7e0AZ/fbuPduTk5NWffEdVHLFxcUHf9838uVXXuw/BA71eWHAtWuxmzZ/C5IiVQd9kkA+CVo3Hmv7pk3L8pC4urqBoyp5wnE7d279++zJpKRE/Y6AgKBKr3bnzu1VXy1f8PYSqJaINun9Dfg9gUANBcLbtv/twJ6c3Bx3N3dSNThxxiXhe5LLQ0JCiPAI5ZNYhk/bYNbMsiw77+2Z0C/778Rp4eEdXF1cp8+cUNmVSG5e7jvvvjHgxZe6R5ak6gYXBY9PnpuVmVF1JVH6W0DFB1jApKQkIjxW+6TK5gJQgod8b/978+bNf5Z/urr90yWRJ9CEj7dvxWctWfK2n1/AlFdnGfZ4eWut0uw3FkAbalzS19efVB2xjlj7+fn973//I8Ij4PwkSuDJcNCbg0eDdBIS7sL/hqEVpQnc8tPGu/F31n+7VSaTGXYGB9W3t7cnOqel35OVlQk1qpOTE6k6YnXc0OjUr18T969a7ZNqfy4AdPvBBPy8bTM0WPfuJXy56tOOHbqkpYaAYEoAAA4pSURBVKeWVz429uK361aNGD4GxHTp8nn9/wcP0kExr4ydDBYbwhDww4Je25y5U1f+z8IPRayOOz09febMmUR4hPJJlIxQAs908/PzB9cMQaYBA3tCw7Rg/gcZmY8Wvjtn7Lihixaa0QF00Ig2cPCZ8U4ITQ0ZPALk1ahR0y1bN168eNbZ2eWplm1mz36H1Amg43bv3j0iPFb7pEq6Zt9/kMCyZOisUGIb/LA0LrSl0wtjA4jIACWlpqbWQAOXkZExcuTIgwcPEgupLAogJ5xN3aXEaYcaifiQvE9iNTamJLE6bvH7pEqUJJMTmralCUpiddw16ZOsm59U6ZxJiBza0ERu0f6pko8nQYVkSzcEiLRpI3XBJ7GcOB2oYHBElK255H2SLpZkY7WSKH85kvdJchnFiLfKFwCx9t0k75Ns7y4lkXolyfskWk7JZDYWBRBlYy55n8RqOIaxKZ8kUsTvk+hqHkdqBsn7JLkc4km2tIqynJbLZUR8SN4n2TvaCz7VTUzQFOVST0HEh+R9UvN2LgW5wt/5Kg4YFVGr2G79xJi3RPI+qdVzLgpH2eEfHxAbYMeX9/yCHYgoEb9PqtJNSJsW37NzkPebGEjEaCF4IPsBc2Rbipefot9EP2LbCJvfDdjycVJ2hkompzXFppFK07RpNEdYqmynyd0pUAkaX8DoKEXr3gxn5lBZUsLHD+leSTtOxpWXLU53RwPHlr0xkwK0DP7TLMv61XccPC2QiBXwSUuXLq2Zask6qrrCxMh52lt/Lh/JLix4bI03/YIMxnd86785I3k9JiWtXIwGtoxVSMlo7Veu+9qLi4qP/3W8V69eJcdALRwxEqxBCNr9lG5FP3gjhl+F8WW1QmG016RpyjBDxlAYOmtOrvI2z7oRcSP5edy1RVpa2sSJE/ft20cQHZKfx11baDQauRzXCi9D8vGk2gJ+gmLO+VrzSD6eVFtgnWSC5Ocn1RaoJBPqwroAtQIqyQT0SVaCPskE9ElWgnWSCeiTrASVZAL6JCtBJZmAPslKUEkmoE+yElASOm5j0CdZCdZJJqBPshJUkgnok6wElWQC+iQrAVuASjIGfZKVYJ1kAvokK0ElmYA+yUpQSSagT7ISVJIJ0l8XoJbAuQAmoE+yEqyTTECfZCWoJBPQJ1mJk5OTi4sLQUpRKpU0XRNfVlFR0bJly4jliFRJ+fn5hYWFBCklNDR03rx5NZDAdM+ePcQqRNqCQNMGDRxBjPD19a2B21wjIiIGDx5MLEekdRIq6UkoiurSpQsr8AqyPj4+1jlUVJKUmDp16u+//04E46233jp27BixCmzdpMQrr7xCBCMnJyc7OzsqKopYBdZJEuPSpUuZmZlEANzd3deuXUusBZUkMXJzcz/88EMiAMePH6/OZ45KkhiRkZFt27YtKCggvAL268CBA9WJBqNPkh5jxowhfAMBvNdee41UA6yTpAfEu60LQ1cAxJCCgoJINUAlSQ9HR0foZPEYDti/f/+5c+dI9cDWTZIsWLAAxnQJTyxatAiVZKM4OzuHhYURPoCYwu7du0m1wdZNqsBQ6+eff06qjaenZ2AgD8tHo5Kkyosvvnjo0KFqDsPFxsbOmjWL8AG2bhIGnDKpHvv27RsyZAjhA1SShCkuLr5z585TTz1FrAWcO+EJca3sPnz48Ly8PJqmi4qKIFamn5EDn5egA+CSZurUqTCs26lTJ2I5KSkpFEUFBAQQPhCXT3ruuecePXqUlpYG8RKok+BPTU1NxWm4FTBz5sz79+8Tq4BYOYSmCE+IS0kxMTGhoaHGe8BR9ujRgyDl0KxZM+umON64cWPKlCkeHh6EJ8SlJHd39759+8pkZcm/IIQ/dOhQgpTP33//febMGWIhLVq04Mtr6xFdFGDYsGENGjQwPI2IiOCrIa+rQLX0zjvvWHRKVlbWpk2bCK+ITknQcr/00kv29vaw7e/vD9sEqRBooVatWgX+suqnrFu3zsGB5+yaYoxMgnr0Udf27dub2CbELM2bN/f29q56+a5du0LdT3ilWlGA66fzb13MyUhXa4pZRqNLD2gUcS1L9lj6UiW5I8tesHSPSRlt4kl4X5wuCyE80sT0PXK6ZICmCS2NkwUaQ8vhB8PRNFE40r5B9m2eqxfSTKT5bq1mxowZb775ZkhICKklrFESqyLbV91/lKrN2U3LZQp7ub2LncwOvm/4YhlDsdJv2/BSutyQum+7tIBut9GVOZ00KFKWNVKfS9JESWWKeTxzKpxJmf1rZFCMVhdqipUqTbFGo2JAnEGNnAZOEW+6UkvZuXMnRCnnzp1baUnQHBQLDg4mvGJxjHvrivuPUortneyCWvi4BzgRaZIel5OakL3qjTv1mzu9OKku6KmKsYALFy5AoI53GRGL6qT7t4r2rkuxc5Q37lqtyXXiQZmlSohNldHUpI8aEukDptvOzg4iKaQ2qKrjvng4Z9fa+wHNfeqMjADHeooW3Ru4eDt/NftObgZDJA6ML02YMKGCAgzDJCQkEGGokpJunCs4/dujVr0aegRKtTmrgMCWXs2ebbBpaUJeprTFBP3c3r17x8XFlVdg/fr1wo1gVt66nd6XdfmvLPjtkrrOtT/iX3m3sUvtNA41wQcffPDGG284OzsTAaikTsrN5C4ezbAFGQHBrfw2Lr5DJM6WLVuKiorMHlq4cKFAMiKVKunHj+96hfA2yCdyPPydnFzsNy5OJFIG3JLZkRBo11JTU4lgVKSkfetSIWDj36wesRnCugQW5Khvns0jkmXs2LFPDgzEx8d/++23go5gVqSkxBsFwc19iI3h7uN6/NeHRLLAkCX4bpOdBQUFy5cvJ0JSrpIO//QIBhlc/UXaWbt89Y85CzvnF2QRvglu661Rc0m3i4hkuXXrlsn6ta1atTKeYSEE5SrpzpVc53q8TaiTFnYO8pN7JVwtNWvW7PLly4ZwwJEjR3ifQ/Ik5SpJVcQENLNgeLku4eLtlKEbVZQumzdvNgyJrF279plnniECY37c7fKf2RRN2zkINeck4d6V34+uS7p/3cW5Xotmz/buMdHBQds7PXnml0N/bpgy/utNW+enP7gb4Nc4otvLHZ/upz9r34Evz8fut1c4tWvzvK+3gMucBzTxfJSYTaQMDJtkZGSAZ4K49nfffefkJLhLMa8VcAkyOUWE4VFG0tqN09Xq4mmT1o0d+Ulq+r9fb5jCMNp7kmRyO6Uyb9f/LR828O1PF59p06rntl1LsrLT4NCpsztOnd0+uO+bMyd/51Uv8NDR9UQ4aOiz0tfP5hMpAxHtbdu2KZVK49nMwmFeSYW5GlqwJfovxh6Qy+xeefkTP59Qf9+wlwYsSE69de3Gn/qjDKPu1WNig5DWFEV1CO8LIfjk1Nuw/8TpbW2eigJtOTm5QS3VOKwDERKZHfXofjGRMjExMbGxsQMHDtRPQBUa80rSMAxNCXUfHDRtIcEtnZ1LAp6e9QK8PIPjEy8bCtQPKrkV0MnRDR6VRXmgp0eZSX6+ZSP2wYHNicAU5gu+jLqgQPQI7NGMGTNIjWC+4qEJzRKhlKQsyk9Kvg59eOOduXkZhm3dVMnHKCouYFnG3r6ssVcohO1XwluQy4Rq32uMPn36kJrCvJIUDjSVI9TN1K6uXg0bhD/fc5LxTmfnigZOHeydaVqmVpfFeIpVwuai4Fji5I5pwSzAvJI8fBUPBesGB/o1uRC7Pyy0nSGlS9qDuz5eFfXFoJaq5xGQcO9qZGln9satk0RIWJYLbmSj4TTrMO+TGrd2ZdVCZTWAjj3Lsnt++1ylKnrwMHHfwVUrVo1MTa9kEL5tq+ir149CaBu2j/y1KfH+NSIYymxtfVy/BSrJAswrqcFTDlBh5KYL0oJA52vOtC0KO8eVa8Yu+2LY3YSLLw1cUKmDjo4c17n9gF37V4DBggrpxRe0q/4ItDwGBJPsHUW6spRoKXem2+al91TFdKMutnj/681j90JbOv1nrB9Bqky5v7wu//EqKpB2QMU6NIUQJWVRRpZSbvixydPOf+6kk64+CmltfvQNQs8rvhpl9pCjvYuy2HyA2N8nbNqkbwl/vPNhuRlbQBEymZk/MLR+m4kx5a7QmHglzctfQRALqWge9+3zBX9sTWsZFWr2KHxPObkPzB4CK61QmL/JlablHu6+hD8ys1LKO6RSFyvszIR35TKFm5v5nwdUSDdPJU1b0YggFlLJHQFbV9wvzOPCOtede1Ur5ubxe03aukSNsLn5fdWnkh7KiNnB6mJ12m1pD4xXkbvn0pycaZSRdVTe1538UVhGUvaDu9IeGK+UuDMpmqLiMe/YxF00QlDVu7lXvxlXL9A1oLkXqYvcPZ8mp5kxCwSc81TnsWBdgDVv3aXtZE2f4X9xglqEUbO3TyQ5uMjGvYu1UbWwbNWbrSuSMlJVzvWcQp/ms/9VW9w5mVxUqGrU1u2FsXXhz6ldLF4/KS2+eP/GVGUBo3CQu/u5+jaW2M3PjIpJu5Od97BQXaxx91SMWYgtGj9Yuabbg0TVsR0PMtKKGYbI5TSMyNJymuMoji27mm65rJIpPqVrr5UsxGVYik1XxHQaEByiS1fY0pc0LlZ2LqU/Yng53SnaZbuoshN1OymayGSURs2yGhauZKeQ+TdwHDglgEh+ApKIqHaOAA25dCI3PbGwqIBVaRjWaJohhJeZ0jlONE1YFh61Xy7Lli3FBvs5jpiszwYbcK5GXXYipV3dr/Ro6aKDcjuKZTjtUb34dCVBMQzDGb2i9hFK2jvIHV1kfvUVrZ+tuytI1CriyjaBSBeRZsBBJAcqCeEHVBLCD6gkhB9QSQg/oJIQfvh/AAAA///n5WZvAAAABklEQVQDAHwLZlzKpEpVAAAAAElFTkSuQmCC",
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"from IPython.display import Image, display\n",
"arch = LATS(llm=reasoning_llm, max_iterations=4, branching=2, ucb_c=1.4, max_depth=3)\n",
"graph = arch.build()\n",
"try:\n",
" display(Image(graph.get_graph().draw_mermaid_png()))\n",
"except Exception as e:\n",
" print(f\"(mermaid PNG render unavailable: {e}; see § 2 for the architecture diagram)\")\n",
" print(graph.get_graph().draw_mermaid())"
]
},
{
"cell_type": "markdown",
"id": "36d2a860",
"metadata": {
"papermill": {
"duration": 0.006763,
"end_time": "2026-05-28T02:23:42.662114+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:42.655351+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 8 · Live run — Game of 24\n",
"\n",
"Game of 24: combine four numbers with +, -, *, / (using each exactly once) to make 24. We use the same task as ToT nb 09 so the contrast is clean — ToT's flat beam vs LATS's UCB tree with backup.\n",
"\n",
"Numbers `[4, 6, 8, 12]` admit the solution `(12 - 6) * (8 - 4) = 6 * 4 = 24`. Other valid arrangements may exist."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "ad56d578",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-28T02:23:42.674750Z",
"iopub.status.busy": "2026-05-28T02:23:42.672744Z",
"iopub.status.idle": "2026-05-28T02:25:41.150514Z",
"shell.execute_reply": "2026-05-28T02:25:41.149470Z"
},
"papermill": {
"duration": 118.483939,
"end_time": "2026-05-28T02:25:41.150514+00:00",
"exception": false,
"start_time": "2026-05-28T02:23:42.666575+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TREE_SIZE: 9\n",
"LEAF_COUNT: 5\n",
"ITERATIONS_USED: 4/4\n",
"BEST_LEAF_VALUE: 10.00/10\n",
"LEAF_VALUES (sorted desc): [10.0, 5.0, 5.0, 5.0, 5.0]\n",
"LEAF_VALUES_SPREAD: 5.00 (spread > 0 means deterministic-picker reward is non-flat)\n",
"BEST_PATH_LENGTH: 4\n",
"\n",
"=== BEST PATH (root → best leaf) ===\n",
" [0] START. Task: Game of 24. Numbers: [4, 6, 8, 12]. Combine all four with +, -, *, / (each number used EXACTLY once, parentheses allowed) so the result is 24. At each step, perform ONE arithmetic operat\n",
" [1] apply 12 / 6 = 2; remaining [8, 4, 2]\n",
" [2] apply 8 + 4 = 12; remaining [12, 2]\n",
" [3] apply 12 * 2 = 24; remaining [24]\n",
"\n",
"BEST_LEAF_FEATURES: {'makes_progress': True, 'is_complete': True, 'avoids_loops': True, 'confidence': 'high', 'rationale': 'The sequence correctly reduces the numbers through valid operations to reach 24 without repeating states.'}\n",
"\n",
"=== FINAL ANSWER ===\n",
"apply 12 * 2 = 24; remaining [24]\n"
]
}
],
"source": [
"TASK = (\n",
" \"Game of 24. Numbers: [4, 6, 8, 12]. Combine all four with +, -, *, / \"\n",
" \"(each number used EXACTLY once, parentheses allowed) so the result is 24. \"\n",
" \"At each step, perform ONE arithmetic operation combining two numbers from \"\n",
" \"the current set and report the new set. Continue until one number remains. \"\n",
" \"The final answer is the explicit expression that evaluates to 24.\"\n",
")\n",
"\n",
"r = arch.run(TASK)\n",
"\n",
"print(f\"TREE_SIZE: {r.metadata['tree_size']}\")\n",
"print(f\"LEAF_COUNT: {r.metadata['leaf_count']}\")\n",
"print(f\"ITERATIONS_USED: {r.metadata['iterations_used']}/{r.metadata['max_iterations']}\")\n",
"print(f\"BEST_LEAF_VALUE: {r.metadata['best_leaf_value']:.2f}/10\")\n",
"print(f\"LEAF_VALUES (sorted desc): {[round(v, 2) for v in r.metadata['leaf_values']]}\")\n",
"print(f\"LEAF_VALUES_SPREAD: {r.metadata['leaf_values_spread']:.2f} \"\n",
" \"(spread > 0 means deterministic-picker reward is non-flat)\")\n",
"print(f\"BEST_PATH_LENGTH: {len(r.metadata['best_path_thoughts'])}\")\n",
"print()\n",
"print('=== BEST PATH (root → best leaf) ===')\n",
"for i, thought in enumerate(r.metadata['best_path_thoughts']):\n",
" print(f' [{i}] {thought[:200]}')\n",
"print()\n",
"print(f'BEST_LEAF_FEATURES: {r.metadata[\"best_path_features\"]}')\n",
"print()\n",
"print('=== FINAL ANSWER ===')\n",
"print(r.output)"
]
},
{
"cell_type": "markdown",
"id": "f68a725c",
"metadata": {
"papermill": {
"duration": 0.002998,
"end_time": "2026-05-28T02:25:41.156779+00:00",
"exception": false,
"start_time": "2026-05-28T02:25:41.153781+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"### 8.0 · What just happened, briefly\n",
"\n",
"Three signals to read:\n",
"\n",
"1. **`LEAF_VALUES_SPREAD > 0`.** This is the deterministic-picker working. If spread is 0 — every leaf scored identically — the reward signal collapsed and search is random.\n",
"2. **`TREE_SIZE` grew beyond `1 + branching`.** If the tree never expands past root's immediate children, the search budget was too small or every leaf hit the depth cap.\n",
"3. **`BEST_PATH_LENGTH > 2`.** Means search actually descended at least one node before declaring a winner."
]
},
{
"cell_type": "markdown",
"id": "67b2127e",
"metadata": {
"papermill": {
"duration": 0.003002,
"end_time": "2026-05-28T02:25:41.162774+00:00",
"exception": false,
"start_time": "2026-05-28T02:25:41.159772+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 9 · What we just observed\n",
"\n",
"The cells above ran LATS on Game of 24 with `branching=2, max_iterations=4`. We measure tree growth, reward spread (the deterministic-picker signal), and the discovered best path.\n",
"\n",
"### 9.1 · Search statistics\n",
"\n",
"- **Tree size**: 9 nodes (5 leaves)\n",
"- **Iterations used**: 4/4 (budget exhausted)\n",
"- **Best leaf value**: 10.00/10\n",
"- **Leaf values (sorted desc)**: `[10.0, 5.0, 5.0, 5.0, 5.0]`\n",
"- **Spread (max − min)**: **5.00**\n",
"\n",
"### 9.2 · The best path from root to terminal leaf\n",
"\n",
"| # | thought |\n",
"|---|---|\n",
"| 0 | START. Task: Game of 24. Numbers: [4, 6, 8, 12]. Combine all four with +, -, *, / (each number used EXACTLY once, parentheses allowed) so the result is 24. At … |\n",
"| 1 | apply 12 / 6 = 2; remaining [8, 4, 2] |\n",
"| 2 | apply 8 + 4 = 12; remaining [12, 2] |\n",
"| 3 | apply 12 * 2 = 24; remaining [24] |\n",
"\n",
"### 9.3 · Best leaf's `_LeafEvaluation` features (deterministic-picker source)\n",
"\n",
"| feature | value |\n",
"|---|---|\n",
"| `makes_progress` | `True` |\n",
"| `is_complete` | `True` |\n",
"| `avoids_loops` | `True` |\n",
"| `confidence` | `high` |\n",
"| `rationale` | `The sequence correctly reduces the numbers through valid operations to reach 24 without repeating states.` |\n",
"\n",
"These independent booleans/categorical fed into `_composite_value(...)` which produced the leaf's reward — **no numeric judgement was made by the LLM**. The reward came from Python composing the LLM's structured feature commitments.\n",
"\n",
"### 9.4 · Final answer\n",
"\n",
"```\n",
"apply 12 * 2 = 24; remaining [24]\n",
"```\n",
"\n",
"### 9.5 · Patterns surfaced in this run\n",
"\n",
"- **✅ Deterministic-picker reward is working**: leaf values have spread of **5.00** points. UCB1 had real discriminating power across leaves. If the spread were 0, the flat-scoring pathology would have collapsed the search.\n",
"\n",
"- **✅ Found a high-value terminal leaf** (value 10.00/10). Likely satisfies `is_complete=True` AND `confidence=high` — a strong solution candidate.\n",
"\n",
"- **Path length 4**: search descended 3 step(s) below root — the tree found a multi-step trajectory, not a one-shot answer.\n",
"\n",
"### 9.6 · The takeaway\n",
"\n",
"LATS only earns its complexity over Tree of Thoughts (nb 09) when **all four** properties hold:\n",
"1. **Reward has spread** — § 9.1's `Spread` value must be > 0 (deterministic-picker prevents flatness).\n",
"2. **UCB1 explores** — under-visited siblings get attention via the exploration bonus.\n",
"3. **Backup amplifies** — high-value leaves boost their ancestors, redirecting future descents.\n",
"4. **Best path is multi-step** — § 9.2 should have ≥ 3 entries, otherwise plain CoT suffices.\n",
"\n",
"When any property fails, fall back to ToT (cheaper, simpler) or Self-Consistency (even simpler). On this Game-of-24 run with branching=2 and only 4 iterations, the tree stays small but the reward spread + path depth show all four properties holding."
]
},
{
"cell_type": "markdown",
"id": "26c9fa85",
"metadata": {
"papermill": {
"duration": 0.002636,
"end_time": "2026-05-28T02:25:41.168429+00:00",
"exception": false,
"start_time": "2026-05-28T02:25:41.165793+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 10 · Contrast — what if the reward were flat?\n",
"\n",
"To make the deterministic-picker's contribution concrete, let's simulate a \"flat reward\" scenario: same architecture but with `confidence` always set to high and `makes_progress/is_complete/avoids_loops` always True. Every leaf would score 10. UCB1 would have nothing to discriminate on.\n",
"\n",
"We can't easily monkey-patch the LLM, but we *can* inspect the actual captured leaf values and verify spread is nonzero. If LEAF_VALUES_SPREAD > 0 in §8, the deterministic-picker fix is delivering real reward signal."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "17a98bc7",
"metadata": {
"execution": {
"iopub.execute_input": "2026-05-28T02:25:41.176594Z",
"iopub.status.busy": "2026-05-28T02:25:41.176594Z",
"iopub.status.idle": "2026-05-28T02:25:41.183713Z",
"shell.execute_reply": "2026-05-28T02:25:41.182696Z"
},
"papermill": {
"duration": 0.0126,
"end_time": "2026-05-28T02:25:41.185103+00:00",
"exception": false,
"start_time": "2026-05-28T02:25:41.172503+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Captured leaf values: [10.0, 5.0, 5.0, 5.0, 5.0]\n",
"Distinct values: 2\n",
"Spread (max-min): 5.00\n",
"✅ Reward signal has non-zero spread — deterministic-picker IS doing work; UCB1 had real discriminating power across leaves.\n"
]
}
],
"source": [
"vals = r.metadata['leaf_values']\n",
"print(f\"Captured leaf values: {[round(v, 2) for v in vals]}\")\n",
"print(f\"Distinct values: {len(set(round(v, 1) for v in vals))}\")\n",
"print(f\"Spread (max-min): {(max(vals)-min(vals)) if vals else 0:.2f}\")\n",
"if vals:\n",
" if max(vals) > min(vals):\n",
" print(\"✅ Reward signal has non-zero spread — deterministic-picker IS doing work; \"\n",
" \"UCB1 had real discriminating power across leaves.\")\n",
" else:\n",
" print(\"❌ All leaves scored identically — reward signal collapsed despite the \"\n",
" \"deterministic-picker fix. Inspect the trace to see why every feature came back the same.\")"
]
},
{
"cell_type": "markdown",
"id": "ea6fffdb",
"metadata": {
"papermill": {
"duration": 0.00351,
"end_time": "2026-05-28T02:25:41.190836+00:00",
"exception": false,
"start_time": "2026-05-28T02:25:41.187326+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 11 · Failure modes, safety, extensions\n",
"\n",
"### 11.1 · Where this breaks\n",
"\n",
"| Failure | Mechanism | Mitigation |\n",
"|---|---|---|\n",
"| **Flat reward** | Even deterministic-picker can collapse if LLM rates every feature identically | Add a stricter `rationale_must_reference_specific_state` validation; inspect features per leaf |\n",
"| **Shallow tree** | `max_iterations` too small or `branching` too high relative to budget | Tune `max_iterations` × `branching` to budget; consider lowering `branching` to 2 with more iterations |\n",
"| **Loop trap** | LLM keeps proposing same move; `avoids_loops` doesn't catch semantic duplicates | Add a Python-side de-dup check on the trajectory before scoring |\n",
"| **Cost** | branching=3 × iter=10 = 30 expand + 30 evaluate = 60 LLM calls per task | Cap; cache per-trajectory evaluations; consider cheaper LLM for EVALUATE |\n",
"| **Root explosion** | If root has no `parent_id`, BACKUP propagates value into root which has visits=0 initially | Guard: skip backup at root, or initialise root.visits=1 |\n",
"\n",
"### 11.2 · Production safety\n",
"\n",
"- **Bounded budget.** Never let LATS run unbounded — set strict `max_iterations` AND a wall-clock timeout.\n",
"- **Reward audit.** Persist `(leaf_id, features, value)` per iteration. If features are stuck (e.g., `confidence` always `low`), the LLM is hedging — switch model or strengthen the prompt.\n",
"- **Best-path explainability.** The full path is interpretable text; surface it to the user as the answer's justification.\n",
"\n",
"### 11.3 · Three extensions\n",
"\n",
"1. **Stronger terminal detection.** Instead of relying solely on the LLM's `is_complete` boolean, add a task-specific Python checker (for Game of 24: parse + evaluate the expression; for code: run unit tests).\n",
"2. **Per-branch budget.** Give popular branches more expansions automatically — track per-subtree spend.\n",
"3. **Memory across tasks.** Cache value estimates by `(parent_thought, child_thought)` so similar sub-trajectories aren't re-evaluated from scratch. Composes with Reflexion (nb 18)'s episodic memory.\n",
"\n",
"### 11.4 · What to read next\n",
"\n",
"- [**09 · Tree of Thoughts**](./09_tree_of_thoughts.ipynb) — the flat-beam predecessor; same Game-of-24 task for direct comparison.\n",
"- [**10 · Mental Loop**](./10_mental_loop.ipynb) — origin of the deterministic-picker pattern.\n",
"- [**15 · RLHF Self-Improvement**](./15_rlhf_self_improvement.ipynb) — multi-dimensional deterministic-picker pattern that LATS reward uses.\n",
"- [**21 · Self-Consistency**](./21_self_consistency.ipynb) — orthogonal exploration strategy (sample-and-vote vs tree search).\n",
"\n",
"### 11.5 · References\n",
"\n",
"1. Zhou, A. et al. *Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models.* 2024. [arXiv:2310.04406](https://arxiv.org/abs/2310.04406)\n",
"2. Yao, S. et al. *Tree of Thoughts.* NeurIPS 2023. [arXiv:2305.10601](https://arxiv.org/abs/2305.10601)\n",
"3. Browne, C. et al. *A Survey of Monte Carlo Tree Search Methods.* IEEE TCIAIG 2012. — original MCTS / UCB1 algorithm."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.0"
},
"papermill": {
"default_parameters": {},
"duration": 125.540431,
"end_time": "2026-05-28T02:25:41.986503+00:00",
"environment_variables": {},
"exception": null,
"input_path": "all-agentic-architectures/notebooks/22_lats.ipynb",
"output_path": "all-agentic-architectures/notebooks/22_lats.ipynb",
"parameters": {},
"start_time": "2026-05-28T02:23:36.446072+00:00",
"version": "2.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}