{ "cells": [ { "cell_type": "markdown", "id": "3ec309cf", "metadata": { "papermill": { "duration": 0.002938, "end_time": "2026-05-27T12:20:39.534793+00:00", "exception": false, "start_time": "2026-05-27T12:20:39.531855+00:00", "status": "completed" }, "tags": [] }, "source": [ "# 10 · Mental Loop (Simulator) — imagine outcomes before committing\n", "\n", "> **TL;DR.** Before acting, the agent **mentally simulates** the predicted outcome of each candidate action, scores those simulations, and only then commits to the highest-scoring one. The pattern: *generate K candidates → simulate each → score → pick best → explain*.\n", ">\n", "> **Reach for it when** the action is hard to undo (financial, medical, deployments, robotics), when \"imagining first\" is cheaper than \"acting and recovering\", or when stakeholders need to see the reasoning *before* execution.\n", "> **Avoid when** the action is trivially reversible — Tool Use (notebook 02) is cheaper.\n", "\n", "| Property | Value |\n", "|---|---|\n", "| Origin | Robotics \"world models\" + classical AI deliberation |\n", "| Reasoning style | Flat (1 layer of simulation per action) — vs. ToT's tree |\n", "| External tools needed? | No (the simulator IS the LLM) |\n", "| Cost | `1 generate + N simulate + 1 explain` LLM calls (≈ N+2) |\n", "| Structured output? | Yes — Pydantic candidate-list + per-action outcome schemas |\n", "\n", "The pattern is structurally similar to ToT (notebook 09) but **flat instead of deep**: ToT builds a tree of reasoning steps; Mental Loop builds a single layer of action-outcome pairs and picks the best. Mental Loop is the natural fit when *candidates are actions in the world*, not reasoning steps in a chain." ] }, { "cell_type": "markdown", "id": "d7daac7d", "metadata": { "papermill": { "duration": 0.001999, "end_time": "2026-05-27T12:20:39.540798+00:00", "exception": false, "start_time": "2026-05-27T12:20:39.538799+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 2 · Architecture at a glance\n", "\n", "```mermaid\n", "flowchart LR\n", " A([task]) --> G[Generate
K candidate actions]\n", " G --> S[Simulate
predicted outcome + risks + benefits + score
per candidate
]\n", " S --> D[Decide
argmax over scores]\n", " D --> E[Explain
state choice + tradeoff vs runner-up]\n", " E --> Z([recommendation])\n", "\n", " style G fill:#e3f2fd,stroke:#1976d2\n", " style S fill:#fff3e0,stroke:#f57c00\n", " style D fill:#fce4ec,stroke:#c2185b\n", " style E fill:#e8f5e9,stroke:#388e3c\n", "```\n", "\n", "**Linear, four-step pipeline.** No loops, no retries. The work is in the *simulation* step — that's where the LLM does most of its thinking." ] }, { "cell_type": "markdown", "id": "16abeac1", "metadata": { "papermill": { "duration": 0.003005, "end_time": "2026-05-27T12:20:39.547806+00:00", "exception": false, "start_time": "2026-05-27T12:20:39.544801+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 3 · Theory\n", "\n", "### 3.1 · Why simulate at all?\n", "\n", "Most agentic patterns commit to the next action greedily — ReAct picks the best tool call given the moment, Planning picks steps in order. That's optimal when actions are *cheap* and *reversible*: try one, see what happens, retry if it doesn't work.\n", "\n", "For *expensive* or *irreversible* actions — sending an email to a customer, executing a financial trade, deploying a code change, prescribing a medication — the \"try and recover\" loop is wrong. You need to deliberate **before** the action is taken, weighing what would happen for each option.\n", "\n", "That's exactly what Mental Loop does: it converts the \"act now, learn later\" loop into \"imagine first, act second\".\n", "\n", "### 3.2 · The simulator is just an LLM\n", "\n", "In robotics, \"world model\" usually means a learned dynamics model that predicts physical state evolution. In an LLM agent, the simulator is just **the LLM itself with a different prompt**: instead of \"what should we do?\" the prompt is \"*imagine* doing X and describe what would happen\".\n", "\n", "This works surprisingly well for high-level decisions (strategy, planning, weighing options) because LLMs have absorbed enormous prior knowledge about cause-and-effect from their training data. It works *less* well for fine-grained physical predictions (\"if the robot lifts its arm 3cm to the left, what happens to the cup?\"), where a learned dynamics model still wins.\n", "\n", "### 3.3 · The two-schema design\n", "\n", "```python\n", "class _CandidateActions(BaseModel):\n", " actions: list[str] = Field(min_length=2, description=\"K distinct candidate actions...\")\n", "\n", "class _SimulatedOutcome(BaseModel):\n", " predicted_outcome: str # 2-3 sentence prediction\n", " benefits: list[str]\n", " risks: list[str]\n", " overall_score: int = Field(ge=1, le=5) # STRICT scale\n", " rationale: str\n", "```\n", "\n", "The separation matters:\n", "- `_CandidateActions` is called *once*, gets K different actions in one structured response.\n", "- `_SimulatedOutcome` is called *K times*, once per candidate, so each gets its own focused prediction.\n", "\n", "Calling the simulator once per action (rather than batched) lets the LLM use its full reasoning budget on a single action — better outcome quality than asking it to weigh K options simultaneously.\n", "\n", "### 3.4 · The strict 1-5 scoring rubric\n", "\n", "Just like ToT (notebook 09), Mental Loop's value depends on the scorer **discriminating** between candidates. If every action scores 4/5, the `argmax` is meaningless. The `_SimulatedOutcome.overall_score` field's description explicitly says: *\"Be discriminating. 1 = clearly bad / high risk. 5 = clearly excellent. Most actions 2-4.\"* Without this calibration anchor, models tend to compress scores into a narrow band.\n", "\n", "### 3.5 · Where Mental Loop sits\n", "\n", "| Pattern | Layers | Candidate type | Use when |\n", "|---|---|---|---|\n", "| ToT (nb 09) | tree (K^D) | reasoning steps | hard reasoning task |\n", "| **Mental Loop** *(this notebook)* | **flat (K)** | **real-world actions** | irreversible / high-stakes choice |\n", "| Best-of-N (sample + pick) | flat | full solutions | parallel sampling helps |\n", "| Ensemble (nb 13) | flat (parallel agents) | full answers | diverse perspectives |\n", "| Self-Consistency (nb 21) | flat (parallel samples) | reasoning chains | majority-vote helps |\n", "| Dry-Run (nb 14) | linear | one proposed action | execute-after-check |\n", "\n", "Compared to **Dry-Run** (notebook 14): Dry-Run *simulates one specific proposed action* and asks for approval. Mental Loop *generates and simulates K alternatives* before picking. Use Dry-Run when you have a specific candidate in mind; use Mental Loop when you need to discover the best option.\n", "\n", "### 3.6 · What goes wrong (you'll see in § 9)\n", "\n", "1. **Flat scores** — the simulator gives every candidate a similar score, so argmax is arbitrary. See ToT § 11.1 for the standard mitigation.\n", "2. **Optimistic predictions** — the LLM imagines best-case outcomes for every candidate. The `risks` field forces it to think about downsides, but a same-model simulator still has blind spots.\n", "3. **Confabulated specifics** — the \"predicted outcome\" includes invented numbers / details (\"you'll save $1,247 per year\"). Production code should mark predictions as estimates.\n", "4. **Limited candidate diversity** — K=3 candidates often span only a narrow range. Raise temperature or explicitly prompt for unconventional options.\n" ] }, { "cell_type": "markdown", "id": "6c686e08", "metadata": { "papermill": { "duration": 0.002001, "end_time": "2026-05-27T12:20:39.551845+00:00", "exception": false, "start_time": "2026-05-27T12:20:39.549844+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 4 · Setup\n", "\n", "We pass `scoring_fn=commute_score_from_minutes` to `MentalLoop`. This is the **deterministic-scoring fix** — the LLM predicts `predicted_metric` (minutes) but Python computes the final 1-5 score from that number. This bypasses the LLM-as-Scorer flatness pathology entirely." ] }, { "cell_type": "code", "execution_count": 1, "id": "15dc4843", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T12:20:39.559489Z", "iopub.status.busy": "2026-05-27T12:20:39.558358Z", "iopub.status.idle": "2026-05-27T12:20:40.577490Z", "shell.execute_reply": "2026-05-27T12:20:40.577490Z" }, "papermill": { "duration": 1.024368, "end_time": "2026-05-27T12:20:40.579767+00:00", "exception": false, "start_time": "2026-05-27T12:20:39.555399+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
Provider: nebius  ·  Model: meta-llama/Llama-3.3-70B-Instruct ─────────────────────────────────────────────────────\n",
       "
\n" ], "text/plain": [ "\u001b[1;36mProvider: nebius · Model: meta-llama/Llama-\u001b[0m\u001b[1;36m3.3\u001b[0m\u001b[1;36m-70B-Instruct\u001b[0m \u001b[92m─────────────────────────────────────────────────────\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Scoring function (Python, not LLM):                                                                                \n",
       "
\n" ], "text/plain": [ "\u001b[1mScoring function (Python, not LLM):\u001b[0m \n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ " 25 min → score 5/5\n", " 35 min → score 5/5\n", " 45 min → score 4/5\n", " 60 min → score 3/5\n", " 80 min → score 2/5\n", " 100 min → score 1/5\n" ] } ], "source": [ "from agentic_architectures import get_llm, enable_langsmith, settings\n", "from agentic_architectures.architectures import MentalLoop\n", "from agentic_architectures.ui import print_md, print_header, print_step\n", "\n", "enable_langsmith()\n", "print_header(f\"Provider: {settings.llm_provider} · Model: {settings.llm_model}\")\n", "\n", "# Deterministic scoring function — Python decides the score from the LLM's predicted minutes.\n", "def commute_score_from_minutes(minutes: float) -> int:\n", " \"\"\"Map predicted door-to-door minutes → 1-5 score.\"\"\"\n", " if minutes < 40: return 5\n", " if minutes < 55: return 4\n", " if minutes < 75: return 3\n", " if minutes < 90: return 2\n", " return 1\n", "\n", "print_md(\"**Scoring function (Python, not LLM):**\")\n", "for m in [25, 35, 45, 60, 80, 100]:\n", " print(f\" {m:>3} min → score {commute_score_from_minutes(m)}/5\")" ] }, { "cell_type": "markdown", "id": "a0195e96", "metadata": { "papermill": { "duration": 0.002983, "end_time": "2026-05-27T12:20:40.585808+00:00", "exception": false, "start_time": "2026-05-27T12:20:40.582825+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 5 · Library walkthrough\n", "\n", "Source: [`src/agentic_architectures/architectures/mental_loop.py`](../src/agentic_architectures/architectures/mental_loop.py).\n", "\n", "Five nodes (each does one job):\n", "\n", "1. `_generate` — `with_structured_output(_CandidateActions)` produces K distinct candidates.\n", "2. `_simulate` — calls `with_structured_output(_SimulatedOutcome)` **K times**, one per candidate.\n", "3. `_decide` — pure Python `argmax` over simulated scores.\n", "4. `_explain` — final LLM call to write the recommendation + tradeoff explanation.\n", "\n", "Note `_simulate` runs sequentially across candidates. For latency-critical applications you could use `langgraph.graph.parallel` to run the K simulations concurrently — extension idea in § 11.3." ] }, { "cell_type": "code", "execution_count": 2, "id": "349f6624", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T12:20:40.592574Z", "iopub.status.busy": "2026-05-27T12:20:40.592574Z", "iopub.status.idle": "2026-05-27T12:20:40.609439Z", "shell.execute_reply": "2026-05-27T12:20:40.608629Z" }, "papermill": { "duration": 0.02173, "end_time": "2026-05-27T12:20:40.610498+00:00", "exception": false, "start_time": "2026-05-27T12:20:40.588768+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- CandidateActions schema ---\n", "{\n", " \"description\": \"K distinct candidate actions the agent could take.\",\n", " \"properties\": {\n", " \"actions\": {\n", " \"description\": \"K distinct candidate actions. Each must be a SPECIFIC, actionable choice (not a vague principle). Each should represent a different strategy, not different wordings of the same one.\",\n", " \"items\": {\n", " \"type\": \"string\"\n", " },\n", " \"minItems\": 2,\n", " \"title\":...\n", "\n", "--- SimulatedOutcome schema ---\n", "{\n", " \"description\": \"Mental simulation of one candidate action's outcome.\\n\\nThe `predicted_metric` field is the key to the deterministic-scoring fix:\\nif the caller passes a `scoring_fn` to `MentalLoop`, the LLM-supplied\\n`overall_score` is **overridden** by `scoring_fn(predicted_metric)`. This\\nsidesteps the LLM-as-Scorer flatness problem (Llama-style models compress\\nscores into a narrow band regardless of the rubric).\",\n", " \"properties\": {\n", " \"predicted_outcome\": {\n", " \"description\": \"A conc...\n" ] } ], "source": [ "from agentic_architectures.architectures.mental_loop import _CandidateActions, _SimulatedOutcome\n", "import json\n", "print('--- CandidateActions schema ---')\n", "print(json.dumps(_CandidateActions.model_json_schema(), indent=2)[:400] + '...')\n", "print()\n", "print('--- SimulatedOutcome schema ---')\n", "print(json.dumps(_SimulatedOutcome.model_json_schema(), indent=2)[:500] + '...')" ] }, { "cell_type": "markdown", "id": "e69a0682", "metadata": { "papermill": { "duration": 0.002999, "end_time": "2026-05-27T12:20:40.615537+00:00", "exception": false, "start_time": "2026-05-27T12:20:40.612538+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 6 · State\n", "\n", "| Field | Type | Set by |\n", "|---|---|---|\n", "| `task` | `str` | caller |\n", "| `candidate_actions` | `list[str]` | `_generate` |\n", "| `simulations` | `list[dict]` (action + outcome + score + …) | `_simulate` |\n", "| `chosen_action` | `str` | `_decide` |\n", "| `chosen_score` | `int` | `_decide` |\n", "| `explanation` | `str` | `_explain` |" ] }, { "cell_type": "markdown", "id": "75624187", "metadata": { "papermill": { "duration": 0.002999, "end_time": "2026-05-27T12:20:40.622497+00:00", "exception": false, "start_time": "2026-05-27T12:20:40.619498+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 7 · Build the graph" ] }, { "cell_type": "code", "execution_count": 3, "id": "da347d94", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T12:20:40.629067Z", "iopub.status.busy": "2026-05-27T12:20:40.629067Z", "iopub.status.idle": "2026-05-27T12:20:43.085666Z", "shell.execute_reply": "2026-05-27T12:20:43.084697Z" }, "papermill": { "duration": 2.463165, "end_time": "2026-05-27T12:20:43.087718+00:00", "exception": false, "start_time": "2026-05-27T12:20:40.624553+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAG0AAAITCAIAAACCLQWZAAAQAElEQVR4nOydB3wURRvGZ68ll95ISC9AINQAAQWV3gSRphTpRbp0kN6VjghIVelI7wIi8oGKgPQaakhCCCmQXi5Xdr93b8Plcrm2yRxekvmT37G3Ozu7+9w7Zae9IoZhEKHEiBABB0RHPBAd8UB0xAPREQ9ERzxg0DE3K+/m+YzE2Dx5rkqlQoo8tiIlECCaRhSFEIUYGgmFlErF7uc2KAB2q2tcsAl1L4EQgiKaCyMSqJS0eoNSKfOrZZoYKAGcijTVtfzTBRRNMyIhpVS93S9grwvRvo2zICr2sW3gDEZsI6jgL6nZ2MndS4pKBlWS+uPB1S+S4vJUSvYube0EYht4RkopVx8TgCps9BR7CUSJEKNU7xaCWKy4sBvRlPoWEOgCj80GVLF7BCJEqwMXenhBfnhWIKSOPP8J8k8H1TRXUQdjGJri9qujQiplwZ2LbCBmWp6nkssYlYKNxKWC+JMh3i4VJKhYFFPHXYuiU5OVdg6CyuEOTbp6olLOlVNv7l/KyMlU2TkKBs0PQfzhrePfx5LvXEh3chd1n+ArsRWjssXe72KTY+WB1aUdv/TldSI/Hfcsj0lLVnQc7u0bbI/KLpunPxNJBAPnBpt/Cg8dz+x6Ff9MNmA2j9hLL3tXxkBG33tqoJnhzdURMsQ8GT1oXnHyjlLKnhUxWamqIQvNemSBOYGOrIuT5zHlSkSg58RAKAZ2fBttTmDTOkZHZsZHyXhlFmWG7uMDcjJU5/cnmgxpWsfTWxNrNHJE5ZV2Ayo8uJJpMpgJHc/tT4T8s2k3L1ReCazmJHUU7l/1wngwEzo+vpYZ1qD8GiNHs888kmLzjIcxpmP0g0ylAjX7vPwaI0dwDUeBiDp/wFguaUzH62fTHJyF6N2yb9++OXPmIP5MnTr16NGjyDJAc0b0g2wjAYzpmJIg9wws5nt7sXnw4AEqFsU+0RxC69rLcmgjAYzVw9dNftr8M4+w91yQBYiOjt6wYcP169fhBmrXrt2vX7/w8PChQ4feuHGDC7Bz585q1art3bv3r7/+unfvno2NTb169UaNGuXn5wdHp0yZIhQKvb29t2/fvnTpUvjKneXg4HD+/HlkAdZOeDpqRSW2wU8fxuwRGriCatghCyCXy0EyEGLNmjXr168XiUTjx4+XyWSbNm2qWbNmhw4drl27BiLeunVr2bJlderUWb58+bx581JSUmbOnMnFIBaLn6pZuXJl3bp1L168CDtnzZplIRGRusXv6R2DFSCD7bgZKUqQXupgkXQdExMDovTq1QvEgq+LFy8GM1QqlTrBatWqBdllQEAACA1fFQoFyJ2enu7s7Ax2ER8fv2PHDltbWziUl5eHLIxAKEhLUKI6+o8a1JFW0chiIwRAGldX17lz57Zv375+/fpgcREREUWDgcHGxcWtWLEC0nV2dn42Dz8A6AgbwcHBnIjvCGi+ZyhDBw2ma65lWKVQIQsAmd3mzZs//PDD3bt3Dx48uHPnzidPniwa7MKFCxMmTKhevToEvnr16tq1a3UiQe8Q6LdwdOevI2LNAT2PNFbYl4SgoKBx48adOHECMrjKlSvPnj374cOHOmEOHz4MhQ+ULaGhoZCQMzNNv59ZDigtAqsZ7MYx+j5DUVF3LKIjFNbHjh2DDUiYTZo0WbJkCeSAkZGROsEgK/T0LOi0OHfuHPqPeHQtjRIiqaPBFGBMR2dPcUK0DFkAEGj+/PmrVq168eIFlDlbtmyBQgZySTjk7+8PuSGkYsgHwQwvX74MZTcc3bVrF3fuq1evikYIaRwU1wRGuLl/OUNstMQ1pmOtxk4ZKRbJH0Gy6dOnnzp1qkuXLt26dbt58ybUJUNC2PbNrl27QhKGtPzkyZORI0c2btwYsshGjRolJCRA1QfyyjFjxpw+fbponIMGDQL1J06cmJubi3CT9ELuW8VYmWaiPRwqnw3buDZs547KMWnJ8p3fxo7+rrKRMCbaewLDpLf+TEPlm+ObXjm6mRgwYeIwdD+um/T0zsWU2h+46Q0wevRoyM70HoJ8iqs/FwVqjs2aNUOWwVDMKpUKEp+hWzp79qzeQ1DzS3+tMG6MyJx+rku/vr51Pm3EMv0R5eTksINR9GFER6lUauhQyTFSPTJyS46O+ptZf571zM1H0nmEPzKKWf2FOxdFC0VUr8nmdkKWGU5ueRX3OHvoosomQ5rVX9hnWlBOpurQDy9QeeLSyaSYB2aJiHiNA9i1OFpsQ3UfXy6s8vyBV4+u5QxbXMnM8PzGpfw0O0ogoMp8H+wvS2PSUxTDF5tliRy8x0kdWB2bEC0PriHtMJjfSKJSwfmDCff/yXJyE/WdEcTnvGKN24uPyj6x+ZVcxvZafNTFzSfYAZVyMtPkZ3cnvXwigybXRh1d6zXn/d5R/HGkdy+lXTudkp1JQ7OQrVRg7yaycxBKbARKlf7GJaEAqQr3cLBjchn1OFD1WFAdRAJKSTPaIfVHUvhUdUhGPUi3YKeAoujCjwnxyOUqWTad8UYhz6VVSiSxpWo0dvygYzHHcpZoPC7Htd/fxDzKyUxV0nJ2qDI3rrko+SNxta+tHpKM9ImoDp8/JJkLw/5TD2pmaKZwJEj7CfLj1IlUJxD8SBIQFwnFAujjDwiVNvqkAioZGHS0NNDWC2080ACBrJhSMF/ByEuI9UB0xAPREQ9mvRf+t0B3K/RWI+uG2CMeiI54IDrioRToSPJHPBB7xAPREQ9ERzwQHfFAyhk8EHvEA9ERD0RHPBAd8UB0xAPREQ9ERzwQHfFA6uF4IPaIh4oVKwoE1t6PVAp0TEpKssRUDryUAh0hURMdMUB0xAPREQ9ERzwQHfFAdMQD0REPREc8EB3xQHTEA9ERD0RHPBAd8UB0xAPREQ+lQkfrnc/Vtm3b5ORkbrIgtIfTNA3bQUFBhw8fRtaH9bbXt27dGqmXiuM6FeBTIpH06tULWSXWq2Pfvn39/QutrhEQENCpUydklVivjl5eXh9//LHmK6Ru+PqO19gzH6vuh4NUrDFJPz+/bt26IWvFqnV0dnZu3749ZJFInV1yy2daJ6bL69jH2U9uZOZprbunPStcs82tv1t0P2Kn46tdTKGCkIy+OeZFJpuz0Ax9+dJlhqHr169vays1dK5mu2gk3GIARk40dGkOkUTlH+oQFmHiJzSh40+zn+blILGNQHuWf6G7f+sAS+3SrGCyvmY/t81Ozn97lPNJVhCbJgbOx1bRJ4Qd7H5K5xJI99dS61VkwQDuctr3g/joKBQztIp17/XF1wEOzgbXLjSm48apTz18RW36BaFyz79nEh//m9l3pkEpDeq4ecZTvyq2H3bxQwQ1L56mX9iTbGidN/3lzKUTSWDMRERt/Cs7S6TU0Y36FyfTr2PsE5mtI3G1qYurl01qgv43ff06KnJoZMybQDlFIhHJZfp10W907BrstMG1ssstKiWt7dxUG5J48UB0xAPRkQdsS6gBvylERz5QBl136C+v4V2KIsVMERi2SZ6UM5ZEvz2q+0IQQQfq7fKJRSH2yA9D2R3RkQeQRg2lUqIjP/iVM1BeMwwpsHmgX0e2vZq0UxTFcDljsJ+rbJjjvPlTT57C56yUQYZyyFKwjmZJePTIgs5KtcFWzqSmpixaPPv+gzsB/kGdOn0eFxf719//27blAFJP/P3p53WXr/ydlJRQs2Z4l07d33//Q9j//PmzQUN6rPth2+7dW/6+eL5CBc/mzdoM/fIrrqM1JeXNuvUr792/LZPJGjRo1K/PEH//QNh/8NCe3b9sGT9u2py5Uzp37v7VqEkQz7HjB27cvJqQEB8UGNK+fedOn34GIZu3ZH0iLlu+YP2G744fPQ/bp387fuz4wefPnwYHV27RvE23rr34vrfx80sKvWt83wuXLp8f+yJ62dJ1CxesvHLlIvxpJvuuXrP0wMHdXTr32L3reNMmLefMm3Lhzz+Q2rcofK5YubBly3ZnTl+aMW3hvv07/3f+d6T2uTN+4rBbt6+PHzf95x/3urq4jRzV/2V8HGIbUyU5OdnHjh2YNnU+/CSw54d1K65evTR2zNeLF60GEb9fveTyFdZN6emT7OfkSbM4Ec/+cXrJ0nmhVart3nlsyOBRcEtr161AfKAMLHWODOqI+PnSTE9Pu3z57+6f960eVtPd3WPihJlgGtyhvLy8386c+KLXgE87dnN2cm7/caeWLdpt37FZc27TJq2aNW0FmtapU8/H2/fxY9b73t27t2Jjo6dPW/Bew8Zubu4jho9zcnY5eHC3+mEosNCePfu3atnOzy8AsW5dFy1btq5e3QZ1wyPAEquGhv179Z+iN3ny5JHateuOGzvV1dUNAg/sP/zIkX2QjJDZMPzzR4pXxvks6gl81qyZ7/zUwcGhXr2G3DboIpfLG0Q00gQOr1M/KuppekY69zU0NExzyMHBMSuL9WZ0994tUBaeNv9uKArOun3nhiZktao1kNbzHTq0p9+AbpCQ4e/howdpRdShaRqyCO3bqFu3Aey8c/cmMhvKcLo2WO/hVX/MzMyAT3v7AocVTk75AxA4Xb4aO1jnlNSUN9xqCXrn+sNZCoWCy+A0uLi4arYhdb+9VXrq9LEKhfzLIaPDwyMcHRyLXgupPRxDhJBNw1+h2+Blj3zr4XyxsWF9+ynkcs2e1LT8+3P3YF0XTJwww9e30CA8T8+KKSmvDUUImYNUKv1m4XfaO4UCPc7hHz95+PDh/eXL1tV/mwLgN6jgoetuwtbW1s7Ork3rDk2atNTe7+PNo29ZXWzwsUd2LAcfe+RK0ufRz4KCWF+OWVlZN2786+XlDdt+vgHcYDvIvLjAYALwq8JTpRg2hUqVQnNzc0FrX5/854x/9dLF2bVoSMia4VMjXHR0FPwFB1XSG2dmVqbmNsA8X7166enphcyGQRS/dlx1cyWPFxp42sDA4G3bN0GRCiKu+n6Rt3e+lx/Qa0D/YVCwQNEBiQtK6klTRq76frHxCMG4GjZsvHz5gsTEBFDqyNH9w0f0PX36WNGQUNGB/GHvvh0ZmRlQNK1Zu6xBxPsJiazvUvj9oC517drlm7dYZ6VfDh598eJ5qJZDVgA3M3/BtAmThsu10pBpWNfNvNK14RcgQ0yZNHv5yoV9+3WpFFKldev2kFdGRub7NezZox/Ywu49W8FIYX+N6rUnTpxpMsJF36yCut78hdMePLgL9t6q1cddu/YsGszLq+KM6QvhJ+zUuQVkHTOmLXiT8nrW7En9B34GtdfeXwzasnUDFN+/7D5Rq1b4pg27du3esnHTapksF24Dqmi4BqbqH9+zbUE09F93GxeIzAasBqoj8FTc12kzxomEogXzl6MyxLnd8fFROXqH+GB7L4Q32fEThsI7DAi6Y+dP169f+VT9UlGmoHiWM9C9iHgyZ86SZcvnb/5xbXJyYmBA8JxZiyGfQmULI3mdAR0R4ts/A+8qC+fze80qdTCMwf5C0s+FB9KvJhGRqAAAEABJREFUwA9++aNQSFnED3vph997IRuY9M/wwUg7BSLoQPGt9xD0wrY/8krXQjHFKEm61oXKn8SjBwPjmhUMQ/pdi2Ck/kjSNR6IjnjQr6NEKmSUpAapCxQbEqmBkbd690rtkUxGdNQlPTVPYsPn/bp5d4/cLFKB1CXztTKsoYveQ/p1dHaXVgyW7Fr0FBHesm/lUztHQURr/S7vjc0bvnw6+ea5dO8QO98qUqmdwanHxjA+oICrihUOwCDdRj5u6RlDAeD11WDnk4Gra2LQd31d5HI6/lnWq6hs70DpJ1/6GgpmYh47SBl5OSsvR6VUoP8KY0oZxcjsdPMRitlp/AFVpW36eBsJVgr82v/yyy8vX76cNGkSsmKInwo8EB3xQHTEA/Frj4dSoCNJ13ggOuKB6IgH4ucMD8Qe8UB0xAPREQ9ERzyQcgYPxB7xQHTEA9ERDyR/xAOxRzwQHfFAdMQD0REPREc8EB3xUKVKFaIjBp48eUL8c2GA+DnDA9ERD0RHPBAd8UB0xAPREQ9ERzwQHfFAdMQD0REPREc8EB3xQHTEA9ERD0RHPBC/9iWiRYsWGRkZKpVKM7kQbtXX1/fEiRPI+rDe+QqNGjWiaZrza88B223btkVWifXq2L9/f2/vQlMj/fz8evTogawS69UxNDQ0IqLQes0ffPCBp6cnskqseh7SoEGDNH7tvby8unfvjqwVq9YxMDCwcePG3HbDhg3hK7JWeNR7ou+mq5ii4fNn1zMUI2B055sXnruvNSEd/i9Yh64gFKWOR3uJuhbvffHwRppKqWr+Xq9nd7LhAI2KrgqaH0PB/P/CkSAjO9lqAOvKveh+GjGB1SSaBd+NY1a9Z9uC55mpKqEIqRTchfV5sWD0PJ/2zkLHtfdrxcbO4C+sk95YTVL0LIiZZvRHJRAgWt/iWezzKpHUUfDFRG+psxQZxbSOG6c9dfWUNOtVUSot1hIVpZzz++NjHuQM/SZYIhUaCWZCx41fP60a4VC/TUVUjsnNle9bGjt6ZWUjYYyVM7/+9FJsIyjnIgKQEF29JLuXxBoJY0zHhFiZm4+1D8x+NwSGSTNSjHkQMKYjrUQSOzzL5pd2XDylKqMLYhqr9yjyGEZOVjlTA9Ux2lhBQtYtxAPR0SwYZMLxmzEdhQKKuBPnMOkowZiOKsPux8sbJlUg6dosBKY86BEdzYJd39/oAtbGdFS3vpD8UQ1VgnKGUb9+IwJSexoudjlD0ECVpN4jgHqPoIw7djYbpvj2yHr9oIl7AA4TNel3YW5RUU+bt4y4c4eHR1ojHDy0p1Wb99C7hTFVDzemI7zPCIQYymsXF9d+fYd4er7TdszDR/YtWjIH4aIk5TW8z9AqDOW1m5v7wAHD0bsFs0f7d1xeX75yce/e7Q8f3Xdz86hZs87QIV+5u3tAuh78Zc/vv9tcu3bdefOnQk7T6P2Plq1YIBQKq1WtMXfOkiNH92/bvsnJybltm0+GDxsLASIf3h85qv+6H7aFVcv3c92nb+fGjZuOHDFe+3JZWVn7D+z89+ql6Ohn7m4eEGDQwBG2trbjJgy9fZt1ln3mzK8bN+wMrVLt/v07cImHD+87u7jC1fv3G2pvb4/Mx5Q9Gk/XAiGfdP34ycNp08fWrdtg688Hxnw15dmzx0uWztUJIxKJ7t2/DX/7957asG4HbIwd/yVNq04cuzBn9uJ9+3deUbukN5NDh/fs/mVrj+59v/1m1bBhY89f+B3Egv2rVm4KC6vZpk2H//1xDUSMe/li0pSRsjzZ2jVbFsxbHhX1ZPyEofzGsDFMSdopaBWfdH3v7i2whT69BwkEAi+vitWqVo96rsezgFwuHz1qklgsdnZ2CQmurFQpuVRfNzwCctJnUU/ef/9DM6/Y/fM+TZu0DAwMzr+Be7f/vfrPsKFjdIKdPXtKLBKDgnBF+Dpp4qxevTv+ffF8s6atkLmYsCec6bpmrXCZTDZtxriI+u81atTEz9df42xaG19ff816PFI7O0iPmkP2dvac/3YzgXiuXru0eMmcp88ec/bl6upWNNj9+7erVavBiQhUrOjt4+N35+5NPjqWpB7ONkDyqBhBClq8aPWff/6xafOadeu/q1+v4YD+wyCX1I22cN1eUIKqPlzo5MkjkKIbRDSCFPDjTz+cPHW0aDD4bR4+egB1L+2dqSlvEB9KUA9nGyD51cPfa9gY/iCdXr9+5eChX6bPGHfo4O8IB5D8dfZAjnX8xMHPun3xSYcu3B5Dtuzm7lGrVrhOncHZSb8HlOJhtD2cHcLJo5y5det6njwPdPTwqNC27ScVK/pAucm5RueLjYTtp8zNzeG+Qrn8+nWyThiFQpGbm+vx1qM9ZLv/XPpTb2yVQqqc+f3XOrXraWw/OjrKzy8AmQ2FGOOOMY2lKRUUNDSfcub+7bnzphw/cSgtLfVB5D0oTEHQil7eiD/+/oGODo6QSMHoIONbvHSOo6OTThiJRBIQEHTq9LGX8XHp6WlLl8+vVTM8MzMjOzsbqXPhyMh7N25eTU1N+eyz3vAka9etgOz7xYuYjZtWDxrSQ28ZaAh2aJBRpyMC0+ebDZSeHdp3WfvD8i7dWkPFws7O/ruVm4q31AkUILNmLYLqXotWDaBsbda0tbe3b9Gqx6wZ39ra2A4Y+Fmffp0hOx4yZDR87dKt1auE+I4dukI9dPKUUVABcHJ0+unHvVJb6bARffoN6Hbr9vXJk2ZBbo7wYWx8z/pJT/3DHJp+Vt7HpQAxD7LP73s1+juDQ3xIOy4eTLQ/Moj0K7BQ7BhUYwFMtD9SiNgjC8OOJTYWgLSH44G0h+PBqD1CqqaIPbJQJet3pSiSPXIwJRgHoK5aknTNwppT8cebQSMuKWfMw2g7LjTiknJGTQnHP7I1H0RQt/eUaByAirzP5FOCfgVGPbocEczAmI4iGwFFps9wmGppMKajWEzJsok9sqQmykRGZ1caq9ZUDLZ58yoXERB6fj/DxcPou5+RYx/392FU1Ll9cah8E/MgLTOF7jkpyEgY0/OGf5wVJbZlItp6BFRxRuWM1wm5106/Tn6RN3J5ZeMhzZrHvnNxVOYbtsuLLjJLTt2NRunGWLjWSRWdN1F4orna/7yxU3TiLDT1vfAh7a8GFh3QPVQoBq1gAiH7aPbOwv6zgpEpeKyDlJ4slyuKnP/26gUw2l1r3EoJlM6KCxSNGCGl6VqnKN1ednghZdSv9hDX77+fSUxK6t27d+GLvm2hZgr35L39SqnPpTULLej8MCj/11NHpHV7WrEJBCr3iiaWAdDAozPPucJ/sx6ASpjKiNIq+Fj1agTE3wceiI54IP7i8ED82uOBpGs8EB3xQHTEA9ERD6S8xgOxRzwQHfFAdMQDyR/xQOwRD0RHPBAd8UDyRzwQe8QD0REPREc8lA4dSf6IAWKPeAgNDSU6YuDRo0fEPxcGiJ8zPBAd8UB0xAPREQ9ERzwQHfFAdMQD6KhSWbu7jFKgo1AoJPaIAZKu8UB0xAPREQ9ERzwQHfEAjeHQZYisG2KPeLBev/affPKJUk1WVhZSL/8ql8tdXFzOnj2LrA/rna/g7++fnJyclpbGqQki0jTdsmVLZJVYr46DBg3y8PDQ3uPj40P82vOmQYMG1atX195Tr169kJAQZJVYu1/7ihXzV0OtUKGC1RojsnIda9WqFR4ezm2HhYXVqFEDWSvWPi+uX79+Xl5ekFF+8cUXyIopZr3nj72vou7kKPIYIw2DlOElt4suAGDyFENorw1Q5Jje5Yv074VIxBLkHWLz6VB/xJ/i6Hh2T8KTm1nBNR2r1LUXSvRML6fe3qx21OrnZai3CzEwhcLq+6KzRx2jkQAFG29/DL2/CmXAOSZNo9gH6U9vpju6SHpM5LFEe360fHXcuzImPUXRa3JlVEY58kOUSoEGzOFXMeCXPya9ynodX5ZFBDqPCpHl0pdOJPE6i5+O/xxNtXMUorKOi4fk6Z0cXqfw0zE3QyWSlP0V+KTOYkUuv+yOX3uPPA8xdNnXUSVHchm/hd1KQbtZqYDoiAd+OgrFAsbaW1T/G/jpqFLQ5SF/LAYkXRuAZ8MD0dEAPNdh5Sc7xa6DTVwF6IGfPTI0Ux48+1CI91OSdK0HBvFOdURHPPDMHymEyoErFcbS6VooElDloB5O8U/X/OxRqeDnybkoq75fPHBwd1QsoqKeNm8ZcefOzaKH/nf+dziUlpaK/iNKU/7o4uLar+8QT09r9EtZmnR0c3PX8RlsPfDTUUAhFc9yJicn55tFM2/evBocXLlTx8+0DymVyp9+Xnf5yt9JSQk1a4Z36dRd44k9IzNj48bvT5466uzsElH/vS+HfOXlVRHS9eAve37/3ebatetCmA0bvz/z+692UruWLdv5+QVqx3z6t+PHjh98/vwpXLRF8zbduvaieDnSESC+/t34BRcIKb5uv5evWBAXF7t82foF85Y/j34GqmkOrV6z9MDB3V0699i963jTJi3nzJty4c8/kFrfqdPGvH6TvHLFhq9GT05KTpw6fYzO0L2jxw4cPbZ/7Jiv163b7u3tu33HZs2hs3+cXrJ0XmiVart3HhsyeBRcYu26FYgXNGIs+l6oVDI0n3Lm9etkKAF69exfPawmpMphQ8fY2Nhyh/Ly8n47c+KLXgM+7djN2cm5/cedWrZox8kBWkdG3hs1YkLd8IiWLdqOHjWpUqXQlMLu0w8d3tO0SStQ38nRqV3bjvXqNtAcOnnyCBjsuLFTXV3dYP/A/sOPHNmXmpqCzEag9oGJ+GDZ8RSvXr2Ez8DAgj7MqlXzhz49fhwpl8sbRDTSHAqvUx9SbnpG+rNnT+zs7AICgrj9YFkzpy/09PTShIS+4pcvXwQFFUQbGhrGbdA0fe/+be1o69ZtADvv3NVTyhuC4f/qa9lyJj09DT4hC9Pskdrmr7TP+aD/auxgnVNSU95kZ2dpzFYv2dnZKpVKqhWt7dto4bdRKBSQ7cJfoWj52CPD+nezZD8X4lnNh1ICPmV5Ms2enJxsbsPdowJ8Tpwww9e30DgQqNbY2dnn5uaAERnKi+3t7YVCYZ5WtBCe27C1tQVbbtO6Q5MmhUac+nj7IR4wfF/b+L7PIFrFQ8qKFX3g896921XV6Q4s5dr1K1ANhG0/3wAbGxvYgEyQCwwmAwkWVKhWtbpMJnv0ODKsGjvALDY2euWqb78aNVlT5sKGl5f3/ft30Of5F9IuviAzzczK1EQLF4XsRTtbMAOKb9rmlz+qeJYzFSp41qxZZ+vWDS9exEDBsvCbGRotQK8B/YdBwXL37i1IjFBST5oyEt524FBExPtgpJs2rf7r7/9dvXYZdiYnJQYGFnJe0rxZ6z//OgeFGGz/smfbgwd3NYe+HDz64sXzUGcCi4bI5y+YNmHScLgEsiQWH7c3ber8sLCaQ4f37tCxiaOjE5TLmhFFPXv0mzxp9u49Wzt2avb96iWQ9CZOnCIRJF4AABAASURBVInUExSWL10HPUGz50ye8vVoW6l00bff6yw906f34A7tO69ZuwxeBy9d/mvkiAko35EqqlUrfNOGXfD62KVba/htILdduGAlZ/uWg984qW0LoqGfq9u4QFSm+X3nq6TonOHLKpl/Cml/1APrZpmnI1Ge5YyAsvYJ5ThgvdjzHM7Is5xhPbSjMg+l7tHjdQpJ13pg1D16vE4hOuKB6KgHi5cz5QSLlzPwVktZvOZeKuFZXkMvF3FArA+SrvXA5o2k3lNy2LyR1Hv+E/jlj2KJQCAq++NSBEJGwHOhbZ462kA9v+wXNAqFUsSzmY2fjpXq2Msyy749ZiQrvfzN9djMwU/Hes09xGJonotBZZd7V5MUeXSHQT68zirOvOEfZz2ztUOdRvJo5iwtnNsbF/9ENmIZ74moxZzHvmNhVGYaLRQipVK3nqWZo64101l36ri2O/SiAbSnSKtdtecf1d7WPout7jFvB1xzc7QZnWgLNqCDSO8TiySIVjASO2rw/OLYR/HXQZLnym9cyJBnFz296IR73cnjenRlDB6Ni4vLzc2tUqUK4gk8GmVqrQBudj1iO4XoyhFOnr78skUNxa8/SqSS99t5IMuza9eZ9MTEJt0aIyuG+PvAA9ERD6VAR+jCl+hbTcSqIH7t8UDSNR6IjnggfiDxQOwRD0RHPBAd8UB0xAPREQ9ERzwQHfFAdMQDqYfjgdgjHoiOeCA64oHoiAdSzuCB2CMeSoGOfn5+1t8/Uwp0jI2NJX7OMED8xeGB6IgHoiMeiI54IDrigeiIB6IjHoiOeCA64oHoiAeiIx6IjnggOuKB6IgHoiMeiF/7EtGqVStQkKKorKws2LCzs4NbFQqFx48fR9aH9dpjhQoVHj9+rJnYlpmZSdN08+bNkVVivfM++vfvb29vr73HxcWlT58+yCqxXh3btWsXGhqqvadatWp169ZFVolVz0MCk3RycuK2nZ2drdYYkZXr+NFHH1WtWpXbDgkJadzYeqe8Wvu8uAEDBri5uUFG2atXL2TFFKfes39VbNprhSqPUal9QgoErDNzxM3L10weZwpcsqtnirNe2PN90DNvHaervStxjiW5Ypm7l/wiWj3FH7ZpFQ1nCASsv+iCCJn8T1T4q9pFGcUtDlB4UYHCywaggtM1MDRjY0fZOwvaDajo5slvQjs/HWW5qp9nPXdwFVbwl4pEwrd1Eq0J/Bpf8mrd8j/Zo7QZa7gXenb1kzN61hIoWFNQc12dMBRCOidS+rzZF9lJgw0wibE5GW9UnYb5+Fa2Q2bDo/74Kib30OqXXScEODhY+yiRkrNz4dM6TZ0af+JpZnge+ePxjfFh7zmWBxGBln29b13IMD+8uTo+up2qVDAN2vJaXb8U4x1oLxSjc3temRne3HSdGCUXicvXEpo2tqKUJHPbR8zVkZYLFLLytYSmQs7IZeYWwmTdQoNQFA+nXkRHwzA86oRER8MQe8RC/hpt5sFDR54rk5d6GMSY7xWJh47W2gFhOfS+TeqHpGuDMKScwQZ2HdVtXOUrg7RIOcM2gJWzcgbxsRyz0zXN27VNGYDkjzjgUw8vNU04nbu22r7jR+NhDh7a07J1Q4SJ8lteVw+r2bfPEPRfYLaO/F2Uv3vCwmrCH/ov4FXOIL7o9S//Mj5u4KDPhw8d27VrT6R2wty7b6cWLdqOGT15xqwJYpE4MDB4z97tNE2HBFeePGl25cqhOtEeOrz38uW/IiPvSWxs6tSuN3jwKF8f1p0wpOt161f+8fu/SJ0PDBwwPD09bdv2TVKptEFEo9GjJrm781hgWiikRCJzbceCNmbIvzw8c/9+Q3/asi4tLRW+woaDvcOwL8fAtkgounnrGmycPnlx29aDbu4eM2dPUKkK+Z68e/fWmrXLatSoM3/+8qlfz0tNTfnm25lFry4Wi/fu3S4QCI4c/mPbloN3793aum0j4oNKxSiV5tqOuToyPOqk+RjxL9+zRz9Pz4rrN66KiXl+7NiB6dMXavzYyuV5kMeB2fp4+4JBJSYmgHDa0VavXmvLT/t6fzGwbnhEg4j3u3/eBwwzPSO96A34+vr36T3I0cERzBDs8fHjSMQHClmg3YzNHPm4WuL8y/fr+6Vmj8a/fNMmLYVC4ddT5o4Y2Q8k+Pyz3tW1MjXIATSrKPj5BsBnTOzz8PD6mgBwbnx83A/rVkQ+vAd5ArczLTXF2clZ5x40zu4BR0en7OwsxAcGWUF5rVBjxL98tarVwZquXrvcuFET7QC2Wh7tbW3ZbZ3nv3jxwszZE8Eehw0dW6lSlWvXr0z5erTee6De4RuYpd5nIJ0a9y8PqRVss3HjJqtWL960YRdYGbdfWzWZTKaOylY7hhMnD9eqFQ4ZLvc1KysTWQaxmBJLrKCc0fiX5/5q1qjj7ubB+ZfPy8tbsnQu5IOQupMSE37Zs01z1rOoJ1DIcttcjhYSUlk72oyM9AoeBcMc/vrrHLIMCgWjkOMuZ4qBEf/ym35cIxAKe3Tv6+ToNHToGKiaxL96yZ3l5OS8es3SjMwM+Nu+Y7OXV8XatQqNHa1cKRRyAyjWlUrl/gO7uJ0JieZ22FsIC+poyL/8g8h7hw7tmTxxFleedPyka6WQKmCe3FlQZwwKqtS9x8edOrdISIhfOH+lJslzDBo08r2GjWfOmtCmXSMozaHqA1nt1GljoJqFsCLg835t7nizC/uS719J7zu7MrIkc+ZOgfxuxfL1yArYs+y5vZPwiykB5gQ2u/2RKXf9MwIBJTA7uZJ2M4MwjPndClbWHj5v7lJkNbDNZmY3KZj9PlPu2sL5QfqvDUMxFLLAOIDyBsjIkHEAJYeCCiQpr0sOTVugnCkV/Qp44VU9sWy/QqmHjH8sOQyywPgegnHMf5+hKUF5G+BDI+z1R4kdJbT2NbwxA52udo5CMwObWwbX+chOKS9fLzSybFVgDVszA5uro4Ozg6Or8NfNMah8cPFovEhC1W1Swczw/OYNb1vwXCBhOg8PQWWaPw/Gv3iUM3wJj0Zr3vPYt86Nys2mbe2FiBEoVcaqlFoTpfMRCQUqumAUl9rFfMHVdSaWaw9yFwoplcrQfRYMUdBsCdRtDPmRa0Wk2Sw6y53bgC6MvFyVUER9+Q0/7/bFWQ/g0a30R1eyZFm0svCz6ehSdAwG5NwqVsi34QWUdl9uER3zmwlyc3MZWmln76g/Xu1vb7e1o4JmbZrWFbKQjm9vA86ydaQCq9nVa+aOeGK960lp2LVrV2Ji4oQJE5AVQ/x94IHoiAfixwcPxK89Hki6xgPREQ8kf8QDsUc8EB3xQHTEA9ERD6ScwQOxRzwQHfFQCnRUqVRERwxA/kh0xABJ13ggOuKB6IgHUn/EA7FHPBAd8SCRSEi6xkBOTo71d7IT/1x4IDrigeiIB6IjHoiOeCA64oHoiAeiIx5KgY7wMgNNFci6IfaIB6IjHoiOeCA64oHoiAeiIx6Ijniw6nlInTt3pmk6LS0NqpC2trYqlQraxo8dO4asD+u1x27dusXFxWm+pqeng6YNGjRAVon1zvvo0qWLoPB6d25ubj179kRWifXq2KdPHz8/P81XMEZfX9/mzZsjq8Sq5yGB9UGGyG07Ojr26NEDWStWrWP37t3BBtnlA2kabLN9+/bIWrH2eXEDBw60t7e3sbGBshtZMdjqPVnp8pt/pCa+yJPl0CoFwzV0CYQUrWKEIvYTrsNNKReKBCq13wJulr96MV+G823PTTgXCBGtYqeaczPMMzIyaYZ2dXWmaXaqOcSpUubfc34M6knp3LXy94vgRMTFJhSyk+c19ymRUAIRI7ERVvC1qdHYydOfn/96Q2DQ8cj6uIRomVLBPolALID7Zqfjc3fOTcwXUuxSOOw6C+wTU5QgfwE22K/STM+nqMIrITDq9QV0lz/gVC+Y7A8xv3Vhr70IgBB+E01shdYZoEBiJeQTjEpBg+5QI3DzlnQa4SW1s0EloEQ67vvuRXJcnlAicPSw861u7tIiVkXCk5T0hExFLu3kIeo3IwgVl2LqGHk17X/73ohtBH51PaV25i5yY808uxSXm6mo9YFj08+K43O+ODqe+OllTGSud1V3Nz8nVIbIzcp9/m+ii4f4i6/NWjNcG97l9a0LaSBijZbBZUxEQOogrd4iKCNNeWobby8D/OwRipT457LqzYNRmebRX7F29lRfPtklD3u8fCrp5bOyLyJQ9aOA7Ez66IY480/hoeO1MxlBEcXJg0sj1ZoGvngse/E4x8zw5ur485woqbPY3tkOlRtcfOx//dHcjNIsHeOeZOVk0JXe80PlCb8ankol8+fhJHMCm6Xj2d1JEgdzV5R89xw8vnTZml7IAjh72UX+a5bfKrN0zEqjfaqWyteVEuJf20uZh14+M51Lmtbxn6NJ0L7g4I7nfb7UIRBTV06nmAxmun8m+lGuyMaCzWtXb5y4dPXwq8Sn3l6Vw2u1+qhRT86d046906F6W69Ou72H5ufl5QT61+rQdnSgP+vpEL7uOjD7adQ1OKVRg67Iktg6SN7Ey00GMy1QdppKbC9BluHG7d/2Hl7g51N1+oTDH7ce8ec/e46e/C7/zgSimBd3r986NXb41m9nXxCJJXsOzecO7Tvyzes3L4YNWNu/15KEpKiHjy8ii2Hvbqs0w2ucaR2VCkbqaKlpQP9ePxoSWLdrxymODm5VQiLathx68cr+zKz8dAR216PLTHc3X6FQVK922+TXMbAnPSP59r2zzT/sC7bp5Oj+SdvRYpEFG0oc3GxpMxbyN60jNAEKJRbpnoVWwOexd0KrvKfZA1JC0+Tz6HyHrp4Vgmxs8mustrbseqQ5uRkpqayHPi/Pgtcqf98wZDFspBLaDAeYZggErdK0RVZgVyrlKpXi9NkN8Ke9PzM73x71+i3JzmFd4tpICt4IJBLLloHmePMxrSO00MuVpjPaYiCR2IIc9cPb167RQns/JGQjZ9nbsf5w5QqZZo8sLxtZDHme3JxF7U3rKBIL5FmWGl7j4x2aK8usHJLvTlipVLxJfenibOwt3tXFBz6jY+9wyRlOefLsX3t7V2QZMt/IzHHwZjp/dHAVyrMtNTy7fesR9yIvXLl+jM0rY27t3Ddj45ZRSqPm7+LsGRRQ57dzm5KSYxSKvF37Z1nUk132G5lYYjp+0zr6VZEqZJayx+DA8PEjtkPBMndJu41bv8qVZQ3svUwsNtHl1KvbnAC/GqvW95uxsLmd1KlhvU8t54QtN13mWtF0dcWsdty145+GNPK1c7RULdKauXfmeeeRFf2qOBgPZtaLip2T8OU9s5o9yhixdxKgkDEpIjJz3F7z7u6//mhMxyvXjh7/bbXeQ5CFGUqnPbvOrhnWFGECstefdk7UewgyXKFQrNd78GefTg2v1RoZICMxt/r7pkVE5vfP/DznOSMQVWroo/eoTJadk5uu91B2Toa9nf4eMQd7N6j6IHykpMZhcAfJAAABvklEQVTr3S+TZdna6pfD3s5FU9XXIS4yKeNV9shllZEZ8OjnglyyWjN/kaQUDIXGAuSMbQd6VantaE5gHg05tZs4Pb7Io+unVPPwQox3sMRMEREvHZt08fT0lcAFUFnn0d/RtlKq2xgeowF4j6f498ybq2dSa7Qss72vD/+K8fKz6TLSl9dZvBtoG7Zx9wmxfXAuOi3Zgm+1/wlgUpHno+3sBXxFRMUeJ3X9jzeXT6aKpaLQD/xRmeDZlZe5GfLQ+vZtensj/pRo3N7ORc/TklQiW6Gbr4NnJTdUCnkTl/EmOl2eq3RwEg6YW/zMCsM40j0rYlMS5YyKHUcqtBEIBEKBSMdl5Fs3Tm+HfBZqidK7U7O/kAuoQl8N+93KH2pK6ZyuPgK1cXZWvAqplLRKwbZ0O7qJ2/b39PIrUSMmtnHNr6Jz7v+TkRwvU8gYpYKSywoa47W8lDFqr1tal387VpZRZ9XcofxxzQJ2UK8mALcThIBgXPu0UJQ/wFkoRCqVOox6dK623zKkFlK7PVskoURi2sZW6OolCa3nGFLL3JqNcUqBf65SQXl5ObE0REc8EB3xQHTEA9ERD0RHPPwfAAD//0XWa4EAAAAGSURBVAMA32bqF6NN65IAAAAASUVORK5CYII=", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Image, display\n", "\n", "arch = MentalLoop(n_candidates=3, scoring_fn=commute_score_from_minutes)\n", "graph = arch.build()\n", "display(Image(graph.get_graph().draw_mermaid_png()))" ] }, { "cell_type": "markdown", "id": "3c50e145", "metadata": { "papermill": { "duration": 0.003044, "end_time": "2026-05-27T12:20:43.100113+00:00", "exception": false, "start_time": "2026-05-27T12:20:43.097069+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 8 · Live run — commute decision with **objective scoring**\n", "\n", "Concrete decision: pick the fastest 8 AM Monday commute from Brooklyn to Midtown Manhattan.\n", "\n", "Why this task: Mental Loop only adds value if the simulator can **discriminate** between candidates. Subjective tasks (which exercise plan? which Python framework?) produce flat 4/5 scores because *all options sound fine*. Objective tasks where the simulator must commit to a **specific predicted number** (travel time in minutes) force the scoring to spread — a route predicted at 80 minutes must score lower than a route predicted at 35 minutes, *whether or not the model wants to be nice*.\n", "\n", "The task explicitly anchors the score to predicted travel time so the simulator can't dodge." ] }, { "cell_type": "code", "execution_count": 4, "id": "875ee173", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T12:20:43.110897Z", "iopub.status.busy": "2026-05-27T12:20:43.110897Z", "iopub.status.idle": "2026-05-27T12:21:05.145792Z", "shell.execute_reply": "2026-05-27T12:21:05.145792Z" }, "papermill": { "duration": 22.040238, "end_time": "2026-05-27T12:21:05.145792+00:00", "exception": false, "start_time": "2026-05-27T12:20:43.105554+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
Recommendation ────────────────────────────────────────────────────────────────────────────────────────────────────\n",
       "
\n" ], "text/plain": [ "\u001b[1;36mRecommendation\u001b[0m \u001b[92m────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
The chosen action is to take the B subway line from 7th Avenue in Park Slope to 42nd Street Bryant Park in Midtown \n",
       "Manhattan. This option is justified by its predicted total door-to-door travel time of approximately 35 minutes,   \n",
       "which is the fastest among the considered options. By choosing the subway, we accept the tradeoff of potentially   \n",
       "less flexibility and comfort compared to driving or biking, but gain the benefit of a more predictable and         \n",
       "efficient commute, outpacing the runner-up driving option which is susceptible to traffic conditions and parking   \n",
       "delays. Overall, the subway option offers the best balance of speed and reliability for the Monday morning commute.\n",
       "
\n" ], "text/plain": [ "The chosen action is to take the B subway line from 7th Avenue in Park Slope to 42nd Street Bryant Park in Midtown \n", "Manhattan. This option is justified by its predicted total door-to-door travel time of approximately 35 minutes, \n", "which is the fastest among the considered options. By choosing the subway, we accept the tradeoff of potentially \n", "less flexibility and comfort compared to driving or biking, but gain the benefit of a more predictable and \n", "efficient commute, outpacing the runner-up driving option which is susceptible to traffic conditions and parking \n", "delays. Overall, the subway option offers the best balance of speed and reliability for the Monday morning commute.\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "CHOSEN_SCORE: 5/5\n", "CHOSEN_ACTION: Take the B subway line from 7th Avenue in Park Slope to 42nd Street Bryant Park in Midtown Manhattan\n", "ALL_SCORES: [5, 4, 4]\n" ] } ], "source": [ "TASK = (\n", " \"DECISION: Choose the fastest 8 AM Monday commute from Park Slope, Brooklyn \"\n", " \"to Midtown Manhattan (about 8 miles / 13 km).\\n\\n\"\n", " \"Generate 3 DISTINCT travel options (different modes / routes — e.g., subway, \"\n", " \"driving + parking, bike + bridge, mix). For each candidate:\\n\"\n", " \" (1) state the option concretely (which subway line / which route),\\n\"\n", " \" (2) PREDICT TOTAL DOOR-TO-DOOR TRAVEL TIME IN MINUTES — populate the \"\n", " \" `predicted_metric` field with this number (e.g. 35.0, 60.0).\\n\"\n", " \" (3) list realistic risks (delays, weather, cost).\\n\\n\"\n", " \"The Python wrapper will compute the 1-5 score deterministically from your \"\n", " \"predicted_metric — you don't need to think about it.\"\n", ")\n", "\n", "result = arch.run(TASK)\n", "\n", "print_header(\"Recommendation\")\n", "print_md(result.output)\n", "print()\n", "print(f\"CHOSEN_SCORE: {result.state['chosen_score']}/5\")\n", "print(f\"CHOSEN_ACTION: {result.state['chosen_action']}\")\n", "print(f\"ALL_SCORES: {result.metadata['scores']}\")" ] }, { "cell_type": "markdown", "id": "b2e1d1ac", "metadata": { "papermill": { "duration": 0.002999, "end_time": "2026-05-27T12:21:05.153600+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.150601+00:00", "status": "completed" }, "tags": [] }, "source": [ "### 8.0 · What just happened, briefly\n", "\n", "Three things to look at:\n", "\n", "- **Score distribution.** Healthy: 2-5 range with clear winner. Pathology: all 4/5 (flat scores → arbitrary argmax).\n", "- **The runner-up.** Mental Loop's value is the *tradeoff explanation* — what was given up by picking the winner over the runner-up. § 9 will highlight this.\n", "- **Risks listed.** Each simulation should list ≥1 risk. If all risks are vague or missing, the simulator is being too optimistic." ] }, { "cell_type": "markdown", "id": "a36706ec", "metadata": { "papermill": { "duration": 0.002994, "end_time": "2026-05-27T12:21:05.159587+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.156593+00:00", "status": "completed" }, "tags": [] }, "source": [ "### 8.1 · Per-candidate simulations — LLM-score vs Python-score side-by-side" ] }, { "cell_type": "code", "execution_count": 5, "id": "ab09a26d", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T12:21:05.167685Z", "iopub.status.busy": "2026-05-27T12:21:05.166644Z", "iopub.status.idle": "2026-05-27T12:21:05.204618Z", "shell.execute_reply": "2026-05-27T12:21:05.203667Z" }, "papermill": { "duration": 0.041973, "end_time": "2026-05-27T12:21:05.204618+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.162645+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
 [1] Action  ·  FINAL score 5/5  ·  (LLM said 4/5, source=deterministic, predicted_metric=35.0)\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m]\u001b[0m\u001b[1m Action · FINAL score \u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m · \u001b[0m\u001b[1m(\u001b[0m\u001b[1mLLM said \u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33msource\u001b[0m\u001b[1m=\u001b[0m\u001b[1;35mdeterministic\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33mpredicted_metric\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m35\u001b[0m\u001b[1;36m.0\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Take the B subway line from 7th Avenue in Park Slope to 42nd Street Bryant Park in Midtown Manhattan\n",
       "
\n" ], "text/plain": [ "Take the B subway line from 7th Avenue in Park Slope to 42nd Street Bryant Park in Midtown Manhattan\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
     Predicted outcome\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m Predicted outcome\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
The commute will take approximately 35 minutes, with a 5-minute walk to the subway station, a 20-minute ride on the\n",
       "B train, and a 10-minute walk to the final destination.\n",
       "
\n" ], "text/plain": [ "The commute will take approximately \u001b[1;36m35\u001b[0m minutes, with a \u001b[1;36m5\u001b[0m-minute walk to the subway station, a \u001b[1;36m20\u001b[0m-minute ride on the\n", "B train, and a \u001b[1;36m10\u001b[0m-minute walk to the final destination.\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
     Risks\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m Risks\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Delays due to subway maintenance or construction; Crowded trains during rush hour; Inclement weather affecting \n",
       "walking time\n",
       "
\n" ], "text/plain": [ "Delays due to subway maintenance or construction; Crowded trains during rush hour; Inclement weather affecting \n", "walking time\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
 [2] Action  ·  FINAL score 4/5  ·  (LLM said 4/5, source=deterministic, predicted_metric=50.0)\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1m]\u001b[0m\u001b[1m Action · FINAL score \u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m · \u001b[0m\u001b[1m(\u001b[0m\u001b[1mLLM said \u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33msource\u001b[0m\u001b[1m=\u001b[0m\u001b[1;35mdeterministic\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33mpredicted_metric\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m50\u001b[0m\u001b[1;36m.0\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Drive from Park Slope to Midtown Manhattan via the Brooklyn Bridge and park at a garage near the destination\n",
       "
\n" ], "text/plain": [ "Drive from Park Slope to Midtown Manhattan via the Brooklyn Bridge and park at a garage near the destination\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
     Predicted outcome\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m Predicted outcome\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
The drive from Park Slope to Midtown Manhattan via the Brooklyn Bridge will take around 30-40 minutes, depending on\n",
       "traffic conditions, and then an additional 10-15 minutes to park at a garage near the destination.\n",
       "
\n" ], "text/plain": [ "The drive from Park Slope to Midtown Manhattan via the Brooklyn Bridge will take around \u001b[1;36m30\u001b[0m-\u001b[1;36m40\u001b[0m minutes, depending on\n", "traffic conditions, and then an additional \u001b[1;36m10\u001b[0m-\u001b[1;36m15\u001b[0m minutes to park at a garage near the destination.\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
     Risks\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m Risks\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
traffic congestion on the Brooklyn Bridge; difficulty finding parking near the destination; higher cost due to \n",
       "parking fees\n",
       "
\n" ], "text/plain": [ "traffic congestion on the Brooklyn Bridge; difficulty finding parking near the destination; higher cost due to \n", "parking fees\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
 [3] Action  ·  FINAL score 4/5  ·  (LLM said 4/5, source=deterministic, predicted_metric=50.0)\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1m]\u001b[0m\u001b[1m Action · FINAL score \u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m · \u001b[0m\u001b[1m(\u001b[0m\u001b[1mLLM said \u001b[0m\u001b[1;36m4\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33msource\u001b[0m\u001b[1m=\u001b[0m\u001b[1;35mdeterministic\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33mpredicted_metric\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m50\u001b[0m\u001b[1;36m.0\u001b[0m\u001b[1m)\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Ride a bike from Park Slope to Midtown Manhattan via the Brooklyn Bridge and park at a bike rack near the \n",
       "destination\n",
       "
\n" ], "text/plain": [ "Ride a bike from Park Slope to Midtown Manhattan via the Brooklyn Bridge and park at a bike rack near the \n", "destination\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
     Predicted outcome\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m Predicted outcome\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
The bike ride from Park Slope to Midtown Manhattan via the Brooklyn Bridge will take approximately 45 minutes, \n",
       "considering the distance and potential traffic on the bridge. After arriving, it will take around 5 minutes to find\n",
       "a bike rack near the destination and park the bike. The total door-to-doo\n",
       "
\n" ], "text/plain": [ "The bike ride from Park Slope to Midtown Manhattan via the Brooklyn Bridge will take approximately \u001b[1;36m45\u001b[0m minutes, \n", "considering the distance and potential traffic on the bridge. After arriving, it will take around \u001b[1;36m5\u001b[0m minutes to find\n", "a bike rack near the destination and park the bike. The total door-to-doo\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
     Risks\n",
       "
\n" ], "text/plain": [ "\u001b[1;35m›\u001b[0m \u001b[1m Risks\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "
Inclement weather; Bike breakdown or accident; Difficulty finding a bike rack near the destination\n",
       "
\n" ], "text/plain": [ "Inclement weather; Bike breakdown or accident; Difficulty finding a bike rack near the destination\n" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "LLM_SCORES_RAW: [4, 4, 4]\n", "PYTHON_SCORES_FINAL: [5, 4, 4]\n", "LLM_SPREAD: 0 · PYTHON_SPREAD: 1\n" ] } ], "source": [ "for i, t in enumerate(result.trace, 1):\n", " final_score = t['overall_score']\n", " llm_score = t.get('llm_score', '?')\n", " source = t.get('score_source', '?')\n", " metric = t.get('predicted_metric', None)\n", " print_step(\n", " f\"[{i}] Action · FINAL score {final_score}/5 · (LLM said {llm_score}/5, \"\n", " f\"source={source}, predicted_metric={metric})\",\n", " t['action'][:200]\n", " )\n", " print_step(\" Predicted outcome\", t['predicted_outcome'][:300])\n", " if t.get('risks'):\n", " print_step(\" Risks\", \"; \".join(t['risks'][:3]))\n", " print()\n", "\n", "# Also print a one-line summary of the comparison\n", "llm_scores = [t.get('llm_score', t['overall_score']) for t in result.trace]\n", "final_scores = [t['overall_score'] for t in result.trace]\n", "print(f\"LLM_SCORES_RAW: {llm_scores}\")\n", "print(f\"PYTHON_SCORES_FINAL: {final_scores}\")\n", "print(f\"LLM_SPREAD: {max(llm_scores) - min(llm_scores)} · PYTHON_SPREAD: {max(final_scores) - min(final_scores)}\")" ] }, { "cell_type": "markdown", "id": "444825ed", "metadata": { "papermill": { "duration": 0.00499, "end_time": "2026-05-27T12:21:05.215278+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.210288+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 9 · What we just observed\n", "\n", "The cells above ran Mental Loop on the **NYC commute decision** with `scoring_fn=commute_score_from_minutes` — a deterministic Python function that converts predicted minutes → 1-5 score.\n", "\n", "### 9.1 · Quantitative summary\n", "\n", "| Metric | Value |\n", "|---|---|\n", "| Candidates generated | **3** |\n", "| Chosen action | Take the B subway line from 7th Avenue in Park Slope to 42nd Street Bryant Park in Midtown Manhattan |\n", "| Chosen FINAL score | **5**/5 |\n", "| LLM-raw score spread | 0 (often flat — the pathology) |\n", "| Python FINAL score spread | **1** (the fix) |\n", "| LLM-raw scores | [4, 4, 4] |\n", "| Python FINAL scores | [5, 4, 4] |\n", "\n", "### 9.2 · Per-candidate breakdown\n", "\n", "| # | predicted_metric (min) | LLM score | **Python score (final)** | Action |\n", "|---|---|---|---|---|\n", "| 1 | 35.0 | 4/5 | **5/5** | Take the B subway line from 7th Avenue in Park Slope to 42nd… |\n", "| 2 | 50.0 | 4/5 | **4/5** | Drive from Park Slope to Midtown Manhattan via the Brooklyn … |\n", "| 3 | 50.0 | 4/5 | **4/5** | Ride a bike from Park Slope to Midtown Manhattan via the Bro… |\n", "\n", "### 9.3 · Patterns surfaced in this run\n", "\n", "- **The deterministic-scoring fix is doing its job.** The LLM's own `overall_score` field on these 3 candidates was `[4, 4, 4]` (spread = 0, a narrow band — the familiar LLM-as-Scorer flatness pathology). The **Python scoring function** computed `[5, 4, 4]` from the LLM's `predicted_metric` field (spread = 1) — a real discriminating signal that the argmax can act on. This is the central lesson of Mental Loop: **let the LLM predict the underlying number, let Python compute the score.**\n", "\n", "- **Score spread comparison**: LLM=0, Python=1. Python won by 1 points of dynamic range — exactly the improvement we built `scoring_fn` for.\n", "\n", "### 9.4 · Final recommendation (verbatim)\n", "\n", "> _(no explanation captured)_\n", "\n", "### 9.5 · The takeaway\n", "\n", "The deterministic-scoring pattern is the canonical fix for LLM-as-Scorer flatness:\n", "\n", "1. **LLM predicts the underlying NUMBER** (`predicted_metric: float`) — concrete, harder to fudge.\n", "2. **Python computes the SCORE** from that number via a deterministic function — perfectly calibrated.\n", "3. **The argmax now has signal** even when the LLM compresses its own `overall_score`.\n", "\n", "For tasks with a measurable outcome (time, cost, error rate, throughput, accuracy), this pattern eliminates the entire class of \"everything is 4/5\" bugs. Use it whenever you can express the scoring criterion as Python." ] }, { "cell_type": "markdown", "id": "35830d10", "metadata": { "papermill": { "duration": 0.008031, "end_time": "2026-05-27T12:21:05.229269+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.221238+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 10 · Try other providers" ] }, { "cell_type": "code", "execution_count": 6, "id": "a964e67c", "metadata": { "execution": { "iopub.execute_input": "2026-05-27T12:21:05.237406Z", "iopub.status.busy": "2026-05-27T12:21:05.237406Z", "iopub.status.idle": "2026-05-27T12:21:05.264559Z", "shell.execute_reply": "2026-05-27T12:21:05.263612Z" }, "papermill": { "duration": 0.028193, "end_time": "2026-05-27T12:21:05.265599+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.237406+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[skip] openai: no API key\n", "[skip] anthropic: no API key\n" ] } ], "source": [ "from agentic_architectures.llm.factory import provider_supports_structured_output\n", "\n", "for p in [\"openai\", \"anthropic\"]:\n", " key = settings.api_key_for(p)\n", " if key is None or not key.get_secret_value():\n", " print(f\"[skip] {p}: no API key\")\n", " continue\n", " if not provider_supports_structured_output(p):\n", " print(f\"[skip] {p}: no structured output\")\n", " continue\n", " print_header(f\"Re-running Mental Loop on {p}\")\n", " r = MentalLoop(llm=get_llm(provider=p), n_candidates=3).run(\n", " \"Choose a programming language to learn in 30 days for a backend job switch.\"\n", " )\n", " print(r.output[:300])\n", " print(f\" scores: {r.metadata['scores']}\")\n", " print()" ] }, { "cell_type": "markdown", "id": "bb94b915", "metadata": { "papermill": { "duration": 0.004625, "end_time": "2026-05-27T12:21:05.275549+00:00", "exception": false, "start_time": "2026-05-27T12:21:05.270924+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 11 · Failure modes, safety, extensions\n", "\n", "### 11.1 · Where this breaks\n", "\n", "| Failure | Mechanism | Mitigation |\n", "|---|---|---|\n", "| **Flat scores** | All candidates scored similarly → argmax is arbitrary | Tighter rubric (\"most should be 2-4\"); use different model for simulator |\n", "| **Optimistic predictions** | Simulator imagines best case for every candidate | Force `risks` field to be non-empty; use a pessimistic-bias prompt |\n", "| **Confabulated specifics** | Predictions include invented numbers (\"you'll save $1,247\") | Disallow specific numbers in predicted_outcome unless asked |\n", "| **Narrow candidate diversity** | K=3 candidates are minor variants of each other | Force one \"unconventional\" candidate in the generator prompt |\n", "| **Single-model blind spots** | Same model generates and simulates → shared biases | Use a different model in the simulator seat |\n", "\n", "### 11.2 · Production safety\n", "\n", "- **Mark predictions as predictions.** Never let the simulated outcome (\"you will lose 10 lbs\") get presented to a user as a *guarantee*.\n", "- **Log all K simulations**, not just the chosen one. Auditability — if the choice turns out wrong, you want to see what was rejected.\n", "- **The simulator can be a real model.** For physics / finance / medicine, plug in a *real* simulator (a discrete-event sim, a Monte Carlo engine, an ODE solver) and have the LLM only orchestrate.\n", "\n", "### 11.3 · Four extensions\n", "\n", "1. **Parallel simulation.** `_simulate` runs K sub-LLM calls sequentially; replace with `langgraph.graph.parallel` for an N× latency win on slow models.\n", "2. **Two-stage Mental Loop.** After picking the best candidate, run a *second* Mental Loop on its first sub-step. Bridges to **Planning + Mental Loop** hybrid.\n", "3. **External simulator.** Replace the LLM simulator with a calibrated real-world simulator. The LLM only handles candidate generation + final synthesis.\n", "4. **Deterministic scoring (THE most important fix).** When the outcome has a measurable metric (time, cost, error rate), DON'T let the LLM assign the score. Instead: extract just the underlying number via structured output (`predicted_minutes: int`), then compute the score in Python via a deterministic function. This sidesteps the LLM-as-Scorer flatness problem entirely. The captured run in § 9 is the clearest demonstration of why this matters.\n", "\n", "### 11.4 · What to read next\n", "\n", "- [**09 · Tree of Thoughts**](./09_tree_of_thoughts.ipynb) — tree-deep version of Mental Loop.\n", "- [**14 · Dry-Run**](./14_dry_run.ipynb) — simulate one proposed action before live execution.\n", "- [**13 · Ensemble**](./13_ensemble.ipynb) — parallel diverse-perspective version.\n", "- [**06 · PEV**](./06_pev.ipynb) — verify each step's *actual* outcome (vs Mental Loop's *predicted* outcome).\n", "\n", "### 11.5 · References\n", "\n", "1. Ha, D. & Schmidhuber, J. *World Models.* 2018. [arXiv:1803.10122](https://arxiv.org/abs/1803.10122)\n", "2. Hafner, D. et al. *Dream to Control* (Dreamer). 2019. [arXiv:1912.01603](https://arxiv.org/abs/1912.01603)\n", "3. Yao, S. et al. *Tree of Thoughts.* NeurIPS 2023. [arXiv:2305.10601](https://arxiv.org/abs/2305.10601) — closely related: tree-deep simulation.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" }, "papermill": { "default_parameters": {}, "duration": 28.487221, "end_time": "2026-05-27T12:21:06.037758+00:00", "environment_variables": {}, "exception": null, "input_path": "all-agentic-architectures/notebooks/10_mental_loop.ipynb", "output_path": "all-agentic-architectures/notebooks/10_mental_loop.ipynb", "parameters": {}, "start_time": "2026-05-27T12:20:37.550537+00:00", "version": "2.7.0" } }, "nbformat": 4, "nbformat_minor": 5 }