{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1c6b8e85",
   "metadata": {
    "papermill": {
     "duration": 0.009414,
     "end_time": "2026-05-27T07:36:27.010960+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:27.001546+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# 06 · PEV — Plan-Execute-**Verify** with per-step retry\n",
    "\n",
    "> **TL;DR.** Planning (nb 04) trusts every step's result. PEV adds a **Verifier** between Execute and Accept: every step is judged against its own intent, and a failed step triggers a *retry with critique* on the same step. After `max_retries_per_step` failures the step is force-accepted with a `fail-accepted` verdict so the plan still finishes.\n",
    ">\n",
    "> **Reach for it when** tools are flaky, stakes are high, or you want per-step success rate as a quality signal.\n",
    "> **Avoid when** every step is trivially verifiable (use plain Planning, cheaper) or no step has a checkable outcome (use ReAct).\n",
    "\n",
    "| Property | Value |\n",
    "|---|---|\n",
    "| Origin | Plan-and-Solve + verifier composition; canonical example: Hu et al., *Tree-Planner* (2023) ([arXiv:2305.10142](https://arxiv.org/abs/2305.10142)) |\n",
    "| Loop style | per-step verification + bounded retry |\n",
    "| External tools needed? | Optional |\n",
    "| Memory across episodes? | No |\n",
    "| Quality signal exposed? | **Yes** — `steps_passed` / `steps_total` ratio, `total_attempts`, `confidence` per step |\n",
    "| Composability | Executor uses `ToolUse`; Verifier uses `LLMJudge` from the library |\n",
    "\n",
    "This notebook builds directly on Planning (notebook 04). The graph adds **one node** — the Verifier — between Execute and the next step."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4f679f38",
   "metadata": {
    "papermill": {
     "duration": 0.007895,
     "end_time": "2026-05-27T07:36:27.026603+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:27.018708+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 2 · Architecture at a glance\n",
    "\n",
    "```mermaid\n",
    "flowchart LR\n",
    "    A([task]) --> P[Plan]\n",
    "    P --> E[Execute<br/><sub>ToolUse sub-agent</sub>]\n",
    "    E --> V[Verify<br/><sub>LLMJudge w/ rubric</sub>]\n",
    "    V -->|pass + more steps| E\n",
    "    V -->|fail + retries left| E\n",
    "    V -->|pass + done<br/>or budget exhausted| F[Finalize]\n",
    "    F --> Z([final answer])\n",
    "\n",
    "    style V fill:#fff3e0,stroke:#f57c00\n",
    "    style P fill:#e3f2fd,stroke:#1976d2\n",
    "    style F fill:#e8f5e9,stroke:#388e3c\n",
    "```\n",
    "\n",
    "**Three nodes + finalize.** The Verifier is the new addition — it gates each step. The same `Execute` node handles both first-tries and retries; the difference is that a retry sees the Verifier's previous critique in its prompt."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b799b397",
   "metadata": {
    "papermill": {
     "duration": 0.002539,
     "end_time": "2026-05-27T07:36:27.044220+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:27.041681+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 3 · Theory\n",
    "\n",
    "### 3.1 · Why Planning isn't enough\n",
    "\n",
    "A plain Plan-Execute (notebook 04) commits every step's result to history regardless of quality. If step 2's result is *\"I couldn't find the data\"*, plain Planning accepts that and the final synthesis is forced to invent answers from nothing. PEV's Verifier catches this: it judges step 2's result against step 2's intent, rejects it, and lets the executor try again with the failure mode in mind.\n",
    "\n",
    "### 3.2 · The Verifier rubric\n",
    "\n",
    "```python\n",
    "class _StepVerification(BaseModel):\n",
    "    is_satisfactory: bool      # full pass / fail\n",
    "    issues: str | None          # critique text if failed\n",
    "    confidence: int             # 1-5 confidence in this verdict\n",
    "```\n",
    "\n",
    "Two crucial design choices:\n",
    "\n",
    "1. **`is_satisfactory` is binary.** Not a 1-10 score. PEV needs a clear go/no-go signal at every step, not a vague rating. The rubric forces the Verifier to commit.\n",
    "2. **`issues` is mandatory on fail.** The retry can't use a verdict without a critique. The schema enforces this by `default=None` + a description that says \"required iff is_satisfactory is False\".\n",
    "\n",
    "The `LLMJudge` wrapper (from `agentic_architectures.evaluators`) standardises the prompt format — same Verifier code is reused in Reflection (nb 01), CoVe (nb 20), Corrective RAG (nb 24), etc.\n",
    "\n",
    "### 3.3 · The retry-with-critique loop\n",
    "\n",
    "When the Verifier rejects a step's result, the Execute node sees:\n",
    "\n",
    "```\n",
    "You are RETRYING this step (attempt N):\n",
    "  → <step text>\n",
    "\n",
    "Your previous attempt was rejected by the Verifier. Critique:\n",
    "  > <issues text from the Verifier>\n",
    "\n",
    "Re-execute the step addressing the critique. Be concrete and grounded.\n",
    "```\n",
    "\n",
    "This is structurally similar to **Reflection** (notebook 01), except the critique here is *per-step* rather than over a whole draft, and the retry is bounded by `max_retries_per_step`. If you squint, PEV is Reflection wrapped around each Planning step.\n",
    "\n",
    "### 3.4 · The fail-accepted verdict\n",
    "\n",
    "Sometimes a step genuinely can't be completed (the data doesn't exist on the public web; the tool is down; the question is wrong). PEV doesn't loop forever — after `max_retries_per_step + 1` attempts, the step is **force-accepted** with verdict `fail-accepted`, the last failure result is committed, and the plan moves on. The final synthesis prompt is told which steps were `fail-accepted` so it can hedge appropriately.\n",
    "\n",
    "This is the right behaviour for production: an honest \"the search returned nothing useful, here's the rest of the answer with a known gap\" beats either (a) an infinite loop or (b) a hallucinated number.\n",
    "\n",
    "### 3.5 · Where PEV sits\n",
    "\n",
    "| Pattern | Verification step? | Retry on failure? | Replan? | Use when |\n",
    "|---|---|---|---|---|\n",
    "| Planning (nb 04) | no | no | yes (whole-plan replan only) | reliable tools, low stakes |\n",
    "| **PEV** *(this notebook)* | **yes per step** | **yes per step** | optional (extension) | flaky tools, high stakes, need per-step quality signal |\n",
    "| Reflection (nb 01) | yes (over whole output) | yes (refine = retry) | no | single-shot generation tasks |\n",
    "| Multi-Agent (nb 05) | no | no | no | broad domain task |\n",
    "| Self-Discover (nb 19) | no | no | reasoning structure | hard reasoning, no tools |\n",
    "\n",
    "### 3.6 · What goes wrong (you'll see live in § 9)\n",
    "\n",
    "1. **Sycophantic verifier.** Verifier rubber-stamps `is_satisfactory=True` even on weak results. Mitigation: use a *different* model in the verifier seat.\n",
    "2. **Verifier infinite loop.** Verifier keeps rejecting plausible results because they don't match an impossible bar. `max_retries_per_step` cap is what stops this.\n",
    "3. **Critique not addressed.** Executor retries but produces the *same* result. Real failure mode. Sometimes means the step is impossible — accept the `fail-accepted` and move on.\n",
    "4. **Final synthesis ignores fail-accepted.** Synthesis writes a confident answer including the failed step. The library's `_finalize` prompt explicitly asks for honesty about failures, but it sometimes still happens — check § 9 for the per-step verdicts."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c01d5c99",
   "metadata": {
    "papermill": {
     "duration": 0.005686,
     "end_time": "2026-05-27T07:36:27.058870+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:27.053184+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 4 · Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "1a34b5db",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T07:36:27.075156Z",
     "iopub.status.busy": "2026-05-27T07:36:27.075156Z",
     "iopub.status.idle": "2026-05-27T07:36:28.317667Z",
     "shell.execute_reply": "2026-05-27T07:36:28.316360Z"
    },
    "papermill": {
     "duration": 1.249058,
     "end_time": "2026-05-27T07:36:28.318915+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:27.069857+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Provider: nebius  ·  Model: meta-llama/Llama-</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3.3</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">-70B-Instruct</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">─────────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mProvider: nebius  ·  Model: meta-llama/Llama-\u001b[0m\u001b[1;36m3.3\u001b[0m\u001b[1;36m-70B-Instruct\u001b[0m \u001b[92m─────────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from agentic_architectures import get_llm, enable_langsmith, settings\n",
    "from agentic_architectures.architectures import PEV\n",
    "from agentic_architectures.ui import print_md, print_header, print_step\n",
    "\n",
    "enable_langsmith()\n",
    "print_header(f\"Provider: {settings.llm_provider}  ·  Model: {settings.llm_model}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9ca6d05",
   "metadata": {
    "papermill": {
     "duration": 0.0,
     "end_time": "2026-05-27T07:36:28.318915+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:28.318915+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 5 · Library walkthrough\n",
    "\n",
    "Source: [`src/agentic_architectures/architectures/pev.py`](../src/agentic_architectures/architectures/pev.py).\n",
    "\n",
    "Five key pieces:\n",
    "\n",
    "1. **`_plan`** — same as Planning: `with_structured_output(Plan)` produces 3-6 atomic steps. The prompt explicitly demands \"verifiable steps\" — each step must produce a concrete fact / value / artifact.\n",
    "2. **`_execute`** — pops the next step from the plan (or reuses pending_step on retry). On retry, includes the previous critique in the executor prompt.\n",
    "3. **`_verify`** — uses `LLMJudge[_StepVerification]` with a rubric: *\"contains the specific fact/value the step asks for AND is grounded (URL or computation shown)\"*. The verdict drives the router.\n",
    "4. **`_finalize`** — synthesises the final answer using the verified history; explicitly told to hedge on `fail-accepted` steps.\n",
    "5. **`_route_after_verify`** — pass + more steps → execute, fail + retries left → execute (retry), pass + done OR budget gone → finalize.\n",
    "\n",
    "The Verifier rubric & schema:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "9fa408e6",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T07:36:28.335687Z",
     "iopub.status.busy": "2026-05-27T07:36:28.335687Z",
     "iopub.status.idle": "2026-05-27T07:36:28.348279Z",
     "shell.execute_reply": "2026-05-27T07:36:28.347547Z"
    },
    "papermill": {
     "duration": 0.018475,
     "end_time": "2026-05-27T07:36:28.348279+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:28.329804+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{\n",
      "  \"description\": \"Verifier's verdict on a single executed plan step.\",\n",
      "  \"properties\": {\n",
      "    \"is_satisfactory\": {\n",
      "      \"description\": \"True iff the step's result fully addresses the step's intent \\u2014 concretely, contains the requested fact / computation / artifact and is grounded in evidence (cited URL or computation shown).\",\n",
      "      \"title\": \"Is Satisfactory\",\n",
      "      \"type\": \"boolean\"\n",
      "    },\n",
      "    \"issues\": {\n",
      "      \"anyOf\": [\n",
      "        {\n",
      "          \"type\": \"string\"\n",
      "        },\n",
      "        {\n",
      "          \"type\": \"null\"\n",
      "        }\n",
      "      ],\n",
      "      \"default\": null,\n",
      "      \"description\": \"If is_satisfactory i...\n"
     ]
    }
   ],
   "source": [
    "from agentic_architectures.architectures.pev import _StepVerification\n",
    "import json\n",
    "print(json.dumps(_StepVerification.model_json_schema(), indent=2)[:600] + '...')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "94336b43",
   "metadata": {
    "papermill": {
     "duration": 0.003937,
     "end_time": "2026-05-27T07:36:28.360874+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:28.356937+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 6 · State\n",
    "\n",
    "The state has a per-step *scratchpad* (the `pending_*` fields) that the Verifier writes to and `_execute` reads from on retry. When a step is accepted (pass or fail-accepted), the scratchpad is cleared and the result is appended to `past_steps`.\n",
    "\n",
    "| Field | Purpose | Reducer |\n",
    "|---|---|---|\n",
    "| `input` | original task | replace |\n",
    "| `plan` | remaining steps (popped on first try of each step) | replace |\n",
    "| `past_steps` | committed step records `{step, result, verdict, attempts, confidence}` | **append** |\n",
    "| `pending_step` / `pending_result` / `pending_critique` / `attempts` | per-step scratchpad | replace |\n",
    "| `response` | final synthesised answer | replace |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e98a4b37",
   "metadata": {
    "papermill": {
     "duration": 0.003267,
     "end_time": "2026-05-27T07:36:28.368153+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:28.364886+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 7 · Build the graph\n",
    "\n",
    "PEV adds one node (`verify`) and one new conditional path (verify → execute on retry) compared to Planning. The compiled-PNG render should show 4 nodes (plan, execute, verify, finalize) with the cycle `execute → verify → execute`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "79364dc2",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T07:36:28.384540Z",
     "iopub.status.busy": "2026-05-27T07:36:28.384540Z",
     "iopub.status.idle": "2026-05-27T07:36:31.393921Z",
     "shell.execute_reply": "2026-05-27T07:36:31.393258Z"
    },
    "papermill": {
     "duration": 3.020783,
     "end_time": "2026-05-27T07:36:31.396157+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:28.375374+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAGoAAAITCAIAAABg8R7gAAAQAElEQVR4nOydCVwU5f/Hn5k9uO9DEVAEvPAAFQ01JQXSvDUrzaxMy/v4e5RXFmqXpl1qZqmlpf7UNK3UMs2LMvG+D0AQOUTkZhf2mPl/dweWBZfZ2Z2FfWDn/fKFs888c+xnn+f73M9XTNM0EjAXMRLggSAfLwT5eCHIxwtBPl4I8vGCr3ypN8rvXigqfKxQKalyGYVq1IJIGlEEQSKaqhbMhFSFkwjpRyCQ5j7aayuOIUyEaHVlYPX7VFxeearqtpXXVh1okTqJRCTt4ilt3saxfQ8XxAPCvHrfhaOFV/8pKC1SwWtJ7EmJmJA6iNRqilZXv5sIITUyIJ+IgJgESdAUrYumd1rzbZk4iCCQ9g0rPooIpPeIijswl+t+g6oDAjH3ry6fnb24vFytUtEKuZpS0w6OkqD2Tn1f8kamY7J8F48Wnjv2WK1CvgH2kbFezdvZoYZMSR59an/Og2SZWkG17Ojcf1wTky43Tb6ty1NlJVT7KPfeIzxR4+Lm2dJ/fsuBxPjmimBEcL3KBPnWz0v2DbQfNcsfNV6O7869fqaw12DviL5uXOJzlW/tnKR+LzQN6+GMbID185LGLmzp5iUyGpOTfOvmJr35QajUHtkO3yxI6RrrGRnrzh6NRMbY8HZKv5f8bEo7YNLHwYl/5hU8NJK2jMj3w7I0n0C7dt2dkO3Rvb/Xzs/uscdhkw8qd/JS9fMzGnNZwULXGDd7R3L3Fw9Y4rDJd/bI4/ZPcSqAGisvzm6Rc7+MJUKt8l04VkSr6N4jvZAN4+hKOLmJf1mXWVuEWuW7mlDg28IB1S9xcXEZGRnIRJKTkwcPHozqhk69PLLS5LWdrVW+kkIltMlQPZKVlZWfn49M58aNG6jO6BLjBg329NuGFTTc43L3YilBEC3qpj0LNc0dO3b89ttvaWlpLVu2jIqKmjJlysWLFydPngxnhw0bFh0dvXr1akhTe/bsSUxMzMzMDA4OHj58+KhRo5g7xMTETJw48dixY3DVuHHjtm3bBoGRkZH/93//N3bsWGRp7J1EV04XBbYxkBcNy5d6vVRqx7nhZyI7d+7cvHnz7Nmze/Xqdfz48XXr1jk5OY0fP/7zzz+HwP379/v7a8p6UBCEW7x4MfyQqampn3zyiZ+fH1wCpyQSyb59+7p37w4idu3aFSL8+eef8HugusHZTZyfU27wlGH5Ch8r7RyN16jN48KFC2FhYYy1GjFiRLdu3WQy2ZPRPvroo9LS0mbNmiFtyjpw4MA///zDyAd6ubm5zZs3D9ULrt6SjCRTMq+iXC2R1pV84eHhX3311bJlyzp37tynT5+AgACD0SCPQzpNSEiAPM6EMKmSAX4AVF84uIhUSrXBU4blo2iapOsq87788suQW0+cOBEfHy8Wi6G0nTlzpo+PT7UXoKhZs2YpFIrp06dD0nNxcZkwYYJ+BKlUiuoLUpveDZ4yLJ9UIiqXG9bbAm9DkiO0pKSknD17duPGjSUlJZ999pl+nFu3bl2/fn39+vVg4JiQ4uJiX19fZA1kxRRZi3yGcyiMAyjKKVQ3gI2HUhUOoDwdPXr0mDFjbt++XSNOQUEB/NXplaIFWYmix0qxg2GhDIcGtHYoK62r1Hf48OH58+efPHmysLDw9OnTUP8AawjhQUFB8PfIkSPXrl0DZSFfQ42kqKgIit1Vq1ZB/QYqhgZv2Lx589zcXCjEdVbSshTmKTw8DNsKw/J17OkC3YD52UpUByxZsgTUmTNnDlTfli9fDrU8qJ1AOJQhQ4YM2bBhAxQsTZs2XbFixdWrV/v16we1uWnTpkGlD2TVVf30efrppyMiIqAg/uOPP1AdUC5Tt+1ueECu1u7STUvv+fjbDZ3UDNk2N88WH9v1cNqnoQbP1lo7CY1weXC31rae7XDuSJ6Hb62tr1qHyaNHel9LKLj4d2HnWgZNsrOzwfAbPOXs7AyFqcFTkG2hyYHqhu+1GDwFNY/a8hnUjQzaBIaCXMWkD0NrO8s21nF056Pky8VvfRRs8KxKpcrJyTF4qqyszN7ecO8+FAh1V/8o1mLwFBRBrq6uBk9BOPzeBk9t//i+Wo3GLW6OasHIUNHGxSkt2jqZOnjcOHhwu+zAtxlTPw1hiWOkZfbWB8FJl0vKi2xxAu9vm7N6DfFhj2O8YRs3psmWD6xWZbUWW95PDWztGB7tyh6N0zhvXrZy+8q06WtCkW2wYUFKn+E+YVHGJ19xnWVw77rs902Z4X08eg9vzKMf92/Kf/8+s2WY04DXmnKJb9oUoY2L70mlZNwrTf1DGvbEKoNsX5lemKvoPcynQy9XjpeYPEHt4ObstJul9o6i0Ajn3iPMmROHG5dPFF9NyIeGrbef/UtzA0y61szpkSDigySZSkmLxYSzu8TOQTM9UjsBtOpuIjGhVlV81MyQRJWTFaH2JyVUispjCQH3qTwmVUqqMg6pUmiOSVIzx5TS3lkXQSQh1NqrJBJSqQ2RSAilkta/ISmCqxApRpRKc0PoAFZqbyiWiJTlVGmhSl6ihn45UkR4+dmNmuKPTO9CNFM+Bnke+vdI7uPM8qI8hUqlmR9L6U/9ZKbTMseEtvu4cuKcWAy17opTIjFS134M/abw9UhEVExDrYwgEtNqFaF/K/ghVdpfSyRBam1fh4hEaqpCRKSRj1YqNJeA0KSYsHcQeTSVdOrp4d/afEPES756oH///tu3b/fywrS8wn1mPTQNoZ2HcEWQjxeCfLzAXT6lUgmD4ghXsJYPil2kHZlDuIK1fJjnXCTIxxOsXw5zw4eE1McTQT5eCPLxQpCPF7jLJxQd5iOkPl4I8vFCkI8XUG0W5DMfIfXxQpCPF4J8vBDk44XQ48ILIfXxQiQSubjw2mOqrsF9qKiwsBBhDN5ZQyxW6SZzYIkgHy8E+XghyMcLQT5e4F5xEeQzHyH18UKQjxeCfLwQ5OOFIB8vBPl4IcjHC0E+Xgjy8QJ/+XBcVRQfH3/gwAHmxeAvoYUkycTERIQZOE5anzJlSlBQEKkFmr3wF+SrbaM164KjfL6+vnFxcfohIN+wYcMQfmC6ZGLs2LEtWrTQffT39x8+fDjCD0zlgwG2oUOH6hbEPPvss+7u7gg/8F2wM2bMGMbeNWvWbOTIkQhLLFzypt8qv3O+SCartvWaWEJSamaBVQWkiKDUNKQtqrpTIZ1bHsaLTkZGRlJykp+fX+tWrRGqWtgMZQmlXR5NiklKswy7mj8fJrDq5pXY24ubhDh26mlJnxmWlG/L+2nlckosJZRl1V6cFGk389QLY9TRfWeagPdgTtR0zkTRlCYLM4GVLo30fEQ94d5JE41CavJJB0l2DqRCQYtEaPgUf58Ay+zdaTH5Ni681yzEKfoF6+yPyZ3r/xZfOvZo1MwAb0soaBn5vluSGtzRrdsAD9QQUJShXZ+mTFkVjHhjgaLj3OECMEQNRTtAao9cPKQ/f5WFeGMB+VJvlzq6GHfrgxXeAXYFj8oQbyzQZSAvUWkcyjUoCDFtka2BLSAfpXEVWFe7FNcRNFSkVBYw+oKLT15YQj7SFjdHZLCEfFR1D5q2hI1mXm0PLOKPJeQjUAMrd7Wd2BZpbVlAPoKgG1i5azksIB8t2D4B87BE5tU0/Bqa9QOLg0vR0RBzLox/YlJ0gO2jbdX2YTfWMXxk7NZt36EGglB08EKQjxcW6TKgTG13DB4a/fKY8bdv3zh56piTk1PHjp0XLVzu4lxz8eTeff87c+bUzZvXpHZ24Z26TJgwzb+ZZugyftkCaHPFxjz38cr35XJZWFjHyW/NateuA+fnW6zRZgnbR5GmNjtEIvHuPT8NHjzy2F+JKz9ee/9+6ldrV9WIc/XqJQhs3z582bJPF7wTn5+f98GHS5hTYrH4+o0rR/46uOHrbYd+P20ntfvok/eQKViq0Wa1oiM0pHW3yChIA5B2hg0ddfz4EaWy2ugwhG/ZtGvsy+M7R0RCzBdfeAWSYWFRxfpUuUw2f97SZn7+IGVMvwHp6WkGHV3WNVazfaGhbXTH/s0CQbvMzActWrTUBYpEIghZt371zVvXSktLmcCC/Dw3V43vpMDmQY6OjkygszbXFxcX6ULqDUukPpI2o9FhZ1flzMjeQeP6trS0mneohIQTi9+d06ZN2OdrvtXk8U/WVnsmHvvBWqLRZlaPlb5YZXKNRzN7+2r+g387uK9jx4iJE6YxH0tKipHl0BQcFkk5iD80RZne6rh8+bzu+G7SbTBh/v6B+hGKigp9vKvmLJw6dQxZDk3BYYleNgvIR9MkYfqrPMrNgcJXrVZDsfvb73v79n3Wzq6a5wIoWxLPnbl46ZxKpYKYTGD2QwuMbVsQqxUdgweNuH79yvqvNX5lu3TuNmP6/BoR3nhjqkxWuuTdOXK5fOSI0VB3ycrKWLBw5uJFKxA2WGCOyw/xadBl8PzsIO6XDBsR8/zIMa+Om4isRMKB7ORLJdNWhyJ+2G5/H7JEs8MSHVYNscMP8hyNySwDTblh2qvs33cUNQosM1DZ8DKvhbDIQCVqcNlX29+CR+alqYZn+7T1DTyKDlvGQrbPRk2fReRrkDUXy2CpkreBQSDL9NZbKPU1NGhkmd56oejghSAfLywgn9SRbHD5VyKV2DlYYC2KBbpLXd0l5RZYYVKvFOUqpQ4W+O4WuMWAMU1lJQrUoMjNlId24upBmwULyCdyRgHBTjtXpqIGwr616VJ7sucQCyzCs9iC1IvHCxP/yGvSwqF5a2d19WEY/dY5UVnLJqqfojW/ZM2JbprPRM2bsLf1icqbPxkHKnq598sykks9m9kNn+yHLIEll0NfPlF88URemYxSlqurPYOo2TVZpUXlKabTSxeN0ItX4/1q6EhUxmFuVTNQT02plJDYi1q0c44ZbTGP9Lg71x4wYMBPP/0kONc2E8G9MS8E+XiBubcnIfXxAmv5oFijKEokwnelv+AthheCfLwQXD3xQkh9vBDk44UgHy8E28cLIfXxQpCPF4J8vBDk44UgHy8E+XghyMcLQT5eCNVmXgipjxeCfLzA3VuMj48Pwhis5VOr1Tk5OQhjBF9FvBDk44UgHy8E+XghyMcLQT5e4C4f1F0QxgipjxeCfLzAXb4a26rhhpD6eCHIxwtBPl4I8vFCkI8Xgny8wHFV0YwZM06fPq3ba4AkSYqi4OP58+cRZuDoYHbWrFkBAQFkJUirYPPmzRF+4ChfaGjo008/rZ8tIOlFR0cj/MDUvfErr7wSGFi1lSkcjxo1CuEHpvL5+/vHxMQwx2D4IiMjGU/RuIGvc+3Ro0cz3t3h70svvYSwxISKy70rcrlMSTGrl/WWMVeYKKacrLl2WbseXLeEnES1Lm5mPum7dCbs4npMPF52vGObDvIcn2s5RYYfUYMnd0arsbS88iNJIMrQraRSSeuuDogbnCou+9ZlZafJQTIVqFexlFu3Tp5ikjBJEJXnDL9wte9eeTmh9TFI0OjJ9847AAAAEABJREFUCyuP6SfWluvLU1MtQhtS4+nIwG1rXdUvsSMpNXL1lL6y0Li5MC7fvvVZhTmKvqP8PAMt488bf6CH+8j3WcV55W+sCGKPaUS+7SsfwL2GT8XRbNc1Z37Nu3+7aMLyIJY4bEVHXhYqfFRum9oBUUM8IW2d/DmPJQ6bfOf+fGTnaNObXLl529+/U8ISgU2+kuIGtjeQxSEldLmcbaiPLXGpFBT8QzaMWqlWsY612HTe5I8gHy8E+XghyMcLNvlIMUHiuwdIvWBsd102+Wg1hfkGYXWOsa/PKh9NaB1nC9SKYPvYIQjWBCjIxw7kQLb8x1p0iBAevvisBkEa2ZqdTT7oNaRsus0GXd9Gyk42+aDrnLDt1GcUNnk0bmAaWuqLX7bg4KH9qL5obKnr9u0bqB5hk0+TeU2s9qlUqm82fjl+wouDhvR5Z+HMM2dOM+FHjhyMieuelHSH+Xjj5rW+MZEntV47a7sEaZcE7vzf1ucGPQ3/5s6bcvXqJSYcPkK4LtrKVcsmTX4FDuCeWdmZqz5dPmTYM8ypw3/8OnX66xAf/u75ebuprQCjRQdr5jW9xfHlVyvhLUcMf2n7T79G94l5L/7tEyc1Ltni4gZ27dJ99ZoV2tvScBAbM6BP734slwAbv/1q//7dy+I/XbLoAx+fJu8snHH/firL0w8fTIC/8+e9++v+43Dw19HDn6yMb92q7fYfD0ycMA2esnb9amQKRgVgTX0m/lgKheKPP397eczrQ4c87+bqNvC5YTH9Bmzd9i1zdu6cJfdSk8Ew/bJ/d17e41kzF0BgeXl5bZcUFhXu2v3j6NGvdYuM6tUret7cJZFdox7n5XJ/n4MHf+nUqfPsWQs8PDy7dO42/rXJv/yyKz8/z4SvZKzkZU19zKgpZ5JT7oKC3SJ76EIiwrumpCQxDsWbNGn6xvgpkKA2b17/ztvvOzs7Q+CdOzdruyT1XjJ8bNu2PRMuFouXxa/qHBHJ8WUoirp2/bL+nTt37gaBV65eRJzReuSpr1YH48B5xqwJNcLz8x4zDsVHjhj9/Q/fiEXiTh07G72EOWWv54PbJOBXUSqVmzavh3/V7mxK6tN65DG31QFFh0mtDi9PzV7wc+csruHn2de3KXMA9t7Pzx++1cZvv4Q8pbnE26e2SwoK8uFAJis1+lw1ZWA0x97e3tHR8dm4QX36xOiHN/MzYdxVk/oIc1MfVPpManWANIyHZ10Wg58afj/GYX1qasoPWzd++cUmlVI5c/ZE+GJhYR0D/JvXdkloaBvIsJevXGjXrgPSJoSFi2f3jY7r33+wVGonl8t0z01PTzP4PiEhrYtLinV3hp8tKyvD17cJ4o4x95usqYvUer3njIODw+uvTQLDDzUMyDtQgM57e+rnX3yMtJZoxYeLY2Oea9e2fceOETH9+n/48VKosoBMtV0CxjEudiCUvIcOH7h46dxXa1edP/8fIyXoDjFLSjQjsNt+3JSbW7HfAfwSPj6+5ypdcr85YXpCwnEorODpcP9lyxfOmTcZnsL9GxlttLGVrrvWpBc8Uo1Z0BKZQuK5M3v37bxw4ayTk3P7sE7z5r0Lhg++5K5d23766YCri8ZHC2TMseOGjXr+5fGvT67tEqQtl0HKI38dhApgaEhrKHl69OgN4RmZD1avXgEaQfJ86cVxcBau/WbDj3Bq/4E9W77foFIpd2z/zcXZBRLmT9u3/HvmVFmZHO781lsz27YJ4/5dDm1Jz3+omvRRrQpYXr7GhFH5WDusJMjWxzqMtbtYO6yUiMJ6MXI9wGOoyJZdn1bAp7/Pll2fMhh1fG2su9S2U5826Znb6tB0l9p26uM1TC7YPkLjx9fsLgObt33aVoe5mVfAKOzjvCRp6/LysH2UmqKwXoxcD/CYIiRgFEE+XrDJJ7YjpPY2Pc1Aaiexk7L1GLOp4+opxXv/sjqnXKaWOrFJxHYu9nlfhcymy46ix4pW4Ww+uFnzphQ1CXLcvToN2SQHN2dJ7EXd+ruxxDE+FH5yX+6tcyXturl17OOBsbdIS3LnfPG1hAKpHTHmbSPDcpxmEvy9Ozf5crGynFarKEM+vw32LNCG6py08VY0+3rn2m5QLVz/Q9Wx3pJsZoW7IR/pmsYCIZYQvoGOI6Y2RcYwbSKGWoGeLEsgRdYIZLoaqOohzFr8GsWYLkS34nzAgAH7fvkFRmn1L2GOZaWlY8aMmTR58qCBA6tOacXQjwaDgxSBalzOQFd/N92b638FqSmLvk2r94mkyGD2tVSevnv3rm8TD1dXwzMLEhMvKZSlmzZ93aVLB0vtiiN64sAk8KrWXb58OTw8vLaz//33X2FhYWZm5qJFixAe4CXf1atXO3bsWNvZs2fPMptZ3b59e9myZQgD8JLvypUrtaU+yNfFxcXMllZgr//+++99+/Yha4ORfHl5eaWlpbXtFgQJU9/5BEi5ZcuWlJQUZFUwko8l6QHHjx+nqk9YevDgwfz585FVwajHBeRjMXxMQgMFpVKpq6urRCL5/fffkbXBS77p06fXdhZqgsz2hw8fPszPz2/bti3CAIwyL3utZc+ePcxBUVFRfHw8wgNcUh+UDB06dCA4DMu3atUqODgY4QEu8rEnvRp88MEHCA9wybzsFeYagNbp6ekIA3CRj73WUoMbN27s2rULYQAW8kEzFioi3t7eHOP37NnTz88PYQAWtg+SXqdOnbjHb6EFYQAWqc8kw8ewd+9eaOEha4OFfCYZPoaEhAQcNhG3fuZVKBTJycmmtiJGjx5NYDB30/rymZH0gG7duiEMsH7mhUqcSeUGA3Q7b9++HVkb68tnRrkBuLm5ff3113K5HFkVLFKfGZkXWLJkCbOszYpY2fbdu3cPassuLi7IdPr374+sjZVTHwz69OvXD5nFtWvX/vrrL2RVrCxfaGjoiRMnkFn88ccfjx49QlbFypkX5IMhi7KyMuhMRibSo0cP6PtDVsX6RQf0kkI2RKYDHQc+Pj7IqlhfPqi1QN0FmQgMVK5atQpZm4aa+q5fv56WZv2Jhw1VvmbNms2ePRtZG+u3eb28vOzs7KDHFBThfhUmfsew6LAyIwGC4QPFkbVpqPLt3r27aVPjsz/rmgYpH/QUbNq0icRgY1UsfFTCO3Tv3j0xMRE1NLBIfdBvHBYWBnURjvFhoOPQoUMIA3AZ5zUp/548edK8ThqLg4t8JrU9pk+f/tRTTyEMaJCpDzoaYFgdYQAu8gUEBEDXcUFBgdGY0EX43nvvITzAaH6ffgIcPHhwbdFgZI7ZEBAHMHKu3bt3b6jQ0Vr8/f1//fVXg9Hy8/OlUqmTkxPCAOu3eQcOHJidna3d5lK7V6PGxQPNMoXFw8MDYYP1M++HH34IzS/9KQNwHBlZ6w65o0aNsvr4pA7ryxcRETFu3DhmH2IGT09PsIMGI0M3gUKhcHDg6j+3rsGi6BgzZkxsbKxYXGFJQB1ohBiMCb3zO3bsQNiAS8kLY95Qc2bKDajW1Va2QnUPk0KDAaOKy5o1a4KDg0UiEcv0H6jx/f333wgb+FZcjvyUk3qjVKmg1Kra7sNpAbj+G6End93isAqdLeYTgSIRIZKQvn52I2aa0MX9JLzk+3Pbo/Q7pa27uIVFeTBVoIpF4XSlk/DKmHoL4Sui6QJ1IcyB/sJuXSBZfTc3/XAK1fRojp7YOolZX64PPCXlSsn1hAKaVL+6xPx5vubL9781GWUl1MhZgaghc/SnnMdZMnYP2iyYafse3lPkZ5c3dO2AmLG+YCqO7XyMzMJM+f77I8/RpZHsf+Xt5/DgrpmzzM2UT1aiJCWNZGNOezeyvEyJzMLMFFQuVzcaF260glKWI/MQNqDjtb2tIB+zPS4yDzPlI8jGs6kun+2pzZRP4z2wschnhcxLkpq2FWoc8PgeZspHUXSDc19ZF5hr+zTmtpHU+/jsLW+u7aNR4/H6zmNveaHioq211HPqa0zQ9Z/6tCVvY4GHFTezy0Bj+BqNfjysuNnyWaHoeO/9t+fOm8Icn044/uZbL/eNibx+/QriBx+fQg3J9vXpE6NUVrhI3LHzB/gF16ze0KIF3w2ZrNBoswox/apWoMpkpeGdunB318sGierb9kHRYdLE7BmzJrz9TrXN5RYunj11+uuI1Tf5sBExP/+8Y9b/vQmZtKi4iMm8EB8+pqam7D+wBw4+WRk/cHBvCNRdBZfE9Y9C3KGQFWyfSd2lfaPjzl84q9t5pays7Ny5M7H9BiBW3+QwKP7bwX2hoW1WrVzn6FAxcC4Wi/8+ei4oKHjY0FFwMOGNqXK5/NTpqsHfE6eO9n66L6oXeJS8phAdHUtR1KnTx5iPYPjh4zPPxLH4JkfaGoWrq9uMafMiuz6lm8JRA29vn26RUceO/cF8fPw49+rVS3GxAxFnNPUW0sz0Z6Z8hEjb5ccZLy/viPCuujSSkHC8a5funp5eLL7JmY9tWhv3Bztw4PAz/51mLjl+4i83N/fu3XsiE6AJ2kzrZ26bV41M7XGBtLZ23aeQbUUi0b9nTs2c8Tbi4M5cymEP9Kd7PePk5HzixF+QhE+eOvps3CCRKb4JtJUwZB7mtzqQiYB8YOb++fckKKLJudFxiNU3Oecba6zhcwOGHvnrIJjOK1cuzprxDjIRs4sOc1OfxgOcSVcgSE2QYc+e/ae8vKxXz2hmDhWLb3JkCoMGjdj5v627dv/YulXb4OBQZCL13mgzq7MeCpArVy6cP/8fpEQmhMU3uUkE+AeC0fx5747+zw5GplPfqc88IMOu+exDSG6Q+nSBo196NSSk9fad3+t8k8+duwSZTs+efa5dvxwTMwCZjtmpz8wpQj8sT4V636jZQQgboB7u4uK6aMEyZCKn9z28d61k6qchyHQafH9fSUnJ3aRbFy8mXr92efOm+t7Q1OyxDgIT37NpaSlz5k728fGNj1/l7V3f+5KYPdaBS39f+/adoOmGrITQWc+L+qs2YwufEVfzh8kbzwQ1Ht3mQublhbny4VLwWhlzbR9BUHh5iTIfTc9bPQ8VNSrbRwmTNKyEIB8vzJRPIiFUjcVxNCkhReYusjDT/js4SQnUSJz1KsuQRGrugC0yizbdXGXFCtQoyH0g9wk0c3m6mfKFRTnZOYj+2voQNXDuX1eUy9VDJjZBZsFrQeoPH9yXikXPvuYvxWVvAdM4sTsn/U7xlJXmdJQy8F0Ovf2TBwW55aSIUCmq34eoUZmitT64Cd0pqKzSOufOBDNJsdJFNKkN1F5OMP/rnay2Nlh3YeU9dPE1+5lo48GYlmYOFdJ+Jmlau7IXij6oujo4i15/LwjxwDLb4Fz/p1Qur7YujEDVVi5oOjXoihBmpxb9v0ynB12pINMgZCLv27evf/8Bjo4OzD2oipXm0OFDVd2LeQCtPa9RSKM9oRWyQkqS+aZUxfppGD6WiFuFuzq4IZ5gtIuQQWJjY/fs2ePu7o6wBHf5kpKSgoODcXaVNJoAABAASURBVNhn0yC4y4c5uHebvPbaa2o1vu0b3Nu8N27cMGm+Tz2De+a9e/eu1b1KsCDYPl5gbfvKysomTpyIMAZr21deXp6amoowBuvMC2UuyBcSYn6btK4RbB8vsLZ92dnZc+fORRiDte2Ty+X3799HGIN15oWiAxIgJo6gDSLYPl5gbfugyfH+++8jjMHa9hUVFWVlZSGMwTrzymSy/Px8f39/hCuC7eMF1rbvwoULq1evRhiDte0rKCjIyclBGIN15i0uLi4pKfHz80O4Itg+XmBt+xISErDa4P9JsLZ9UGu5ffs2whjcbV9paSkOvihrQ7B9vMDa9iUmJn7++ecIY7C2fTBUhIMHbRawzrxg+KDmLLR5Gy1Y275bt24tX74cYQzWtk+pVKakpCCMEcY6eCHYPl5gbfsyMjKEcV7zoShKsH3mo1KpIAEKtq/RgrXtKywsnDRpEsIYrG0fSZJ37txBGINj5oUUx5QYUHRA1Q/eEA6gCn3unNV2C6oNHDPv6NGjodCArmbIvNDpAgqCdngOGOEoX9++fVu3bq2fLeAYz/n1mBYd48eP9/Ly0n308fEZM2YMwg9M5YuKigoLC2MSIBi+kJAQFp/HVgTfiosuAXp4eLzwwgsIS/CVLzw8PCIiAsoQaHU888wzCEtMq7goStDer9OL89VKJWSpJy5kVnOjJ4Npg5u8adeXI0NULhjXenrXrHmuuJzWLmeuZdOQJ5+uF6J9FMHyXImUdHQSRcZ5tnvKGXHGhGpzaaF66wdpHk2k7Z5yF0tJNaVi3qvSPTZRuQK8MrTyNStWyz/hHJsgiardnysiMzfRrkVn0D/WHFZ3r1ftKsZlmJ42lc/R3kV/bbuBXZckhPhBcunJXx6pKbpDDxfEDa6p73ai7Nju7FcW8/Utgj87Pkn1D3EYNIHT3hpcbd+Jfdmdn/FGNsCYd4LSbpZwXEPMSb475+Q0hdr3ckW2gYOz5PD3nOYVcrJ96cmlIrENbXdo70QW5JVxiclJPpVCqSi3oW7BMrmK446IwgZ0BiCYTXM4IMhnAE0NkVtm41R0ECKSaEROFY0CX5ajKyFOqY+GqiRtQ0WH1uWskHnNRbshv+WKDq0vaGQ7CEUHLwhmvzwOcJOPtq3hYJqz7eNUwNBVWw7aCFzrGULmNQRBc/RC0Vg2nrco2k5KyxUdpIgkG8kmzdzgXHHhlPooNU2ZuIeeTCb78OOlg4b0efud6SkpSX1jIq9cuYjMZfjI2K3bvoODn/fujInrjrCBo+0zudi9eu3SkSMHp02dExEe6e7u8eq4iSa5nayNsHYdxr1S55uCadxpWLfeJ5NpPBnHxjwH2sHB+NcnI0vQrl0H+IfqGIqmrdlhtf/AHsbP5Ijn47pFRk2eNHvCm6O/+OzbTp06xy9bAIUayPrxyvflcllYWMfJb81iFLl3L/nAr3suXEzMzs4MahE8cODwYUNH1bgzZN71X685euRsQsKJJUtrztvd9sPegIDmMLa5afP6M/+dzsnJ7tAhYsSwF6OinkamIBKRiODkjoRjo820rWvha7u6ui1bvnDfz0cg9YHtq3qeWHzl6kWohG/4epuvT5NFi2d/9Ml7W7//GU6tW78ahJszZzHoe/9+6hdfftKkiV/UU70MPqJDh/A1qzfoPsK1pSUlXl4aN3dffrXy0OEDM6bPj46OTUg4/l7824sWLo/uE4M4o1ZTlkx9FJQcaou1OuQy2fx5SxkXsjH9BkAyhHIGPr777keQ5f2aNkNaf7OHDx84m/hPbfK5ubnrfNJCYs/ISF/75RYHBwd9b91wauBzw65du7x127cmyUdwHoDklvr4OTOrQWDzIJ37XWdnzYhqcXGRJoSm9+7d+d/ZhPT0NOasn5/x1WxJSXfWrvt08aIVISGaKVgGvXVDYiwsKmTcTXOBRjSyZJvXohg0BBRFLVg0S6lUvDlxekREpIuzy5Met5+kqLhoydI5w4a+8Ex0LBNi1Fs3tzfkGBGbRtudu7du3br+6ar1XbtUVOtACB9vX/arVqxYBPZxyuTZuhCLeOvWJjxLtjqIum51FBYWwF+dXqmpKfCvZRDbtpvbd3yfci9p07c79Td2toi3bk3JYcmiQ61peKC6BGoqUCj/b9e2SZNmFeTnfbV2FdR4sh/Wuv/X5csXvv1uLVTLQUFdoH+zQF/fJoy37hbNW7ZpE/bvmVNwDOHL4lehOoB7q8NiRYdBmjRpCub/h60bhw3vB/lu8cLlj/Ny310677Xxo37YsufJ+FC8Ik19ZY1+4PRp854fOZq/t26t/1JOyYVTCf3HtqzkK/JxSxr//CCGPV+kgjLjl7YwGlPo7zOAhUveqvmJNgJd4YzGKJxTnw0NdWi7DCxYbaaFdYO1INg+Xgi2zwAE54oLt4FKW8u9BFfv19wabaRtzbCiKWTJcV7NGg5bmmGlMVXcfF9zKzpsqtLH+Cu06BwXJGAQoeLCC07yiaDsIG0oBYrFJFJbrtHm6CIRiW1oNgwMLEodONk+TqL0HOCpUnArihoFsmJVUBtO6yq5pSkpcvO2+/3bDGQDXDpWCIaq+0BO40omTBvdvjKdUJODp+K7FSZ/zhx4nHKtcNInXDuGTZt1++PHD0ryFBI7EmqVSlW17Kx1YA0VbL0QbctRV99mFtoyPqFrXAi1fN0yXEJE0Gq6wvW2tudS7546h9AVVVEC6a0Y1gJDWpo2A11tSTFJVCw1YJxra16MqHioTgGxPakuoyV2xBvLghBnTJ60XPiQTjyaKytWUiq6hgrwmrSqmkNtiqiaK1KhCGQMqrp88IXVevKJEa1COvlS01ICAlqImbE0klnwUxmTGU+kNeF05RpzUqy5W5V82lO0CJFaTSueRWq/t5ogtDdk7uboJGnRwalNpBMyBdznfD/77LM7d+709PREWIJ7tVmpVEokEoQruMunUqlg/BfhiiAfLwT5eIG1fGq1WiQS4TxQgLV8mCc9JMjHE0E+Xgjy8QL3Lf8F+cxHSH28EOTjhSAfLzDvL0BC6uOJIB8vBPl4IcjHC6Ho4IWQ+nghyMcLQT5eYP1yMIjq5sZ1Ea5VwN3RWElJCcIYvLOGWAz5F2GMIB8vBPl4IcjHC0E+Xgjy8UKQjxeCfLwQ5OOFIB8vBPl4IcjHC0E+Xgjy8UKQjxeCfLzAcVnMtGnTcnJySJKEkbb79+/7+/vDS8LxoUOHEGbguEo3NjY2IyMjOTkZtIOPcJyZmalQKBB+4CjfiBEjAgOrbZ0JvfahoaEIPzBdI/7qq6/qb5fp5OT04osvIvzAVL5BgwYFBQVR2rWo8Ldly5YxMSZsvFxv4LtDwWuvvebu7g4H9vb2gnNtk4Hk1rp1ayhzmzVrNmTIEIQllqm4JB7JT71eWlaqVpRpMpxaVfOehHY7RkLvY+WSZvrJ7Z10a8MpWq1UaCbXi0Riw5vPanxFE5Xr1Kv+Vj6FRnoew0hSs5m6xI50dpM0bW7Xe7A3kiKe8JLvzKH8a6cL5TIVQSKRmJTYS8QSEUESlOoJ5yiExo82YUA/A1sUVW2BxJzVRq5lXySicoE5Mo6IhHegVBT8U6vUaiVl7ygKbOPUf5wvMhcz5bt4tOjM4UeQoJzcHP3be0kcGqQvnowbj4tyShFNtQxzHvA6J2/aNTBHvi3xqbJitU+gh29rrCegcKQopyzzZg5kj0kfmbyzssnyrZ+fbOcoDYlqhhoXGddy8zKLh0/1D2zlwP0q0+RbPy/ZN8jTO6RxOilXq9GtY6mvLAhy8+Vqi0yQD7QL7NjUxdceNWpuHEvt/0rTkHBOO5Jwrfd9szDFI8Ct0WsHtO3R/NDWLI6ROcm3c/UDRIr82nggG4B0IF19nDcuvscpstEY+dmq3MzyNk8HIJuhebiPSkH/vfOR0ZjG5du7Pt3R1Q7ZGE1CPG+eKzIazYh85XJaVqwK7u6HsKSkNH/eu09duvoXsjReLVygMXNq/2P2aEbkO/hdptTeRvc3dXRzuHuhmD2OEflyHpQ7e5jg5acx0TTEA3IeexwjKUupopqE1lWBW1T8+NdDn6emX1Eoytq0ioqNfsPXpwWEZz1MXr325ZmTNh87+cO1myfcXH0jOsYNjJvGuHS6eOXPw0e/kcuLwtr2ju41FtUZ9u6aDplrCcUdernUFoct9SVflkEXj7huugPUavWGzVOTUy88P2TB3OnbnZ08v9z4Ru7jB3BKLNKsY9u9/6POnfp//N7pl0fFn0j46fJ1jYHLepi0fc/SyM4DF8z+OTJi0P7fV6O6RCwV3b/FtjKCTb7s+2WkqK42QLp3/1JObuqYUfFtW/dwdfEaMmCmk6P7qX936iKEt+8X3iFGLJaEtOzi5eH/IOMWBP7z38/ubk3jnpng6OgaGtz1qcjhqC4RiUWlxWx73rJlXnmJqu72j0pNuywSSVoFV3hTgweBTCmpVS6QA5q10x3b27vIyzRWPDcvvWmTqn6RQP8wVKeQSFHG5tiZ3fbRFI/OVHbkZSXQYwnVDv1AZ6cqO6vdGrMmMlmRt1fVGKZUakLviBkQyMje9WzyOblICUKG6gYXZy/48m+MrWa8jPqxhTyrVJbpPpaXl6K6hKJoiZTN9LPJ5xfscPF4Hqob/P1aKxRyd/cm3p4VzcHHeRn6qc8gHu5+N26dgvEURugbt0+juoRSUi4ebL0kbL92UJg95F1FkYle3bnRKqRb21Y9dv/yQX5BdklpQcJ/e77Y8PrZC7+yXxXePhZaGr/8vhr62ZJSzv/z3x5Ul6hVVMsObD1XRup9MC71MDU/sJM3qgPeeGXNv4l7f9y1JC39qo93iy7hA3r3eIn9kjatnhrcf8a/Z/fOXxoFRfDYF+LXfTepjhxiyPMUYPfaRLLtvm6ku/TAN5mZ98rbRjdHtse9cw9hzPD199i+uxFTPXRSM7WyTjIv/sgK5GFRRsbCjHcHuHhKk89mhnQ3PDYEVnzpR3EGT6lUCqjZGaw5NvUJnv7Wt8hybNo25979ywZPKZXlEomBDjepxH7p27+jWsi+Uwij6t37G5GPw1iHGq17J6l9TMvazuflZxoMLysrsbc3bDhIUuzuZv7g9JMUFeWq1IYnAJbKipwcDY5tEZ4etXbE3Tya1jXWvfsAI7uVcxoqOvBNVlZaeZvegcg2SDufrVYq34gPMhqT01jH0El+IpJ+cDUH2QDFuWWywnIu2iHuI20TV7QszpVl3qyrWjQ+pF3KevW9uvHXsXHRPQc3p8BOXqgxUvxQnno5e+qnoSLOXXQmT9L4ZiGM4BFt+jQ2O3gvMau0sOytj0KlpsxaM2eK0I5VDx5nlbn6ODWPsGTpaS0eJhXmpRfYOZAc7Z0+Zk5Qe3Cn/NAPGTAOZ+8i9W7h7u7X8MZD5IXK7LuPy4rLoWOs09MePQebMybBa3pk0mX5v78/KsxVEKSmdiyWiEmJSDMpVM+NEa2d3KjzLMT8oTXzPCtjkMzMU701nhEIAAAAp0lEQVRpoRCF1F5V/dVoxu9r9ReuFqg/a5U5U+2e8JmEXgCkpjU+S2nawVnUPsr9qefckblYZnLugztlN88WFuSqlOWUUkmpFXr31M2qrXC7VNNXkUgMHRtVzokYoDuKplHNVyO1stfsPGek1z6B1Mxh1R4xjooqP6KKs2IpPI5w97YL6eRsqlsig+DuqwhzbHQI3FII8vFCkI8Xgny8EOTjhSAfL/4fAAD///YFhOgAAAAGSURBVAMAXtNZHRoAElIAAAAASUVORK5CYII=",
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "---\n",
      "config:\n",
      "  flowchart:\n",
      "    curve: linear\n",
      "---\n",
      "graph TD;\n",
      "\t__start__([<p>__start__</p>]):::first\n",
      "\tplan(plan)\n",
      "\texecute(execute)\n",
      "\tverify(verify)\n",
      "\tfinalize(finalize)\n",
      "\t__end__([<p>__end__</p>]):::last\n",
      "\t__start__ --> plan;\n",
      "\texecute --> verify;\n",
      "\tplan --> execute;\n",
      "\tverify -.-> execute;\n",
      "\tverify -.-> finalize;\n",
      "\tfinalize --> __end__;\n",
      "\tclassDef default fill:#f2f0ff,line-height:1.2\n",
      "\tclassDef first fill-opacity:0\n",
      "\tclassDef last fill:#bfb6fc\n",
      "\n"
     ]
    }
   ],
   "source": [
    "from IPython.display import Image, display\n",
    "\n",
    "arch = PEV(max_retries_per_step=2, executor_rounds=4)\n",
    "graph = arch.build()\n",
    "display(Image(graph.get_graph().draw_mermaid_png()))\n",
    "print(arch.diagram())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f64827f0",
   "metadata": {
    "papermill": {
     "duration": 0.004398,
     "end_time": "2026-05-27T07:36:31.405195+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:31.400797+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 8 · Live run\n",
    "\n",
    "Concrete task: a multi-step computation where each step has a *checkable* outcome. The Verifier should accept the lookup steps (they return concrete numbers with citations) and may flag the computation step if the executor doesn't actually show the arithmetic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "09148f02",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T07:36:31.415262Z",
     "iopub.status.busy": "2026-05-27T07:36:31.415262Z",
     "iopub.status.idle": "2026-05-27T07:37:29.379984Z",
     "shell.execute_reply": "2026-05-27T07:37:29.379984Z"
    },
    "papermill": {
     "duration": 57.972112,
     "end_time": "2026-05-27T07:37:29.383619+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:36:31.411507+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">Final answer</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">──────────────────────────────────────────────────────────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36mFinal answer\u001b[0m \u001b[92m──────────────────────────────────────────────────────────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">The population density of Singapore is approximately 7,574 people per square kilometer (5,450,000 / 720.2 km²).    \n",
       "Note that the calculated density provided earlier (8,437 people per square kilometer) was not actually calculated  \n",
       "in the log, so the correct calculation is provided here. The population and land area values are based on data from\n",
       "https://www.singstat.gov.sg and https://www.singstat.gov.sg/, with additional population information available at  \n",
       "https://www.worldometers.info/world-population/singapore-population/.                                              \n",
       "</pre>\n"
      ],
      "text/plain": [
       "The population density of Singapore is approximately 7,574 people per square kilometer (5,450,000 / 720.2 km²).    \n",
       "Note that the calculated density provided earlier (8,437 people per square kilometer) was not actually calculated  \n",
       "in the log, so the correct calculation is provided here. The population and land area values are based on data from\n",
       "https://www.singstat.gov.sg and https://www.singstat.gov.sg/, with additional population information available at  \n",
       "https://www.worldometers.info/world-population/singapore-population/.                                              \n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">steps: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">  ·  pass: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">  ·  fail-accepted: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">  ·  total attempts: </span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span> <span style=\"color: #00ff00; text-decoration-color: #00ff00\">───────────────────────────────────────────────────</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36msteps: \u001b[0m\u001b[1;36m3\u001b[0m\u001b[1;36m  ·  pass: \u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;36m  ·  fail-accepted: \u001b[0m\u001b[1;36m1\u001b[0m\u001b[1;36m  ·  total attempts: \u001b[0m\u001b[1;36m5\u001b[0m \u001b[92m───────────────────────────────────────────────────\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "TASK = (\n",
    "    \"Compute the population density (people per square kilometer) of Singapore. \"\n",
    "    \"Required steps: (1) look up Singapore's population (latest available year), \"\n",
    "    \"(2) look up Singapore's land area in km², (3) divide population by area to \"\n",
    "    \"get density. Cite a source URL for the population and the land area.\"\n",
    ")\n",
    "\n",
    "result = arch.run(TASK)\n",
    "\n",
    "print_header(\"Final answer\")\n",
    "print_md(result.output)\n",
    "print()\n",
    "print_header(\n",
    "    f\"steps: {result.metadata['steps_total']}  ·  \"\n",
    "    f\"pass: {result.metadata['steps_passed']}  ·  \"\n",
    "    f\"fail-accepted: {result.metadata['steps_fail_accepted']}  ·  \"\n",
    "    f\"total attempts: {result.metadata['total_attempts']}\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "16ed7755",
   "metadata": {
    "papermill": {
     "duration": 0.0,
     "end_time": "2026-05-27T07:37:29.395836+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.395836+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### 8.0 · What just happened, briefly\n",
    "\n",
    "Three counts to inspect above:\n",
    "\n",
    "- **`steps_passed` / `steps_total`** — fraction of steps that satisfied the Verifier on at least one attempt. 100% means the Verifier never rejected anything (could be sycophancy or genuinely clean execution).\n",
    "- **`steps_fail_accepted`** — steps the Verifier kept rejecting until retries ran out. This is the *honest* signal that the agent couldn't fully complete the task.\n",
    "- **`total_attempts` − `steps_total`** = total retry-rounds across all steps. If this is large, the Verifier is doing real work."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4c5849a6",
   "metadata": {
    "papermill": {
     "duration": 0.019912,
     "end_time": "2026-05-27T07:37:29.415748+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.395836+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### 8.1 · Per-step verification trace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "939650bd",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T07:37:29.433191Z",
     "iopub.status.busy": "2026-05-27T07:37:29.430417Z",
     "iopub.status.idle": "2026-05-27T07:37:29.472099Z",
     "shell.execute_reply": "2026-05-27T07:37:29.472099Z"
    },
    "papermill": {
     "duration": 0.056351,
     "end_time": "2026-05-27T07:37:29.472099+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.415748+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">›</span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"font-weight: bold\">] ✓ pass  (</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">attempts</span><span style=\"font-weight: bold\">=</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"font-weight: bold\">, </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">confidence</span><span style=\"font-weight: bold\">=</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">/</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m]\u001b[0m\u001b[1m ✓ pass  \u001b[0m\u001b[1m(\u001b[0m\u001b[1;33mattempts\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33mconfidence\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">step: Look up Singapore's population <span style=\"font-weight: bold\">(</span>latest available year<span style=\"font-weight: bold\">)</span> from a reliable source such as the World Bank or \n",
       "Singapore Department of Statistics, and record the value.\n",
       "</pre>\n"
      ],
      "text/plain": [
       "step: Look up Singapore's population \u001b[1m(\u001b[0mlatest available year\u001b[1m)\u001b[0m from a reliable source such as the World Bank or \n",
       "Singapore Department of Statistics, and record the value.\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">›</span> <span style=\"font-weight: bold\">    result</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35m›\u001b[0m \u001b[1m    result\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5.45</span> million  <span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.singstat.gov.sg</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36m5.45\u001b[0m million  \u001b[4;94mhttps://www.singstat.gov.sg\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">›</span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">2</span><span style=\"font-weight: bold\">] ✓ pass  (</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">attempts</span><span style=\"font-weight: bold\">=</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">1</span><span style=\"font-weight: bold\">, </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">confidence</span><span style=\"font-weight: bold\">=</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">/</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1m]\u001b[0m\u001b[1m ✓ pass  \u001b[0m\u001b[1m(\u001b[0m\u001b[1;33mattempts\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m1\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33mconfidence\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">step: Look up Singapore's land area in km² from a reliable source such as the World Bank or Singapore Department of\n",
       "Statistics, and record the value.\n",
       "</pre>\n"
      ],
      "text/plain": [
       "step: Look up Singapore's land area in km² from a reliable source such as the World Bank or Singapore Department of\n",
       "Statistics, and record the value.\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">›</span> <span style=\"font-weight: bold\">    result</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35m›\u001b[0m \u001b[1m    result\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">720.2</span> km² <span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.singstat.gov.sg/</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;36m720.2\u001b[0m km² \u001b[4;94mhttps://www.singstat.gov.sg/\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">›</span> <span style=\"font-weight: bold\">[</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"font-weight: bold\">] ✗ fail-accepted  (</span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">attempts</span><span style=\"font-weight: bold\">=</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">3</span><span style=\"font-weight: bold\">, </span><span style=\"color: #808000; text-decoration-color: #808000; font-weight: bold\">confidence</span><span style=\"font-weight: bold\">=</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">/</span><span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">5</span><span style=\"font-weight: bold\">)</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35m›\u001b[0m \u001b[1m[\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1m]\u001b[0m\u001b[1m ✗ fail-accepted  \u001b[0m\u001b[1m(\u001b[0m\u001b[1;33mattempts\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m3\u001b[0m\u001b[1m, \u001b[0m\u001b[1;33mconfidence\u001b[0m\u001b[1m=\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m/\u001b[0m\u001b[1;36m5\u001b[0m\u001b[1m)\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">step: Divide the population by the land area to get the population density in people per square kilometer.\n",
       "</pre>\n"
      ],
      "text/plain": [
       "step: Divide the population by the land area to get the population density in people per square kilometer.\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"><span style=\"color: #800080; text-decoration-color: #800080; font-weight: bold\">›</span> <span style=\"font-weight: bold\">    result</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "\u001b[1;35m›\u001b[0m \u001b[1m    result\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">The population density of Singapore is <span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">8</span>,<span style=\"color: #008080; text-decoration-color: #008080; font-weight: bold\">437</span> people per square kilometer.   \n",
       "<span style=\"color: #0000ff; text-decoration-color: #0000ff; text-decoration: underline\">https://www.worldometers.info/world-population/singapore-population/</span>\n",
       "</pre>\n"
      ],
      "text/plain": [
       "The population density of Singapore is \u001b[1;36m8\u001b[0m,\u001b[1;36m437\u001b[0m people per square kilometer.   \n",
       "\u001b[4;94mhttps://www.worldometers.info/world-population/singapore-population/\u001b[0m\n"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "for i, t in enumerate(result.trace, 1):\n",
    "    badge = '✓' if t['verdict'] == 'pass' else '✗'\n",
    "    print_step(\n",
    "        f\"[{i}] {badge} {t['verdict']}  (attempts={t['attempts']}, confidence={t.get('confidence', '?')}/5)\",\n",
    "        f\"step: {t['step']}\"\n",
    "    )\n",
    "    snippet = (t['result'] or '')[:300].replace('\\n', ' ')\n",
    "    print_step(\"    result\", snippet + ('...' if t['result'] and len(t['result']) > 300 else ''))\n",
    "    if t.get('last_critique'):\n",
    "        print_step(\"    final critique (rejected)\", t['last_critique'][:300])\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e84b453c",
   "metadata": {
    "papermill": {
     "duration": 0.00815,
     "end_time": "2026-05-27T07:37:29.488435+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.480285+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 9 · What we just observed\n",
    "\n",
    "The cells above are live. Below: a quantitative + qualitative breakdown of the **actual** Plan-Execute-Verify trace the Nebius-hosted Llama-3.3-70B produced on this run.\n",
    "\n",
    "### 9.1 · Quantitative summary\n",
    "\n",
    "| Metric | Value |\n",
    "|---|---|\n",
    "| Steps executed | **3** |\n",
    "| Steps passed | **2** / 3 |\n",
    "| Steps `fail-accepted` | **1** |\n",
    "| Total attempts (incl. retries) | **5** |\n",
    "| Retry rounds | 2 |\n",
    "| Pass rate | 67% |\n",
    "\n",
    "### 9.2 · Per-step verdicts\n",
    "\n",
    "| # | Verdict | Attempts | Confidence | Step |\n",
    "|---|---|---|---|---|\n",
    "| 1 | pass | 1 | 5/5/5 | Look up Singapore's population (latest available year) from a reliable source such as the World Bank |\n",
    "| 2 | pass | 1 | 5/5/5 | Look up Singapore's land area in km² from a reliable source such as the World Bank or Singapore Depa |\n",
    "| 3 | fail-accepted | 3 | 5/5/5 | Divide the population by the land area to get the population density in people per square kilometer. |\n",
    "\n",
    "### 9.3 · Patterns surfaced in this run\n",
    "\n",
    "- **Partial success: 2/3 steps passed, 1 fail-accepted.** This is the honest PEV signal — the Verifier rejected some step(s) until retries ran out. Inspect the `last_critique` field on fail-accepted steps to see what the Verifier kept flagging.\n",
    "\n",
    "- **Retries were partially effective**: 0 of 1 retried step(s) recovered to `pass`; the rest stayed failed. When retry doesn't help, the step is likely genuinely impossible — force-accept and synthesize honestly.\n",
    "\n",
    "### 9.4 · The final answer (verbatim)\n",
    "\n",
    "> The population density of Singapore is approximately 7,574 people per square kilometer (5,450,000 / 720.2 km²).    \n",
    "> Note that the calculated density provided earlier (8,437 people per square kilometer) was not actually calculated  \n",
    "> in the log, so the correct calculation is provided here. The population and land area values are based on data from\n",
    "> https://www.singstat.gov.sg and https://www.singstat.gov.sg/, with additional population information available at  \n",
    "> https://www.worldometers.info/world-population/singapore-population/.\n",
    "\n",
    "### 9.5 · The takeaway\n",
    "\n",
    "The pass-rate metric is what makes PEV worth its extra cost over plain Planning: you have an **honest quality signal per task**. A run with 100% pass-rate and 0 retries either means the task was easy or the Verifier was lazy — check the per-step confidence. A run with `fail-accepted` steps is *useful information*: the agent reached the end of its plan but knows the answer is incomplete, and the final synthesis (if the prompt is doing its job) hedges accordingly."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2654ccd6",
   "metadata": {
    "papermill": {
     "duration": 0.0,
     "end_time": "2026-05-27T07:37:29.488435+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.488435+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 10 · Try other providers / verifier-side reasoning model\n",
    "\n",
    "PEV needs **structured output** (Plan + Verifier schemas). The Verifier's quality is the single biggest quality lever — try a reasoning model in the Verifier seat (the rest stays on Llama 3.3)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "34056e31",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-05-27T07:37:29.504175Z",
     "iopub.status.busy": "2026-05-27T07:37:29.504175Z",
     "iopub.status.idle": "2026-05-27T07:37:29.530764Z",
     "shell.execute_reply": "2026-05-27T07:37:29.530764Z"
    },
    "papermill": {
     "duration": 0.026589,
     "end_time": "2026-05-27T07:37:29.530764+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.504175+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[skip] openai: no API key\n",
      "[skip] anthropic: no API key\n"
     ]
    }
   ],
   "source": [
    "from agentic_architectures.llm.factory import provider_supports_structured_output\n",
    "\n",
    "for p in [\"openai\", \"anthropic\"]:\n",
    "    key = settings.api_key_for(p)\n",
    "    if key is None or not key.get_secret_value():\n",
    "        print(f\"[skip] {p}: no API key\")\n",
    "        continue\n",
    "    if not provider_supports_structured_output(p):\n",
    "        print(f\"[skip] {p}: no structured output\")\n",
    "        continue\n",
    "    print_header(f\"Re-running PEV on {p}\")\n",
    "    r = PEV(llm=get_llm(provider=p), max_retries_per_step=1, executor_rounds=3).run(\n",
    "        \"What was the GDP of France in 2023? Provide a source.\"\n",
    "    )\n",
    "    print(r.output[:300])\n",
    "    print(f\"  steps: {r.metadata['steps_total']}, pass: {r.metadata['steps_passed']}, fail-accepted: {r.metadata['steps_fail_accepted']}\")\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2635137",
   "metadata": {
    "papermill": {
     "duration": 0.010212,
     "end_time": "2026-05-27T07:37:29.549066+00:00",
     "exception": false,
     "start_time": "2026-05-27T07:37:29.538854+00:00",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## 11 · Failure modes, safety, extensions\n",
    "\n",
    "### 11.1 · Where this breaks\n",
    "\n",
    "| Failure | Mechanism | Mitigation |\n",
    "|---|---|---|\n",
    "| **Sycophantic verifier** | Verifier rubber-stamps weak results | Different model in Verifier seat; or score with `confidence < 4` threshold |\n",
    "| **Verifier infinite-loop bait** | Verifier rejects on impossible bar; `max_retries_per_step` cap fires repeatedly | The cap is per-step; thrash shows up as many `fail-accepted` |\n",
    "| **Retry produces same result** | Executor doesn't actually use the critique | Tighten retry prompt: \"MUST address each bullet of the critique\"; or use a stronger executor |\n",
    "| **Synthesis hides failed steps** | `_finalize` writes confident answer despite `fail-accepted` | Inspect the per-step trace (§ 8.1); add a hard \"if any fail-accepted, prefix answer with 'PARTIAL:'\" rule |\n",
    "| **Cost explosion** | Each step × (1 attempt + retries) × executor rounds = O(N·R·M) calls | Set `max_retries_per_step=1` for low-stakes tasks |\n",
    "\n",
    "### 11.2 · Production safety\n",
    "\n",
    "- **Always inspect the trace** — `result.metadata['steps_fail_accepted']` is your *honest signal* that the agent's answer is partial. Surface this to the user.\n",
    "- **Different model for Verifier** — same-model Verifier suffers from blind spots that match the Executor's. Even a smaller, faster, *different* model catches more.\n",
    "- **Per-step time budget** — add `timeout` per tool call so a hung tool doesn't block forever.\n",
    "\n",
    "### 11.3 · Three extensions\n",
    "\n",
    "1. **External verifier.** Replace the LLM Verifier with a *deterministic check* (regex for a number, JSON schema validator, code execution) when the step has a strict format. Bridge to **Dry-Run (nb 14)**.\n",
    "2. **Whole-plan replanning.** Combine PEV's per-step retry with Planning's whole-plan replan: after all steps execute, if too many fail-accepted, regenerate the plan with the failures as context.\n",
    "3. **Process Reward Model** — score each step on a continuous scale; bridge to **Tree of Thoughts (nb 09)** and **LATS (nb 22)**.\n",
    "\n",
    "### 11.4 · What to read next\n",
    "\n",
    "- [**14 · Dry-Run**](./14_dry_run.ipynb) — simulate the *side effects* of each step before live execution.\n",
    "- [**17 · Reflexive Metacognitive**](./17_reflexive_metacognitive.ipynb) — agent decides for itself when to escalate to a human.\n",
    "- [**24 · Corrective RAG**](./24_corrective_rag.ipynb) — PEV-style verification specialised for RAG retrieval.\n",
    "\n",
    "### 11.5 · References\n",
    "\n",
    "1. Hu, M. et al. *Tree-Planner.* 2023. [arXiv:2305.10142](https://arxiv.org/abs/2305.10142)\n",
    "2. Wang, L. et al. *Plan-and-Solve Prompting.* ACL 2023. [arXiv:2305.04091](https://arxiv.org/abs/2305.04091)\n",
    "3. LangGraph plan-execute-verify pattern — [tutorial](https://langchain-ai.github.io/langgraph/)\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 66.266371,
   "end_time": "2026-05-27T07:37:30.683921+00:00",
   "environment_variables": {},
   "exception": null,
   "input_path": "all-agentic-architectures/notebooks/06_pev.ipynb",
   "output_path": "all-agentic-architectures/notebooks/06_pev.ipynb",
   "parameters": {},
   "start_time": "2026-05-27T07:36:24.417550+00:00",
   "version": "2.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}