{ "cells": [ { "cell_type": "markdown", "id": "d11f2ab1", "metadata": { "papermill": { "duration": 0.004014, "end_time": "2026-05-28T03:06:49.800378+00:00", "exception": false, "start_time": "2026-05-28T03:06:49.796364+00:00", "status": "completed" }, "tags": [] }, "source": [ "# 28 · Multi-Agent Debate — argue, then vote\n", "\n", "> **TL;DR.** N agents (same LLM, different personas) answer the question; each round they see the others' prior answers and revise; after K rounds, Python majority-votes.\n", ">\n", "> **Reach for it when** the task is hard enough that a single chain-of-thought is unreliable AND there's value in agents *seeing each other's reasoning* (which Self-Consistency nb 21 lacks).\n", "\n", "| Property | Value |\n", "|---|---|\n", "| Origin | Du et al., *Improving Factuality and Reasoning in Language Models through Multiagent Debate* (2023). [arXiv:2305.14325](https://arxiv.org/abs/2305.14325) |\n", "| Vote | Python `Counter.most_common(1)` on last-round answers — deterministic-picker |\n", "| Cost | `n_agents × n_rounds` LLM calls |\n", "\n", "**Why this is different from Self-Consistency (nb 21).** SC samples N independent paths blind to each other. Debate's agents *see prior-round answers* and can update — letting an initially-wrong agent change its mind when confronted with stronger arguments. The cross-pollination is the whole point." ] }, { "cell_type": "markdown", "id": "998ef524", "metadata": { "papermill": { "duration": 0.002979, "end_time": "2026-05-28T03:06:49.808424+00:00", "exception": false, "start_time": "2026-05-28T03:06:49.805445+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 2 · Architecture at a glance\n", "\n", "```mermaid\n", "flowchart LR\n", " A([task]) --> R1[ROUND 1
N agents answer independently]\n", " R1 --> R2[ROUND 2
each agent sees others' R1 answers, revises]\n", " R2 --> R3[…]\n", " R3 --> V[VOTE
Counter on last round's answers]\n", " V --> Z([modal answer])\n", "\n", " style R1 fill:#e3f2fd,stroke:#1976d2\n", " style R2 fill:#fff3e0,stroke:#f57c00\n", " style V fill:#e8f5e9,stroke:#388e3c\n", "```" ] }, { "cell_type": "markdown", "id": "65bfbd31", "metadata": { "papermill": { "duration": 0.00201, "end_time": "2026-05-28T03:06:49.812429+00:00", "exception": false, "start_time": "2026-05-28T03:06:49.810419+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 3 · Theory\n", "\n", "### 3.0 · The promise\n", "\n", "If 2 of 3 agents are wrong in their first instinct, simple majority vote (Self-Consistency) returns the wrong answer. Debate lets the *one* correct agent expose their reasoning; round 2 reads it and the originally-wrong agents update. The vote in round-2 may now be 3-of-3 correct.\n", "\n", "### 3.1 · The risk\n", "\n", "The opposite also happens: the *one* wrong agent argues louder, and a previously-correct agent switches sides. The handoff §7 deterministic-picker discipline doesn't fix this — the picker (Counter) just tallies whatever the agents end up at.\n", "\n", "Mitigation: require *both* \"answer\" AND \"critique_of_others\" per response, so an agent has to engage with arguments rather than just restate. Schema enforces this.\n", "\n", "### 3.2 · Where this sits\n", "\n", "| Pattern | Voter independence |\n", "|---|---|\n", "| [Self-Consistency (nb 21)](./21_self_consistency.ipynb) | Independent (no cross-talk) |\n", "| [Ensemble (nb 13)](./13_ensemble.ipynb) | Different roles, vote once |\n", "| **Debate (this nb)** | **Same role, K rounds of cross-talk** |\n", "| Council-of-judges (extension) | Different roles + K rounds |" ] }, { "cell_type": "markdown", "id": "61da9271", "metadata": { "papermill": { "duration": 0.002022, "end_time": "2026-05-28T03:06:49.816834+00:00", "exception": false, "start_time": "2026-05-28T03:06:49.814812+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 4 · Setup" ] }, { "cell_type": "code", "execution_count": 1, "id": "85a1f32b", "metadata": { "execution": { "iopub.execute_input": "2026-05-28T03:06:49.817655Z", "iopub.status.busy": "2026-05-28T03:06:49.817655Z", "iopub.status.idle": "2026-05-28T03:06:52.272282Z", "shell.execute_reply": "2026-05-28T03:06:52.272282Z" }, "papermill": { "duration": 2.454627, "end_time": "2026-05-28T03:06:52.272282+00:00", "exception": false, "start_time": "2026-05-28T03:06:49.817655+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "text/html": [ "
LLM: meta-llama/Llama-3.3-70B-Instruct ────────────────────────────────────────────────────────────────────────────\n",
       "
\n" ], "text/plain": [ "\u001b[1;36mLLM: meta-llama/Llama-\u001b[0m\u001b[1;36m3.3\u001b[0m\u001b[1;36m-70B-Instruct\u001b[0m \u001b[92m────────────────────────────────────────────────────────────────────────────\u001b[0m\n" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from agentic_architectures import get_llm, enable_langsmith, settings\n", "from agentic_architectures.architectures import Debate\n", "from agentic_architectures.ui import print_md, print_header\n", "\n", "enable_langsmith()\n", "llm = get_llm(provider=\"nebius\", model=\"meta-llama/Llama-3.3-70B-Instruct\", temperature=0.4)\n", "print_header(f\"LLM: {llm.model}\")" ] }, { "cell_type": "markdown", "id": "ac3c81f0", "metadata": { "papermill": { "duration": 0.0, "end_time": "2026-05-28T03:06:52.272282+00:00", "exception": false, "start_time": "2026-05-28T03:06:52.272282+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 5 · Library walkthrough" ] }, { "cell_type": "code", "execution_count": 2, "id": "3676ed4e", "metadata": { "execution": { "iopub.execute_input": "2026-05-28T03:06:52.302178Z", "iopub.status.busy": "2026-05-28T03:06:52.302178Z", "iopub.status.idle": "2026-05-28T03:06:52.357084Z", "shell.execute_reply": "2026-05-28T03:06:52.357084Z" }, "papermill": { "duration": 0.064959, "end_time": "2026-05-28T03:06:52.357084+00:00", "exception": false, "start_time": "2026-05-28T03:06:52.292125+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--- _DebateResponse schema ---\n", "{\n", " \"description\": \"One agent's per-round output.\",\n", " \"properties\": {\n", " \"answer\": {\n", " \"description\": \"JUST the final answer in the requested format \\u2014 no preface, no critique here. (Critiques go in the other field.)\",\n", " \"title\": \"Answer\",\n", " \"type\": \"string\"\n", " },\n", " \"critique_of_others\": {\n", " \"description\": \"2-3 sentences engaging with the other agents' prior answers \\u2014 ...\n", "\n", "--- Default agent personas ---\n", " [0] You are Agent A: rigorous, demands step-by-step reasoning before committing to an answer.\n", " [1] You are Agent B: skeptical, actively looks for counterexamples and edge cases.\n", " [2] You are Agent C: pragmatic, focuses on which answer best fits all available evidence.\n" ] } ], "source": [ "from agentic_architectures.architectures.debate import _DebateResponse, Debate\n", "import json\n", "print('--- _DebateResponse schema ---')\n", "print(json.dumps(_DebateResponse.model_json_schema(), indent=2)[:400] + '...')\n", "print()\n", "print('--- Default agent personas ---')\n", "for i, p in enumerate(Debate.AGENT_PERSONAS):\n", " print(f' [{i}] {p}')" ] }, { "cell_type": "markdown", "id": "dbe360a4", "metadata": { "papermill": { "duration": 0.015725, "end_time": "2026-05-28T03:06:52.372809+00:00", "exception": false, "start_time": "2026-05-28T03:06:52.357084+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 7 · Build the graph" ] }, { "cell_type": "code", "execution_count": 3, "id": "a7f0f941", "metadata": { "execution": { "iopub.execute_input": "2026-05-28T03:06:52.372809Z", "iopub.status.busy": "2026-05-28T03:06:52.372809Z", "iopub.status.idle": "2026-05-28T03:06:55.313233Z", "shell.execute_reply": "2026-05-28T03:06:55.313233Z" }, "papermill": { "duration": 2.940424, "end_time": "2026-05-28T03:06:55.313233+00:00", "exception": false, "start_time": "2026-05-28T03:06:52.372809+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAL8AAAFNCAIAAADepSn3AAAQAElEQVR4nOydB3wU1fbHz2zPpvdGCglCAAkgReEBgdBRkI5AQHg2ePBUFPkjigiKYgHz9KnYEIOoD8QHKgEfGqSKSFVaqCGFkJBC2vad+Z/dTTabsCGZye4ys3u/n/fW2Tt3hs3Ob88999xyJAzDAIHACQkQCFwh6iFwh6iHwB2iHgJ3iHoI3CHqIXCHqIc71aX08X3lJde0WpWRNjJ6DUOJgKEBKAAGGIrB/+J/agvBVI7FIoqyvLWUm0pEVF0F02WWU7X1jfU1QcQAbXM3M6Z/xXxJLXV3sHyGRpURqZySyCi5UhIeI++aEuTlA62BIvEetmhr6K1rC8qLdAYDI5ZQXt5iqYwSS0U6tZEy6QAfGj52xiIbpuEjNAkFC23Ug5VNB0bzUzA/cssBFpvVw9TXFFNgZCgxVVvZgrmw/i3Kjq4/sKMehYimQa+hNWrGoDNKJFRItGLiU9HACaIedmS8erWiVOfjL+l4b8B9IwNB4BzYVnr+WHV1pS4wTJ72fCywhKinpfxvQ/GFk5WhUYrJz7QBt+OrN/PKi7Rd/hYwYHxIy68i6mkRGStz0bl5dEVbSgzuSmWR4ev0PL9g6dSFLf15EPU0z5Z3C4x6mPwsR+dAWHy5Mi84WjZyVnhLKhP1NMO6ZTkKb8m0RW7YWjXFhpV56NnPeLF5N0gEhKb5+q087FJ5lHSQGS/EYHTgvx8UNluTqKdJDv90s6JEP3VRDHgeM5bEXc9RXTiiun01op4mObKrNGVCi5p/t6RHalDWluu3r0PUY5/v1xYqlOKOvb3BU+k9IhBjobs2Ft+mDlGPffIvqe67PxQ8m873+l08WXWbCkQ9djj8vzIcc+h0r0sNz6ZNm5YtWwbsGTp0aEFBATiBPg8E4/jducNNCoioxw7nj1QHhUnBtZw5cwbYU1hYWF5eDk7DP1h2Yt/Nps6SMXY71FQa7kkNAueQk5Ozdu3ao0ePYqQtOTl55syZ3bp1e/zxx48dO4Znt2/f/uWXX7Zp0wZff/vtt0uXLoWEhKSkpMydO1ehUGCFRYsWicXiyMjIjIyMJ5544qOPPsLCBx98EOusXr0aHE1se+X5E03aHqIeO+Dgebd+AeAEdDodCqVXr17vvfceiuCTTz5ZsGDBjh07Pv7441mzZsXFxS1fvhyrffrpp+vXr3/11VcDAgKqqqreeustrPzkk0/iKalUev78+ZqamjVr1nTp0qVjx45PP/30tm3boqOdEgpP7Opz6veKps4S9TSmKFdHUSBVgjO4evVqWVnZ1KlTk5KS8O2qVavQ5BgMhkbV0tLSBg8e3LZtW8vbkydPHjx40KIeiqKuXbu2YcMGiylyNtF3KYABnRZkcjtniXoaU3ZdT1HgJGJjYwMDA19++eVRo0b16NGja9euPXv2vLUaGhhsttCJRjNj0VZQUH1LiqpyjXQsMMCUX9eFx8luPUW85sYwRhqchlwux9aqX79+X3311SOPPDJ27NjMzMxbq2G7hm3ZuHHjtm7deuTIkdmzZze6CbiUJn9NRD2NCQiTmuYHOo34+Hj0VH788Ud0XNq1a/fSSy+dO3fOtgJ601u2bJkyZQqqJyIiAkvQ9YE7CAO+QTK7Z4h6GhOVqKDpuvmdjgY7XN9//z0eYNMzYMCAN954QyKRnD171raOXq9Xq9VhYWGWt+ho7927F+4QN3L1lAi8fO2fJeqxA0bo/9xfCU6goqJixYoV6enpeXl56EF//vnn6Nag94OnYmJiTp069ccff1RXV6N9QpHl5+ffvHkT62OXvrKyEvtZt94Qa+Lrrl278FpwAuePVoma9gKJeuyg9BHfJsjRGlAoS5YswS46tkoTJkw4fvw4xn4SEhLw1Pjx47E/NW/evAsXLrz22mtonCZOnIiOUe/evefPn49vhwwZgr2tRjfEyNDo0aPxJugqgRO4crZK6dtk14rMDrPDnm9Lzh2pfGJVAng8Hy661HtocI+h9qNfxPbYIWViiE5rLLioBs/m9IFKmmaakg6QeE9ThEQpdm0smrUsvqkKkydPLi62M3vBaDSKRCKqiU4u9sAxfAxO4MSJE9iVs3vq9h8pKysLz9o99duO0qi2XtA0pOVqkn8/czHt/+IDwu3/wIqKivCpAEuioqLAadzqFbWEpj7S5T9VOzKuzXu73W2uJbanSRKTfTe/m/fYyrZ2z4aH827aoWOl+b8Nhcn9mhkqJn5Pk4ycFS6RiX78pPnJ4e7H5vQC3yBp/7FEPa1g9rK4gktq7IKBJ5G57vrNG7rpLViYTPye5vl06ZXYu7yHzQwDD+C/H1yrLjfMeKFFa9qJelrEx0su+wbIprr7wq6MlVf1GuaRV+JbWJ+op6V883ZeaaG2Y0+/1KluaIR+yjDt8RAZq5jwFItfCFEPC7KP1OzefN1oYKITvQZOCgsIdfXcZ4dTkqf79b83ivM0Eik1YmZ0bBK7uR9EPaw5lnXzxJ6bNZUGqYxS+ki8/MTevlKxlNFpGozLW3Zeqt0PygxG7DB0W7cRWO0wvnXDJ/MlIsY0vm+9tsEdGmzlJLKdBWC6jgHKdmIJxv9oe7s/YS+SMVA1VXpVlUFViRErBoexeg8LuftvXHYRI+rhzpFdlbnZNdU3dXodg08OXxuctuz9ZvqCa4O85i3Fal/NkqCshXUVaIYRWQstj0Zk2oGscU3r5dY7A4Dtk6ytbN2Irg6JnBKLKKmc8g2UxnVQdk9tVeCbqIe/pKenh4SEpKWlAV8hsWb+YjAYJBJePyCiHv5C1EPgDlEPgTt6vV4q5XVQgKiHvxDbQ+AOUQ+BO0Q9BO6geojfQ+AIsT0E7qB6xGJe701P1MNfiO0hcIeoh8AdEi0kcIfYHgJ3iHoI3CHqIXCH+D0E7hDbQ+AOUQ+BO0Q9BO4Q9RC4Q8bYCRwxGo08HyIFoh7ewjBMTAzfU6IS9fAUNDw5OTnAb8juTzyFoiiRSMRha0RXQtTDX7DDdWvyJV5B1MNf+K8e4vfwF6IeAneIegjcIeohcIeoh8Adoh4Cd4h6CNwh6iFwh6iHwB2iHgJ3iHoI3EH18HyUlKiHvxDbQ+AO/9VD9ornHcOGDSstLbU8F4qiaJrG4+7du3/++efAM8gMDd7Rq1cvMKWnEFkSGYvFYj8/vxkzZgD/IOrhHSiURvlpExISUlNTgX8Q9fCOpKSkPn36WN9KpdLJkycDLyHq4SPTp0+Pjo62HMfGxo4aNQp4CVEPH4mLi+vXrx+Yu12TJk0CvkL6XNzJv6g7d7hCXaMHc+eotpckMmVTM2fkq31rSdknkpgKGZt8fXiJ6eu35O6zpgoUme/DgFanPX7smEgk7tmrl8l5FpkyuNHW7H82uQFNaQAtxdYscbX/aF3eQdv8g5YKYsrLW9qlj19YrAxaAVEPR9Yvv6quMcrkotqEktYsfCLLo6p9S4lQH+YDVI/RfKo+X58lUSDYXs5QWFtkOWNkjBRjUo7pPGXOM1inGNs8gWC+zvSf+pIGKQEZk0obfHhUj1RK6bW00k8888U44ApRDxc+XXIlIkGZMikcBE7mp9dVVZrZL8cDJ4h6WPPp0qvR8d79JoaAW5C1saj8hmbWMi4WiHjN7Di26yY6H24jHSR1erhWbbx4VAXsIephx5VslZfS3b40uVJ87mgFsIeMkrJDVWUAmgL3wmgwqlVchmOJethBG3DUEtwM2igy6rn8VUQ9BO4Q9RC4Q9TDDowUu5vX0wqIethhGW0gWCDqYQclJranHqIeduBYFbE9Voh62CESEdNTD1EPO3CYwv3iPZwhIxUEEIkZsZSLEojtYY37TUqgjRSJNbsCijLN5SNYIN8ESyig+O03p/9r1exHXLQGg9gedmB3nXjNVoh6CNwh6nE6D44bPDPt0b37s/788/i2rVl+vn4HDuz5IuPjq7lX/P0D2rXr8NQ//y88PAJrPv/C0/j6+sp0y4U//fTjqjdf3v7DXqVSuXzFYoqihgweiSVqtapTpy5zHn+qY8e7sZpKpVr5+ovHj//Rtm27B0dPBBdC/B52UOy9HqlU+mPmf1Elb735vtJLeeTo7y+9/NywYfdv+iZz2dJVRUWF6e+uavYmEonk9Jk/d/2cufbDDTu275fL5K+/scxy6u3Vr+Tn57791oevLH/7Ss6lQ7/vB1dB1MMOHGNn2+dCwfn5+f9z3sKePe5FEaz7/MMB/VMnTpiGhqdz5+R/zH3m0KH957LPNHsftUr13MKXoiKj8SaDU0fk5V1Fq1NScmP3r7umPvRwp453BwUFP/H4k3K5AlwFUQ87aCOXWHOH9p2sx5cvX0hK6tzo1Llzp5u9SUxsPDZhlmMfH198raqqLCwsANPa04T6G3boBK6CqMcVyGS1Szarq6u1Wq2tebAIQqWqafYmIntGr6LypukmXkpriZfCC1iCsWaRhEscgnjNLkWhMOlGo1FbS2rMugkOsrPEx0g3v2mhv1+A6YZajbWkJUJsBMaaaQOXCDqxPayhGO7hQnRZOrTvePr0n9YSy3FC4l34KpPKbJ89ejbN3jAiwrTTz6lTJy1v9Xo9euXgKoh62GHa5KB139m4sVP2H/h1y5avK6sqj5848sGHa+7p3uuudh3wFPbA0QG6fPkiHqMIsFqzdwsNDbv77q7r169FqWGb+OrKFygXxsJJy8UOhm7t0m3sq98oKf7P5g3//mA1hnl69rjvsUfnW06NfXBybm7O43OmG43G1EHD0qb9HaM7zf57zy9ekZ7+Ol6FhmfE8NGjRj7YEtk5BLKOnR1fvJKDfa6JT8eDG/HNmzm+AdRDz7Feyk5sDztMe+7wfJjUhRD1sMQkHyBYIOphB9VglyVPh6iHHWResy1EPQSMNYNISmLNzsctVyJjTJvWc2mOiXrYQVYi20LUwxLS5bKBqIclDOlx1UPUww5KTJEeuxWiHnYwRob4PVaIegjcIeohcIeohx0Kb7FO5269LrmSUiilwB4yO4wdKm2ZXsXrTLMc0KuZwDAuyXKIelqK0Wi8evVqjuZHjZrXOdLZUlHK6LR0yiQuuROIepqntLT02Wef1Wq1ERERy5e/ENfe55tVOeAuZH6c066bL3CCzC28HTRNi0Si119/vW/fvikpKdbyv/ZX/ZZZEh7nFZPkQ0GDHjxlmTVv+Vbr0mjVp1ejalNpWV6xaoPoozlnW12iOJucbOb7UOZIk7W+iKJopkECOfO8NdOl5nxcjKWSKfObOcFX3b5D5tuKxYyeyTlTU5xXM3RaRGKyEjhB1NMkGzZsKCoqWrhwod2zJ/ZUntxTrlbRBm3DhszmoZtzrNlxsevLb8naZ7eQMams7k5M42qUjTCYOqnUnjILizHXaXBPCqQykZeP9N5RwR3u4SgdIOqxi16vv3bt2tatW5966im4c/zrX/8KCgpilYldpVKNGzdOIpFMnTo1LS0NnAzxexqQk5Pz8MMP63S6Nm3a3FnpgGm5sQ+qh9Ulge6NzAAAEABJREFUSqUSnbPCwsL33ntv/PjxO3fuBGdCbE8tVVVVvr6+H3744YABAzp37gyC5bnnnsvKyrI4T97e3u3atZszZ07v3r3BCRD1mHjnnXcUCsXcuXOBT5SVlcnlclQAq6vS09MzMjKsi97x+QYGBnbp0gX/RnA0nt5yaTQajOKEhYXxTTpg9nv27NkDLElMTLRuugDmflh5eXl2djY4Ac9Vz+nTpx944AGMAcbFxU2fPh34R0BAADamwJLw8HDbq9B6HT16NDMzE5yAJ45zYX8qKirq5MmTn332Gdt2wZUsWLAA2INes2WnDgxWRUdHf/vtt+A0PMv2oKV54YUXdu3ahcfTpk3DnynwmOLiYuyBA0tiY2PR10G/59ixYz/88MNjjz125kzzG5Nxw4O85pqamoqKilOnTg0bNgyEAIYMpkyZgmFuaAX4V6MT7SSvziNsz+HDh/v06YMH2GAJRTpISEhI6xtWvIPzOgRubnuwr9GhQ4ft27ejaKRSLlNY3APsrg8cOLB79+7gUNxWPTjaMH/+/CFDhkyaNAmEyfXr1zFUg50mcAT4PWzevBkcihuqB50b/KO0Wm1+fn6PHj1AsMycOXPx4sWdOrluD1S2uJvfg0H6CRMm4O8V+1OClg6YdpULdWxAYe/evY7tf7mP7cGYGMpl3759/fv3B0ITjBo1av369RhbB0fgDrYHG6mJEyfiwDIeu5N0CgoKDAYHz6Hetm2byHEJxoRtey5duuTn5ycWi3GEHAccwL0YM2bM2rVrMcoADgWdcRz8ckikVMC2B/vhS5YsQfUEBQW5n3TAPObg5cV64/eW3HbRokU4zAetRpC2B13j1NTUEydOdOvWDQjs0el0v/zyy8iRI6F1CMz2aDSalJQUy66lbi+d3Nxc2jnb3MlkstZLBwSkHhxtKCoqwhhgZmbmoEGDwANIS0vDXws4jeeffx778NAKhKGeTZs2ffHFF+jf+Pr68nlOhWNBf9l2npfDef3113fu3Nka88Zrvwe74jt27Bg7diz2rRITE4HAM/hre9Rq9eDBg6Ojo8E82xI8DPxVX73afI6c1vPdd98dPHgQOMFf9eDXt3///l69eoFHgh7eypUrwfmMHz8+IyODwzQ04K16NmzYcODAAfBgMCKMIVBwCatWrbLmrGQFT+c1l5SUYAQZPBiJRPL111+DSwgICABO8NRrxs45fn3BwcHgqeBzwXiPa2LoI0aMwMA9h58rT1suHIXxZOmA2e+ZOnUquISbN29yMyI8Vc/WrVuxLwAeDPo9Lhu8w6gPWnpgD0/9nrKyMqeGWfkP8Xu4U1paajQaHTWJSYgQv4c76PR4snSA+D2t4ZdffsEQFngwxO/hTkVFRX5+PngwxO/hDqoHY+eRkZHgqRC/hzv+/v6eLB0gfk9rOHz48Pvvvw8eDPF7uFNdXe2a+Qm8hfg93EH1oDlt06YNeCrE7+GOj4+PJ0sHiN/TGs6cObNq1SrwYATh9/Cr5Ro3blxOTg5+cYw5LYPllabp48ePA4F/8Mv2zJkzJygoCBWDArK8ooCSk5PB83DZvGYw+z04qgjs4Zd6hg8fnpCQYFvi6+v70EMPgedB/B4uzJgxw3ZeWGxsrENWPQoOQfg9vFPPgAEDkpKSLMdyuRw9IfBIBBHv4WOfa+bMmREREXgQExMzZswY8EgE4fc4MtacfahGzxhrk41ZUpHRQImBoW9Jjmc5oMFe7jOQw129kyacFZ0bfN+Q7CMqU/VbUpqZs99R1kLTP0rX17Em57PgrZC17a4AQWHxeziv02MFZ7/HMT32jJW51Tf1IjHotYw1VZ1ZHozI1PGuzV9nKxWrwBocmMVRl1WRMSe3M31Ca7I9c87ExscNEidaShq+lchE+D4oTDFlYTQIBIPBgC6gaxovVA+3xssB6lm7+HJ4rHLIQxHA4wVYN0uMe/9TKKJg6vMeHcJ2LK31ez5afLnHoPAh03ktHSQgRDxmXhuZUvzFK8IYfHX/eM/2z67LvcRJ9wlmS5RhsyI1KuOpAy5a4dsa3D/ecyNPGxLFPaPuHcHHR3ruqADU4/7ze3Qao0wpsG0PKQmoq/XAe9w/3qM3MHotCAu9njZoBaB4Ms5F4I4g/B5PzCxpN0TJN9zf76FE5vieoDB/YAG0XO7v9+DggOD2Cjd9YEYAknd/v4eiBGl7KCE4e+4f7xHIz7gRwvjEHuD3UAJ5FDYwwAgiNYdn+D1OyaPgTIjfcwsk3uNuuH+8x9RjF5r8TInxhLCVr/v7PUJsucxRBuL3NODO+D0iiqHEAgv4mNwemvg9Dbgzfg/NUIzR6U9i+YrFmTu2gYdB1nM5huxsR6YQN0HGuRpyZ9ZzsfKa0TYOHX7flxvX2ZbcP3rAx5+8h8cqlerV116cOHnE8JF9n5iTtnXbZkudQYN7Fl6/9tbbr4x+cKClZOdPP/xj/qyR9/fD12+3fMX2R2OaME8Rv6cBAoj3iMXiPvf137cvy1py5OjvKJrBqSPwePGSJ69dy39lxepN32QOGDD4X+++cfacKWnvzkxTppznFi79YduvePDzLzvfeHN5+7uSvvry+0cfmYfq+fcHq4EdlFgIxsdDxrlY/I5TUoacv3AObYnl7f79u+PjExIT7zr0+4G//jrx3LNLOyZ19vcPmD5tdpcu3b7I+PjWO2Rmbk1O7v70U4sDA4Pu6d5r9sNztm7dVFlVCS0GnwothH6iR/g9rP7Nv/VNkcvlFvODH3fP3l8shufKlYsKhaJt2/oEgO3v6niru0PT9KnTJ3v17GMt6d69FxZeunQe3A70e2JiYsAl3KF4D8uoP0qkb58B+/bvnjwpDY1NVVXl0CGjwJRXoEShaJC2XqlUqtWNs9XpdDr8RX627gP8n215ZWUFtBjzGLsA/B78VeTl5YFL4Oz3uDrWPHDg0GUvL0K57N2X1blzcni4ab26t7e3RqO2rVajqgkJDm10LYoPVTVs6P3oGNmWx8W2BRYfWghzw0wqp1yW347zvoWujjWj44xaOfT7/qzdP81Ie9RS2KF9J41Gc+Fi9l3tOlhKzp49Fd/WTibbxMT2VdVV3bv1tLxFU1RYWIA+ELQcBgQxM0Aqle7btw9cwp3xezi0Avil9O2b8v3331ZU3ByYMsRS2Lt336ioNmvWrDyXfaasrBQbJlTPlEkzwLwJS2ho2JEjh46fOGIwGB57ZP6BA79i8BANO7Z9K155/pmFc7C85R8AvyWhjK5w6wdx4M7Ee8xPgvXveOAAU8+rxz29rTYDP/qrK1b7+fn/Y97D09LGHD12+JUVb2O3y3J2+rS/Hzv+x9KXnlVr1Fj48dqNf/55fNyEoQsX/aOmpvrVV9agIsHtQCevf//+4BLuzH7N7y+8FN/Jd8AEIaVC+u69q7Semb08HvgNNsqpqamuabzujN8jxBkaFAijz+X+fo8g5xZSIJSpzW7u94jQ9ghuhoZAvGZB+D2tm6GBtscouHnxwsDF8Z47sG+hENdzCQUP8HsY4a0lFRBu7vdQJgQmH5EExBIBfGb393tM6+qEtpiUNoDRIIDP7BF+jyDm6QkR9/d7TNcTr9lpkHEu3iGUPTTc3+8RIkKJFrq/30NwHu7v90illNxLYNZL7iWWyYTxmd3c75F5iVUVLGZm8QG9xujlL4D5QO7v98QkKEuuaUBQqCrprgOCgfcIwu9plXoGp4Uajcy+LaUgELak5/qHSBKTZcB7BOH3OCDD0ucv58hkku5DQ2Pay4GvZB+uOn2o3DdQOn5+JAgEtAeuMT93Mj8XsvmdgtIiHW2kjYYm72bO5ddMcMheHcbuZC77d7Nf1xTgkcnE0YnK+x8NB4GAfs/AgQNdkxuQM47psU9aYMq5p1ODUddE82nN+ldfAtYkgLcWltwoeebZZzK+yKgraZT7z/xq50KbvIQ2Z728xCCAxqoB7r+eqxEyL3xQjvmDxdVGLV3l5S+EPeKcg0eMczkJg8HALQLhTrh5vMd56PV6t1yl1XLIOBd3iO0h41zcIbaH+D3coWlaJPL0nciJ38MR0nIRv4c7RD3E7+EOUQ/xe7iD6vFwrxmI38MZYnuI38Mdoh7i93CHqIf4PdzBaCEZ5yJ+D0eI7SF+D3eIeojfwx2iHuL3cIeMkgLxezhDbA/xe7hD1OP+67mcB2m5iN/DHWJ7gPg9nCHqIX4Pd/z8/Hx9fcGz6d27N7iEBQsWsMozZIWnv+/Kysrq6mrwYGQyWXp6Ojif7du3R0ZG3oHMks4D/xhuvwZ3Qq1WFxUVxcfHgzO53wxwgqctF1EPmBZQe7344ovnzp0Dp4ECzcnJAa4Q9fCaZ555xqnqee21186ePQtc4WnLhYEyoh7kHjPgHDQaDZq3kSNHAleI7eE7Bw8ePHnyJDgBhUKxZMkSaAVEPXwnLi5u2bJl4AQyMjJqamqgFfBUPRinx8EKIABER0e/9NJLpaUO3t5vz549aNK8vb2hFZAeuwBwhuvj4+PTymYLSMslFJ588knHmp8ePXoEB7d271iiHmGA5ufrr78GB7Fx48bMzExoNaTlEgazZs3CcVNwEB988EFWVha0GqIewaBSqYxGI0ZooHXgTfbu3euQqWek5RIMubm58+bNg1ZTVlbmkG2WgahHQCQnJyclJRUUFEArwF764sWLHTV3irRcQmLRokXQOlA9c+fOBQdBbI/A2LRpU2tmrM6cObNnz57gIChHNYEOYfTo0VbLTJm2lwfLxzt27BgQzKSnp4eEhKSlpQF7zpw5gxH8rl27goPgl+3BLwVjoCIz5mTvFB4kJCQAoY7HHnssNDQUOIENX0REBDgOfqlnypQpOChoW4LqGTNmDBDqwJGp4cOHA3uuXbuG/nJ4uCPzvPDO75k8ebLt0B2KaeLEiUCwAT3fjz76CFgSFRXVr18/cCi8Uw+6PtaZvNhyjRgxQqlUAsEGdFy2bdtWXFzc8ktwjAwH6sHR8LHPhd6Pv78/HrRp02bcuHFAuIXNmzezmlyRkZHRoUMHcDR8VM/QoUMtnvLgwYODgoKAcAsoHVZ76U83A46Ge4+9poLZ+11R4RWtTmMw6BlLsj7TvWwT9NUe1xUxNkndoGUHNjC0KcufbUF9Jfz3qSYTD1rOmPtwjMJbknC3z8BJISBwli5d2rdv35bMSjaakckcn+COi3qOZVUc212mUdFiiUjuI/cJUCgD5DJvGYgZsdGqEFOePvPjxtdaE8dYyi3/r6vDmKvbaoamzHXqsvtZVMlYbmrzMYwiSkzX1THfyvaelG1yQLEYvz5Nlba6RF1drjbqjEYD7eMvHZYWGZUotKSBdVy6dGnjxo0t8WbQE0CpOaPlYqceVQVsfOuKXsv4BCtju3KMOvABo47OOXFdXaH1C5TOXBoH7kvLRcYBFur5+esb2X9U+Ib6xHYTsG4acfHQNW21dujUyPY9WzXD946Qn59fXl7epUsXuEO0VD3fvX+t+LFBEuQAAAcPSURBVKomaZAb/kxrSrU5xwv7jQ3t2t8PhAa6Pr/++mtTPg2211lZWdgLAefQIr99+7qiwhz3lA7iHSzvPCR+/7Ybx3dXgNDAYa8LFy40dfbLL7906lLU5mdofPd+YUmepnOqOzsHSOfB8b9tz0FXu3uqPwiH2+/S4uXl5dRIfTO25889FYWXVe1TYsED6DQ4/sCPN4AGYbFu3Tp0je2eajTs43CaUc++72/E3O3IcTWeExDh8/HSHBAUbdu2Xbt27a3lGF/Ozs4GZ3I79Wxaky/1kvpFtHYatoBo0yUUI597tzh44aZTGTRo0Jw5cxpNGSsoKNiyZYszYjy23E49xQWaxHsiwcMIiQk4e0Rg7nNiYmKjNRLYC1u/fj04mSbV88MnhVK5ROzloi2D2XLir58XLr23uqYcHE1YOz+DAf7aXwXCoaamZtiwYbYloaGhgYGB4GSaVM+1y2rfUOEF0ByC3EtyYq/jdek80DXGAS8M/FjeZmZmvvbaa+B8muix60GvpaOSPHR8OyDKv/hiCQiKBQsWWI9//vlnh6z8ahb76vljdzklpsBp5OT++b/dn+bln/HxDuzYod+wQY8qFCY7d+DQ5l171s39+4cZ3zxfVHw5MrzdgL5Te93zgOWqH3e+d+Rkplym7J48PCzEiUGEkDifwuxiXTUj83Hil+BwTp482a5dO7RDa9asAZdgv+UquKgSiZz1xZWU5n20/p96vXb+458+PO2NwqILH66bi6PeeEoskarVVVu3vz157JK3VhxKvjt109ZXy29ex1MHD285ePjb8fc/99QTnwcHRu3a/Rk4ExzmP3dCSK4PcuXKlXfeeefixYs3btwAl2BfPTVVRlaTj1hx7OROiVg6a+ob4aHxEWEJkx58oaAw+9TZPZazRqN+6KBH42K64PPr2e1+HIYrKDyP5ft/25TceTDqSan0Q2vULsFhi5LsIhJTZdcdtuuAaxg7dqxCoZg9ezbnRRdsaUIiOHLqtJYLm62YNp28vWt3tw8KjAwOanPl6glrhdjozpYDpZdp2FKtqUINlZTlhYe1tdZpE5UEzgS1q1ULLeoMMHDgwJUrV4KrsO/3iCQUUM767tSa6ryCM9jfti2srKoP0FG3zBLUaGto2iiX10+Pl8mcG8NEvYqkPFon2UIcuE60JdhXj0Ihqix1VoIWX9/gtnHdhqc+blvo7X27sUmF3FskEuv1GmuJVqcC50IFhsiBcFvsqyc4Sn49VwvOISr8rqMnMxPiu1tdq+vFl0ODb9eHQmsUGBCZk/tXyt9qS85mHwBnwtBMbHuyEqgZ7Ps9ne71o2ln2W3shNM0/f2Od3Q6TfGNqz/+9O/V/55WWHTx9ld1vXvIX2d2Y4gZj7P2ZVzNPwVOo6JYi8IOixXqlGeXYV89IdEysZgqueqULDXYaVo4/yuZ1Ct97cNvvjv5cs6xSWNfaNYLHpIy+94eD27NXI0OExqeMSOfhro9EhxOed5NLx+eDtHwiiZnpuIAe2U5065vFHge5/bkdu7l139Ca7cUdXuaDOoMnR6hVQss4OEQKotUtJEm0mkJTc5MDQyX+PhJco4Uxfe0PzsMQ8Cr37e/PNFL7qPW2m/1IkIT5j/+CTiOF1cObuoUxq/FYjt/YHxs8qMz3mnqqusXyqISiL/cIm63psKohY+WXOo0JN7+WaOhotL+Qnx0h2Uyhd1TIpEkwD8MHEdZ+bWmTun0WpnUTq9bIpb5+dlfS1qWX1N0oWTum2THoBZxu1nxYjnEdlSiE5Bkb14z/qyDAu+8V+TYz3D9/I0BYx0pbvemmcGsBx6NlCmonKNF4AFk78sLj/W6+2+enk235TQ/FPr3l+N1Ku2Fg63a6JX/nP011y9QPOGfntjH5ExL15J+uvQKJZIk3ueeX2723ryoBMXoxxy5p58nwGId+xev5qqqDHHJkcog9wnC3rhUWXS5NDJBOWE+sTqsYbeHxp4tJad+q5DKJdEdw7yDha2h0pyqkpxy/POHp0W0TSZddC5w2b/n23cLivI0YpFI7i3zD/cJivUBgUAboeTKzaqSGk2NngJo28V35MOkh8Ud7nuH7fm25PLpGnWVwWikzdtyWXZfqp2aY96CCax7e9VuAlV3rn5nJpvj+jqW7Z7AfDuKqv2AIgZoylpSv1uUiGFM5VC76VPt29ptxSgxMEZ8pRiaxjL8Y8USEUZBO/cJ7DFEeDtm8A1H7BWvh4unayrL9Fo1bXpIFkS1W4XVvhOLMPxvOcbxV6ORqStnaGOdwsQixlyHEpm3DjPdiqLEDGO01KRoI0rHdBYPzJvcmQQrEpssirUCxpZNM6QpswIZRiQR0QaaklBeXpKAcHl8RwUQHAe/Mg0QhAVPc+QQBAFRD4E7RD0E7hD1ELhD1EPgDlEPgTv/DwAA//+BJ6XeAAAABklEQVQDAAF+XtdsPn78AAAAAElFTkSuQmCC", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from IPython.display import Image, display\n", "arch = Debate(llm=llm, n_agents=3, n_rounds=2, sample_temperature=0.7)\n", "graph = arch.build()\n", "try:\n", " display(Image(graph.get_graph().draw_mermaid_png()))\n", "except Exception as e:\n", " print(f\"(PNG render unavailable: {e}; see § 2)\")\n", " print(graph.get_graph().draw_mermaid())" ] }, { "cell_type": "markdown", "id": "05433945", "metadata": { "papermill": { "duration": 0.004904, "end_time": "2026-05-28T03:06:55.321135+00:00", "exception": false, "start_time": "2026-05-28T03:06:55.316231+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 8 · Live run — Sally-siblings trick problem\n", "\n", "Same trap as nb 21 Self-Consistency. With debate, we'd expect agents who slip on round 1 to be corrected by their peers in round 2." ] }, { "cell_type": "code", "execution_count": 4, "id": "90c7bfeb", "metadata": { "execution": { "iopub.execute_input": "2026-05-28T03:06:55.333078Z", "iopub.status.busy": "2026-05-28T03:06:55.333078Z", "iopub.status.idle": "2026-05-28T03:07:10.914859Z", "shell.execute_reply": "2026-05-28T03:07:10.914521Z" }, "papermill": { "duration": 15.587703, "end_time": "2026-05-28T03:07:10.914859+00:00", "exception": false, "start_time": "2026-05-28T03:06:55.327156+00:00", "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FINAL_ANSWER: 0\n", "EXPECTED: 1\n", "MATCH: False\n", "CONVERGENCE: partial\n", "ROUND_UNIQUE_COUNTS: [1, 2]\n", "FINAL_TALLY: {'0': 2, '2': 1}\n", "\n", "--- ROUND 1 ---\n", " Agent A: answer='2'\n", " critique: (round 1 — no prior answers)\n", " Agent B: answer='2'\n", " critique: (round 1 — no prior answers)\n", " Agent C: answer='2'\n", " critique: (round 1 — no prior answers)\n", "--- ROUND 2 ---\n", " Agent A: answer='0'\n", " critique: All agents agreed that Sally has 2 sisters in the prior round, but I disagree. Since each of Sally's brothers has 2 sisters, and one of them must be Sally herself, it means that there is only 1 other \n", " Agent B: answer='0'\n", " critique: I disagree with Agents A and C, who both said Sally has 2 sisters. Since each of her brothers has 2 sisters, and one of those sisters must be Sally herself, it means that there is only one other siste\n", " Agent C: answer='2'\n", " critique: Agents A and B also concluded that Sally has 2 sisters, which aligns with my own analysis that each of her brothers having 2 sisters implies Sally is one of those sisters, thus she must have 1 sister \n" ] } ], "source": [ "TASK = (\n", " \"Sally is a girl with 3 brothers. Each of her brothers has 2 sisters. \"\n", " \"How many sisters does Sally have? Return only the integer answer.\"\n", ")\n", "EXPECTED = \"1\"\n", "\n", "r = arch.run(TASK)\n", "print(f\"FINAL_ANSWER: {r.output}\")\n", "print(f\"EXPECTED: {EXPECTED}\")\n", "print(f\"MATCH: {r.output.strip() == EXPECTED}\")\n", "print(f\"CONVERGENCE: {r.metadata['convergence']}\")\n", "print(f\"ROUND_UNIQUE_COUNTS: {r.metadata['round_unique_answer_count']}\")\n", "print(f\"FINAL_TALLY: {r.metadata['final_tally']}\")\n", "print()\n", "for round_idx, rd in enumerate(r.metadata['rounds'], 1):\n", " print(f\"--- ROUND {round_idx} ---\")\n", " for resp in rd:\n", " print(f\" Agent {chr(65+resp['agent_id'])}: answer={resp['answer']!r}\")\n", " print(f\" critique: {resp['critique'][:200]}\")" ] }, { "cell_type": "markdown", "id": "3fbb56c7", "metadata": { "papermill": { "duration": 0.015809, "end_time": "2026-05-28T03:07:10.930668+00:00", "exception": false, "start_time": "2026-05-28T03:07:10.914859+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 9 · What we just observed\n", "\n", "The cells above ran a 3-agent × 2-round debate on the Sally-siblings trick problem.\n", "\n", "### 9.1 · Summary\n", "\n", "- **Winner**: `0` — ❌ differs from expected `1`\n", "- **Convergence**: partial\n", "- **Unique answers per round**: [1, 2]\n", "- **Final tally**: {'0': 2, '2': 1}\n", "\n", "### 9.2 · Per-round agent responses\n", "\n", "#### Round 1\n", "\n", "| Agent | Answer | Critique |\n", "|---|---|---|\n", "| Agent A | `2` | (round 1 — no prior answers) |\n", "| Agent B | `2` | (round 1 — no prior answers) |\n", "| Agent C | `2` | (round 1 — no prior answers) |\n", "\n", "#### Round 2\n", "\n", "| Agent | Answer | Critique |\n", "|---|---|---|\n", "| Agent A | `0` | All agents agreed that Sally has 2 sisters in the prior round, but I disagree. Since each of Sally's brothers has 2 sist… |\n", "| Agent B | `0` | I disagree with Agents A and C, who both said Sally has 2 sisters. Since each of her brothers has 2 sisters, and one of … |\n", "| Agent C | `2` | Agents A and B also concluded that Sally has 2 sisters, which aligns with my own analysis that each of her brothers havi… |\n", "\n", "### 9.3 · Patterns surfaced in this run\n", "\n", "- **⚠️ No full convergence** — 2 unique answers in the final round; majority vote decided. Consider more rounds or a judge LLM.\n", "\n", "- **❌ Majority vote was wrong** (got `0`, expected `1`). Group-think can fail; pair with verification.\n", "\n", "### 9.4 · The takeaway\n", "\n", "Debate's value is in `ROUND_UNIQUE_COUNTS`: a `[2, 1]` sequence means agents disagreed in round 1, then converged by round 2 — that's the cross-talk paying off. A `[1, 1]` sequence means everyone agreed from the start (debate wasted N×K calls); `[2, 2]` means no convergence and majority vote decided based on first-round noise.\n", "\n", "The deterministic-picker is `Counter.most_common(1)` on the last round — same pattern as Self-Consistency (nb 21), but the votes were *informed by* peers' arguments, not independent." ] }, { "cell_type": "markdown", "id": "cccf955c", "metadata": { "papermill": { "duration": 0.0, "end_time": "2026-05-28T03:07:10.932142+00:00", "exception": false, "start_time": "2026-05-28T03:07:10.932142+00:00", "status": "completed" }, "tags": [] }, "source": [ "## 11 · Failure modes & extensions\n", "\n", "| Failure | Mitigation |\n", "|---|---|\n", "| **Group-think convergence on wrong answer** | One loud agent shifts the others | Diverse personas; weight agent influence by confidence |\n", "| **No convergence** | Agents stubborn; final tally split | Add round 3+; or escalate to human judge |\n", "| **Cost** | `agents × rounds` calls per task | Cap agents=3, rounds=2 for most tasks |\n", "\n", "Extensions: (1) heterogeneous LLMs per agent (Llama + Qwen-Thinking + GPT), (2) weighted-by-confidence vote, (3) judge-LLM that picks the best argument rather than majority vote.\n", "\n", "Reference: Du et al. 2023 — [arXiv:2305.14325](https://arxiv.org/abs/2305.14325)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.0" }, "papermill": { "default_parameters": {}, "duration": 23.758117, "end_time": "2026-05-28T03:07:11.858419+00:00", "environment_variables": {}, "exception": null, "input_path": "all-agentic-architectures/notebooks/28_debate.ipynb", "output_path": "all-agentic-architectures/notebooks/28_debate.ipynb", "parameters": {}, "start_time": "2026-05-28T03:06:48.100302+00:00", "version": "2.7.0" } }, "nbformat": 4, "nbformat_minor": 5 }