{ "cells": [ { "cell_type": "markdown", "id": "f2b8088c", "metadata": {}, "source": [ "# Chapter 8 — Data Analysis and Reasoning Agents\n", "\n", "**Book:** *Agents* by Imran Ahmad (Packt Publishing, 2026) \n", "**Author:** Imran Ahmad \n", "**Chapter Pages:** 203–233\n", "\n", "> *\"Information is the oil of the 21st century, and analytics is the combustion engine.\"* \n", "> — **Peter Sondergaard**\n", "\n", "---\n", "\n", "## Introduction\n", "\n", "The modern era is defined not by a lack of data but by the difficulty of interpreting it. This chapter examines a specialized class of intelligent systems designed to process information as well as understand it, reason about it, and extract meaningful insights from it to guide decisions. Unlike task-oriented agents, the agents discussed here operate at a higher level of cognition — they function as **digital analysts, researchers, and critical thinkers** who can question assumptions, evaluate evidence, and synthesize knowledge.\n", "\n", "These agents convert raw data into **actionable intelligence**, closing the gap between information and understanding across business, finance, scientific research, and journalism.\n", "\n", "### Three Agent Archetypes\n", "\n", "| Agent | Purpose | Key Techniques |\n", "|:------|:--------|:---------------|\n", "| **Data Analysis Agent** (§8.1, pp. 204–211) | Transform raw data into structured insight | Cognitive loop, visualization recommendation, OLS regression, anomaly detection |\n", "| **Verification & Validation Agent** (§8.2, pp. 211–215) | Ensure insights are defensible and reproducible | Fact-checking, NLI evidence scoring, logical coherence, consistency analysis |\n", "| **General Problem Solver** (§8.3, pp. 215–219) | Reason broadly and adapt across domains | 5-stage meta-reasoning: decompose → analogy → hypothesize → test → meta-learn |\n", "\n", "### Extended Case Studies & Integration\n", "\n", "| Section | Description |\n", "|:--------|:------------|\n", "| **Case Study 1** (§8.4, pp. 220–226) | Newsroom fact-checking assistant — claim extraction, evidence retrieval, tolerance-based verification |\n", "| **Case Study 2** (§8.5, pp. 226–231) | Cross-disciplinary GPS hypothesis engine — ecological resilience applied to power grid stability |\n", "| **Tri-Agent Pipeline** (§8.6, pp. 231–232) | Trust-then-escalate architecture wiring all three agents |\n", "\n", "---\n", "\n", "## Section Roadmap\n", "\n", "| Section | Topic | Chapter Ref | Book Pages |\n", "|:--------|:------|:------------|:-----------|\n", "| 0 | Environment Setup & Simulation Mode | — | — |\n", "| 1 | Data Analysis Agent: Visualization Recommender, Statistical Reasoning | §8.1, §8.1.1, §8.1.2 | pp. 204–211 |\n", "| 2 | Verification & Validation Agent: Theory, NLI Demo (BART-MNLI) | §8.2–§8.2.5 | pp. 211–215 |\n", "| 3 | General Problem Solver: Theory, Pseudocode Architecture | §8.3–§8.3.4 | pp. 215–219 |\n", "| 4 | Case Study 1: News Fact-Checking Assistant | §8.4 | pp. 220–226 |\n", "| 5 | Case Study 2: Cross-Disciplinary GPS Hypothesis Engine | §8.5 | pp. 226–231 |\n", "| 6 | Tri-Agent Pipeline Integration | §8.6 | pp. 231–232 |\n" ] }, { "cell_type": "markdown", "id": "6d914a1c", "metadata": {}, "source": [ "---\n", "## Section 0 — Environment Setup\n", "\n", "This cell imports the shared utilities, resolves the API key through a\n", "three-tier cascade (`.env` → `os.getenv` → `getpass`), and sets the global\n", "`SIMULATION_MODE` flag. If no key is found, the notebook activates Simulation\n", "Mode automatically." ] }, { "cell_type": "code", "execution_count": null, "id": "5b6e31e2", "metadata": {}, "outputs": [], "source": [ "# ── Section 0: Environment Setup ─────────────────────────────\n", "# Ref: Strategy §4.1 — Three-tier API key resolution\n", "\n", "import sys, os, json, re, textwrap\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "# Add project root to path for utils imports\n", "if '.' not in sys.path:\n", " sys.path.insert(0, '.')\n", "\n", "from utils import load_api_key, log, MockLLM, llm_call, fail_gracefully\n", "\n", "# ── Resolve API key ──────────────────────────────────────────\n", "API_KEY, SIMULATION_MODE = load_api_key()\n", "\n", "client = None\n", "if not SIMULATION_MODE:\n", " try:\n", " from openai import OpenAI\n", " client = OpenAI(api_key=API_KEY)\n", " log.success('OpenAI client initialized. Running in LIVE mode.')\n", " except Exception as e:\n", " log.error(f'OpenAI client init failed: {e}')\n", " SIMULATION_MODE = True\n", " log.info('Falling back to SIMULATION MODE.')\n", "\n", "if SIMULATION_MODE:\n", " log.info('No API key detected. Running in SIMULATION MODE.')\n", " log.info('All outputs are chapter-accurate mocks. See AGENTS.md.')" ] }, { "cell_type": "markdown", "id": "a09f65da", "metadata": {}, "source": [ "---\n", "## Section 1 — The Data Analysis Agent\n", "\n", "**Chapter Ref:** §8.1 (pp. 204–211)\n", "\n", "Data analysis is one of the most important capabilities in modern intelligent\n", "systems. By embedding reasoning capabilities into the analysis workflow, agents\n", "move beyond static queries to interpret intent, select appropriate methods, and\n", "explain results — turning data analysis from a manual, tool-driven process into\n", "an intelligent, conversational one.\n", "\n", "At the core of a Data Analysis agent lies a **cognitive loop** that integrates:\n", "\n", "1. **Intent analysis and planning** — The LLM Reasoning Core interprets the\n", " user's natural language query, extracting analytical intent and temporal scope.\n", "2. **Code formulation and execution** — The plan is translated into executable\n", " code (Python/SQL) and run against the dataset.\n", "3. **Visualization and analysis** — Results are rendered in the most effective\n", " visual form, accompanied by statistical reasoning.\n", "4. **Presentation and refinement** — Insights are delivered with a summary;\n", " the user can refine, creating a feedback loop.\n", "\n", "### Figure 8.1 — The Cognitive Loop of a Data Analysis Agent\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3309e28a", "metadata": {}, "outputs": [], "source": [ "# ── Figure 8.1 — Cognitive Loop of a Data Analysis Agent (SVG) ──\n", "# Ref: §8.1, p. 204\n", "\n", "from IPython.display import SVG, display\n", "\n", "fig_8_1 = \"\"\"\n", "\n", " \n", " \n", " The Cognitive Loop of a Data Analysis Agent\n", " \n", " \n", " User Query\n", " \"Top products\n", " last quarter?\"\n", " \n", " \n", " LLM Reasoning Core\n", " Intent Analysis\n", " Plan Generation\n", " Statistical Reasoning\n", " \n", " \n", " Code Interpreter\n", " Dataset (CSV) · Python (Pandas, SQL)\n", " \n", " \n", " Visualization Engine\n", " Charts · Statistical Analysis\n", " \n", " \n", " Visualization\n", " (Bar Chart) & Summary\n", " Presents / Explains\n", " \n", " Insights / Refinement\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Figure 8.1 — A linguistic prompt is converted into a reproducible computational workflow (p. 204)\n", "\n", "\"\"\"\n", "display(SVG(fig_8_1))\n" ] }, { "cell_type": "markdown", "id": "3aaeaab1", "metadata": {}, "source": [ "### Section 1.1 — Visualization Recommendation Systems\n", "\n", "**Chapter Ref:** §8.1.1 (pp. 206–209)\n", "\n", "A visualization recommendation system operates as a decision-making pipeline:\n", "\n", "1. **Query analysis** — Parse the prompt to extract intent (trend, compare,\n", " relationship).\n", "2. **Schema recognition** — Examine variable types (temporal, categorical,\n", " numeric).\n", "3. **Decision branching** — Map intent + schema to chart type:\n", " - Time-series → Line Chart\n", " - Categorical Comparison → Bar Chart\n", " - Correlation / Relationship → Scatter Plot or Table\n", "4. **Output and refinement** — Render with a summary; the user can refine\n", " through follow-up queries.\n", "\n", "> **📝 Note — Production Considerations (p. 207)** \n", "> The rule-based `recommend_visualization()` below is intentionally simplified for pedagogy. In production, ambiguous queries like \"show me the data\" may match multiple keyword categories or none at all, requiring a confidence-scoring layer or an LLM-based intent classifier. As the chart palette grows beyond line/bar/scatter/table to heatmaps, box plots, and geographic maps, a flat keyword list becomes difficult to maintain — production systems typically replace it with a learned model or decision tree trained on query–visualization pairs.\n", "\n", "> **📝 Note — Self-Optimizing Visualization (p. 209)** \n", "> Over time, reinforcement mechanisms enable the system to learn from user behavior. If users frequently replace a bar chart with a line chart for trend-related queries, the model adapts its decision weights, transforming visualization recommendation into a self-optimizing subsystem that mirrors human analytical intuition.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5e188e67", "metadata": {}, "outputs": [], "source": [ "# ── Section 1.1: Visualization Recommender ───────────────────\n", "# Ref: §8.1.1 — recommend_visualization() from chapter\n", "# Fallback: \"table\" per Strategy §4.3.4\n", "\n", "import pandas as pd\n", "\n", "@fail_gracefully(fallback_value='table', section='8.1.1')\n", "def recommend_visualization(df: pd.DataFrame, query: str) -> str:\n", " \"\"\"Recommend a visualization type based on the user's query.\n", "\n", " This is a pedagogical, rule-based implementation. Production systems\n", " should use an LLM-based intent classifier rather than keyword rules.\n", "\n", " Parameters\n", " ----------\n", " df : pd.DataFrame\n", " The dataset under analysis (used for schema recognition).\n", " query : str\n", " Natural language query from the user.\n", "\n", " Returns\n", " -------\n", " str\n", " One of: \"line\", \"bar\", \"scatter\", \"table\".\n", "\n", " Ref: §8.1.1, recommend_visualization() — Chapter 8\n", " \"\"\"\n", " # NOTE: This is a pedagogical implementation. Production systems should\n", " # use an LLM-based intent classifier rather than keyword rules.\n", " q = query.lower()\n", "\n", " if any(k in q for k in [\"trend\", \"over time\", \"timeline\"]):\n", " return \"line\"\n", " if any(k in q for k in [\"compare\", \"by \", \"across\", \"versus\"]):\n", " return \"bar\"\n", " if any(k in q for k in [\"relationship\", \"correlation\", \"scatter\"]):\n", " return \"scatter\"\n", " return \"table\"\n", "\n", "\n", "# ── Demo: 4 sample queries ──────────────────────────────────\n", "df_demo = pd.read_csv('data/sample_sales_data.csv')\n", "\n", "sample_queries = [\n", " \"Show me the sales trend over time\",\n", " \"Compare revenue across regions\",\n", " \"Is there a relationship between marketing spend and revenue?\",\n", " \"Show me the raw data\",\n", "]\n", "\n", "log.info('Visualization Recommender Demo', section='8.1.1')\n", "for query in sample_queries:\n", " viz_type = recommend_visualization(df_demo, query)\n", " log.success(f'Query: \"{query}\" → Recommended: {viz_type}', section='8.1.1')" ] }, { "cell_type": "markdown", "id": "3b6815f3", "metadata": {}, "source": [ "### Section 1.2 — Statistical Reasoning Capabilities\n", "\n", "**Chapter Ref:** §8.1.2 (pp. 209–211)\n", "\n", "Visualization provides perception, but statistical reasoning provides understanding.\n", "The agent performs three complementary functions:\n", "\n", "1. **Descriptive statistics** — `df.describe()`, standard deviation, correlation\n", " coefficients. These form the quantitative foundation for reasoning.\n", "2. **Inferential and diagnostic analysis** — OLS regression to quantify\n", " relationships (e.g., \"Did marketing spend influence revenue growth?\").\n", "3. **Anomaly detection** — Z-score-based outlier identification to flag\n", " data integrity issues and surface unexpected patterns.\n", "\n", "> **📝 Note — Hybrid Reasoning Layer (p. 210)** \n", "> In practice, the statistical reasoning module operates as a hybrid of symbolic and numerical computation. Symbolic reasoning (often powered by the LLM core) interprets user intent and formulates hypotheses. Numerical computation (via Pandas, NumPy, Statsmodels) performs precise calculations. The outputs are synthesized into a coherent narrative that balances technical accuracy with communicative clarity.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a04f820a", "metadata": {}, "outputs": [], "source": [ "# ── Section 1.2: Descriptive Statistics ─────────────────────\n", "# Ref: §8.1.2 — df.describe(), variability, correlation\n", "\n", "import numpy as np\n", "\n", "df = pd.read_csv('data/sample_sales_data.csv')\n", "\n", "log.info('Loading sample_sales_data.csv (100 rows, seed=42)', section='8.1.2')\n", "\n", "@fail_gracefully(fallback_value=None, section='8.1.2')\n", "def compute_descriptive_stats(df: pd.DataFrame) -> pd.DataFrame:\n", " \"\"\"Compute summary statistics for the sales dataset.\n", "\n", " Ref: §8.1.2 — Descriptive statistics and summarization\n", " \"\"\"\n", " summary = df.describe(include='all')\n", " variability = df['sales'].std()\n", " correlation = df[['sales', 'marketing_spend', 'revenue']].corr(method='pearson')\n", "\n", " log.info(f'Sales variability (std): {variability:.2f}', section='8.1.2')\n", " log.info(f'Pearson correlation (marketing_spend ↔ revenue): '\n", " f'{correlation.loc[\"marketing_spend\", \"revenue\"]:.3f}', section='8.1.2')\n", " return summary\n", "\n", "summary = compute_descriptive_stats(df)\n", "if summary is not None:\n", " display(summary)" ] }, { "cell_type": "code", "execution_count": null, "id": "62bdde03", "metadata": {}, "outputs": [], "source": [ "# ── Section 1.2: OLS Regression ─────────────────────────────\n", "# Ref: §8.1.2 — sm.OLS() regression, chapter verbatim\n", "\n", "import statsmodels.api as sm\n", "\n", "@fail_gracefully(fallback_value=None, section='8.1.2')\n", "def run_ols_regression(df: pd.DataFrame):\n", " \"\"\"Run OLS regression: revenue ~ marketing_spend.\n", "\n", " Ref: §8.1.2 — Inferential and diagnostic analysis\n", " \"\"\"\n", " X = sm.add_constant(df['marketing_spend'])\n", " model = sm.OLS(df['revenue'], X).fit()\n", "\n", " r_squared = model.rsquared\n", " p_value = model.pvalues.iloc[1] # p-value for marketing_spend\n", " coeff = model.params.iloc[1]\n", "\n", " log.success(\n", " f'OLS Results: R²={r_squared:.3f}, '\n", " f'marketing_spend coeff={coeff:.4f}, p={p_value:.2e}',\n", " section='8.1.2'\n", " )\n", " return model\n", "\n", "model = run_ols_regression(df)\n", "if model is not None:\n", " print(model.summary())" ] }, { "cell_type": "code", "execution_count": null, "id": "cb0914e2", "metadata": {}, "outputs": [], "source": [ "# ── Section 1.2: Anomaly Detection ──────────────────────────\n", "# Ref: §8.1.2 — Z-score outlier detection, chapter verbatim\n", "\n", "@fail_gracefully(fallback_value=pd.DataFrame(), section='8.1.2')\n", "def detect_anomalies(df: pd.DataFrame, column: str = 'sales',\n", " threshold: float = 3.0) -> pd.DataFrame:\n", " \"\"\"Detect anomalies using z-score method.\n", "\n", " Parameters\n", " ----------\n", " df : pd.DataFrame\n", " Input dataset.\n", " column : str\n", " Column to analyze for outliers.\n", " threshold : float\n", " Z-score threshold (default 3.0).\n", "\n", " Returns\n", " -------\n", " pd.DataFrame\n", " Rows where |z-score| > threshold.\n", "\n", " Ref: §8.1.2 — Anomaly detection and uncertainty quantification\n", " \"\"\"\n", " df = df.copy()\n", " df['zscore'] = (df[column] - df[column].mean()) / df[column].std()\n", " anomalies = df[df['zscore'].abs() > threshold]\n", "\n", " if len(anomalies) > 0:\n", " for _, row in anomalies.iterrows():\n", " log.success(\n", " f'Anomaly detected: date={row[\"date\"]}, region={row[\"region\"]}, '\n", " f'{column}={row[column]}, z-score={row[\"zscore\"]:.2f}',\n", " section='8.1.2'\n", " )\n", " else:\n", " log.info(f'No anomalies found (threshold z>{threshold})', section='8.1.2')\n", "\n", " return anomalies\n", "\n", "anomalies = detect_anomalies(df)\n", "if not anomalies.empty:\n", " display(anomalies[['date', 'region', 'product', 'sales', 'zscore']])" ] }, { "cell_type": "code", "execution_count": null, "id": "a31e46e9", "metadata": {}, "outputs": [], "source": [ "# ── Section 1.2: Natural Language Interpretation ─────────────\n", "# Ref: §8.1.2 — LLM-powered interpretation via llm_call()\n", "# Mock key: \"stats_interpretation\"\n", "\n", "@fail_gracefully(fallback_value='Interpretation unavailable.', section='8.1.2')\n", "def interpret_statistics(df: pd.DataFrame) -> str:\n", " \"\"\"Generate a natural language interpretation of the statistical results.\n", "\n", " Uses llm_call() with context_key='stats_interpretation'.\n", "\n", " Ref: §8.1.2 — Statistical reasoning with LLM interpretation\n", " \"\"\"\n", " system = (\n", " 'You are a data analyst. Interpret the following statistical results '\n", " 'in plain language for a business audience.'\n", " )\n", " corr_val = df['marketing_spend'].corr(df['revenue'])\n", " std_val = df['sales'].std()\n", " z = (df['sales'] - df['sales'].mean()) / df['sales'].std()\n", " n_anomalies = int((z.abs() > 3).sum())\n", " user = (\n", " f'Dataset summary:\\n'\n", " f'- marketing_spend ↔ revenue correlation: {corr_val:.3f}\\n'\n", " f'- Sales std: {std_val:.2f}\\n'\n", " f'- Anomalies (z>3): {n_anomalies}\\n'\n", " f'Provide a concise business interpretation.'\n", " )\n", " return llm_call(\n", " system, user,\n", " context_key='stats_interpretation',\n", " simulation_mode=SIMULATION_MODE,\n", " client=client,\n", " )\n", "\n", "interpretation = interpret_statistics(df)\n", "log.success('Statistical interpretation:', section='8.1.2')\n", "print(f'\\n{interpretation}')" ] }, { "cell_type": "markdown", "id": "2053f207", "metadata": {}, "source": [ "---\n", "## Section 2 — The Verification and Validation Agent\n", "\n", "**Chapter Ref:** §8.2 (pp. 211–215)\n", "\n", "While statistical reasoning allows agents to draw inferences from data,\n", "every analytical system must also be able to verify and defend its conclusions.\n", "The Verification and Validation (V&V) agent addresses this by ensuring that\n", "generated insights, predictions, and explanations remain trustworthy, factual,\n", "and logically coherent.\n", "\n", "A V&V agent performs four essential functions:\n", "\n", "### 2.1 — Fact-Checking (§8.2.1, pp. 211–212)\n", "\n", "When an analytical agent produces a statement such as *\"Revenue increased by\n", "12% last quarter\"*, the V&V agent retrieves corresponding records from verified\n", "data sources, performs independent calculations, and compares findings to the\n", "claim. The agent decomposes statements into discrete, verifiable claims and\n", "tests each as a hypothesis against reliable evidence.\n", "\n", "### 2.2 — Logical Coherence (§8.2.2, pp. 212–213)\n", "\n", "The agent evaluates whether reasoning steps follow valid inference patterns.\n", "\n", "> **📝 Note — Common Logical Errors (p. 212)** \n", "> Several categories of logical error recur in agent-generated analyses: **Premise–conclusion mismatches** (concluding revenue increased after noting a decline in unit sales), **Circular reasoning** (an intermediate step reuses the claim it's trying to prove), and **Scope violations** (a finding qualified for one segment is generalized to the entire population). The agent can decompose reasoning chains into directed graphs of claims and check each edge for valid inference.\n", "\n", "### 2.3 — Retrieval-Augmented Evaluation (§8.2.3, p. 213)\n", "\n", "Combines the interpretive flexibility of language models with deterministic\n", "verification techniques — retrieving historical records, recalculating figures\n", "independently, and checking alignment with conclusions.\n", "\n", "### 2.4 — Consistency Analysis (§8.2.4, pp. 213–214)\n", "\n", "In multi-agent systems, V&V agents perform structured post-processing:\n", "length/formatting checks, structural validations, rule-based constraints,\n", "adversarial red-teaming, and human-in-the-loop triggers.\n" ] }, { "cell_type": "markdown", "id": "e1452303", "metadata": {}, "source": [ "### Section 2.5 — Handling Conflicting Evidence: NLI Demo\n", "\n", "**Chapter Ref:** §8.2.5 (pp. 214–215)\n", "\n", "The code below demonstrates NLI using `facebook/bart-large-mnli` (~1.6 GB).\n", "If the model or its dependencies (`transformers`, `torch`) are unavailable,\n", "the notebook falls back to precomputed scores matching the chapter's Q4 profit\n", "claim example.\n", "\n", "**Premise:** *\"In Q4, the company reported a 15% year-over-year increase in\n", "net profit.\"* \n", "**Hypothesis:** *\"The company's profits increased by 15% last quarter.\"*\n", "\n", "> **📝 Note — NLI Classification Labels (p. 212)** \n", "> NLI models classify the relationship between evidence and a claim as one of three outcomes: **Supports (Entailment)** — the evidence confirms the claim; **Refutes (Contradiction)** — the evidence disproves the claim; **Neutral** — the evidence is inconclusive or unrelated. Advanced V&V agents balance evidence across multiple sources, weight their credibility, and synthesize results into a confidence score.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3722da9a", "metadata": {}, "outputs": [], "source": [ "# ── Section 2.5: NLI Demo (BART-MNLI) ──────────────────────\n", "# Ref: §8.2.5 — Handling Conflicting Evidence\n", "# Fallback: precomputed scores per Strategy §4.4\n", "\n", "# Premise and hypothesis from chapter §8.2.5\n", "premise = \"In Q4, the company reported a 15% year-over-year increase in net profit.\"\n", "hypothesis = \"The company's profits increased by 15% last quarter.\"\n", "\n", "nli_scores = None\n", "\n", "try:\n", " from transformers import AutoTokenizer, AutoModelForSequenceClassification\n", " import torch\n", "\n", " log.info('Loading BART-MNLI model (facebook/bart-large-mnli)...', section='8.2.5')\n", "\n", " tokenizer = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')\n", " nli_model = AutoModelForSequenceClassification.from_pretrained('facebook/bart-large-mnli')\n", " id2label = nli_model.config.id2label\n", "\n", " inputs = tokenizer(premise, hypothesis, return_tensors='pt', truncation=True)\n", "\n", " with torch.no_grad():\n", " logits = nli_model(**inputs).logits[0]\n", " probs = torch.softmax(logits, dim=-1).tolist()\n", "\n", " nli_scores = {id2label[i].lower(): round(probs[i], 4) for i in range(len(probs))}\n", " log.success(f'Live NLI inference complete: {nli_scores}', section='8.2.5')\n", "\n", "except (ImportError, OSError, Exception) as e:\n", " log.error(\n", " f'BART-MNLI model unavailable ({type(e).__name__}). '\n", " 'Displaying precomputed results.',\n", " section='8.2.5'\n", " )\n", " # Precomputed fallback — matches chapter's Q4 profit example\n", " nli_scores = {'entailment': 0.92, 'neutral': 0.05, 'contradiction': 0.03}\n", " log.info(f'Precomputed NLI scores: {nli_scores}', section='8.2.5')\n", "\n", "# ── Display results ─────────────────────────────────────────\n", "print(f'\\nPremise: \"{premise}\"')\n", "print(f'Hypothesis: \"{hypothesis}\"')\n", "print(f'\\nNLI Scores: {nli_scores}')\n", "\n", "if nli_scores:\n", " best = max(nli_scores, key=nli_scores.get)\n", " log.success(\n", " f'Verdict: {best.upper()} (confidence: {nli_scores[best]:.2%})',\n", " section='8.2.5'\n", " )" ] }, { "cell_type": "markdown", "id": "f4b5657e", "metadata": {}, "source": [ "---\n", "## Section 3 — The General Problem Solver\n", "\n", "**Chapter Ref:** §8.3 (pp. 215–219)\n", "\n", "One of the most ambitious designs in artificial intelligence is the **General\n", "Problem Solver (GPS)**, which unlike specialized agents that are confined to\n", "narrow domains, is designed to reason broadly, transfer knowledge between\n", "contexts, and develop new strategies for problems it has never encountered\n", "before.\n", "\n", "A GPS operates on the principle of **meta-reasoning** — the ability to reason\n", "about its own reasoning process. Instead of following a fixed set of rules, it\n", "evaluates its approach dynamically, identifies gaps in logic or knowledge, and\n", "refines its strategy.\n", "\n", "> **📝 Note — Aspirational Architecture (p. 219)** \n", "> The `GeneralProblemSolver` pseudocode below represents an aspirational architecture rather than a production-ready implementation. Current LLMs can approximate each stage through structured prompting, but a fully autonomous GPS loop remains an active area of research.\n", "\n", "> **📝 Note — Meta-Learning Strategies (p. 218)** \n", "> The GPS employs several meta-learning strategies: summarizing successful and failed strategies to refine heuristics, using persistent memory stores (e.g., LangGraph CheckpointStore) to retain context across sessions, and dynamically adjusting tool choice, parameter settings, and prompt design based on observed outcomes.\n" ] }, { "cell_type": "markdown", "id": "2c69cd4a", "metadata": {}, "source": [ "### The Five-Stage Cognitive Cycle\n", "\n", "**Chapter Ref:** §8.3.2, Figure 8.4 (p. 217)\n", "\n", "The GPS operates through a cyclical process of reasoning and learning composed\n", "of five interdependent stages:\n", "\n", "1. **Decompose problem** — Break down complex or unfamiliar challenges into\n", " smaller, manageable components.\n", "2. **Cross-Domain Analogy Search** — Identify patterns or analogies from other\n", " fields (e.g., biological processes for engineering optimization).\n", "3. **Synthesize and Hypothesize** — Combine insights from various domains to\n", " generate hypotheses or solution strategies.\n", "4. **Test and Reflect** — Test proposed solutions through simulation or\n", " empirical data. Evaluate performance and adjust.\n", "5. **Meta-Learning and Adaptation** — Record effective strategies, contextual\n", " factors, and performance metrics to refine future reasoning.\n", "\n", "### Figure 8.4 — The GPS Domain-Agnostic Reasoning Framework\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5997edcf", "metadata": {}, "outputs": [], "source": [ "# ── Figure 8.4 — GPS Domain-Agnostic Reasoning Framework (SVG) ──\n", "# Ref: §8.3.2, p. 217\n", "\n", "from IPython.display import SVG, display\n", "\n", "fig_8_4 = \"\"\"\n", "\n", " \n", " \n", " \n", " \n", " Meta-Learning\n", " & Adaptation\n", " \n", " \n", " 1. Decompose Problem\n", " Break into sub-problems\n", " \n", " \n", " 2. Analogy Search\n", " Cross-domain patterns\n", " \n", " \n", " 3. Synthesize &\n", " Hypothesize\n", " \n", " \n", " 4. Test & Reflect\n", " Evaluate & adjust\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " Figure 8.4 — The General Problem Solver's domain-agnostic reasoning framework (p. 217)\n", "\n", "\"\"\"\n", "display(SVG(fig_8_4))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "37260503", "metadata": {}, "outputs": [], "source": [ "# ── Section 3: GeneralProblemSolver Pseudocode ──────────────\n", "# Ref: §8.3 — Aspirational architecture (pseudocode, clearly\n", "# annotated as non-production). Verbatim from chapter.\n", "\n", "class GeneralProblemSolver:\n", " \"\"\"Aspirational GPS architecture illustrating the five-stage cycle.\n", "\n", " NOTE: This is pseudocode. Each stage could be implemented as a\n", " separate LLM prompt within a framework such as LangGraph or CrewAI,\n", " with the orchestration loop managing state transitions and confidence\n", " thresholds.\n", "\n", " Ref: §8.3 — The General Problem Solver, Chapter 8\n", " \"\"\"\n", "\n", " def __init__(self, threshold: float = 0.70):\n", " self.threshold = threshold\n", " self.knowledge_base = []\n", "\n", " def solve(self, problem: str, max_iterations: int = 5):\n", " # Stage 1 -- Decompose\n", " sub_problems = self.decompose(problem)\n", "\n", " for iteration in range(max_iterations):\n", " # Stage 2 -- Analogy Search\n", " analogies = self.search_analogies(sub_problems)\n", "\n", " # Stage 3 -- Hypothesize\n", " hypotheses = self.generate_hypotheses(\n", " sub_problems, analogies\n", " )\n", "\n", " # Stage 4 -- Test\n", " results = self.test_hypotheses(hypotheses)\n", "\n", " # Stage 5 -- Meta-Learn\n", " if results.confidence >= self.threshold:\n", " self.update_knowledge_base(results)\n", " return results.solution\n", "\n", " # Confidence too low -- refine and retry\n", " sub_problems = self.refine(sub_problems, results)\n", "\n", " return self.best_effort_solution(results)\n", "\n", " # ── Stage stubs (would be LLM calls in production) ──────\n", " def decompose(self, problem): raise NotImplementedError\n", " def search_analogies(self, subs): raise NotImplementedError\n", " def generate_hypotheses(self, subs, anl): raise NotImplementedError\n", " def test_hypotheses(self, hyps): raise NotImplementedError\n", " def update_knowledge_base(self, results): raise NotImplementedError\n", " def refine(self, subs, results): raise NotImplementedError\n", " def best_effort_solution(self, results): raise NotImplementedError\n", "\n", "\n", "log.info('GeneralProblemSolver pseudocode class defined.', section='8.3')\n", "log.info(\n", " 'This is an aspirational architecture — see §8.5 for a runnable '\n", " 'implementation using the GPS five-stage cycle.',\n", " section='8.3'\n", ")" ] }, { "cell_type": "markdown", "id": "3a0752a5", "metadata": {}, "source": [ "---\n", "## Section 4 — Case Study: News Fact-Checking Assistant\n", "\n", "**Chapter Ref:** §8.4 (pp. 220–226) — An Agent for Journalistic Integrity\n", "\n", "A major newsroom deployed a Verification and Validation agent to assist\n", "editors during high-pressure events where speed and accuracy must coexist.\n", "The agent is composed of three cooperating components:\n", "\n", "- **Claim Extractor** — Scans text for verifiable numerical statements and\n", " converts them into structured records with metric, value, entity, and period.\n", "- **Evidence Retriever** — Queries curated official sources to obtain\n", " authoritative values with provenance.\n", "- **Verifier** — Compares claimed and authoritative values with defined\n", " tolerances, then assigns a label: `Confirmed`, `Mostly True`, `Contradicted`,\n", " or `Unverified`.\n", "\n", "> **📝 Note — Tolerance Thresholds (p. 223)** \n", "> For percentage-based claims, the system applies a difference threshold of 0.5 percentage points. For monetary values, it applies a threshold of $500,000 to absorb common rounding in public communications. In production, calibrate tolerances by beat and metric.\n", "\n", "> **📝 Note — Operational Results (p. 226)** \n", "> Deploying this agent reduced verification time from hours to minutes. Editors retained control over the narrative while trusting that quantitative claims were consistent with official data. For production use, replace the in-memory store with your data platform, introduce freshness checks, and route low-confidence outcomes to a human review queue.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "80f80e2f", "metadata": {}, "outputs": [], "source": [ "# ── Section 4: Trusted Database & Article Text ──────────────\n", "# Ref: §8.4 — Authoritative data store and test article\n", "# (verbatim from chapter)\n", "\n", "from typing import List, Dict, Any\n", "\n", "# ── Authoritative data: The trusted internal database ───────\n", "# In production, this would be a warehouse or official API with\n", "# access control, versioning, and data freshness policies.\n", "\n", "trusted_database: Dict[str, Dict[str, Any]] = {\n", " \"ottawa_unemployment_rate_change_2024\": {\n", " \"value\": -0.048,\n", " \"source\": \"Statistics Canada, Labour Force Survey, Table 14-10-0287-01\",\n", " \"notes\": \"Annual change from 2023 to 2024 for Ottawa--Gatineau CMA.\"\n", " },\n", " \"city_budget_surplus_2024\": {\n", " \"value\": 15_200_000,\n", " \"source\": \"City of Ottawa Annual Financial Report 2024\",\n", " \"notes\": \"Reported surplus for the fiscal year ending 2024.\"\n", " }\n", "}\n", "\n", "article_text = (\n", " \"A new report on Ottawa's economy shows promising signs of recovery. \"\n", " \"According to official city documents, the city's unemployment rate \"\n", " \"fell by 5% last year, a significant improvement driven by the tech \"\n", " \"sector. Furthermore, the municipal government reported a budget \"\n", " \"surplus of $12 million for the 2024 fiscal year.\"\n", ")\n", "\n", "log.info('Trusted database loaded (2 entries).', section='8.4')\n", "log.info(f'Article text loaded ({len(article_text)} chars).', section='8.4')" ] }, { "cell_type": "code", "execution_count": null, "id": "5d04521d", "metadata": {}, "outputs": [], "source": [ "# ── Section 4: Claim Extraction ─────────────────────────────\n", "# Ref: §8.4 — extract_claims_from_text()\n", "# LLM-first with regex fallback, verbatim from chapter.\n", "# Mock key: \"claim_extraction\"\n", "\n", "@fail_gracefully(fallback_value=[], section='8.4')\n", "def extract_claims_from_text(text: str) -> List[Dict[str, Any]]:\n", " \"\"\"Extract verifiable numeric claims from article text.\n", "\n", " Uses llm_call() with context_key='claim_extraction' for robust\n", " parsing. If the LLM path fails, a regex fallback keeps the\n", " workflow operational.\n", "\n", " Returns claims with fields: claim_text, metric, value, entity, period.\n", "\n", " Ref: §8.4 — Claim Extraction (LLM-first with a safe fallback)\n", " \"\"\"\n", " system_prompt = (\n", " \"You are an expert fact-checker. Extract verifiable numeric claims \"\n", " \"from the text. For each claim, return claim_text, metric, value, \"\n", " \"entity, period as a JSON object in a JSON field called 'claims' \"\n", " \"which is a list.\"\n", " )\n", " user_prompt = f'Extract claims from: {text}'\n", "\n", " raw = llm_call(\n", " system_prompt, user_prompt,\n", " context_key='claim_extraction',\n", " simulation_mode=SIMULATION_MODE,\n", " client=client,\n", " )\n", "\n", " # ── Parse LLM / mock response ───────────────────────────\n", " try:\n", " data = json.loads(raw)\n", " claims = data.get('claims', [])\n", " for c in claims:\n", " c['claim_text'] = str(c.get('claim_text', '')).strip()\n", " c['metric'] = str(c.get('metric', '')).strip()\n", " c['value'] = str(c.get('value', '')).strip()\n", " c['entity'] = str(c.get('entity', '')).strip()\n", " c['period'] = str(c.get('period', '')).strip()\n", " if claims:\n", " return claims\n", " except (json.JSONDecodeError, AttributeError):\n", " log.error('LLM response not valid JSON; trying regex fallback.', section='8.4')\n", "\n", " # ── Regex fallback (from chapter) ───────────────────────\n", " claims = []\n", " m1 = re.search(r'(unemployment.*?fell by\\s+(-?\\d+(\\.\\d+)?)\\s*%)', text, re.I)\n", " if m1:\n", " claims.append({\n", " 'claim_text': m1.group(1),\n", " 'metric': 'unemployment rate change',\n", " 'value': f'{m1.group(2)}%',\n", " 'entity': 'Ottawa',\n", " 'period': '2024'\n", " })\n", "\n", " m2 = re.search(\n", " r'budget surplus of\\s*\\$?\\s*([0-9]+(\\.\\d+)?)\\s*(million)?\\s*for the\\s*(\\d{4})',\n", " text, re.I\n", " )\n", " if m2:\n", " amount = float(m2.group(1)) * (1_000_000 if m2.group(3) else 1)\n", " claims.append({\n", " 'claim_text': m2.group(0),\n", " 'metric': 'budget surplus',\n", " 'value': str(amount),\n", " 'entity': 'Ottawa',\n", " 'period': m2.group(4)\n", " })\n", "\n", " return claims\n", "\n", "\n", "# ── Run extraction ──────────────────────────────────────────\n", "claims = extract_claims_from_text(article_text)\n", "log.success(f'Extracted {len(claims)} claim(s).', section='8.4')\n", "for i, c in enumerate(claims, 1):\n", " print(f' Claim {i}: {c[\"claim_text\"]}')\n", " print(f' metric={c[\"metric\"]}, value={c[\"value\"]}, '\n", " f'entity={c[\"entity\"]}, period={c[\"period\"]}')" ] }, { "cell_type": "code", "execution_count": null, "id": "8dfad59d", "metadata": {}, "outputs": [], "source": [ "# ── Section 4: Mapping, Parsing, and Verification ───────────\n", "# Ref: §8.4 — _map_to_db_key(), _parse_percentage(),\n", "# verify_claim() — verbatim from chapter.\n", "# Tolerances: 0.5 pp for percentages, $500K for monetary.\n", "\n", "def _map_to_db_key(metric: str, entity: str, period: str) -> str:\n", " \"\"\"Map a claim's fields to a canonical trusted_database key.\n", "\n", " Acts as a simplified Evidence Retriever: resolves each claim\n", " to an authoritative record.\n", "\n", " Ref: §8.4 — Mapping, parsing, and verification\n", " \"\"\"\n", " m = (metric or '').lower()\n", " e = (entity or '').lower()\n", " p = str(period or '')\n", " if 'unemployment' in m and 'ottawa' in e and '2024' in p:\n", " return 'ottawa_unemployment_rate_change_2024'\n", " if 'budget surplus' in m and 'ottawa' in e and '2024' in p:\n", " return 'city_budget_surplus_2024'\n", " return ''\n", "\n", "\n", "def _parse_percentage(value: str) -> float:\n", " \"\"\"Parse a percentage string into a decimal.\n", "\n", " Ref: §8.4 — _parse_percentage()\n", " \"\"\"\n", " v = value.replace(' ', '')\n", " return float(v[:-1]) / 100 if v.endswith('%') else float(v)\n", "\n", "\n", "@fail_gracefully(\n", " fallback_value={'status': 'Error', 'details': 'Verification unavailable'},\n", " section='8.4'\n", ")\n", "def verify_claim(claim: Dict[str, Any]) -> Dict[str, Any]:\n", " \"\"\"Verify a single claim against the trusted database.\n", "\n", " Applies tolerance-based comparison:\n", " - Percentage claims: 0.5 percentage-point threshold\n", " - Monetary claims: $500,000 threshold\n", "\n", " Returns a verdict dict with status, details, source, and notes.\n", "\n", " Ref: §8.4 — verify_claim()\n", " \"\"\"\n", " db_key = _map_to_db_key(\n", " claim.get('metric', ''),\n", " claim.get('entity', ''),\n", " claim.get('period', '')\n", " )\n", " if db_key not in trusted_database:\n", " return {\n", " 'claim': claim.get('claim_text', ''),\n", " 'status': 'Unverified',\n", " 'details': 'No matching internal data source.'\n", " }\n", "\n", " evidence = trusted_database[db_key]\n", " actual = evidence['value']\n", " claimed_raw = claim.get('value', '')\n", "\n", " try:\n", " if isinstance(claimed_raw, str) and '%' in claimed_raw:\n", " claimed = _parse_percentage(claimed_raw)\n", " diff = abs(claimed - actual)\n", " tol = 0.005 # 0.5 percentage points\n", " status = (\n", " 'Confirmed' if diff == 0\n", " else ('Mostly True' if diff <= tol else 'Contradicted')\n", " )\n", " details = (\n", " f'Claimed {claimed_raw}, actual '\n", " f'{round(actual * 100, 2)}% '\n", " f'(\\u0394={round(diff * 100, 2)} pp)'\n", " )\n", " else:\n", " claimed = float(claimed_raw)\n", " diff = abs(claimed - actual)\n", " tol = 500_000 # $500K\n", " status = (\n", " 'Confirmed' if diff == 0\n", " else ('Mostly True' if diff <= tol else 'Contradicted')\n", " )\n", " details = (\n", " f'Claimed ${int(claimed):,}, actual '\n", " f'${int(actual):,} '\n", " f'(\\u0394=${int(diff):,})'\n", " )\n", " except Exception:\n", " return {\n", " 'claim': claim.get('claim_text', ''),\n", " 'status': 'Error',\n", " 'details': f'Could not parse value {claimed_raw}'\n", " }\n", "\n", " return {\n", " 'claim': claim.get('claim_text', ''),\n", " 'metric': claim.get('metric', ''),\n", " 'entity': claim.get('entity', ''),\n", " 'period': claim.get('period', ''),\n", " 'status': status,\n", " 'details': details,\n", " 'source': evidence['source'],\n", " 'notes': evidence.get('notes', '')\n", " }\n", "\n", "\n", "log.info('Verification functions defined.', section='8.4')" ] }, { "cell_type": "code", "execution_count": null, "id": "a741f824", "metadata": {}, "outputs": [], "source": [ "# ── Section 4: Orchestration & Editorial Report ─────────────\n", "# Ref: §8.4 — Orchestration, reporting, and demonstration\n", "\n", "log.info('Starting Fact-Check Agent...', section='8.4')\n", "\n", "if not claims:\n", " log.error('No verifiable numeric claims found.', section='8.4')\n", "else:\n", " log.info(f'Found {len(claims)} claim(s). Verifying...', section='8.4')\n", " print('-' * 72)\n", "\n", " for c in claims:\n", " r = verify_claim(c)\n", "\n", " # Color-code by verdict\n", " claim_text = r.get('claim', '')\n", " status = r.get('status', 'Unknown')\n", " details = r.get('details', '')\n", "\n", " if status in ('Confirmed', 'Mostly True'):\n", " log.success(f'[{status}] {claim_text}', section='8.4')\n", " else:\n", " log.error(f'[{status}] {claim_text}', section='8.4')\n", "\n", " print(f' Details: {details}')\n", " if r.get('source'):\n", " print(f' Source: {r[\"source\"]}')\n", " if r.get('notes'):\n", " print(f' Notes: {r[\"notes\"]}')\n", " print('-' * 72)\n", "\n", "log.success('Fact-Check Agent complete. Editorial report above.', section='8.4')" ] }, { "cell_type": "markdown", "id": "1e42aaf7", "metadata": {}, "source": [ "---\n", "## Section 5 — Case Study: Cross-Disciplinary Hypothesis Generation\n", "\n", "**Chapter Ref:** §8.5 (pp. 226–231) — Applying Ecological Resilience to Power Grid Stability\n", "\n", "A multidisciplinary research team is investigating whether principles from\n", "ecological network resilience can inform strategies for preventing cascading\n", "failures in electrical power grids. The GPS agent is organized into three\n", "cooperating modules mapping to the five-stage cycle from Figure 8.4:\n", "\n", "- **Decomposer** (Stage 1) — Breaks the research question into tractable\n", " sub-problems.\n", "- **Analogy Engine** (Stages 2 & 3) — Searches for cross-domain parallels\n", " in ecology and synthesizes them into a testable hypothesis.\n", "- **Hypothesis Evaluator** (Stages 4 & 5) — Scores the hypothesis against\n", " a rubric, logs the strategy, and decides whether to accept or refine.\n", "\n", "> **📝 Note — Failure-Driven Refinement (p. 230)** \n", "> The two iterations demonstrate the GPS's defining behavior: failure-driven refinement. In the first pass, the broad decomposition produces a hypothesis scoring 0.40 — below the 0.70 threshold. The meta-learning engine diagnoses the weakness, logs a refinement hint, and triggers a second iteration. The sharper decomposition then yields a hypothesis scoring 0.78, clearing the threshold. The strategy log provides a transparent audit trail of how the agent's reasoning evolved.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "12422cc7", "metadata": {}, "outputs": [], "source": [ "# ── Section 5: Setup & Configuration ────────────────────────\n", "# Ref: §8.5 — GPS Case Study Setup\n", "\n", "CONFIDENCE_THRESHOLD = 0.70\n", "\n", "RESEARCH_QUESTION = (\n", " \"Can ecological network resilience principles inform strategies \"\n", " \"for preventing cascading failures in electrical power grids?\"\n", ")\n", "\n", "log.info(f'GPS Confidence threshold: {CONFIDENCE_THRESHOLD}', section='8.5')\n", "log.info(f'Research question: {RESEARCH_QUESTION}', section='8.5')" ] }, { "cell_type": "code", "execution_count": null, "id": "94279719", "metadata": {}, "outputs": [], "source": [ "# ── Section 5, Stage 1: Decompose ───────────────────────────\n", "# Ref: §8.5 — decompose() with two fallback paths\n", "# Mock keys: \"gps_decompose_v1\" (broad), \"gps_decompose_v2\" (refined)\n", "\n", "@fail_gracefully(fallback_value=['Unable to decompose'], section='8.5')\n", "def decompose(question: str, refinement_hint: str = '') -> List[str]:\n", " \"\"\"Decompose a research question into sub-problems.\n", "\n", " On the first iteration, uses a broad decomposition.\n", " After a low-confidence result, the meta-learning stage provides\n", " a refinement_hint for a sharper second pass.\n", "\n", " Ref: §8.5 — Stage 1, Decompose\n", " \"\"\"\n", " context_key = 'gps_decompose_v2' if refinement_hint else 'gps_decompose_v1'\n", "\n", " prompt = f'Decompose this research question into 3 sub-problems:\\n{question}'\n", " if refinement_hint:\n", " prompt += f'\\nRefinement guidance: {refinement_hint}'\n", "\n", " result = llm_call(\n", " 'You are a research decomposition agent.',\n", " prompt,\n", " context_key=context_key,\n", " simulation_mode=SIMULATION_MODE,\n", " client=client,\n", " )\n", "\n", " # Parse JSON list or return as-is\n", " if result.startswith('['):\n", " return json.loads(result)\n", " return [result]\n", "\n", "\n", "log.info('decompose() defined.', section='8.5')" ] }, { "cell_type": "code", "execution_count": null, "id": "638a42c7", "metadata": {}, "outputs": [], "source": [ "# ── Section 5, Stage 2: Analogy Search ──────────────────────\n", "# Ref: §8.5 — search_analogies()\n", "# Mock key: \"gps_analogies\"\n", "\n", "@fail_gracefully(fallback_value={'error': 'Analogy search unavailable'}, section='8.5')\n", "def search_analogies(sub_problems: List[str]) -> Dict[str, str]:\n", " \"\"\"Search for cross-domain analogies from ecology.\n", "\n", " Ref: §8.5 — Stage 2, Cross-domain analogy search\n", " \"\"\"\n", " prompt = (\n", " 'For each sub-problem, find an analogy from ecology:\\n'\n", " + '\\n'.join(sub_problems)\n", " )\n", " result = llm_call(\n", " 'You are an ecology-to-engineering analogy expert.',\n", " prompt,\n", " context_key='gps_analogies',\n", " simulation_mode=SIMULATION_MODE,\n", " client=client,\n", " )\n", " try:\n", " return json.loads(result)\n", " except json.JSONDecodeError:\n", " return {'raw': result}\n", "\n", "\n", "log.info('search_analogies() defined.', section='8.5')" ] }, { "cell_type": "code", "execution_count": null, "id": "b50edb96", "metadata": {}, "outputs": [], "source": [ "# ── Section 5, Stage 3: Hypothesize ─────────────────────────\n", "# Ref: §8.5 — generate_hypothesis()\n", "# Mock key: \"gps_hypothesis\"\n", "\n", "@fail_gracefully(fallback_value='Hypothesis generation unavailable.', section='8.5')\n", "def generate_hypothesis(sub_problems: List[str],\n", " analogies: Dict[str, str]) -> str:\n", " \"\"\"Synthesize sub-problems and analogies into a testable hypothesis.\n", "\n", " Ref: §8.5 — Stage 3, Synthesize and hypothesize\n", " \"\"\"\n", " prompt = (\n", " 'Synthesize these sub-problems and analogies into one '\n", " 'testable hypothesis:\\n'\n", " f'Sub-problems: {json.dumps(sub_problems)}\\n'\n", " f'Analogies: {json.dumps(analogies)}'\n", " )\n", " return llm_call(\n", " 'You are a hypothesis synthesis agent.',\n", " prompt,\n", " context_key='gps_hypothesis',\n", " simulation_mode=SIMULATION_MODE,\n", " client=client,\n", " )\n", "\n", "\n", "log.info('generate_hypothesis() defined.', section='8.5')" ] }, { "cell_type": "code", "execution_count": null, "id": "5533d546", "metadata": {}, "outputs": [], "source": [ "# ── Section 5, Stage 4: Evaluate ────────────────────────────\n", "# Ref: §8.5 — evaluate_hypothesis()\n", "# Deterministic scores: iteration 1 → 0.40, iteration 2 → 0.78\n", "\n", "RUBRIC = ['specificity', 'cross_domain_grounding', 'testability']\n", "\n", "@fail_gracefully(\n", " fallback_value={'scores': {}, 'confidence': 0.0, 'passed': False},\n", " section='8.5'\n", ")\n", "def evaluate_hypothesis(hypothesis: str,\n", " iteration: int) -> Dict[str, Any]:\n", " \"\"\"Score a hypothesis against the rubric.\n", "\n", " Each criterion is scored 0-1. The mean becomes the overall\n", " confidence score. Deterministic scores simulate a weak first\n", " pass and a strong refined second pass.\n", "\n", " Ref: §8.5 — Stage 4, Test and reflect\n", " \"\"\"\n", " if iteration == 1:\n", " # Broad decomposition → low confidence\n", " scores = {\n", " 'specificity': 0.3,\n", " 'cross_domain_grounding': 0.5,\n", " 'testability': 0.4\n", " }\n", " else:\n", " # Refined decomposition → passes threshold\n", " scores = {\n", " 'specificity': 0.8,\n", " 'cross_domain_grounding': 0.85,\n", " 'testability': 0.7\n", " }\n", "\n", " confidence = sum(scores.values()) / len(scores)\n", " return {\n", " 'scores': scores,\n", " 'confidence': round(confidence, 2),\n", " 'passed': confidence >= CONFIDENCE_THRESHOLD\n", " }\n", "\n", "\n", "log.info('evaluate_hypothesis() defined.', section='8.5')" ] }, { "cell_type": "code", "execution_count": null, "id": "9183e86c", "metadata": {}, "outputs": [], "source": [ "# ── Section 5, Stage 5: Meta-Learning Orchestration Loop ────\n", "# Ref: §8.5 — 2-iteration loop demonstrating failure-driven\n", "# refinement. Verbatim logic from chapter.\n", "\n", "strategy_log: List[Dict[str, Any]] = []\n", "\n", "for iteration in range(1, 3):\n", " print(f'\\n{\"=\" * 60}')\n", " log.info(f'GPS Iteration {iteration}', section='8.5')\n", " print(f'{\"=\" * 60}')\n", "\n", " # Retrieve refinement hint from previous failure, if any\n", " hint = strategy_log[-1].get('refinement_hint', '') if strategy_log else ''\n", "\n", " # Stage 1 — Decompose\n", " subs = decompose(RESEARCH_QUESTION, refinement_hint=hint)\n", " log.info(f'Sub-problems:', section='8.5')\n", " for s in subs:\n", " print(f' - {s}')\n", "\n", " # Stage 2 — Analogy Search\n", " analogies = search_analogies(subs)\n", " log.info(f'Analogies:', section='8.5')\n", " for k, v in analogies.items():\n", " print(f' [{k}] {v}')\n", "\n", " # Stage 3 — Hypothesize\n", " hypothesis = generate_hypothesis(subs, analogies)\n", " log.info(f'Hypothesis: {hypothesis}', section='8.5')\n", "\n", " # Stage 4 — Evaluate\n", " result = evaluate_hypothesis(hypothesis, iteration)\n", " log.info(f'Evaluation: confidence={result[\"confidence\"]}, '\n", " f'passed={result[\"passed\"]}', section='8.5')\n", " for k, v in result['scores'].items():\n", " print(f' {k}: {v}')\n", "\n", " # Stage 5 — Meta-learn\n", " entry = {\n", " 'iteration': iteration,\n", " 'confidence': result['confidence'],\n", " 'passed': result['passed'],\n", " 'hypothesis': hypothesis\n", " }\n", "\n", " if not result['passed']:\n", " entry['refinement_hint'] = (\n", " 'Previous decomposition was too broad. Focus on quantitative '\n", " 'graph metrics that are measurable in both ecological and '\n", " 'engineered networks.'\n", " )\n", " log.error(\n", " f'BELOW THRESHOLD ({result[\"confidence\"]} < {CONFIDENCE_THRESHOLD}) '\n", " f'— refining.',\n", " section='8.5'\n", " )\n", " log.info(f'Refinement hint: {entry[\"refinement_hint\"]}', section='8.5')\n", " else:\n", " log.success(\n", " f'PASSED ({result[\"confidence\"]} >= {CONFIDENCE_THRESHOLD}) '\n", " f'— hypothesis accepted.',\n", " section='8.5'\n", " )\n", "\n", " strategy_log.append(entry)\n", "\n", "# ── Print strategy log ──────────────────────────────────────\n", "print(f'\\n{\"=\" * 60}')\n", "log.success('GPS Strategy Log:', section='8.5')\n", "print(json.dumps(strategy_log, indent=2))" ] }, { "cell_type": "markdown", "id": "516620c0", "metadata": {}, "source": [ "---\n", "## Section 6 — Bringing It All Together: The Tri-Agent Pipeline\n", "\n", "**Chapter Ref:** §8.6 (pp. 231–232)\n", "\n", "Throughout this chapter, we treated the Data Analysis agent, the Verification\n", "and Validation agent, and the General Problem Solver as independent modules.\n", "In practice, the three work best as stages in a single pipeline:\n", "\n", "1. **Data Analysis Agent** — Surfaces candidate insights from raw data.\n", "2. **Verification & Validation Agent** — Stress-tests those insights for\n", " factual accuracy and logical coherence.\n", "3. **General Problem Solver** — Steps in when neither agent can resolve a\n", " question from its existing repertoire.\n", "\n", "The pipeline follows a **trust-then-escalate** pattern: every insight produced\n", "by Stage 1 must pass through the verification gate in Stage 2 before reaching\n", "the user. Claims that fail verification are not discarded but routed to the\n", "GPS, which can decompose the question, search for analogies, or generate\n", "hypotheses for further testing.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "bf5ea58b", "metadata": {}, "outputs": [], "source": [ "# ── Section 6: Tri-Agent Pipeline ───────────────────────────\n", "# Ref: §8.6 — Bringing It All Together\n", "# Demonstrates the \"trust-then-escalate\" pattern by wiring\n", "# Data Analysis mock output → V&V verification → GPS fallback.\n", "\n", "def tri_agent_pipeline(\n", " user_query: str,\n", " dataset: pd.DataFrame,\n", " trusted_sources: Dict[str, Dict[str, Any]]\n", ") -> Dict[str, Any]:\n", " \"\"\"Orchestrate the three agents in a trust-then-escalate pipeline.\n", "\n", " Stage 1: Data Analysis Agent — generates candidate insights.\n", " Stage 2: V&V Agent — verifies each insight against trusted sources.\n", " Stage 3: GPS — decomposes and resolves unverified/flagged items.\n", "\n", " Ref: §8.6 — tri_agent_pipeline()\n", " \"\"\"\n", " report = {'verified': [], 'flagged': [], 'gps_resolved': []}\n", "\n", " # ── Stage 1: Data Analysis Agent ─────────────────────────\n", " log.info('Stage 1: Data Analysis Agent — generating insights...', section='8.6')\n", "\n", " # Simulate data analysis insights (mock outputs from earlier sections)\n", " insights = [\n", " {\n", " 'claim_text': \"Ottawa's unemployment rate fell by 5% in 2024\",\n", " 'metric': 'unemployment rate change',\n", " 'value': '5%',\n", " 'entity': 'Ottawa',\n", " 'period': '2024',\n", " 'viz_recommendation': recommend_visualization(dataset, 'Compare unemployment across years')\n", " },\n", " {\n", " 'claim_text': 'Budget surplus of $12 million for 2024',\n", " 'metric': 'budget surplus',\n", " 'value': '12000000',\n", " 'entity': 'Ottawa',\n", " 'period': '2024',\n", " 'viz_recommendation': recommend_visualization(dataset, 'Show budget trend over time')\n", " },\n", " {\n", " 'claim_text': 'Marketing spend strongly correlates with revenue',\n", " 'metric': 'marketing correlation',\n", " 'value': '0.708',\n", " 'entity': 'All regions',\n", " 'period': '2024',\n", " 'viz_recommendation': recommend_visualization(dataset, 'Show relationship between spend and revenue')\n", " }\n", " ]\n", "\n", " for ins in insights:\n", " log.success(\n", " f'Insight: {ins[\"claim_text\"]} (viz: {ins[\"viz_recommendation\"]})',\n", " section='8.6'\n", " )\n", "\n", " # ── Stage 2: V&V Agent ──────────────────────────────────\n", " log.info('Stage 2: V&V Agent — verifying insights...', section='8.6')\n", "\n", " for insight in insights:\n", " verdict = verify_claim(insight)\n", " status = verdict.get('status', 'Unverified')\n", "\n", " if status in ('Confirmed', 'Mostly True'):\n", " insight['confidence'] = 0.95 if status == 'Confirmed' else 0.80\n", " report['verified'].append(insight)\n", " log.success(f'[{status}] {insight[\"claim_text\"]}', section='8.6')\n", " else:\n", " insight['flag'] = verdict.get('details', 'Verification failed')\n", " report['flagged'].append(insight)\n", " log.error(f'[{status}] {insight[\"claim_text\"]} — flagged for GPS', section='8.6')\n", "\n", " # ── Stage 3: GPS (fallback for flagged items) ───────────\n", " unresolved = report['flagged']\n", " if unresolved:\n", " log.info(\n", " f'Stage 3: GPS — resolving {len(unresolved)} flagged item(s)...',\n", " section='8.6'\n", " )\n", " for item in unresolved:\n", " # GPS decomposes the discrepancy into sub-questions\n", " question = (\n", " f'Why does the claimed value ({item[\"value\"]}) for '\n", " f'{item[\"metric\"]} differ from the trusted source? '\n", " f'Flag: {item[\"flag\"]}'\n", " )\n", " subs = decompose(question)\n", " analogies = search_analogies(subs)\n", " hypothesis = generate_hypothesis(subs, analogies)\n", "\n", " item['gps_hypothesis'] = hypothesis\n", " item['gps_sub_problems'] = subs\n", " report['gps_resolved'].append(item)\n", "\n", " log.success(\n", " f'GPS resolved: {item[\"claim_text\"]} → hypothesis generated',\n", " section='8.6'\n", " )\n", " else:\n", " log.success('No flagged items — GPS not needed.', section='8.6')\n", "\n", " return report\n", "\n", "\n", "# ── Run the pipeline ────────────────────────────────────────\n", "log.info('=' * 56, section='8.6')\n", "log.info('TRI-AGENT PIPELINE — Trust-Then-Escalate Demo', section='8.6')\n", "log.info('=' * 56, section='8.6')\n", "\n", "pipeline_report = tri_agent_pipeline(\n", " user_query='Analyze Ottawa economic indicators for 2024',\n", " dataset=df,\n", " trusted_sources=trusted_database\n", ")\n", "\n", "# ── Summary ─────────────────────────────────────────────────\n", "print(f'\\n{\"=\" * 60}')\n", "log.success('Pipeline Summary:', section='8.6')\n", "print(f' Verified insights: {len(pipeline_report[\"verified\"])}')\n", "print(f' Flagged insights: {len(pipeline_report[\"flagged\"])}')\n", "print(f' GPS-resolved: {len(pipeline_report[\"gps_resolved\"])}')\n", "print(f'{\"=\" * 60}')" ] }, { "cell_type": "markdown", "id": "f0303194", "metadata": {}, "source": [ "---\n", "## Summary\n", "\n", "> *Ref: Chapter Summary (pp. 232–233)*\n", "\n", "This notebook demonstrated the three agent archetypes from Chapter 8:\n", "\n", "1. **Data Analysis Agent** (§8.1) — Visualization recommendation, OLS\n", " regression, and z-score anomaly detection on synthetic sales data.\n", "2. **Verification & Validation Agent** (§8.2) — NLI-based evidence scoring\n", " (BART-MNLI), factual consistency checking with tolerance-based verification.\n", "3. **General Problem Solver** (§8.3) — Five-stage meta-reasoning cycle with\n", " failure-driven refinement (confidence 0.40 → 0.78 over two iterations).\n", "\n", "Two case studies grounded these architectures in practice:\n", "\n", "- **News Fact-Checking** (§8.4) — Claim extraction, evidence retrieval,\n", " tolerance-based verification under deadline pressure.\n", "- **Cross-Disciplinary GPS** (§8.5) — Ecological resilience principles\n", " applied to power grid stability, demonstrating the full meta-reasoning cycle.\n", "\n", "The **Tri-Agent Pipeline** (§8.6) wired these into a trust-then-escalate\n", "architecture where confidence scores accompany every finding and no unverified\n", "claim reaches a decision-maker without an explicit flag.\n", "\n", "> **What's Next:** Chapter 9 examines AI-powered coding agents, showing how the cognitive loop, verification pipeline, and meta-learning strategies introduced here translate into tools that generate, review, and refactor production code.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }