{ "cells": [ { "cell_type": "markdown", "id": "7c212630", "metadata": {}, "source": [ "# Chapter 15: Education and Knowledge Agents\n", "\n", "> **Book:** *30 Agents Every AI Engineer Must Build* (Packt Publishing, 2026) \n", "> **Author:** Imran Ahmad \n", "> **Chapter Pages:** pp. 421–454 \n", "\n", "---\n", "\n", "## Introduction\n", "\n", "> *\"Education is not the filling of a pail, but the lighting of a fire.\"* \n", "> — Commonly attributed to W.B. Yeats, paraphrasing Plutarch\n", "\n", "Teaching is hard — harder, in many ways, than playing chess or folding proteins. Those problems have clean objective functions. Education does not. To teach well, you need to **read a mind you cannot observe directly**, calibrate a challenge level that shifts with every interaction, and somehow make the whole experience feel less like an obstacle course and more like a conversation (Ch.15, p. 421).\n", "\n", "This chapter tackles two complementary challenges:\n", "\n", "1. **Individual instruction** — How a single agent can personalize curricula, track progress across dozens of competencies, and deliver feedback in real time, democratizing expert-level instruction at scale.\n", "2. **Collective reasoning** — How teams of agents, each with partial knowledge and limited reasoning capabilities, produce emergent behaviors that exceed the sum of their individual contributions.\n", "\n", "---\n", "\n", "### What This Notebook Implements\n", "\n", "**Part I — Education Intelligence Agent** (pp. 422–441) \n", "A POMDP-based adaptive tutor that combines five core modules:\n", "- **Knowledge Graph** — A DAG curriculum with prerequisite relationships (pp. 423–426)\n", "- **Student Model** — Probabilistic mastery estimates using Bayesian Knowledge Tracing (pp. 425–434)\n", "- **Curriculum Planner** — ZPD-aligned objective selection with downstream impact scoring (pp. 426–429)\n", "- **Adaptive Placement Test** — IRT 2PL diagnostics with Fisher information item selection (pp. 428–431)\n", "- **Spaced Repetition** — SM-2 algorithm for long-term retention scheduling (pp. 435–437)\n", "- **Feedback Generator** — Two-stage misconception detection with pedagogical nudges (pp. 438–440)\n", "\n", "**Part II — Collective Intelligence Agent** (pp. 441–453) \n", "A multi-agent collaboration pattern with:\n", "- **CollaborativeAgent** — Dual propose/critique pathways with confidence metadata (pp. 442–444)\n", "- **ConsensusEngine** — Weighted multi-round consensus with adversarial critic rotation (pp. 445–450)\n", "- **Emergent Intelligence** — Cross-pollination, constraint relaxation, and analogical transfer (pp. 452–453)\n", "\n", "---\n", "\n", "### Key Architectural Insight: POMDP Formulation\n", "\n", "The Education Intelligence agent is formally modeled as a **Partially Observable Markov Decision Process (POMDP)**, extending the stochastic decision-making frameworks from Chapter 13 (p. 422). A POMDP is required because the most critical variable — the student's true mastery — is a **hidden state** that is not directly accessible. The agent relies on noisy proxies: quiz responses, code submissions, and time-on-task telemetry.\n", "\n", "The core of this model is the **belief state** $b(s)$, a probability distribution over the possible knowledge states of the learner — the computational equivalent of a teacher's intuition about a student's understanding.\n", "\n", "> **Design Principle — Pedagogical Alignment (p. 422):** Every decision the agent makes must be grounded in established learning science. An agent that optimizes for task completion without considering cognitive load, spaced repetition, or the zone of proximal development will produce learning experiences that feel efficient in the short term but fail at durable knowledge acquisition.\n", "\n", "---\n", "\n", "### Key Theoretical Backbone: Condorcet Jury Theorem\n", "\n", "The Collective Intelligence agent draws on the **Condorcet Jury Theorem** (p. 441): if each agent independently arrives at the correct answer with probability $p > 0.5$, then the probability that a majority vote produces the correct answer approaches 1 as the group grows. Since LLM-based agents sharing the same model may exhibit correlated errors, the diversity mechanisms in this architecture serve to reduce this correlation.\n", "\n", "---\n", "\n", "### Mathematical Foundations\n", "\n", "| Formula | Book Page | Purpose |\n", "|---|---|---|\n", "| $G(m,d) = \\alpha \\cdot \\exp\\left(-\\frac{(d-m-\\delta)^2}{2\\sigma^2}\\right)$ | p. 423 | ZPD Gaussian expected learning gain |\n", "| $P(\\text{correct} \\mid \\theta, a, b) = \\frac{1}{1 + \\exp(-a(\\theta - b))}$ | p. 428 | 2PL Item Response Theory probability |\n", "| BKT posterior + learning transition | pp. 431–432 | Bayesian mastery belief-state update |\n", "| $\\text{ease} = \\max(1.3, \\text{ease} + 0.1 - (5-q)(0.08 + (5-q) \\cdot 0.02))$ | p. 436 | SM-2 spaced repetition scheduling |\n", "| $\\text{Score}(p_j) = \\sum_i [w_i \\cdot \\text{relevance}_i \\cdot \\text{score}_{ij}]$ | p. 445 | Expertise-weighted consensus aggregation |\n", "\n", "---\n", "\n", "**Simulation Mode:** No API key is required. All LLM calls route through a section-mapped `MockLLM` with educationally accurate pre-authored responses. Set an OpenAI API key in `.env` for Live Mode.\n" ] }, { "cell_type": "markdown", "id": "dbd0c33c", "metadata": {}, "source": [ "## Section 0: Setup & Configuration\n", "\n", "**Ref:** Ch.15, p. 421 (Technical Requirements); Strategy §3 (Simulation Mode)\n", "\n", "Three-tier API key resolution:\n", "1. **Tier 1:** `.env` file (loaded via `python-dotenv`)\n", "2. **Tier 2:** Shell environment variable\n", "3. **Tier 3:** Interactive `getpass` prompt (Jupyter only)\n", "4. **Fallback:** Simulation Mode — no API key required\n", "\n", "The `get_llm_client()` factory returns either a `LiveLLM` wrapper (real OpenAI API) or a `MockLLM` instance, both exposing `.generate(prompt) -> str`." ] }, { "cell_type": "code", "execution_count": null, "id": "468a08e7", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell 0: Imports, API Key Detection, Mode Switch\n", "# Ref: Ch.15, p. 421; Strategy §3, §9\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "import os\n", "import sys\n", "import math\n", "import json\n", "import uuid\n", "from datetime import datetime, timedelta\n", "from dataclasses import dataclass, field\n", "from typing import Optional\n", "\n", "import numpy as np\n", "import networkx as nx\n", "from dotenv import load_dotenv\n", "\n", "# --- Cross-cutting infrastructure (utils/) ---\n", "from resilience import ColorLogger, LogLevel, graceful_fallback\n", "from mock_llm import MockLLM\n", "\n", "# --- Initialize logger for setup phase ---\n", "logger = ColorLogger(\"Setup\")\n", "logger.info(\"Imports complete. Detecting API key...\")\n", "\n", "# === Three-Tier API Key Resolution ===\n", "load_dotenv() # Tier 1: .env file\n", "\n", "OPENAI_API_KEY = os.getenv(\"OPENAI_API_KEY\") # Tier 2: environment variable\n", "\n", "if not OPENAI_API_KEY:\n", " try:\n", " if sys.stdin.isatty():\n", " from getpass import getpass\n", " key = getpass(\"Enter OpenAI API key (or press Enter for Simulation Mode): \")\n", " if key.strip():\n", " OPENAI_API_KEY = key.strip()\n", " os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY\n", " except (EOFError, KeyboardInterrupt, OSError):\n", " pass # Non-interactive environment — skip gracefully\n", "\n", "SIMULATION_MODE = OPENAI_API_KEY is None or OPENAI_API_KEY == \"\" or \"your-key\" in (OPENAI_API_KEY or \"\")\n", "\n", "if SIMULATION_MODE:\n", " logger.info(\"No API key detected. Running in SIMULATION MODE with MockLLM.\")\n", " logger.info(\"All LLM responses are pre-authored educational examples from Ch.15.\")\n", "else:\n", " logger.success(f\"API key loaded (ends in ...{OPENAI_API_KEY[-4:]}). Running in LIVE MODE.\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c70a0d10", "metadata": {}, "outputs": [], "source": [ "# Multi-provider LLM support (OpenAI / Anthropic / Google Gemini)\n", "# Set LLM_PROVIDER in .env to choose: openai | anthropic | google | auto\n", "# Auto-detection uses the first available key.\n", "# See supporting/llm_provider.py for details.\n", "\n", "import sys, os\n", "sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath('.')), ''))\n", "sys.path.insert(0, '..')\n", "\n", "try:\n", " from supporting.llm_provider import detect_provider, get_llm, PROVIDER_MODELS, print_provider_banner\n", " _PROVIDER, _PROVIDER_KEY, _PROVIDER_MODE = detect_provider()\n", " print_provider_banner(_PROVIDER, _PROVIDER_MODE)\n", "except ImportError:\n", " print('[INFO] supporting/llm_provider.py not found — using default OpenAI path')\n", " _PROVIDER, _PROVIDER_KEY, _PROVIDER_MODE = 'openai', os.getenv('OPENAI_API_KEY'), 'LIVE' if os.getenv('OPENAI_API_KEY') else 'SIMULATION'\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6d34a2bb", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell 1: LLM Client Factory\n", "# Ref: Strategy §9 — Unified factory returning MockLLM or LiveLLM\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "def get_llm_client():\n", " \"\"\"Return a MockLLM (Simulation) or LiveLLM (API) client.\n", "\n", " Both expose .generate(prompt: str, **kwargs) -> str.\n", " Ref: Strategy §9 — LLM Client Abstraction\n", " \"\"\"\n", " if SIMULATION_MODE:\n", " logger.info(\"LLM client: MockLLM (Simulation Mode)\")\n", " return MockLLM()\n", " else:\n", " from openai import OpenAI\n", " client = OpenAI()\n", "\n", " class LiveLLM:\n", " \"\"\"Thin wrapper around OpenAI chat completions API.\"\"\"\n", " def generate(self, prompt: str, **kwargs) -> str:\n", " response = client.chat.completions.create(\n", " model=\"gpt-4o\",\n", " messages=[{\"role\": \"user\", \"content\": prompt}],\n", " max_tokens=kwargs.get(\"max_tokens\", 1024),\n", " temperature=kwargs.get(\"temperature\", 0.7),\n", " )\n", " return response.choices[0].message.content\n", "\n", " logger.info(\"LLM client: LiveLLM (OpenAI gpt-4o)\")\n", " return LiveLLM()\n", "\n", "\n", "llm_client = get_llm_client()\n", "logger.success(f\"LLM client initialized: {type(llm_client).__name__}\")\n" ] }, { "cell_type": "markdown", "id": "0e1ee485", "metadata": {}, "source": [ "## Cell Group 1: Supporting Type Definitions\n", "\n", "**Ref:** Ch.15, pp. 425–427; Strategy §8\n", "\n", "These lightweight dataclasses fill in the types referenced by the chapter's code listings but not explicitly defined there. They provide the minimal API surface that downstream components (`StudentModel`, `CurriculumPlanner`, `CollaborativeAgent`, `ConsensusEngine`) depend on.\n", "\n", "**12 dataclasses** + **2 utility classes** (`AgentMemory`, `Tool`)." ] }, { "cell_type": "code", "execution_count": null, "id": "70546d87", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 1: Supporting Type Definitions\n", "# Ref: Ch.15, pp. 425–427 (StudentModel context); Strategy §8\n", "# Author: Imran Ahmad\n", "#\n", "# 12 dataclasses + 2 utility classes that provide the type contracts\n", "# consumed by all downstream Education and Collective Intelligence components.\n", "# ===========================================================================\n", "\n", "\n", "@dataclass\n", "class LearningObjective:\n", " \"\"\"A single node in the knowledge graph DAG.\n", "\n", " Ref: Ch.15, pp. 423–426 — objectives at the level of a single concept/skill,\n", " each requiring roughly 15–45 minutes of instructional time.\n", " \"\"\"\n", " id: str\n", " title: str\n", " estimated_difficulty: float\n", " topic: str\n", " description: str = \"\"\n", "\n", "\n", "@dataclass\n", "class Exercise:\n", " \"\"\"A coding exercise or assessment item tied to learning objectives.\n", "\n", " Ref: Ch.15, pp. 433, 22 — exercises are the observable interactions that\n", " drive BKT updates and feedback generation.\n", " \"\"\"\n", " id: str\n", " description: str\n", " objective_ids: list[str] = field(default_factory=list)\n", " primary_objective: str = \"\"\n", " topic: str = \"\"\n", " difficulty: float = 0.5\n", "\n", "\n", "@dataclass\n", "class TestResults:\n", " \"\"\"Output of automated test execution against student code.\n", "\n", " Ref: Ch.15, p. 437 — automated tests verify correctness;\n", " static analysis checks style and anti-patterns.\n", " \"\"\"\n", " passed: bool\n", " summary: str\n", " details: dict = field(default_factory=dict)\n", "\n", "\n", "@dataclass\n", "class Feedback:\n", " \"\"\"Structured feedback returned by FeedbackGenerator.\n", "\n", " Ref: Ch.15, pp. 438–440 — includes content, addressed misconceptions,\n", " and computed mastery updates for the student model.\n", " \"\"\"\n", " content: str\n", " misconceptions_addressed: Optional[dict] = None\n", " mastery_updates: dict = field(default_factory=dict)\n", "\n", "\n", "@dataclass\n", "class Proposal:\n", " \"\"\"A solution proposal from a CollaborativeAgent.\n", "\n", " Ref: Ch.15, pp. 442–444 — carries agent ID, content, confidence,\n", " and expertise relevance for downstream consensus scoring.\n", " \"\"\"\n", " id: str = \"\"\n", " agent_id: str = \"\"\n", " content: str = \"\"\n", " confidence: float = 0.5\n", " expertise_relevance: float = 0.5\n", "\n", "\n", "@dataclass\n", "class Evaluation:\n", " \"\"\"A scored critique of a Proposal by another agent.\n", "\n", " Ref: Ch.15, pp. 443–444 — 4-dimension scoring (correctness,\n", " completeness, feasibility, risks) plus free-text critique.\n", " \"\"\"\n", " evaluator_id: str = \"\"\n", " proposal_id: str = \"\"\n", " scores: dict = field(default_factory=dict)\n", " critique: str = \"\"\n", "\n", "\n", "@dataclass\n", "class Review:\n", " \"\"\"A spaced-repetition review item with scheduling metadata.\n", "\n", " Ref: Ch.15, pp. 435–436 — SM-2 algorithm tracks interval,\n", " repetition count, and computed priority from overdue days.\n", " \"\"\"\n", " objective_id: str = \"\"\n", " priority: float = 0.0\n", " interval: int = 1\n", " repetitions: int = 0\n", "\n", "\n", "@dataclass\n", "class Problem:\n", " \"\"\"A problem statement fed into the ConsensusEngine.\n", "\n", " Ref: Ch.15, pp. 447–449 — defines the shared context for\n", " multi-agent proposal generation.\n", " \"\"\"\n", " description: str = \"\"\n", " constraints: list[str] = field(default_factory=list)\n", "\n", "\n", "@dataclass\n", "class SharedContext:\n", " \"\"\"Shared deliberation state across CollaborativeAgents.\n", "\n", " Ref: Ch.15, pp. 442–444 — accumulates proposals so later agents\n", " can see prior contributions and avoid duplication.\n", " \"\"\"\n", " proposals: list = field(default_factory=list)\n", " history: list = field(default_factory=list)\n", "\n", " def add_proposal(self, proposal: Proposal) -> None:\n", " \"\"\"Add a proposal to the shared state.\"\"\"\n", " self.proposals.append(proposal)\n", "\n", " def get_relevant(self, expertise_tags: list[str]) -> str:\n", " \"\"\"Return recent proposals as concatenated text for agent context.\"\"\"\n", " return \"\\n---\\n\".join(p.content for p in self.proposals[-5:])\n", "\n", "\n", "@dataclass\n", "class ConsensusResult:\n", " \"\"\"Final output of the ConsensusEngine after convergence.\n", "\n", " Ref: Ch.15, pp. 448–450 — includes synthesized proposal, consensus\n", " score, round count, and full audit trail.\n", " \"\"\"\n", " final_proposal: str = \"\"\n", " consensus_score: float = 0.0\n", " rounds: int = 0\n", " audit_trail: list = field(default_factory=list)\n", "\n", "\n", "@dataclass\n", "class MisconceptionResult:\n", " \"\"\"Output of the two-stage misconception detection pipeline.\n", "\n", " Ref: Ch.15, pp. 437–438 — rule-based first stage returns a result\n", " with confidence; LLM stage is invoked when confidence < 0.7.\n", " \"\"\"\n", " misconception_id: str = \"\"\n", " confidence: float = 0.0\n", " description: str = \"\"\n", " related_objectives: list[str] = field(default_factory=list)\n", " suggested_remediation: str = \"\"\n", "\n", "\n", "class AgentMemory:\n", " \"\"\"Lightweight working memory for CollaborativeAgent.\n", "\n", " Ref: Ch.15, p. 442 — retains the agent's own prior reasoning\n", " and interactions across turns within a consensus session.\n", " \"\"\"\n", "\n", " def __init__(self) -> None:\n", " self.entries: list[str] = []\n", "\n", " def add(self, entry: str) -> None:\n", " \"\"\"Store a reasoning entry.\"\"\"\n", " self.entries.append(entry)\n", "\n", " def recent(self, n: int = 5) -> list[str]:\n", " \"\"\"Return the n most recent entries.\"\"\"\n", " return self.entries[-n:]\n", "\n", "\n", "class Tool:\n", " \"\"\"Placeholder for agent tool interface.\n", "\n", " Ref: Ch.15, p. 442 — CollaborativeAgent accepts a toolset;\n", " this stub satisfies the constructor contract.\n", " \"\"\"\n", "\n", " def __init__(self, name: str, description: str = \"\") -> None:\n", " self.name = name\n", " self.description = description\n", "\n", "\n", "logger.success(\n", " \"Type definitions loaded: 12 dataclasses + 2 utility classes \"\n", " \"(AgentMemory, Tool)\"\n", ")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7b4ed9af", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: Verify all dataclasses instantiate with defaults\n", "# ===========================================================================\n", "\n", "test_logger = ColorLogger(\"TypeTest\")\n", "\n", "_test_classes = [\n", " (\"LearningObjective\", lambda: LearningObjective(\n", " id=\"test\", title=\"Test\", estimated_difficulty=0.5, topic=\"test\")),\n", " (\"Exercise\", lambda: Exercise(id=\"ex1\", description=\"Test exercise\")),\n", " (\"TestResults\", lambda: TestResults(passed=True, summary=\"ok\")),\n", " (\"Feedback\", lambda: Feedback(content=\"Good work\")),\n", " (\"Proposal\", lambda: Proposal()),\n", " (\"Evaluation\", lambda: Evaluation()),\n", " (\"Review\", lambda: Review()),\n", " (\"Problem\", lambda: Problem()),\n", " (\"SharedContext\", lambda: SharedContext()),\n", " (\"ConsensusResult\", lambda: ConsensusResult()),\n", " (\"MisconceptionResult\", lambda: MisconceptionResult()),\n", " (\"AgentMemory\", lambda: AgentMemory()),\n", " (\"Tool\", lambda: Tool(name=\"test_tool\")),\n", "]\n", "\n", "for name, factory in _test_classes:\n", " obj = factory()\n", " test_logger.success(f\"{name} instantiated: {type(obj).__name__}\")\n", "\n", "# Verify SharedContext API\n", "ctx = SharedContext()\n", "ctx.add_proposal(Proposal(id=\"p1\", content=\"Test proposal\"))\n", "assert len(ctx.proposals) == 1, \"SharedContext.add_proposal failed\"\n", "assert \"Test proposal\" in ctx.get_relevant([\"any\"]), \"SharedContext.get_relevant failed\"\n", "test_logger.success(\"SharedContext API verified (add_proposal + get_relevant)\")\n", "\n", "# Verify AgentMemory API\n", "mem = AgentMemory()\n", "mem.add(\"first\")\n", "mem.add(\"second\")\n", "assert mem.recent(1) == [\"second\"], \"AgentMemory.recent failed\"\n", "test_logger.success(\"AgentMemory API verified (add + recent)\")\n" ] }, { "cell_type": "markdown", "id": "ce1460ec", "metadata": {}, "source": [ "## Cell Group 2: Knowledge Graph — Synthetic Python Curriculum\n", "\n", "**Ref:** Ch.15, pp. 423–426 (knowledge graph design); p. 426 (granularity guidance); p. 437 (247 objectives in production, 12 topic clusters)\n", "\n", "> **Info Box — Granularity Matters (p. 424):** Objectives defined too broadly (\"understand object-oriented programming\") prevent fine-grained decisions. Objectives defined too narrowly (\"know that Python uses indentation for blocks\") make the graph unwieldy. In practice, objectives at the level of a single concept or skill, each requiring roughly 15 to 45 minutes of instructional time, hit the right balance.\n", "\n", "> **Production Note (p. 424):** In production, the graph loads from a Neo4j database at startup, with topological ordering pre-computed to reduce prerequisite-checking from $O(|V| \\times |E|)$ to $O(|V|)$.\n", "\n", "The curriculum is represented as a **directed acyclic graph (DAG)** where nodes are learning objectives and edges are prerequisite relationships. The `KnowledgeGraph` class wraps `networkx.DiGraph` to provide the API surface consumed by `CurriculumPlanner` and `StudentModel`.\n", "\n", "Our synthetic graph contains **10 objectives** across the introductory Python track:\n", "\n", "```\n", "variables ──→ conditionals ──→ for_loops ──→ loop_termination ──→ nested_iteration\n", " │ │\n", " └──→ boolean_logic └──→ list_comprehensions\n", " │\n", "variables ──→ list_basics ──→ list_slicing ──────────→ list_comprehensions\n", " │\n", " └──→ functions_intro\n", "```\n", "\n", "Each objective is calibrated with a difficulty level (0.0–1.0) used by the ZPD-aligned expected-gain heuristic (p. 423):\n", "\n", "$$G(m, d) = \\alpha \\cdot \\exp\\!\\left(-\\frac{(d - m - \\delta)^2}{2\\sigma^2}\\right)$$\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7b4294ec", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 2: KnowledgeGraph Class\n", "# Ref: Ch.15, pp. 423–426 (DAG curriculum); Strategy §7.1 (10-node spec)\n", "# Author: Imran Ahmad\n", "#\n", "# Wraps networkx.DiGraph to provide the API consumed by CurriculumPlanner:\n", "# - get_all_objectives() → list of LearningObjective\n", "# - get_prerequisites(obj_id) → list of LearningObjective\n", "# - get_dependents(obj_id) → list of str (downstream objective IDs)\n", "# - get_objective(obj_id) → LearningObjective\n", "# ===========================================================================\n", "\n", "\n", "class KnowledgeGraph:\n", " \"\"\"Directed acyclic graph of learning objectives with prerequisite edges.\n", "\n", " In production, the graph loads from a Neo4j database at startup with\n", " topological ordering pre-computed (Ch.15, p. 426). Here we use networkx\n", " for a self-contained educational implementation.\n", "\n", " Ref: Ch.15, pp. 423–426; Strategy §7.1\n", " \"\"\"\n", "\n", " def __init__(self) -> None:\n", " self.graph = nx.DiGraph()\n", " self.objectives: dict[str, LearningObjective] = {}\n", " self._logger = ColorLogger(\"KnowledgeGraph\")\n", "\n", " def add_objective(self, objective: LearningObjective) -> None:\n", " \"\"\"Add a learning objective as a node in the DAG.\"\"\"\n", " self.objectives[objective.id] = objective\n", " self.graph.add_node(objective.id, objective=objective)\n", "\n", " def add_prerequisite(self, prereq_id: str, dependent_id: str) -> None:\n", " \"\"\"Add a prerequisite edge: prereq_id must be mastered before dependent_id.\"\"\"\n", " self.graph.add_edge(prereq_id, dependent_id)\n", "\n", " def get_all_objectives(self) -> list[LearningObjective]:\n", " \"\"\"Return all objectives in topological order.\"\"\"\n", " ordered_ids = list(nx.topological_sort(self.graph))\n", " return [self.objectives[oid] for oid in ordered_ids if oid in self.objectives]\n", "\n", " def get_objective(self, objective_id: str) -> LearningObjective:\n", " \"\"\"Return a single objective by ID.\"\"\"\n", " return self.objectives[objective_id]\n", "\n", " def get_prerequisites(self, objective_id: str) -> list[LearningObjective]:\n", " \"\"\"Return immediate prerequisite objectives for the given objective.\"\"\"\n", " prereq_ids = list(self.graph.predecessors(objective_id))\n", " return [self.objectives[pid] for pid in prereq_ids if pid in self.objectives]\n", "\n", " def get_dependents(self, objective_id: str) -> list[str]:\n", " \"\"\"Return IDs of all downstream objectives (direct and transitive).\n", "\n", " Used by CurriculumPlanner._expected_gain() to compute the downstream\n", " impact multiplier: objectives unlocking many dependents are higher leverage.\n", " Ref: Ch.15, p. 427\n", " \"\"\"\n", " return list(nx.descendants(self.graph, objective_id))\n", "\n", " def summary(self) -> str:\n", " \"\"\"Return a human-readable summary of the graph.\"\"\"\n", " return (\n", " f\"KnowledgeGraph: {len(self.objectives)} objectives, \"\n", " f\"{self.graph.number_of_edges()} prerequisite edges\"\n", " )\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c38e151a", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Populate Synthetic Knowledge Graph: 10 Introductory Python Objectives\n", "# Ref: Strategy §7.1 — difficulty, prerequisites, and downstream counts\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "knowledge_graph = KnowledgeGraph()\n", "kg_logger = ColorLogger(\"KnowledgeGraph\")\n", "\n", "# --- Define 10 learning objectives (Strategy §7.1 table) ---\n", "OBJECTIVES = [\n", " LearningObjective(\"variables\", \"Variables & Assignment\", 0.15, \"data_types\",\n", " \"Variable declaration, assignment, and basic type inference\"),\n", " LearningObjective(\"conditionals\", \"Conditional Statements\", 0.30, \"control_flow\",\n", " \"if/elif/else branching and comparison operators\"),\n", " LearningObjective(\"boolean_logic\", \"Boolean Logic & Operators\", 0.35, \"control_flow\",\n", " \"and, or, not operators; truth tables; short-circuit evaluation\"),\n", " LearningObjective(\"list_basics\", \"List Fundamentals\", 0.25, \"data_structures\",\n", " \"List creation, indexing, len(), append(), basic iteration\"),\n", " LearningObjective(\"list_slicing\", \"List Slicing\", 0.40, \"data_structures\",\n", " \"Slice notation [start:stop:step], negative indices\"),\n", " LearningObjective(\"for_loops\", \"For Loop Iteration\", 0.45, \"control_flow\",\n", " \"for-in loops, range(), enumerate(), accumulation patterns\"),\n", " LearningObjective(\"loop_termination\", \"Loop Termination & Control\", 0.55, \"control_flow\",\n", " \"break, continue, while-True patterns, termination conditions\"),\n", " LearningObjective(\"nested_iteration\", \"Nested Iteration Patterns\", 0.70, \"control_flow\",\n", " \"Nested for loops, matrix traversal, complexity implications\"),\n", " LearningObjective(\"list_comprehensions\", \"List Comprehensions\", 0.65, \"data_structures\",\n", " \"[expr for x in iterable if cond], nested comprehensions\"),\n", " LearningObjective(\"functions_intro\", \"Functions Introduction\", 0.60, \"functions\",\n", " \"def, parameters, return values, docstrings, scope basics\"),\n", "]\n", "\n", "for obj in OBJECTIVES:\n", " knowledge_graph.add_objective(obj)\n", "\n", "# --- Define prerequisite edges (Strategy §7.1 DAG) ---\n", "PREREQUISITES = [\n", " (\"variables\", \"conditionals\"),\n", " (\"variables\", \"list_basics\"),\n", " (\"conditionals\", \"boolean_logic\"),\n", " (\"conditionals\", \"for_loops\"),\n", " (\"list_basics\", \"list_slicing\"),\n", " (\"for_loops\", \"loop_termination\"),\n", " (\"for_loops\", \"list_comprehensions\"),\n", " (\"list_slicing\", \"list_comprehensions\"),\n", " (\"loop_termination\", \"nested_iteration\"),\n", " (\"loop_termination\", \"list_comprehensions\"),\n", " (\"list_comprehensions\", \"functions_intro\"),\n", "]\n", "\n", "for prereq, dependent in PREREQUISITES:\n", " knowledge_graph.add_prerequisite(prereq, dependent)\n", "\n", "kg_logger.success(knowledge_graph.summary())\n", "kg_logger.info(\"Topological order: \" + \" → \".join(\n", " obj.id for obj in knowledge_graph.get_all_objectives()\n", "))\n" ] }, { "cell_type": "code", "execution_count": null, "id": "81cefe96", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: Knowledge Graph structure and API\n", "# Ref: Strategy §7.1 — verify prerequisite chains and downstream counts\n", "# ===========================================================================\n", "\n", "kg_test = ColorLogger(\"KGTest\")\n", "\n", "# F6: Planner respects prerequisites — for_loops requires conditionals\n", "prereqs_for_loops = [p.id for p in knowledge_graph.get_prerequisites(\"for_loops\")]\n", "assert \"conditionals\" in prereqs_for_loops, \"for_loops should require conditionals\"\n", "kg_test.success(f\"get_prerequisites('for_loops') = {prereqs_for_loops}\")\n", "\n", "# list_comprehensions requires for_loops AND list_slicing (AND loop_termination)\n", "prereqs_lc = [p.id for p in knowledge_graph.get_prerequisites(\"list_comprehensions\")]\n", "assert \"for_loops\" in prereqs_lc, \"list_comprehensions should require for_loops\"\n", "assert \"list_slicing\" in prereqs_lc, \"list_comprehensions should require list_slicing\"\n", "kg_test.success(f\"get_prerequisites('list_comprehensions') = {prereqs_lc}\")\n", "\n", "# variables has no prerequisites\n", "prereqs_vars = knowledge_graph.get_prerequisites(\"variables\")\n", "assert len(prereqs_vars) == 0, \"variables should have no prerequisites\"\n", "kg_test.success(\"variables has 0 prerequisites (root node)\")\n", "\n", "# Downstream count for variables (should unlock many)\n", "deps_vars = knowledge_graph.get_dependents(\"variables\")\n", "kg_test.success(f\"variables has {len(deps_vars)} downstream dependents: {deps_vars}\")\n", "\n", "# functions_intro has no dependents (leaf node)\n", "deps_func = knowledge_graph.get_dependents(\"functions_intro\")\n", "assert len(deps_func) == 0, \"functions_intro should be a leaf node\"\n", "kg_test.success(\"functions_intro has 0 dependents (leaf node)\")\n", "\n", "# DAG validation — no cycles\n", "assert nx.is_directed_acyclic_graph(knowledge_graph.graph), \"Graph must be a DAG\"\n", "kg_test.success(\"Graph is a valid DAG (no cycles)\")\n" ] }, { "cell_type": "markdown", "id": "fbc16046", "metadata": {}, "source": [ "---\n", "## Part I: Education Intelligence Agent\n", "\n", "> **Architecture Overview (p. 422):** The Education Intelligence agent replicates the instructor's diagnostic cycle computationally. It maintains a rich, evolving model of each student's knowledge state, plans individualized learning paths, delivers instruction through multiple modalities, and refines its understanding based on observed performance.\n", "\n", "### Figure 15.1 — Education Intelligence Agent Architecture\n", "\n", "The architecture adapts the general cognitive loop from Chapter 1 to the educational domain (p. 424):\n", "\n", "```\n", " ┌──────────────────────────┐\n", " │ PERCEPTION MODULE │\n", " │ Input: Quiz, Code, │\n", " │ Telemetry │\n", " └────────────┬─────────────┘\n", " │\n", " ▼\n", " ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐\n", " │ LEARNING MODULE │ │ STUDENT MODEL │ │ CURRICULUM PLANNER │\n", " │ BKT / Bayesian │◄─┤ (Probabilistic ├─►│ ZPD & Knowledge │\n", " │ Update │ │ Mastery State) │ │ Graph │\n", " └─────────────────┘ └────────┬─────────┘ └─────────────────────┘\n", " │\n", " ▼\n", " ┌──────────────────────────┐\n", " │ CONTENT & FEEDBACK │\n", " │ Action: Hints, │\n", " │ Scaffolding │\n", " └──────────────────────────┘\n", " │\n", " ┌────────────┴─────────────┐\n", " │ CONSTRAINT LAYER: │\n", " │ Pedagogical Alignment │\n", " │ (ZPD, Cognitive Load, │\n", " │ Spaced Repetition) │\n", " └──────────────────────────┘\n", "```\n", "\n", "The architecture decomposes the full POMDP into **three tractable components** (p. 423):\n", "\n", "1. **Bayesian Knowledge Tracing** — Belief-state estimator: maintains probability distribution $b(s)$ over student mastery, updating after every observed response.\n", "2. **Curriculum Planner** — Action selection: identifies the skill whose expected learning gain is highest using ZPD heuristics.\n", "3. **Spaced Repetition Scheduler** — Temporal planning: determines when previously mastered skills should be reviewed.\n", "\n", "### Cell Group 3: Student Model\n", "\n", "**Ref:** Ch.15, pp. 425–427\n", "\n", "The `StudentModel` maintains per-student mastery state across all learning objectives. Every component in the Education Intelligence agent — the curriculum planner, the spaced repetition scheduler, and the feedback generator — queries this shared class.\n", "\n", "The mastery dictionary stores:\n", "- `p_mastery`: Probability of mastery (0.0–1.0), initialized at 0.1 for all objectives\n", "- `attempts`: Number of interactions with this objective\n", "- `last_seen`: Timestamp of most recent interaction\n", "- `recent_errors`: Rolling window of the 5 most recent error descriptions\n", "\n", "The class also stores per-objective scheduling metadata for the SM-2 spaced repetition algorithm (extended in Cell Group 7, pp. 435–437).\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9aa4d1aa", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 3: StudentModel\n", "# Ref: Ch.15, pp. 425–427 — Per-student mastery state\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class StudentModel:\n", " \"\"\"Maintains per-student mastery state across all learning objectives.\n", "\n", " Updated after every interaction. Provides the belief-state snapshot\n", " consumed by the CurriculumPlanner and FeedbackGenerator.\n", "\n", " Ref: Ch.15, pp. 425–427\n", " \"\"\"\n", "\n", " def __init__(self, student_id: str, knowledge_graph: KnowledgeGraph) -> None:\n", " self.student_id = student_id\n", " self.knowledge_graph = knowledge_graph\n", " self._logger = ColorLogger(\"StudentModel\")\n", " # Initialize all objectives at p_mastery=0.1 (low prior)\n", " self.mastery: dict[str, dict] = {\n", " obj.id: {\n", " \"p_mastery\": 0.1,\n", " \"attempts\": 0,\n", " \"last_seen\": None,\n", " \"recent_errors\": [],\n", " }\n", " for obj in knowledge_graph.get_all_objectives()\n", " }\n", " # SM-2 scheduling metadata (extended in Cell Group 7)\n", " self._schedule_meta: dict[str, dict] = {\n", " obj.id: {\n", " \"reps\": 0,\n", " \"interval\": 1,\n", " \"ease_factor\": 2.5,\n", " \"next_review\": None,\n", " }\n", " for obj in knowledge_graph.get_all_objectives()\n", " }\n", " self._logger.info(\n", " f\"Initialized StudentModel for '{student_id}' \"\n", " f\"with {len(self.mastery)} objectives (all at p=0.1)\"\n", " )\n", "\n", " def get_mastery_state(self, student_id: str = None) -> dict[str, float]:\n", " \"\"\"Return current mastery probabilities for all objectives.\n", "\n", " The student_id parameter is accepted for API compatibility with\n", " CurriculumPlanner but uses this model's internal state.\n", " \"\"\"\n", " return {\n", " oid: state[\"p_mastery\"]\n", " for oid, state in self.mastery.items()\n", " }\n", "\n", " def get_recent_errors(self, objective_id_or_student_id: str,\n", " n: int = 10) -> list[str]:\n", " \"\"\"Return recent errors. Accepts objective_id or student_id.\n", "\n", " If the argument matches an objective ID, returns errors for that\n", " objective. Otherwise, returns errors across all objectives.\n", " \"\"\"\n", " if objective_id_or_student_id in self.mastery:\n", " return self.mastery[objective_id_or_student_id].get(\n", " \"recent_errors\", []\n", " )\n", " # Aggregate across all objectives\n", " all_errors = []\n", " for state in self.mastery.values():\n", " all_errors.extend(state.get(\"recent_errors\", []))\n", " return all_errors[-n:]\n", "\n", " def get_mastered_objectives(self, student_id: str = None,\n", " threshold: float = 0.85) -> dict[str, dict]:\n", " \"\"\"Return objectives with mastery >= threshold.\n", "\n", " Returns dict mapping objective_id → full schedule metadata,\n", " consumed by SpacedRepetitionScheduler.get_due_reviews().\n", " \"\"\"\n", " return {\n", " oid: self._schedule_meta[oid]\n", " for oid, state in self.mastery.items()\n", " if state[\"p_mastery\"] >= threshold\n", " }\n", "\n", " def update_mastery(self, objective_id: str, new_p: float,\n", " error: str = None) -> None:\n", " \"\"\"Update mastery probability and optionally record an error.\n", "\n", " Ref: Ch.15, p. 427 — rolling window of last 5 errors.\n", " \"\"\"\n", " state = self.mastery[objective_id]\n", " old_p = state[\"p_mastery\"]\n", " state[\"p_mastery\"] = new_p\n", " state[\"attempts\"] += 1\n", " state[\"last_seen\"] = datetime.utcnow()\n", " if error is not None:\n", " state[\"recent_errors\"].append(error)\n", " state[\"recent_errors\"] = state[\"recent_errors\"][-5:]\n", " self._logger.info(\n", " f\"Mastery update: {objective_id} \"\n", " f\"{old_p:.3f} → {new_p:.3f} \"\n", " f\"(attempt #{state['attempts']})\"\n", " )\n", "\n", " def get_objective_metadata(self, student_id: str,\n", " objective_id: str) -> dict:\n", " \"\"\"Return SM-2 scheduling metadata for an objective.\"\"\"\n", " return self._schedule_meta.get(objective_id, {})\n", "\n", " def update_objective_metadata(self, student_id: str,\n", " objective_id: str, **kwargs) -> None:\n", " \"\"\"Update SM-2 scheduling metadata fields.\"\"\"\n", " meta = self._schedule_meta.setdefault(objective_id, {})\n", " meta.update(kwargs)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "9547c3cc", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: StudentModel initialization and API\n", "# Ref: Strategy §11, C9 — Init → all at p=0.1. Update → in mastered set.\n", "# ===========================================================================\n", "\n", "sm_test = ColorLogger(\"StudentModelTest\")\n", "\n", "# Create student model for test student\n", "student_model = StudentModel(\"test_student\", knowledge_graph)\n", "\n", "# All start at p=0.1\n", "mastery_state = student_model.get_mastery_state()\n", "assert all(v == 0.1 for v in mastery_state.values()), \"All should start at 0.1\"\n", "sm_test.success(f\"All {len(mastery_state)} objectives initialized at p=0.1\")\n", "\n", "# Update variables to 0.9 → should appear in mastered set\n", "student_model.update_mastery(\"variables\", 0.9)\n", "mastered = student_model.get_mastered_objectives()\n", "assert \"variables\" in mastered, \"variables should be mastered at 0.9\"\n", "sm_test.success(\"variables updated to 0.9 → appears in get_mastered_objectives()\")\n", "\n", "# Error recording\n", "student_model.update_mastery(\"for_loops\", 0.3, error=\"break placement incorrect\")\n", "errors = student_model.get_recent_errors(\"for_loops\")\n", "assert len(errors) == 1, \"Should have 1 error recorded\"\n", "sm_test.success(f\"Error recorded: {errors}\")\n", "\n", "# Reset for downstream use\n", "student_model = StudentModel(\"test_student\", knowledge_graph)\n", "sm_test.success(\"StudentModel reset for downstream cell groups\")\n" ] }, { "cell_type": "markdown", "id": "8d9feca2", "metadata": {}, "source": [ "### Cell Group 4: Curriculum Planner\n", "\n", "**Ref:** Ch.15, pp. 426–429 (CurriculumPlanner class); p. 423 (ZPD Gaussian gain formula)\n", "\n", "The planner answers: *given what we currently believe about a student's mastery, what should they do next that is both feasible and maximally productive?*\n", "\n", "Two architectural commitments (p. 427):\n", "1. **Eligibility vs. ranking separation** — eligibility is enforced by the knowledge graph (`prereqs_met`); ranking encodes pedagogy via the ZPD heuristic.\n", "2. **Downstream impact bias** — objectives that unlock many dependents are treated as higher leverage, so the planner tends to clear bottlenecks early.\n", "\n", "**ZPD Gaussian expected-gain formula (p. 423):**\n", "\n", "$$G(m, d) = \\alpha \\cdot \\exp\\!\\left(-\\frac{(d - m - \\delta)^2}{2\\sigma^2}\\right) \\cdot (1 + 0.1 \\cdot |\\text{downstream}|)$$\n", "\n", "Constants (p. 427): `delta = 0.2` (optimal gap), `sigma = 0.25` (ZPD width)." ] }, { "cell_type": "code", "execution_count": null, "id": "885fc6a6", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 4: CurriculumPlanner\n", "# Ref: Ch.15, pp. 426–429 — ZPD-aligned objective selection\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class CurriculumPlanner:\n", " \"\"\"Selects next learning objectives using prerequisite constraints\n", " and ZPD-aligned expected-gain ranking.\n", "\n", " The planner filters objectives whose prerequisites are satisfied,\n", " then ranks candidates by a Gaussian gain heuristic that favors\n", " tasks slightly above the student's current mastery.\n", "\n", " Ref: Ch.15, pp. 426–429\n", " \"\"\"\n", "\n", " def __init__(self, knowledge_graph: KnowledgeGraph,\n", " student_model: StudentModel,\n", " content_library: dict = None) -> None:\n", " self.graph = knowledge_graph\n", " self.model = student_model\n", " self.content = content_library or {}\n", " self.mastery_threshold = 0.8\n", " self._logger = ColorLogger(\"CurriculumPlanner\")\n", "\n", " def get_next_objectives(self, student_id: str,\n", " n: int = 3) -> list[LearningObjective]:\n", " \"\"\"Select the next learning objectives based on student mastery\n", " state and prerequisite constraints.\n", "\n", " Ref: Ch.15, pp. 426–429\n", "\n", " Args:\n", " student_id: Student identifier (for API compatibility).\n", " n: Maximum number of objectives to return.\n", "\n", " Returns:\n", " Up to n LearningObjective instances, ranked by expected gain.\n", " \"\"\"\n", " self._logger.info(\n", " f\"Selecting next objectives for student '{student_id}'...\"\n", " )\n", " mastery = self.model.get_mastery_state(student_id)\n", " eligible = []\n", "\n", " for objective in self.graph.get_all_objectives():\n", " # Prerequisite check: all prereqs must be above threshold\n", " prereqs = self.graph.get_prerequisites(objective.id)\n", " prereqs_met = all(\n", " mastery.get(p.id, 0.0) >= self.mastery_threshold\n", " for p in prereqs\n", " )\n", " # Must not already be mastered\n", " not_mastered = (\n", " mastery.get(objective.id, 0.0) < self.mastery_threshold\n", " )\n", " if prereqs_met and not_mastered:\n", " eligible.append(objective)\n", "\n", " # Rank by expected learning gain (ZPD heuristic)\n", " ranked = sorted(\n", " eligible,\n", " key=lambda obj: self._expected_gain(\n", " obj, mastery, student_id\n", " ),\n", " reverse=True\n", " )\n", "\n", " selected = ranked[:n]\n", " if selected:\n", " top = selected[0]\n", " gain = self._expected_gain(top, mastery, student_id)\n", " self._logger.success(\n", " f\"Selected {len(selected)} objectives. \"\n", " f\"Top: '{top.id}' (gain={gain:.3f})\"\n", " )\n", " else:\n", " self._logger.warn(\"No eligible objectives found.\")\n", " return selected\n", "\n", " def _expected_gain(self, objective: LearningObjective,\n", " mastery: dict, student_id: str) -> float:\n", " \"\"\"Estimate learning gain using ZPD-aligned Gaussian model\n", " with downstream impact multiplier.\n", "\n", " G(m,d) = exp(-(d - m - delta)^2 / (2*sigma^2)) * (1 + 0.1*downstream)\n", "\n", " Constants (p. 427):\n", " delta = 0.2 — optimal gap (20% above current mastery)\n", " sigma = 0.25 — width of effective learning zone\n", "\n", " Ref: Ch.15, p. 423 (formula), p. 427 (constants and rationale)\n", " \"\"\"\n", " current = mastery.get(objective.id, 0.0)\n", " difficulty = objective.estimated_difficulty\n", " delta = 0.2\n", " sigma = 0.25\n", "\n", " zpd_score = math.exp(\n", " -((difficulty - current - delta) ** 2)\n", " / (2 * sigma ** 2)\n", " )\n", "\n", " downstream = len(\n", " self.graph.get_dependents(objective.id)\n", " )\n", "\n", " return zpd_score * (1 + 0.1 * downstream)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "286c7e71", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: CurriculumPlanner objective selection\n", "# Ref: Strategy §11, C10 — Student with variables=0.9 → planner selects\n", "# conditionals, list_basics (prerequisites satisfied, not yet mastered)\n", "# ===========================================================================\n", "\n", "cp_test = ColorLogger(\"PlannerTest\")\n", "\n", "# Create student with variables mastered\n", "test_sm = StudentModel(\"planner_test\", knowledge_graph)\n", "test_sm.update_mastery(\"variables\", 0.9)\n", "\n", "planner = CurriculumPlanner(knowledge_graph, test_sm)\n", "next_objs = planner.get_next_objectives(\"planner_test\", n=3)\n", "\n", "next_ids = [obj.id for obj in next_objs]\n", "cp_test.success(f\"Selected objectives: {next_ids}\")\n", "\n", "# conditionals and list_basics should be eligible (variables is their only prereq)\n", "assert \"conditionals\" in next_ids, \"conditionals should be selected\"\n", "assert \"list_basics\" in next_ids, \"list_basics should be selected\"\n", "cp_test.success(\"F6: Prerequisite enforcement verified\")\n", "\n", "# for_loops should NOT be eligible (requires conditionals >= 0.8)\n", "assert \"for_loops\" not in next_ids, \"for_loops should not yet be eligible\"\n", "cp_test.success(\"for_loops correctly excluded (conditionals not yet mastered)\")\n", "\n", "# Verify gain calculation\n", "mastery = test_sm.get_mastery_state()\n", "for obj in next_objs:\n", " gain = planner._expected_gain(obj, mastery, \"planner_test\")\n", " cp_test.info(f\" {obj.id}: difficulty={obj.estimated_difficulty}, gain={gain:.3f}\")\n" ] }, { "cell_type": "markdown", "id": "89fd41a5", "metadata": {}, "source": [ "### Cell Group 5: Adaptive Placement Test (IRT 2PL)\n", "\n", "**Ref:** Ch.15, pp. 428–431 (full class listing); p. 428 (2PL IRT formula)\n", "\n", "> **Info Box — The Cold-Start Problem (p. 428):** When a student first enters the system, the agent has no interaction history, no reliable estimate of mastery, and no evidence about common misconceptions. If the agent guesses the starting level incorrectly, it can fail in two predictable ways: it can **overestimate** the learner and begin with opaque material (leading to rapid disengagement), or it can **underestimate** the learner and deliver trivial content (equally corrosive because it signals the system is not paying attention).\n", "\n", "The solution is to treat onboarding as an explicit diagnostic protocol. The adaptive placement test uses a **two-parameter logistic (2PL) Item Response Theory** model to rapidly estimate a student's latent ability. IRT separates three factors that naïve diagnostics confuse: **student ability**, **item difficulty**, and **item informativeness**.\n", "\n", "**2PL probability model (p. 428):**\n", "\n", "$$P(\\text{correct} \\mid \\theta, a, b) = \\frac{1}{1 + \\exp(-a(\\theta - b))}$$\n", "\n", "where $\\theta$ is learner ability, $b$ is item difficulty, and $a$ is discrimination (how sharply the item distinguishes students just below and just above the difficulty threshold).\n", "\n", "**Fisher information:** $I(\\theta) = a^2 \\cdot P(\\theta) \\cdot (1 - P(\\theta))$\n", "\n", "> **Practical Note (p. 429):** This approach achieves a stable estimate with far fewer items than a fixed test, often in the 10–15 question range, because each question is chosen to maximize information given the current uncertainty.\n", "\n", "The test terminates when standard error falls below `se_threshold=0.3`.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ba8b078f", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 5: AdaptivePlacementTest\n", "# Ref: Ch.15, pp. 429–431 — IRT 2PL adaptive diagnostic\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class AdaptivePlacementTest:\n", " \"\"\"Adaptive placement test using the 2PL IRT model.\n", "\n", " Selects items maximizing Fisher information at the current ability\n", " estimate, updates via Newton-Raphson MLE, and terminates when\n", " standard error falls below the configured threshold.\n", "\n", " Ref: Ch.15, pp. 428–431\n", " \"\"\"\n", "\n", " def __init__(self, item_bank: list[dict],\n", " se_threshold: float = 0.3) -> None:\n", " \"\"\"Initialize the adaptive test.\n", "\n", " Args:\n", " item_bank: List of dicts with keys 'skill', 'a' (discrimination),\n", " 'b' (difficulty), 'text', 'answer'.\n", " se_threshold: Stop when SE falls below this value (default 0.3).\n", " \"\"\"\n", " self.items = item_bank\n", " self.se_threshold = se_threshold\n", " self.theta: float = 0.0 # initial ability estimate\n", " self.responses: list[tuple] = []\n", " self._logger = ColorLogger(\"PlacementTest\")\n", "\n", " @staticmethod\n", " def p_correct(theta: float, a: float, b: float) -> float:\n", " \"\"\"2PL probability of correct response.\n", "\n", " P(correct | theta, a, b) = 1 / (1 + exp(-a * (theta - b)))\n", "\n", " Ref: Ch.15, p. 428\n", " \"\"\"\n", " z = -a * (theta - b)\n", " z = max(-500.0, min(500.0, z)) # Prevent overflow\n", " return 1.0 / (1.0 + math.exp(z))\n", "\n", " def _information(self, item: dict) -> float:\n", " \"\"\"Fisher information at current theta.\n", "\n", " I(theta) = a^2 * P * (1 - P)\n", " Items with high discrimination provide more information.\n", "\n", " Ref: Ch.15, p. 430\n", " \"\"\"\n", " p = self.p_correct(self.theta, item['a'], item['b'])\n", " return (item['a'] ** 2) * p * (1 - p)\n", "\n", " def select_next_item(self, used_ids: set) -> Optional[dict]:\n", " \"\"\"Pick the unused item with maximum information.\n", "\n", " Ref: Ch.15, p. 430\n", " \"\"\"\n", " remaining = [i for i in self.items if id(i) not in used_ids]\n", " if not remaining:\n", " return None\n", " return max(remaining, key=self._information)\n", "\n", " def update_theta(self) -> None:\n", " \"\"\"Newton-Raphson MLE update over all responses.\n", "\n", " Iteratively refines the ability estimate by maximizing the\n", " log-likelihood of the observed response pattern.\n", "\n", " Ref: Ch.15, p. 430\n", " \"\"\"\n", " theta = self.theta\n", " for _ in range(25): # max iterations\n", " num = den = 0.0\n", " for item, correct in self.responses:\n", " p = self.p_correct(theta, item['a'], item['b'])\n", " num += item['a'] * (correct - p)\n", " den += (item['a'] ** 2) * p * (1 - p)\n", " if abs(den) < 1e-10:\n", " break\n", " step = num / den\n", " step = max(-1.0, min(1.0, step)) # Step-size damping\n", " theta += step\n", " theta = max(-4.0, min(4.0, theta)) # Clamp to prevent divergence\n", " self.theta = theta\n", "\n", " def standard_error(self) -> float:\n", " \"\"\"Compute standard error of the current ability estimate.\n", "\n", " SE = 1 / sqrt(sum of Fisher information across administered items).\n", "\n", " Ref: Ch.15, p. 430\n", " \"\"\"\n", " info = sum(\n", " (it['a'] ** 2)\n", " * self.p_correct(self.theta, it['a'], it['b'])\n", " * (1 - self.p_correct(self.theta, it['a'], it['b']))\n", " for it, _ in self.responses\n", " )\n", " return 1.0 / math.sqrt(info) if info > 0 else float('inf')\n", "\n", " def run(self, get_response_fn) -> dict:\n", " \"\"\"Execute adaptive test.\n", "\n", " Args:\n", " get_response_fn: Callable that takes an item dict and returns\n", " bool (correct/incorrect).\n", "\n", " Returns:\n", " Dict with 'theta' (ability estimate), 'se' (standard error),\n", " and 'items_used' (number of items administered).\n", "\n", " Ref: Ch.15, pp. 430–431\n", " \"\"\"\n", " used = set()\n", " self._logger.info(\"Starting adaptive placement test...\")\n", "\n", " while True:\n", " item = self.select_next_item(used)\n", " if item is None:\n", " self._logger.warn(\"Item bank exhausted.\")\n", " break\n", "\n", " correct = get_response_fn(item)\n", " self.responses.append((item, int(correct)))\n", " used.add(id(item))\n", " self.update_theta()\n", "\n", " se = self.standard_error()\n", " self._logger.info(\n", " f\"Item {len(self.responses)}: \"\n", " f\"skill={item['skill']}, b={item['b']:.1f}, \"\n", " f\"correct={correct}, \"\n", " f\"theta={self.theta:.3f}, SE={se:.3f}\"\n", " )\n", "\n", " if (len(self.responses) >= 5 and se < self.se_threshold):\n", " self._logger.success(\n", " f\"Converged after {len(self.responses)} items \"\n", " f\"(SE={se:.3f} < {self.se_threshold})\"\n", " )\n", " break\n", "\n", " return {\n", " 'theta': self.theta,\n", " 'se': self.standard_error(),\n", " 'items_used': len(self.responses)\n", " }\n" ] }, { "cell_type": "code", "execution_count": null, "id": "b1a04ca2", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# IRT Item Bank: 15 items for Adaptive Placement Test\n", "# Ref: Strategy §7.2 — calibrated from 12,000 historical students (p. 437)\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "ITEM_BANK = [\n", " {\"skill\": \"variables\", \"a\": 1.2, \"b\": -1.5,\n", " \"text\": \"What is the output of: x = 3; print(x + 2)?\",\n", " \"answer\": \"5\"},\n", " {\"skill\": \"variables\", \"a\": 1.0, \"b\": -1.0,\n", " \"text\": \"What is the type of x after: x = '42'?\",\n", " \"answer\": \"str\"},\n", " {\"skill\": \"conditionals\", \"a\": 1.4, \"b\": -0.5,\n", " \"text\": \"What is the output of: print('yes' if 3 > 2 else 'no')?\",\n", " \"answer\": \"yes\"},\n", " {\"skill\": \"conditionals\", \"a\": 1.3, \"b\": 0.0,\n", " \"text\": \"What does this print? x=15; print('a' if x>20 elif x>10 else 'c')\",\n", " \"answer\": \"b\"},\n", " {\"skill\": \"boolean_logic\", \"a\": 1.1, \"b\": 0.2,\n", " \"text\": \"Evaluate: True and not False or False\",\n", " \"answer\": \"True\"},\n", " {\"skill\": \"list_basics\", \"a\": 1.2, \"b\": -0.3,\n", " \"text\": \"What is len([1, [2, 3], 4])?\",\n", " \"answer\": \"3\"},\n", " {\"skill\": \"list_slicing\", \"a\": 1.5, \"b\": 0.5,\n", " \"text\": \"What is [1,2,3,4,5][1:4]?\",\n", " \"answer\": \"[2, 3, 4]\"},\n", " {\"skill\": \"for_loops\", \"a\": 1.6, \"b\": 0.3,\n", " \"text\": \"Trace: total=0; for i in range(4): total+=i. What is total?\",\n", " \"answer\": \"6\"},\n", " {\"skill\": \"for_loops\", \"a\": 1.3, \"b\": 0.7,\n", " \"text\": \"What prints? for x in 'abc': print(x, end='')\",\n", " \"answer\": \"abc\"},\n", " {\"skill\": \"loop_termination\", \"a\": 1.7, \"b\": 1.0,\n", " \"text\": \"for i in range(5): if i==2: break; print(i). Output?\",\n", " \"answer\": \"0 1\"},\n", " {\"skill\": \"loop_termination\", \"a\": 1.4, \"b\": 1.2,\n", " \"text\": \"x=0; while True: x+=1; if x==3: break. What is x?\",\n", " \"answer\": \"3\"},\n", " {\"skill\": \"nested_iteration\", \"a\": 1.2, \"b\": 1.5,\n", " \"text\": \"for i in range(3): for j in range(2): print('*'). How many *?\",\n", " \"answer\": \"6\"},\n", " {\"skill\": \"list_comprehensions\", \"a\": 1.5, \"b\": 1.3,\n", " \"text\": \"Rewrite as comprehension: result=[]; for x in range(5): if x%2==0: result.append(x*2)\",\n", " \"answer\": \"[x*2 for x in range(5) if x%2==0]\"},\n", " {\"skill\": \"functions_intro\", \"a\": 1.1, \"b\": 1.0,\n", " \"text\": \"def f(x): x + 1. What does f(5) return?\",\n", " \"answer\": \"None\"},\n", " {\"skill\": \"list_slicing\", \"a\": 1.6, \"b\": 0.8,\n", " \"text\": \"nums=[10,20,30,40,50]. Fix: nums[1:3] to get [20,30,40]\",\n", " \"answer\": \"nums[1:4]\"},\n", "]\n", "\n", "irt_logger = ColorLogger(\"IRTItemBank\")\n", "irt_logger.success(f\"Item bank loaded: {len(ITEM_BANK)} items across \"\n", " f\"{len(set(i['skill'] for i in ITEM_BANK))} skills\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "992313b2", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: Adaptive Placement Test execution\n", "# Ref: Strategy §11, C11; Quality Checklist F5 — converges in [5, 15] items\n", "# ===========================================================================\n", "\n", "apt_test = ColorLogger(\"PlacementTestDemo\")\n", "\n", "def simulated_student_response(item: dict) -> bool:\n", " \"\"\"Simulate a student who knows basics but struggles with advanced topics.\n", "\n", " Uses the 2PL model with true ability theta=0.3 and a fixed\n", " random sequence for reproducibility. Models a student who has\n", " grasped variables/conditionals but not loops/advanced topics\n", " (matching Alex's initial profile, pp. 440–441).\n", " \"\"\"\n", " true_theta = 0.3\n", " p = AdaptivePlacementTest.p_correct(true_theta, item['a'], item['b'])\n", " return _placement_rng.random() < p\n", "\n", "\n", "# Fixed seed for reproducible placement test results\n", "_placement_rng = __import__('random').Random(2024)\n", "\n", "\n", "placement_test = AdaptivePlacementTest(ITEM_BANK, se_threshold=0.3)\n", "result = placement_test.run(simulated_student_response)\n", "\n", "apt_test.success(\n", " f\"Placement complete: theta={result['theta']:.3f}, \"\n", " f\"SE={result['se']:.3f}, items_used={result['items_used']}\"\n", ")\n", "\n", "# F5: Should converge in 5–15 items\n", "assert 5 <= result['items_used'] <= 15, (\n", " f\"Expected 5-15 items, got {result['items_used']}\"\n", ")\n", "apt_test.success(f\"F5: Converged in {result['items_used']} items (within [5,15] range)\")\n" ] }, { "cell_type": "markdown", "id": "d1bb187b", "metadata": {}, "source": [ "### Cell Group 6: Bayesian Knowledge Tracing (BKT)\n", "\n", "**Ref:** Ch.15, pp. 431–434 (standalone `bkt_update()` + `BKTTracker` class)\n", "\n", "BKT treats each learning objective as a **hidden mastery state** that the agent infers from observable evidence. The update is a two-step Bayesian calculation:\n", "\n", "**Step 1 — Posterior:** Compute $P(L_n \\mid \\text{observation})$ using slip/guess parameters as the observation model.\n", "\n", "**Step 2 — Learning transition:** Apply $P(T)$ to account for the possibility that the student learned from the interaction itself.\n", "\n", "$$P(L_{n+1}) = P(L_n \\mid \\text{obs}) + (1 - P(L_n \\mid \\text{obs})) \\cdot P(T)$$\n", "\n", "> **Info Box — Why Not Just Track Correct/Incorrect? (p. 431):** The four BKT parameters explain why the agent cannot equate \"correct\" with \"mastered\" or \"incorrect\" with \"not mastered\":\n", "> - **Initial mastery** $P(L_0)$: Starting belief before the first exercise. An intermediate-track student begins with a higher prior.\n", "> - **Transition probability** $P(T)$: Probability that one learning opportunity causes real learning (captures interaction design effectiveness).\n", "> - **Slip rate** $P(S)$: Chance of an incorrect outcome despite mastery. Common in programming: missing colons, indentation errors, off-by-one bugs.\n", "> - **Guess rate** $P(G)$: Chance of a correct outcome without mastery. Happens via trial-and-error edits until tests pass.\n", "\n", "**Default parameters (pp. 431–432):**\n", "- $P(L_0) = 0.1$ — Initial mastery prior\n", "- $P(T) = 0.1$ — Probability of learning per opportunity\n", "- $P(S) = 0.05$ — Slip rate (incorrect despite mastery)\n", "- $P(G) = 0.2$ — Guess rate (correct without mastery)\n", "\n", "> **Info Box — BKT Drives Action Selection (p. 434):** The updated posterior makes BKT operationally useful. If mastery remains high and the error looks like a slip, the agent offers a minimal correction. If mastery drops meaningfully, it switches to a diagnostic move. If the student produces correct answers but the model estimates low mastery, the agent infers guessing behavior and schedules a more discriminative assessment.\n", "\n", "**Verification targets (Quality Checklist F4):**\n", "- `bkt_update(0.1, True)` ≈ 0.34 (posterior only) / 0.41 (full two-step)\n", "- `bkt_update(0.34, False)` ≈ 0.28\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d48ac8e6", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 6: bkt_update() standalone function\n", "# Ref: Ch.15, pp. 431–432 — Two-step Bayesian update\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "def bkt_update(p_mastery: float, correct: bool,\n", " p_transit: float = 0.1,\n", " p_slip: float = 0.05,\n", " p_guess: float = 0.2) -> float:\n", " \"\"\"Compute updated mastery probability after one observation\n", " using standard Bayesian Knowledge Tracing.\n", "\n", " Args:\n", " p_mastery: Prior P(L_n), current mastery belief.\n", " correct: Whether the student response was correct.\n", " p_transit: P(T), probability of learning per opportunity.\n", " p_slip: P(S), probability of slip given mastery.\n", " p_guess: P(G), probability of guess given no mastery.\n", "\n", " Returns:\n", " Updated P(L_{n+1}) after incorporating evidence.\n", "\n", " Ref: Ch.15, pp. 431–432\n", " \"\"\"\n", " # Step 1: posterior P(L_n | observation)\n", " if correct:\n", " p_correct_given_L = 1.0 - p_slip\n", " p_correct_given_not_L = p_guess\n", " else:\n", " p_correct_given_L = p_slip\n", " p_correct_given_not_L = 1.0 - p_guess\n", "\n", " p_obs = (\n", " p_correct_given_L * p_mastery\n", " + p_correct_given_not_L * (1 - p_mastery)\n", " )\n", " p_posterior = (p_correct_given_L * p_mastery) / p_obs\n", "\n", " # Step 2: learning transition\n", " p_updated = p_posterior + (1 - p_posterior) * p_transit\n", "\n", " return p_updated\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e727125c", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# BKTTracker class — reusable component for StudentModel integration\n", "# Ref: Ch.15, pp. 432–433\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class BKTTracker:\n", " \"\"\"Encapsulates Bayesian Knowledge Tracing for integration\n", " with StudentModel.\n", "\n", " Wraps the two-step update into a reusable component with\n", " configurable parameters per course.\n", "\n", " Ref: Ch.15, pp. 432–433\n", " \"\"\"\n", "\n", " def __init__(self, p_transit: float = 0.1,\n", " p_slip: float = 0.05,\n", " p_guess: float = 0.2) -> None:\n", " self.p_transit = p_transit\n", " self.p_slip = p_slip\n", " self.p_guess = p_guess\n", " self._logger = ColorLogger(\"BKTTracker\")\n", "\n", " def update(self, p_mastery: float, correct: bool) -> float:\n", " \"\"\"Return updated P(L) after one observation.\n", "\n", " Ref: Ch.15, pp. 432–433\n", " \"\"\"\n", " if correct:\n", " p_correct_L = 1.0 - self.p_slip\n", " p_correct_not_L = self.p_guess\n", " else:\n", " p_correct_L = self.p_slip\n", " p_correct_not_L = 1.0 - self.p_guess\n", "\n", " p_obs = (\n", " p_correct_L * p_mastery\n", " + p_correct_not_L * (1 - p_mastery)\n", " )\n", " posterior = (p_correct_L * p_mastery) / p_obs\n", "\n", " updated = posterior + (1 - posterior) * self.p_transit\n", " self._logger.info(\n", " f\"BKT: {p_mastery:.3f} → {updated:.3f} \"\n", " f\"(correct={correct})\"\n", " )\n", " return updated\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f605e847", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: BKT update — verify against chapter worked example\n", "# Ref: Ch.15, p. 432; Quality Checklist F4\n", "#\n", "# NOTE: The chapter text (p. 432) quotes approximate values of ~0.34\n", "# and ~0.28. The value 0.345 is the posterior BEFORE the learning\n", "# transition (Step 1 only). The complete two-step BKT update\n", "# (posterior + transition with P(T)=0.1) gives 0.411. Both the\n", "# standalone function and BKTTracker implement the full two-step\n", "# update correctly. The trajectory still crosses 0.85 as expected.\n", "# ===========================================================================\n", "\n", "bkt_test = ColorLogger(\"BKTTest\")\n", "\n", "# F4 Test 1: First correct submission\n", "p1 = bkt_update(0.1, True)\n", "bkt_test.success(f\"bkt_update(0.1, True) = {p1:.4f}\")\n", "bkt_test.info(f\" Posterior (Step 1 only): {0.095/0.275:.4f} ≈ chapter's 0.34\")\n", "bkt_test.info(f\" Full update (Step 1+2): {p1:.4f} (with P(T)=0.1 transition)\")\n", "assert abs(p1 - 0.411) < 0.01, f\"F4 failed: expected ~0.411, got {p1:.4f}\"\n", "\n", "# F4 Test 2: Subsequent incorrect submission\n", "p2 = bkt_update(p1, False)\n", "bkt_test.success(f\"bkt_update({p1:.4f}, False) = {p2:.4f}\")\n", "bkt_test.info(\" Incorrect response drops mastery significantly\")\n", "assert p2 < p1, \"Incorrect answer should reduce mastery\"\n", "assert p2 < 0.85, \"Should remain below mastery threshold\"\n", "\n", "# Verify BKTTracker class produces identical results\n", "tracker = BKTTracker()\n", "p1_class = tracker.update(0.1, True)\n", "p2_class = tracker.update(p1_class, False)\n", "assert abs(p1 - p1_class) < 1e-10, \"BKTTracker and bkt_update must agree\"\n", "assert abs(p2 - p2_class) < 1e-10, \"BKTTracker and bkt_update must agree\"\n", "bkt_test.success(\"BKTTracker class matches standalone bkt_update()\")\n", "\n", "# Verify monotonic behavior over multiple correct answers\n", "p = 0.1\n", "trajectory = [p]\n", "for _ in range(10):\n", " p = bkt_update(p, True)\n", " trajectory.append(p)\n", "bkt_test.info(f\"10x correct trajectory: {[f'{v:.3f}' for v in trajectory]}\")\n", "assert trajectory[-1] > 0.85, \"10 correct answers should push past mastery threshold\"\n", "bkt_test.success(f\"After 10 correct: p={trajectory[-1]:.4f} > 0.85 threshold\")\n" ] }, { "cell_type": "markdown", "id": "4ffe11f4", "metadata": {}, "source": [ "### Cell Group 7: Spaced Repetition Scheduler (SM-2)\n", "\n", "**Ref:** Ch.15, pp. 435–437\n", "\n", "The adaptive engine implements spaced repetition for long-term retention using a modified SM-2 (SuperMemo-2) algorithm.\n", "\n", "> **Info Box — SM-2 Core Logic (p. 435):**\n", "> - **Increasing intervals:** The most efficient way to move information from short-term to long-term memory is to review it at increasing intervals, with each successful recall pushing the next scheduled review further into the future.\n", "> - **Recall quality:** The algorithm utilizes a quality score (0–5) to adjust the \"ease factor\" of a concept.\n", "> - **Adaptive reinforcement:** If a student successfully recalls a concept, the interval grows; if they fail, the concept is reset to be reviewed daily until mastery is re-established.\n", "\n", "**SM-2 ease factor update (p. 436):**\n", "\n", "$$\\text{ease} = \\max\\!\\left(1.3,\\; \\text{ease} + 0.1 - (5 - q) \\cdot (0.08 + (5 - q) \\cdot 0.02)\\right)$$\n", "\n", "**Interval schedule:** reps=0 → 1 day, reps=1 → 6 days, reps≥2 → `interval × ease`.\n", "\n", "> **Practical Example (p. 437):** A Python learner who has recently \"mastered\" list slicing but begins making boundary mistakes two weeks later. Rather than waiting for the mistake to reappear in a graded assignment, the scheduler surfaces slicing for review as soon as `next_review` becomes due. If the student answers correctly but hesitates, a mid-range quality score causes the interval to grow conservatively. If they cannot reconstruct the concept at all, the algorithm resets spacing to prevent drift from mastery into fragile knowledge.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "ac8d75d8", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 7: SpacedRepetitionScheduler\n", "# Ref: Ch.15, pp. 435–437 — SM-2 algorithm\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class SpacedRepetitionScheduler:\n", " \"\"\"Manages review scheduling using the SM-2 algorithm.\n", "\n", " Determines when mastered objectives should be reviewed to prevent\n", " retention decay. Integrates with StudentModel's scheduling metadata.\n", "\n", " Ref: Ch.15, pp. 435–437\n", " \"\"\"\n", "\n", " def __init__(self, student_model: StudentModel) -> None:\n", " self.model = student_model\n", " self._logger = ColorLogger(\"SpacedRepetition\")\n", "\n", " def get_due_reviews(self, student_id: str,\n", " max_reviews: int = 5) -> list[Review]:\n", " \"\"\"Identify objectives due for review based on spaced repetition schedule.\n", "\n", " Scans mastered objectives and computes a priority score based on\n", " how overdue each objective is (days_overdue / 7, capped at 1.0).\n", "\n", " Ref: Ch.15, pp. 435–436\n", " \"\"\"\n", " mastered = self.model.get_mastered_objectives(student_id)\n", " due = []\n", " now = datetime.utcnow()\n", "\n", " for objective_id, metadata in mastered.items():\n", " next_review = metadata.get(\"next_review\")\n", " if next_review and next_review <= now:\n", " days_overdue = (now - next_review).days\n", " priority = min(days_overdue / 7.0, 1.0)\n", " due.append(Review(\n", " objective_id=objective_id,\n", " priority=priority,\n", " interval=metadata.get(\"interval\", 1),\n", " repetitions=metadata.get(\"reps\", 0)\n", " ))\n", "\n", " due.sort(key=lambda r: r.priority, reverse=True)\n", " self._logger.info(\n", " f\"{len(due)} objectives due for review \"\n", " f\"(returning top {min(len(due), max_reviews)})\"\n", " )\n", " return due[:max_reviews]\n", "\n", " def update_schedule(self, student_id: str,\n", " objective_id: str,\n", " quality: int) -> None:\n", " \"\"\"Update review schedule using SM-2 algorithm.\n", "\n", " Quality: 0 (complete failure) to 5 (perfect recall).\n", "\n", " SM-2 interval logic:\n", " - quality >= 3: increment reps, grow interval\n", " - quality < 3: reset reps to 0, interval to 1\n", "\n", " Ease factor update (p. 436):\n", " ease = max(1.3, ease + 0.1 - (5-q)*(0.08 + (5-q)*0.02))\n", "\n", " Ref: Ch.15, pp. 436–437\n", " \"\"\"\n", " meta = self.model.get_objective_metadata(\n", " student_id, objective_id\n", " )\n", " reps = meta.get(\"reps\", 0)\n", " interval = meta.get(\"interval\", 1)\n", " ease = meta.get(\"ease_factor\", 2.5)\n", "\n", " if quality >= 3:\n", " if reps == 0:\n", " interval = 1\n", " elif reps == 1:\n", " interval = 6\n", " else:\n", " interval = int(interval * ease)\n", " reps += 1\n", " else:\n", " reps = 0\n", " interval = 1\n", "\n", " # SM-2 ease factor update\n", " ease = max(1.3, ease + 0.1 - (5 - quality)\n", " * (0.08 + (5 - quality) * 0.02))\n", "\n", " next_review = datetime.utcnow() + timedelta(days=interval)\n", "\n", " self.model.update_objective_metadata(\n", " student_id, objective_id,\n", " reps=reps, interval=interval,\n", " ease_factor=ease, next_review=next_review\n", " )\n", "\n", " self._logger.info(\n", " f\"Schedule updated: {objective_id} → \"\n", " f\"interval={interval}d, reps={reps}, \"\n", " f\"ease={ease:.2f}, quality={quality}\"\n", " )\n" ] }, { "cell_type": "code", "execution_count": null, "id": "18d530c3", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: SpacedRepetitionScheduler\n", "# Ref: Strategy §11, C13; Quality Checklist F7\n", "# quality >= 3 → interval grows\n", "# quality < 3 → interval resets to 1, reps resets to 0\n", "# ===========================================================================\n", "\n", "sr_test = ColorLogger(\"SRTest\")\n", "\n", "# Create a student with one mastered objective\n", "sr_sm = StudentModel(\"sr_test\", knowledge_graph)\n", "sr_sm.update_mastery(\"variables\", 0.90)\n", "scheduler = SpacedRepetitionScheduler(sr_sm)\n", "\n", "# Schedule a successful review (quality=4)\n", "scheduler.update_schedule(\"sr_test\", \"variables\", quality=4)\n", "meta1 = sr_sm.get_objective_metadata(\"sr_test\", \"variables\")\n", "sr_test.success(f\"After quality=4: interval={meta1['interval']}, reps={meta1['reps']}\")\n", "assert meta1[\"reps\"] == 1, \"Reps should be 1 after first success\"\n", "assert meta1[\"interval\"] == 1, \"First interval should be 1 day\"\n", "\n", "# Second successful review\n", "scheduler.update_schedule(\"sr_test\", \"variables\", quality=4)\n", "meta2 = sr_sm.get_objective_metadata(\"sr_test\", \"variables\")\n", "sr_test.success(f\"After 2nd quality=4: interval={meta2['interval']}, reps={meta2['reps']}\")\n", "assert meta2[\"reps\"] == 2, \"Reps should be 2\"\n", "assert meta2[\"interval\"] == 6, \"Second interval should be 6 days\"\n", "\n", "# Third successful review — interval grows by ease factor\n", "scheduler.update_schedule(\"sr_test\", \"variables\", quality=4)\n", "meta3 = sr_sm.get_objective_metadata(\"sr_test\", \"variables\")\n", "sr_test.success(f\"After 3rd quality=4: interval={meta3['interval']}, reps={meta3['reps']}, ease={meta3['ease_factor']:.2f}\")\n", "assert meta3[\"interval\"] > 6, \"Interval should grow beyond 6\"\n", "\n", "# F7: Failed recall resets everything\n", "scheduler.update_schedule(\"sr_test\", \"variables\", quality=1)\n", "meta_fail = sr_sm.get_objective_metadata(\"sr_test\", \"variables\")\n", "sr_test.success(f\"After quality=1 (fail): interval={meta_fail['interval']}, reps={meta_fail['reps']}\")\n", "assert meta_fail[\"interval\"] == 1, \"F7: Failed recall should reset interval to 1\"\n", "assert meta_fail[\"reps\"] == 0, \"F7: Failed recall should reset reps to 0\"\n", "sr_test.success(\"F7: Spaced repetition resets on failure — PASS\")\n" ] }, { "cell_type": "markdown", "id": "5108badf", "metadata": {}, "source": [ "### Cell Group 8: Feedback Generator & Misconception Detection\n", "\n", "**Ref:** Ch.15, pp. 438–440 (FeedbackGenerator class); pp. 437–438 (misconception detector)\n", "\n", "The feedback engine is the **Action** phase of the educational cognitive loop. It synthesizes a personalized, context-aware feedback message using:\n", "- The student's current mastery levels\n", "- Recent error patterns (from StudentModel)\n", "- Detected misconceptions (two-stage pipeline: rule-based → LLM fallback)\n", "\n", "> **Info Box — Two-Stage Misconception Detection (pp. 437–438):** The misconception detector uses a hybrid pipeline. Stage 1 is a fast, rule-based classifier that matches code against approximately 180 known Python misconception patterns (AST node patterns and output signatures), running in under 50 ms and catching about 70% of common misconceptions. When Stage 1 returns no match or low confidence, Stage 2 invokes the LLM. This hybrid approach reduced average feedback latency from 8 seconds (LLM-only) to 2.5 seconds while maintaining diagnostic accuracy above 85%.\n", "\n", "The prompt follows a **4-part pedagogical contract** (p. 439):\n", "1. Acknowledge what the student did correctly\n", "2. Identify the specific error without revealing the solution\n", "3. Ask a guiding question toward independent discovery\n", "4. Address the underlying conceptual misconception\n", "\n", "> **Info Box — Preventing Feedback Anti-Patterns (p. 440):** The structured prompt prevents two failure modes common in tutoring systems: \"rubber-stamp praise plus the correct answer\" and \"generic advice that does not connect to the student's actual misconception.\"\n", "\n", "All LLM calls in this component are wrapped with `@graceful_fallback`.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d16c7ea5", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 8: MisconceptionLibrary + FeedbackGenerator\n", "# Ref: Ch.15, pp. 437–440 — Two-stage misconception detection + feedback\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class MisconceptionLibrary:\n", " \"\"\"Two-stage misconception detection pipeline.\n", "\n", " Stage 1: Fast rule-based classifier (~50ms, catches ~70% of cases).\n", " Stage 2: LLM-based diagnostic (invoked when Stage 1 confidence < 0.7).\n", "\n", " Ref: Ch.15, pp. 437–438\n", " \"\"\"\n", "\n", " def __init__(self, llm_client) -> None:\n", " self.llm = llm_client\n", " self._logger = ColorLogger(\"MisconceptionDetector\")\n", " self._rules = self._build_rules()\n", "\n", " def _build_rules(self) -> list[dict]:\n", " \"\"\"Rule-based patterns for common Python misconceptions.\"\"\"\n", " return [\n", " {\n", " \"id\": \"ctrl_flow_break_placement\",\n", " \"pattern_keywords\": [\"break\", \"total\", \"+=\" ],\n", " \"confidence\": 0.82,\n", " \"description\": \"Break placed after accumulation step\",\n", " \"related_objectives\": [\"loop_termination\", \"control_flow_ordering\"],\n", " },\n", " {\n", " \"id\": \"off_by_one_loop\",\n", " \"pattern_keywords\": [\"range\", \"len\", \"-1\"],\n", " \"confidence\": 0.75,\n", " \"description\": \"Off-by-one error in loop bounds\",\n", " \"related_objectives\": [\"for_loops\", \"list_slicing\"],\n", " },\n", " {\n", " \"id\": \"mutable_default_arg\",\n", " \"pattern_keywords\": [\"def\", \"=[]\", \"append\"],\n", " \"confidence\": 0.88,\n", " \"description\": \"Mutable default argument in function definition\",\n", " \"related_objectives\": [\"functions_intro\"],\n", " },\n", " ]\n", "\n", " def detect_rule_based(self, submission: str,\n", " objective_ids: list[str]) -> MisconceptionResult:\n", " \"\"\"Stage 1: Fast rule-based misconception detection.\n", "\n", " Matches submission against ~180 known patterns (3 shown here).\n", " Returns result with confidence score.\n", "\n", " Ref: Ch.15, pp. 437–438\n", " \"\"\"\n", " submission_lower = submission.lower()\n", " for rule in self._rules:\n", " if all(kw in submission_lower for kw in rule[\"pattern_keywords\"][:2]):\n", " self._logger.info(\n", " f\"Rule-based match: {rule['id']} \"\n", " f\"(confidence={rule['confidence']})\"\n", " )\n", " return MisconceptionResult(\n", " misconception_id=rule[\"id\"],\n", " confidence=rule[\"confidence\"],\n", " description=rule[\"description\"],\n", " related_objectives=rule[\"related_objectives\"],\n", " suggested_remediation=\"trace_exercise\",\n", " )\n", " return MisconceptionResult(confidence=0.0)\n", "\n", " @graceful_fallback(\n", " fallback_value=MisconceptionResult(confidence=0.0),\n", " component=\"MisconceptionDetector\"\n", " )\n", " def detect_llm(self, submission: str, exercise: Exercise,\n", " history: list) -> MisconceptionResult:\n", " \"\"\"Stage 2: LLM-based misconception detection.\n", "\n", " Invoked when rule-based stage returns confidence < 0.7.\n", "\n", " Ref: Ch.15, pp. 438, 24\n", " \"\"\"\n", " prompt = (\n", " f\"Diagnose the misconception in this student submission.\\n\"\n", " f\"Exercise: {exercise.description}\\n\"\n", " f\"Submission:\\n{submission}\\n\"\n", " f\"Recent error pattern: {history}\\n\"\n", " f\"Return a JSON object with: misconception_id, confidence, \"\n", " f\"related_objectives, suggested_remediation.\"\n", " )\n", " response = self.llm.generate(prompt)\n", " try:\n", " data = json.loads(response)\n", " return MisconceptionResult(\n", " misconception_id=data.get(\"misconception_id\", \"\"),\n", " confidence=data.get(\"confidence\", 0.5),\n", " description=data.get(\"description\", \"\"),\n", " related_objectives=data.get(\"related_objectives\", []),\n", " suggested_remediation=data.get(\"suggested_remediation\", \"\"),\n", " )\n", " except (json.JSONDecodeError, TypeError):\n", " self._logger.warn(\"LLM response not valid JSON; using raw text.\")\n", " return MisconceptionResult(\n", " misconception_id=\"llm_detected\",\n", " confidence=0.5,\n", " description=response[:200],\n", " )\n", "\n", "\n", "class FeedbackGenerator:\n", " \"\"\"Generates personalized, context-aware pedagogical feedback.\n", "\n", " Integrates student mastery, error history, and misconception detection\n", " to produce a 'conceptual nudge' that guides the student toward\n", " independent discovery rather than revealing the answer.\n", "\n", " Ref: Ch.15, pp. 438–440\n", " \"\"\"\n", "\n", " def __init__(self, llm_client, student_model: StudentModel,\n", " misconception_library: MisconceptionLibrary) -> None:\n", " self.llm = llm_client\n", " self.model = student_model\n", " self.misconceptions = misconception_library\n", " self._logger = ColorLogger(\"FeedbackGenerator\")\n", "\n", " @graceful_fallback(\n", " fallback_value=Feedback(content=\"[Feedback unavailable — see instructor.]\"),\n", " component=\"FeedbackGenerator\"\n", " )\n", " def generate_feedback(self, student_id: str,\n", " exercise: Exercise,\n", " submission: str,\n", " test_results: TestResults) -> Feedback:\n", " \"\"\"Generate personalized feedback addressing the student's\n", " specific learning needs.\n", "\n", " Follows the 4-part pedagogical contract (p. 439):\n", " 1. Acknowledge correct elements\n", " 2. Localize error without revealing solution\n", " 3. Ask guiding question\n", " 4. Address underlying misconception\n", "\n", " Ref: Ch.15, pp. 438–440\n", " \"\"\"\n", " self._logger.info(\n", " f\"Generating feedback for student '{student_id}' \"\n", " f\"on exercise '{exercise.id}'\"\n", " )\n", "\n", " mastery = self.model.get_mastery_state(student_id)\n", " history = self.model.get_recent_errors(student_id, n=10)\n", "\n", " # Two-stage misconception detection\n", " detected = self.misconceptions.detect_rule_based(\n", " submission, exercise.objective_ids\n", " )\n", " if detected.confidence < 0.7:\n", " self._logger.info(\n", " f\"Rule-based confidence {detected.confidence:.2f} < 0.7; \"\n", " f\"invoking LLM stage.\"\n", " )\n", " detected = self.misconceptions.detect_llm(\n", " submission, exercise, history\n", " )\n", "\n", " prompt = (\n", " f\"You are an expert Python tutor.\\n\"\n", " f\"The student is working on: {exercise.description}\\n\"\n", " f\"Their solution:\\n```python\\n{submission}\\n```\\n\"\n", " f\"Test results: {test_results.summary}\\n\"\n", " f\"Student context:\\n\"\n", " f\"- Current mastery of {exercise.topic}: \"\n", " f\"{mastery.get(exercise.primary_objective, 0.0):.0%}\\n\"\n", " f\"- Recent error patterns: {history}\\n\"\n", " f\"- Detected misconceptions: {detected}\\n\"\n", " f\"Generate feedback that:\\n\"\n", " f\"1. Acknowledges what the student did correctly\\n\"\n", " f\"2. Identifies the specific error without revealing \"\n", " f\"the complete solution\\n\"\n", " f\"3. Asks a guiding question that leads toward \"\n", " f\"discovering the fix independently\\n\"\n", " f\"4. If a misconception is detected, address the \"\n", " f\"underlying conceptual confusion\"\n", " )\n", "\n", " response = self.llm.generate(prompt)\n", "\n", " feedback = Feedback(\n", " content=response,\n", " misconceptions_addressed=(\n", " {\"id\": detected.misconception_id,\n", " \"confidence\": detected.confidence}\n", " if detected.misconception_id else None\n", " ),\n", " mastery_updates={},\n", " )\n", "\n", " self._logger.success(\n", " f\"Feedback generated ({len(response)} chars). \"\n", " f\"Misconception: {detected.misconception_id or 'none detected'}\"\n", " )\n", " return feedback\n" ] }, { "cell_type": "markdown", "id": "72c6f0cd", "metadata": {}, "source": [ "### Cell Group 8b: Case Study — \"Alex\" End-to-End Demo\n", "\n", "**Ref:** Ch.15, pp. 440–441; Strategy §7.3\n", "\n", "A new learner, Alex, enters the Python programming tutor. This walkthrough demonstrates all Education Intelligence Agent components working together:\n", "\n", "| Stage | Exercise | Correct? | Agent Action |\n", "|-------|----------|----------|-------------|\n", "| 1 | Sum even numbers in a list | ✅ | Advance to next exercise |\n", "| 2 | Sum evens, stop on negative (break misplaced) | ❌ | Diagnostic tracing question |\n", "| 3 | Diagnostic: value of `total` after 3rd iteration? | ✅ | Provide Level 2 hint |\n", "| 4 | Resubmission: corrected `break` placement | ✅ | Schedule spaced review |\n", "| 5 | Spaced review exercise (2 days later) | ✅ | Mastery crosses 0.85 → advance |\n", "\n", "**Quality Checklist F9:** Alex's mastery must cross the 0.85 threshold." ] }, { "cell_type": "code", "execution_count": null, "id": "0e90c52d", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Case Study: Alex — End-to-End Education Intelligence Agent Demo\n", "# Ref: Ch.15, pp. 440–441; Strategy §7.3; Quality Checklist F9\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "alex_logger = ColorLogger(\"Alex-CaseStudy\")\n", "alex_logger.info(\"=\"*60)\n", "alex_logger.info(\"CASE STUDY: Alex — Python Programming Tutor\")\n", "alex_logger.info(\"=\"*60)\n", "\n", "# --- Initialize all components ---\n", "alex_model = StudentModel(\"alex_001\", knowledge_graph)\n", "bkt = BKTTracker(p_transit=0.1, p_slip=0.05, p_guess=0.2)\n", "planner = CurriculumPlanner(knowledge_graph, alex_model)\n", "scheduler = SpacedRepetitionScheduler(alex_model)\n", "misconceptions = MisconceptionLibrary(llm_client)\n", "feedback_gen = FeedbackGenerator(llm_client, alex_model, misconceptions)\n", "\n", "# Pre-set: Alex has mastered variables and conditionals (from placement)\n", "# Ref: p. 440 — \"high mastery estimates for variables (p=0.82)\n", "# and conditionals (p=0.78) but low estimates for loops (p=0.25)\"\n", "alex_model.update_mastery(\"variables\", 0.90)\n", "alex_model.update_mastery(\"conditionals\", 0.85)\n", "alex_model.update_mastery(\"list_basics\", 0.85)\n", "\n", "# The objective we're tracking: for_loop_iteration\n", "OBJECTIVE = \"for_loops\"\n", "INITIAL_P = 0.1 # low prior for loops\n", "alex_model.update_mastery(OBJECTIVE, INITIAL_P)\n", "\n", "alex_logger.info(f\"Initial mastery for '{OBJECTIVE}': {INITIAL_P:.2f}\")\n", "alex_logger.info(\"\")\n", "\n", "# ------------------------------------------------------------------\n", "# STAGE 1: First correct submission — \"Sum even numbers\"\n", "# Ref: p. 440 — \"Alex receives the 'sum even numbers' exercise\n", "# and submits a correct solution.\"\n", "# ------------------------------------------------------------------\n", "alex_logger.info(\"STAGE 1: Sum even numbers in a list\")\n", "p = alex_model.get_mastery_state()[OBJECTIVE]\n", "p_new = bkt.update(p, correct=True)\n", "alex_model.update_mastery(OBJECTIVE, p_new)\n", "alex_logger.success(f\" Correct submission. Mastery: {p:.4f} → {p_new:.4f}\")\n", "alex_logger.info(\" Agent action: Advance to next exercise\")\n", "alex_logger.info(\"\")\n", "\n", "# ------------------------------------------------------------------\n", "# STAGE 2: Incorrect submission — break misplaced\n", "# Ref: p. 440 — \"The next exercise, requiring an early break condition,\n", "# produces an incorrect submission.\"\n", "# ------------------------------------------------------------------\n", "alex_logger.info(\"STAGE 2: Sum evens, stop on negative (break misplaced)\")\n", "p = alex_model.get_mastery_state()[OBJECTIVE]\n", "p_new = bkt.update(p, correct=False)\n", "alex_model.update_mastery(OBJECTIVE, p_new, error=\"break placement incorrect\")\n", "alex_logger.warn(f\" Incorrect submission. Mastery: {p:.4f} → {p_new:.4f}\")\n", "alex_logger.info(\" Agent action: Trigger diagnostic tracing question\")\n", "\n", "# Generate feedback for the incorrect submission\n", "exercise = Exercise(\n", " id=\"sum_evens_break\",\n", " description=\"Sum even numbers, stop early on negative value\",\n", " objective_ids=[OBJECTIVE, \"loop_termination\"],\n", " primary_objective=OBJECTIVE,\n", " topic=\"control_flow\",\n", ")\n", "submission = (\n", " 'total = 0\\n'\n", " 'for x in nums:\\n'\n", " ' if x < 0:\\n'\n", " ' break\\n'\n", " ' if x % 2 == 0:\\n'\n", " ' total += x\\n'\n", ")\n", "test_results = TestResults(\n", " passed=False,\n", " summary=\"2/4 test cases failed: break triggers before accumulating valid even values\"\n", ")\n", "feedback = feedback_gen.generate_feedback(\n", " \"alex_001\", exercise, submission, test_results\n", ")\n", "alex_logger.info(f\" Feedback preview: {feedback.content[:120]}...\")\n", "alex_logger.info(\"\")\n", "\n", "# ------------------------------------------------------------------\n", "# STAGE 3: Diagnostic — correct answer to tracing question\n", "# Ref: p. 441 — agent classifies error as genuine gap, selects\n", "# diagnostic tracing question\n", "# ------------------------------------------------------------------\n", "alex_logger.info(\"STAGE 3: Diagnostic tracing question (correct)\")\n", "p = alex_model.get_mastery_state()[OBJECTIVE]\n", "p_new = bkt.update(p, correct=True)\n", "alex_model.update_mastery(OBJECTIVE, p_new)\n", "alex_logger.success(f\" Correct diagnostic. Mastery: {p:.4f} → {p_new:.4f}\")\n", "alex_logger.info(\" Agent action: Provide Level 2 hint\")\n", "alex_logger.info(\"\")\n", "\n", "# ------------------------------------------------------------------\n", "# STAGE 4: Resubmission — corrected break placement\n", "# Ref: p. 441 — \"Alex corrects the submission.\"\n", "# ------------------------------------------------------------------\n", "alex_logger.info(\"STAGE 4: Resubmission with corrected break placement\")\n", "p = alex_model.get_mastery_state()[OBJECTIVE]\n", "p_new = bkt.update(p, correct=True)\n", "alex_model.update_mastery(OBJECTIVE, p_new)\n", "alex_logger.success(f\" Correct resubmission. Mastery: {p:.4f} → {p_new:.4f}\")\n", "\n", "# Schedule spaced review\n", "scheduler.update_schedule(\"alex_001\", OBJECTIVE, quality=3)\n", "alex_logger.info(\" Agent action: Continue; schedule spaced review\")\n", "alex_logger.info(\"\")\n", "\n", "# ------------------------------------------------------------------\n", "# STAGE 5: Spaced review — 2 days later, correct without hints\n", "# Ref: p. 441 — \"After one more successful retrieval, mastery\n", "# crosses 0.85.\"\n", "# ------------------------------------------------------------------\n", "alex_logger.info(\"STAGE 5: Spaced review exercise (2 days later)\")\n", "p = alex_model.get_mastery_state()[OBJECTIVE]\n", "p_new = bkt.update(p, correct=True)\n", "alex_model.update_mastery(OBJECTIVE, p_new)\n", "\n", "mastery_threshold = 0.85\n", "crossed = p_new >= mastery_threshold\n", "if crossed:\n", " alex_logger.success(\n", " f\" Correct review. Mastery: {p:.4f} → {p_new:.4f} \"\n", " f\"— MASTERY THRESHOLD {mastery_threshold} CROSSED!\"\n", " )\n", " alex_logger.success(\" Agent action: Advance Alex to nested_iteration\")\n", "else:\n", " alex_logger.info(f\" Mastery: {p:.4f} → {p_new:.4f} (below threshold)\")\n", "\n", "# Update spaced repetition schedule\n", "scheduler.update_schedule(\"alex_001\", OBJECTIVE, quality=5)\n", "alex_logger.info(\"\")\n", "\n", "# ------------------------------------------------------------------\n", "# SUMMARY\n", "# ------------------------------------------------------------------\n", "alex_logger.info(\"=\"*60)\n", "alex_logger.info(\"CASE STUDY SUMMARY\")\n", "alex_logger.info(\"=\"*60)\n", "\n", "final_mastery = alex_model.get_mastery_state()\n", "alex_logger.info(f\"Final mastery for '{OBJECTIVE}': {final_mastery[OBJECTIVE]:.4f}\")\n", "\n", "# Show what planner recommends next\n", "next_objs = planner.get_next_objectives(\"alex_001\", n=3)\n", "alex_logger.info(f\"Next recommended objectives: {[o.id for o in next_objs]}\")\n", "\n", "# F9 Assertion\n", "assert final_mastery[OBJECTIVE] >= mastery_threshold, (\n", " f\"F9 FAILED: Alex mastery {final_mastery[OBJECTIVE]:.4f} \"\n", " f\"< {mastery_threshold}\"\n", ")\n", "alex_logger.success(f\"F9: Alex crosses {mastery_threshold} threshold — PASS\")\n" ] }, { "cell_type": "markdown", "id": "e003fa1a", "metadata": {}, "source": [ "---\n", "## Part II: Collective Intelligence Agent\n", "\n", "**Ref:** Ch.15, pp. 441–453\n", "\n", "> **Key Insight (p. 441):** The central insight is that diversity of perspective, combined with effective aggregation mechanisms, consistently produces better outcomes than reliance on a single reasoner, even a highly capable one.\n", "\n", "The Collective Intelligence agent is not a single agent — it is an architectural pattern for organizing teams of agents that reason together, debate alternatives, and converge on solutions through structured consensus.\n", "\n", "### Figure 15.2 — Multi-Agent Collaboration Architecture (p. 442)\n", "\n", "```\n", " ┌───────────────┐\n", " │ FACILITATOR │\n", " │ Protocol Mgr │\n", " └───────┬───────┘\n", " │\n", " ┌───────────────┐ │ ┌───────────────┐\n", " │ UX AGENT │ │ │ ARCHITECT │\n", " │ Usability & │ │ │ System Design │\n", " │ APIs │ │ │ │\n", " └───────┬───────┘ │ └───────┬───────┘\n", " │ │ │\n", " └───────┐ │ ┌───────┘\n", " ▼ ▼ ▼\n", " ┌─────────────────────────┐\n", " │ SHARED STATE │\n", " │ REPOSITORY │\n", " │ (Proposals, Evaluations, │\n", " │ & History) │\n", " └─────────────────────────┘\n", " ▲ ▲ ▲\n", " ┌───────┘ │ └───────┐\n", " │ │ │\n", " ┌───────┴───────┐ │ ┌───────┴───────┐\n", " │ PERFORMANCE │ │ │ SECURITY │\n", " │ Optimization │ │ │ Threat Model │\n", " └───────────────┘ │ └───────────────┘\n", " │\n", " Collaboration Protocol: Parallel Proposal Phase\n", " → Cross-Evaluation → Synthesis\n", "```\n", "\n", "### Design Questions (p. 441)\n", "1. **Organization** — Flat peers or hierarchy with coordinators?\n", "2. **Communication** — Central bus or pairwise conversations?\n", "3. **Aggregation** — Voting, averaging, debate, or something else?\n", "\n", "### Safeguards Against Group Decision-Making Failures (pp. 446–447)\n", "- **Groupthink prevention:** Adversarial critic rotation (12% higher risk identification)\n", "- **Anchoring bias mitigation:** Randomized proposal viewing order\n", "- **Expertise calibration:** Relevance-gated voting weight\n", "\n", "### Theoretical Backbone: Condorcet Jury Theorem (p. 441)\n", "If each agent independently arrives at the correct answer with probability $p > 0.5$, the probability that a majority vote produces the correct answer approaches 1 as the group grows. The aggregation rule satisfies **Pareto efficiency** and **dictator-freeness**. Arrow's impossibility theorem is partially sidestepped because the system uses cardinal evaluations rather than ordinal rankings.\n" ] }, { "cell_type": "markdown", "id": "0a380531", "metadata": {}, "source": [ "### Cell Group 9: CollaborativeAgent\n", "\n", "**Ref:** Ch.15, pp. 442–444\n", "\n", "The `CollaborativeAgent` class implements the core unit of collaboration: a role-specialized participant that can both **contribute a proposal** and **critique the proposals of others**.\n", "\n", "Two pathways:\n", "1. `propose_solution()` — Generates a Proposal informed by expertise and shared context\n", "2. `evaluate_proposal()` — Produces an Evaluation with 4-dimension scoring (correctness, completeness, feasibility, risks)\n", "\n", "All LLM calls are wrapped with `@graceful_fallback` for Simulation Mode resilience." ] }, { "cell_type": "code", "execution_count": null, "id": "06dd8db4", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 9: CollaborativeAgent\n", "# Ref: Ch.15, pp. 442–444 — Dual-pathway collaborative agent\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class CollaborativeAgent:\n", " \"\"\"Role-specialized participant in a multi-agent consensus system.\n", "\n", " Each agent has a declared role, expertise profile, and LLM client.\n", " It can propose solutions and evaluate other agents' proposals.\n", "\n", " Ref: Ch.15, pp. 442–444\n", " \"\"\"\n", "\n", " def __init__(self, agent_id: str = None, role: str = \"\",\n", " expertise: list[str] = None, llm_client=None,\n", " tools: list[Tool] = None) -> None:\n", " self.id = agent_id or str(uuid.uuid4())[:8]\n", " self.role = role\n", " self._original_role = role\n", " self.expertise = expertise or []\n", " self.llm = llm_client\n", " self.tools = tools or []\n", " self.memory = AgentMemory()\n", " self._logger = ColorLogger(f\"Agent-{self.id}\")\n", " self._logger.info(\n", " f\"Initialized: role='{self.role}', \"\n", " f\"expertise={self.expertise}\"\n", " )\n", "\n", " def set_role(self, role: str) -> None:\n", " \"\"\"Temporarily override role (e.g., adversarial critic).\"\"\"\n", " self._logger.info(f\"Role override: '{self.role}' → '{role}'\")\n", " self.role = role\n", "\n", " def reset_role(self) -> None:\n", " \"\"\"Restore original role after adversarial rotation.\"\"\"\n", " self.role = self._original_role\n", " self._logger.info(f\"Role restored: '{self.role}'\")\n", "\n", " @graceful_fallback(\n", " fallback_value=Proposal(content=\"[Proposal unavailable]\"),\n", " component=\"CollaborativeAgent\"\n", " )\n", " def propose_solution(self, problem: Problem,\n", " context: SharedContext) -> Proposal:\n", " \"\"\"Generate a solution proposal informed by expertise\n", " and shared context.\n", "\n", " Pulls relevant_history from shared context using expertise tags\n", " to avoid re-litigating the entire conversation.\n", "\n", " Ref: Ch.15, pp. 442–444\n", " \"\"\"\n", " relevant_history = context.get_relevant(self.expertise)\n", "\n", " prompt = (\n", " f\"You are a {self.role} with expertise \"\n", " f\"in {', '.join(self.expertise)}.\\n\"\n", " f\"Problem: {problem.description}\\n\"\n", " f\"Previous proposals from other agents:\\n\"\n", " f\"{relevant_history}\\n\"\n", " f\"Based on your expertise, propose a solution.\\n\"\n", " f\"Explain your reasoning and identify any \"\n", " f\"assumptions or uncertainties.\"\n", " )\n", "\n", " response = self.llm.generate(prompt)\n", "\n", " proposal = Proposal(\n", " id=f\"prop_{self.id}_{uuid.uuid4().hex[:6]}\",\n", " agent_id=self.id,\n", " content=response,\n", " confidence=self._assess_confidence(problem, response),\n", " expertise_relevance=self._compute_relevance(problem),\n", " )\n", "\n", " context.add_proposal(proposal)\n", " self.memory.add(f\"Proposed: {response[:100]}...\")\n", " self._logger.success(\n", " f\"Proposal generated: {proposal.id} \"\n", " f\"(confidence={proposal.confidence:.2f})\"\n", " )\n", " return proposal\n", "\n", " @graceful_fallback(\n", " fallback_value=Evaluation(scores={\"overall\": 5.0}),\n", " component=\"CollaborativeAgent\"\n", " )\n", " def evaluate_proposal(self, proposal: Proposal,\n", " problem: Problem) -> Evaluation:\n", " \"\"\"Evaluate another agent's proposal along four dimensions.\n", "\n", " Dimensions: correctness, completeness, feasibility, risks/gaps.\n", " Each scored 0–10. Produces a shared scoring schema for\n", " consistent cross-agent comparison.\n", "\n", " Ref: Ch.15, pp. 443–444\n", " \"\"\"\n", " # Use adversarial prompt if in critic role\n", " role_prompt = self.role\n", " if \"adversarial\" in self.role.lower() or \"critic\" in self.role.lower():\n", " role_prompt = (\n", " f\"adversarial critic. Identify weaknesses \"\n", " f\"regardless of apparent consensus\"\n", " )\n", "\n", " prompt = (\n", " f\"You are a {role_prompt}. Evaluate \"\n", " f\"the following proposal for solving:\\n\"\n", " f\"{problem.description}\\n\\n\"\n", " f\"Proposal by {proposal.agent_id}:\\n\"\n", " f\"{proposal.content}\\n\\n\"\n", " f\"Assess: correctness, completeness, feasibility, \"\n", " f\"and any risks or gaps from your perspective.\\n\"\n", " f\"Score each dimension from 0 to 10.\"\n", " )\n", "\n", " response = self.llm.generate(prompt)\n", " scores = self._parse_scores(response)\n", "\n", " evaluation = Evaluation(\n", " evaluator_id=self.id,\n", " proposal_id=proposal.id,\n", " scores=scores,\n", " critique=response,\n", " )\n", "\n", " self._logger.info(\n", " f\"Evaluated {proposal.id}: overall={scores.get('overall', 'N/A')}\"\n", " )\n", " return evaluation\n", "\n", " def _assess_confidence(self, problem: Problem,\n", " response: str) -> float:\n", " \"\"\"Estimate confidence in the proposal (heuristic).\"\"\"\n", " # Heuristic: longer, more detailed responses = higher confidence\n", " base = 0.5\n", " length_bonus = min(len(response) / 2000, 0.3)\n", " return min(base + length_bonus, 0.95)\n", "\n", " def _compute_relevance(self, problem: Problem) -> float:\n", " \"\"\"Compute expertise relevance to the problem.\"\"\"\n", " desc = problem.description.lower()\n", " matches = sum(1 for e in self.expertise if e.lower() in desc)\n", " return min(matches / max(len(self.expertise), 1), 1.0)\n", "\n", " def _parse_scores(self, response: str) -> dict:\n", " \"\"\"Extract numerical scores from evaluation response.\"\"\"\n", " import re\n", " scores = {}\n", " for dim in [\"correctness\", \"completeness\", \"feasibility\",\n", " \"risks\", \"overall\"]:\n", " pattern = rf\"{dim}[:\\s]*([0-9]+(?:\\.[0-9]+)?)/10\"\n", " match = re.search(pattern, response.lower())\n", " if match:\n", " scores[dim] = float(match.group(1))\n", " if \"overall\" not in scores and scores:\n", " scores[\"overall\"] = sum(scores.values()) / len(scores)\n", " elif not scores:\n", " scores[\"overall\"] = 5.0 # neutral default\n", " return scores\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6acf40ce", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: CollaborativeAgent propose + evaluate\n", "# Ref: Strategy §11, D15\n", "# ===========================================================================\n", "\n", "ca_test = ColorLogger(\"AgentTest\")\n", "\n", "test_agent = CollaborativeAgent(\n", " agent_id=\"test_ped\",\n", " role=\"Pedagogy Specialist\",\n", " expertise=[\"scaffolding\", \"cognitive load\", \"formative feedback\"],\n", " llm_client=llm_client,\n", ")\n", "\n", "test_problem = Problem(\n", " description=\"Design a grading rubric for a merge-sorted-lists assignment.\"\n", ")\n", "test_context = SharedContext()\n", "\n", "# Test propose_solution\n", "proposal = test_agent.propose_solution(test_problem, test_context)\n", "assert isinstance(proposal, Proposal), \"Should return Proposal\"\n", "assert proposal.confidence > 0, \"Confidence should be positive\"\n", "assert len(proposal.content) > 0, \"Content should not be empty\"\n", "ca_test.success(f\"propose_solution() returned Proposal with confidence={proposal.confidence:.2f}\")\n", "\n", "# Test evaluate_proposal\n", "evaluation = test_agent.evaluate_proposal(proposal, test_problem)\n", "assert isinstance(evaluation, Evaluation), \"Should return Evaluation\"\n", "assert \"overall\" in evaluation.scores, \"Should have overall score\"\n", "ca_test.success(f\"evaluate_proposal() returned scores: {evaluation.scores}\")\n", "\n", "# Test role override (adversarial critic)\n", "test_agent.set_role(\"adversarial_critic\")\n", "assert test_agent.role == \"adversarial_critic\"\n", "test_agent.reset_role()\n", "assert test_agent.role == \"Pedagogy Specialist\"\n", "ca_test.success(\"Role override and reset verified\")\n" ] }, { "cell_type": "markdown", "id": "8157ea51", "metadata": {}, "source": [ "### Cell Group 10: ConsensusEngine\n", "\n", "**Ref:** Ch.15, pp. 445–450\n", "\n", "### Figure 15.3 — Consensus and Voting Mechanism (p. 446)\n", "\n", "```\n", " ┌─────────────────────────────────────────────────┐\n", " │ PHASE 1: INDEPENDENT PROPOSALS │\n", " │ [Agent 1] [Agent 2] ... [Agent N] │\n", " │ Solution X₁ Solution X₂ Solution Xₙ │\n", " └───────────────────────┬─────────────────────────┘\n", " ▼\n", " ┌─────────────────────────────────────────────────┐\n", " │ PHASE 2: CROSS-EVALUATION │\n", " │ ┌──────────────┐ │\n", " │ │ ADVERSARIAL │──► PEER SCORING & CRITIQUE │\n", " │ │ CRITIC │ (4 dimensions per eval) │\n", " │ └──────────────┘ │\n", " └───────────────────────┬─────────────────────────┘\n", " ▼\n", " ┌─────────────────────────────────────────────────┐\n", " │ PHASE 3: WEIGHTED SYNTHESIS │\n", " │ Score(pⱼ) = Σᵢ [wᵢ · relevanceᵢ · scoreᵢⱼ] │\n", " └───────────────────────┬─────────────────────────┘\n", " ▼\n", " ┌──────────────────────┐\n", " │ COLLECTIVE SOLUTION │\n", " └──────────────────────┘\n", " + AUDIT TRAIL\n", " (Rejected Elements + Reasoning Logs)\n", "```\n", "\n", "The `ConsensusEngine` orchestrates the multi-round consensus protocol:\n", "\n", "1. **Proposal generation** — Each agent produces an independent proposal\n", "2. **Cross-evaluation** — Every agent evaluates every other agent's proposal; one agent is rotated into an **adversarial critic** role per round\n", "3. **Weighted scoring** — Expertise-weighted consensus score:\n", "\n", "$$\\text{Score}(p_j) = \\sum_i \\left[ w_i \\cdot \\text{relevance}_i \\cdot \\text{score}_{ij} \\right]$$\n", "\n", "4. **Convergence detection** — If scores stabilize within tolerance, synthesize; otherwise refine and repeat\n", "\n", "> **Info Box — Social Choice Properties (p. 445):** This aggregation rule satisfies Pareto efficiency and dictator-freeness. Arrow's impossibility theorem is partially sidestepped because the system uses cardinal evaluations (numerical scores) rather than ordinal rankings.\n", "\n", "> **Info Box — Convergence Speed (p. 445):** Under balanced communication, agents' proposals converge to consensus at an exponential rate. Preventing any single agent from dominating not only reduces bias but also accelerates convergence.\n", "\n", "**Quality Checklist F8:** Consensus must converge in ≤ 3 rounds.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "7da75136", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 10: ConsensusEngine\n", "# Ref: Ch.15, pp. 447–450 — Multi-round consensus protocol\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "\n", "class ConsensusEngine:\n", " \"\"\"Orchestrates multi-agent consensus via a bounded multi-round protocol.\n", "\n", " Implements proposal → cross-evaluation → weighted synthesis with\n", " adversarial critic rotation and convergence detection.\n", "\n", " Ref: Ch.15, pp. 447–450\n", " \"\"\"\n", "\n", " def __init__(self, agents: list[CollaborativeAgent]) -> None:\n", " self.agents = agents\n", " self.expertise_weights: dict[str, float] = {\n", " a.id: 1.0 for a in agents\n", " }\n", " self._logger = ColorLogger(\"ConsensusEngine\")\n", "\n", " def run_consensus(self, problem: Problem,\n", " max_rounds: int = 3) -> ConsensusResult:\n", " \"\"\"Execute multi-round consensus protocol with convergence\n", " detection and early termination.\n", "\n", " Phase 1: Bootstrap — each agent produces an initial proposal.\n", " Phase 2: Critique — adversarial rotation, cross-evaluation.\n", " Phase 3: Convergence — synthesize or refine.\n", "\n", " Ref: Ch.15, pp. 447–450\n", " \"\"\"\n", " self._logger.info(\n", " f\"Starting consensus: {len(self.agents)} agents, \"\n", " f\"max_rounds={max_rounds}\"\n", " )\n", "\n", " context = SharedContext()\n", " proposals = []\n", " audit_trail = []\n", "\n", " # Phase 1: Initial proposals\n", " self._logger.info(\"Phase 1: Generating initial proposals...\")\n", " for agent in self.agents:\n", " proposal = agent.propose_solution(problem, context)\n", " proposals.append(proposal)\n", " audit_trail.append(\n", " f\"Round 0: {agent.role} proposed {proposal.id}\"\n", " )\n", "\n", " # Phase 2: Critique-and-refine loop\n", " best_result = None\n", " scores = {}\n", " evaluations = []\n", "\n", " for round_num in range(max_rounds):\n", " self._logger.info(f\"Round {round_num + 1}: Cross-evaluation...\")\n", "\n", " # Adversarial critic rotation (pp. 446–447)\n", " critic_idx = round_num % len(self.agents)\n", " self.agents[critic_idx].set_role(\"adversarial_critic\")\n", " audit_trail.append(\n", " f\"Round {round_num + 1}: \"\n", " f\"{self.agents[critic_idx]._original_role} → adversarial critic\"\n", " )\n", "\n", " # Cross-evaluation: every agent evaluates every other's proposal\n", " evaluations = []\n", " for agent in self.agents:\n", " for proposal in proposals:\n", " if proposal.agent_id != agent.id:\n", " evaluation = agent.evaluate_proposal(\n", " proposal, problem\n", " )\n", " evaluations.append(evaluation)\n", "\n", " # Compute expertise-weighted consensus scores\n", " scores = self._compute_consensus_scores(\n", " proposals, evaluations\n", " )\n", "\n", " self._logger.info(\n", " f\"Round {round_num + 1} scores: \"\n", " + \", \".join(\n", " f\"{pid}={s:.2f}\" for pid, s in scores.items()\n", " )\n", " )\n", "\n", " # Convergence check\n", " if self._has_converged(scores, tolerance=0.5):\n", " self._logger.success(\n", " f\"Converged at round {round_num + 1}!\"\n", " )\n", " best_result = self._synthesize(\n", " proposals, evaluations, scores\n", " )\n", " best_result.rounds = round_num + 1\n", " best_result.audit_trail = audit_trail\n", " self.agents[critic_idx].reset_role()\n", " break\n", "\n", " # Refine proposals for next round\n", " proposals = self._refine_proposals(\n", " proposals, evaluations, context\n", " )\n", " self.agents[critic_idx].reset_role()\n", "\n", " if best_result is None:\n", " self._logger.warn(\n", " f\"Max rounds reached. Synthesizing final result.\"\n", " )\n", " best_result = self._synthesize(\n", " proposals, evaluations, scores\n", " )\n", " best_result.rounds = max_rounds\n", " best_result.audit_trail = audit_trail\n", "\n", " return best_result\n", "\n", " def run_session(self, context_description: str,\n", " max_rounds: int = 3) -> dict:\n", " \"\"\"Convenience wrapper matching the chapter's runnable example (p. 452).\n", "\n", " Returns a dict with 'rounds', 'consensus_score', 'final_proposal',\n", " and 'audit_trail'.\n", " \"\"\"\n", " problem = Problem(description=context_description)\n", " result = self.run_consensus(problem, max_rounds=max_rounds)\n", " return {\n", " \"rounds\": result.rounds,\n", " \"consensus_score\": result.consensus_score,\n", " \"final_proposal\": result.final_proposal,\n", " \"audit_trail\": result.audit_trail,\n", " }\n", "\n", " def _compute_consensus_scores(\n", " self, proposals: list[Proposal],\n", " evaluations: list[Evaluation]\n", " ) -> dict[str, float]:\n", " \"\"\"Compute expertise-weighted consensus scores.\n", "\n", " Score(p_j) = sum_i [w_i * relevance_i * score_ij] / total_weight\n", "\n", " Ref: Ch.15, p. 445 (weighted voting formula), pp. 448–449\n", " \"\"\"\n", " scores = {}\n", " for proposal in proposals:\n", " relevant_evals = [\n", " e for e in evaluations\n", " if e.proposal_id == proposal.id\n", " ]\n", " weighted_sum = sum(\n", " self.expertise_weights.get(e.evaluator_id, 1.0)\n", " * e.scores.get(\"overall\", 5.0)\n", " for e in relevant_evals\n", " )\n", " total_weight = sum(\n", " self.expertise_weights.get(e.evaluator_id, 1.0)\n", " for e in relevant_evals\n", " )\n", " scores[proposal.id] = (\n", " weighted_sum / total_weight\n", " if total_weight > 0 else 0.0\n", " )\n", " return scores\n", "\n", " def _has_converged(self, scores: dict,\n", " tolerance: float = 0.5) -> bool:\n", " \"\"\"Check if score distribution has stabilized.\n", "\n", " Convergence: range of scores is within tolerance.\n", " \"\"\"\n", " if not scores:\n", " return False\n", " values = list(scores.values())\n", " return (max(values) - min(values)) <= tolerance\n", "\n", " def _refine_proposals(\n", " self, proposals: list[Proposal],\n", " evaluations: list[Evaluation],\n", " context: SharedContext\n", " ) -> list[Proposal]:\n", " \"\"\"Produce refined proposals informed by critiques.\n", "\n", " In mock mode, returns the original proposals (refinement\n", " requires generative LLM). In live mode, each agent would\n", " receive its critiques and regenerate.\n", " \"\"\"\n", " self._logger.info(\"Refining proposals based on critiques...\")\n", " return proposals # Mock: pass-through\n", "\n", " def _synthesize(\n", " self, proposals: list[Proposal],\n", " evaluations: list[Evaluation],\n", " scores: dict[str, float]\n", " ) -> ConsensusResult:\n", " \"\"\"Synthesize the final result from scored proposals.\n", "\n", " Selects the highest-scored proposal as the basis, then\n", " uses LLM to synthesize a combined result if available.\n", "\n", " Ref: Ch.15, pp. 448–449\n", " \"\"\"\n", " if not scores:\n", " return ConsensusResult()\n", "\n", " # Find best proposal\n", " best_id = max(scores, key=scores.get)\n", " best_proposal = next(\n", " (p for p in proposals if p.id == best_id),\n", " proposals[0]\n", " )\n", "\n", " # Attempt LLM synthesis\n", " try:\n", " all_contents = \"\\n---\\n\".join(\n", " f\"Agent {p.agent_id}: {p.content[:300]}\"\n", " for p in proposals\n", " )\n", " synthesis_prompt = (\n", " f\"Synthesize the following proposals into a final \"\n", " f\"consensus result. Combine the strongest elements.\\n\"\n", " f\"{all_contents}\"\n", " )\n", " synth_llm = self.agents[0].llm\n", " if synth_llm:\n", " synthesis = synth_llm.generate(synthesis_prompt)\n", " else:\n", " synthesis = best_proposal.content\n", " except Exception:\n", " synthesis = best_proposal.content\n", "\n", " avg_score = (\n", " sum(scores.values()) / len(scores) if scores else 0.0\n", " )\n", "\n", " self._logger.success(\n", " f\"Synthesis complete. Consensus score: {avg_score:.2f}\"\n", " )\n", "\n", " return ConsensusResult(\n", " final_proposal=synthesis,\n", " consensus_score=avg_score,\n", " )\n" ] }, { "cell_type": "code", "execution_count": null, "id": "3e3dcb50", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Test: ConsensusEngine convergence\n", "# Ref: Strategy §11, D16; Quality Checklist F8 — converges in ≤ 3 rounds\n", "# ===========================================================================\n", "\n", "ce_test = ColorLogger(\"ConsensusTest\")\n", "\n", "# Create three test agents\n", "test_agents = [\n", " CollaborativeAgent(\n", " agent_id=\"agt_ped\", role=\"Pedagogy Specialist\",\n", " expertise=[\"scaffolding\", \"cognitive load\", \"formative feedback\"],\n", " llm_client=llm_client,\n", " ),\n", " CollaborativeAgent(\n", " agent_id=\"agt_dom\", role=\"Domain Expert\",\n", " expertise=[\"algorithm correctness\", \"edge cases\", \"code style\"],\n", " llm_client=llm_client,\n", " ),\n", " CollaborativeAgent(\n", " agent_id=\"agt_asmt\", role=\"Assessment Specialist\",\n", " expertise=[\"rubric validity\", \"inter-rater reliability\", \"grade distribution\"],\n", " llm_client=llm_client,\n", " ),\n", "]\n", "\n", "engine = ConsensusEngine(test_agents)\n", "test_problem = Problem(\n", " description=\"Design a grading rubric for merging two sorted lists.\"\n", ")\n", "\n", "result = engine.run_consensus(test_problem, max_rounds=3)\n", "\n", "ce_test.success(f\"Rounds: {result.rounds}\")\n", "ce_test.success(f\"Consensus score: {result.consensus_score:.2f}\")\n", "ce_test.info(f\"Audit trail: {len(result.audit_trail)} entries\")\n", "for entry in result.audit_trail:\n", " ce_test.info(f\" {entry}\")\n", "\n", "# F8: Must converge in ≤ 3 rounds\n", "assert result.rounds <= 3, f\"F8 FAILED: took {result.rounds} rounds\"\n", "ce_test.success(f\"F8: Converged in {result.rounds} round(s) (≤ 3) — PASS\")\n", "\n", "assert len(result.final_proposal) > 0, \"Final proposal should not be empty\"\n", "ce_test.success(\"Final proposal is non-empty — PASS\")\n" ] }, { "cell_type": "markdown", "id": "b03477a7", "metadata": {}, "source": [ "### Cell Group 11: Rubric Design Case Study + Emergent Intelligence\n", "\n", "**Ref:** Ch.15, pp. 450–452 (rubric case study); pp. 452–453 (emergent intelligence)\n", "\n", "Three agents collaboratively design a grading rubric for a merge-sorted-lists assignment:\n", "- **Pedagogy Agent** — Process-oriented, 40% strategy / 30% correctness / 30% readability\n", "- **Domain Expert** — Technical rigor, 50% correctness / 30% efficiency / 20% style\n", "- **Assessment Agent** — Binary reliability, 5 pass/fail criteria\n", "\n", "The consensus protocol produces a **hybrid rubric** combining the Assessment Agent's binary criteria with the Pedagogy Agent's partial-credit scale and the Domain Expert's edge-case coverage — a result none produced independently.\n", "\n", "> **Info Box — Emergent Intelligence Mechanisms (pp. 452–453):**\n", "> - **Cross-pollination prompting:** Exposes each agent to the strongest elements of competing proposals during refinement, triggering novel combinations. The pedagogy agent adopted the assessment agent's binary-criteria structure while retaining its own partial-credit scoring — a combination neither proposed independently.\n", "> - **TRIZ-inspired constraint relaxation:** When consensus stalls, the facilitator systematically loosens problem constraints, forcing agents to question unexamined assumptions.\n", "> - **Analogical transfer:** Agents with expertise outside the immediate domain introduce concepts from their field. An agent versed in clinical diagnostic reasoning might reframe misconception detection as differential diagnosis.\n", "\n", "> **Synergistic Information (p. 452):** The most compelling aspect of collective intelligence is not aggregation but emergence: solutions that arise from agent interaction and that no individual could have produced alone.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "01066210", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 11: Rubric Design Case Study — Runnable CI Example\n", "# Ref: Ch.15, pp. 450–452; Strategy §11, D17\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "rubric_logger = ColorLogger(\"RubricCaseStudy\")\n", "rubric_logger.info(\"=\"*60)\n", "rubric_logger.info(\"CASE STUDY: Multi-Agent Rubric Design\")\n", "rubric_logger.info(\"=\"*60)\n", "\n", "# 1. Instantiate role-specialized agents (p. 451)\n", "pedagogy = CollaborativeAgent(\n", " agent_id=\"pedagogy_01\",\n", " role=\"Pedagogy Specialist\",\n", " expertise=[\"scaffolding\", \"cognitive load\", \"formative feedback\"],\n", " llm_client=llm_client,\n", ")\n", "\n", "domain = CollaborativeAgent(\n", " agent_id=\"domain_01\",\n", " role=\"Domain Expert\",\n", " expertise=[\"algorithm correctness\", \"edge cases\", \"code style\"],\n", " llm_client=llm_client,\n", ")\n", "\n", "assessment = CollaborativeAgent(\n", " agent_id=\"assessment_01\",\n", " role=\"Assessment Specialist\",\n", " expertise=[\"rubric validity\", \"inter-rater reliability\",\n", " \"grade distribution\"],\n", " llm_client=llm_client,\n", ")\n", "\n", "# 2. Build the consensus engine (p. 451)\n", "engine = ConsensusEngine(\n", " agents=[pedagogy, domain, assessment]\n", ")\n", "\n", "# 3. Define the shared problem context (pp. 450–451)\n", "context_desc = (\n", " \"Design a grading rubric for an introductory \"\n", " \"programming assignment: implement a function \"\n", " \"that merges two sorted lists.\"\n", ")\n", "\n", "# 4. Run the consensus session (p. 452)\n", "rubric_logger.info(\"Running consensus session...\")\n", "result = engine.run_session(context_desc)\n", "\n", "# 5. Inspect the output\n", "rubric_logger.info(\"\")\n", "rubric_logger.success(f\"Rounds: {result['rounds']}\")\n", "rubric_logger.success(f\"Consensus score: {result['consensus_score']:.2f}\")\n", "rubric_logger.info(\"\")\n", "rubric_logger.info(\"Final synthesized rubric:\")\n", "# Display first 500 chars of the proposal\n", "for line in result['final_proposal'][:500].split('\\n'):\n", " rubric_logger.info(f\" {line}\")\n", "if len(result['final_proposal']) > 500:\n", " rubric_logger.info(f\" ... ({len(result['final_proposal'])} chars total)\")\n", "\n", "rubric_logger.info(\"\")\n", "rubric_logger.info(\"Audit trail:\")\n", "for entry in result['audit_trail']:\n", " rubric_logger.info(f\" {entry}\")\n", "\n", "# Assertions\n", "assert result['rounds'] <= 3, \"Should converge in ≤ 3 rounds\"\n", "assert len(result['final_proposal']) > 0, \"Should produce a rubric\"\n", "rubric_logger.success(\"Rubric case study complete — hybrid rubric produced\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f4b81b7e", "metadata": {}, "outputs": [], "source": [ "# ===========================================================================\n", "# Cell Group 11b: Emergent Intelligence Demonstration\n", "# Ref: Ch.15, pp. 452–453 — Cross-pollination prompting\n", "# Author: Imran Ahmad\n", "# ===========================================================================\n", "\n", "emerge_logger = ColorLogger(\"EmergentIntelligence\")\n", "emerge_logger.info(\"=\"*60)\n", "emerge_logger.info(\"EMERGENT INTELLIGENCE: Cross-Pollination Demo\")\n", "emerge_logger.info(\"=\"*60)\n", "\n", "# Cross-pollination: expose each agent to strongest elements\n", "# of competing proposals (pp. 452–453)\n", "cross_pollination_prompt = (\n", " \"You have access to the strongest elements from competing proposals. \"\n", " \"The Domain Expert emphasized edge-case testing. The Pedagogy Agent \"\n", " \"emphasized process-oriented assessment. Combine these into novel combinations \"\n", " \"that neither agent proposed independently.\"\n", ")\n", "\n", "emerge_logger.info(\"Invoking cross-pollination with combined expertise...\")\n", "cross_response = llm_client.generate(cross_pollination_prompt)\n", "\n", "emerge_logger.info(\"\")\n", "emerge_logger.info(\"Cross-pollination result:\")\n", "for line in cross_response[:400].split('\\n'):\n", " emerge_logger.info(f\" {line}\")\n", "\n", "emerge_logger.info(\"\")\n", "emerge_logger.info(\"Emergent mechanisms demonstrated (pp. 452–453):\")\n", "emerge_logger.info(\" 1. Cross-pollination prompting — novel criterion combining\")\n", "emerge_logger.info(\" Domain Expert edge-case coverage with Pedagogy process emphasis\")\n", "emerge_logger.info(\" 2. TRIZ-inspired constraint relaxation — when consensus stalls,\")\n", "emerge_logger.info(\" Facilitator loosens constraints to explore alternatives\")\n", "emerge_logger.info(\" 3. Analogical transfer — agents from outside the domain\")\n", "emerge_logger.info(\" introduce concepts (e.g., clinical diagnostic reasoning)\")\n", "emerge_logger.success(\"Emergent intelligence demonstration complete\")\n" ] }, { "cell_type": "markdown", "id": "74387c76", "metadata": {}, "source": [ "---\n", "## Summary & Further Reading\n", "\n", "**Ref:** Ch.15, pp. 453–454\n", "\n", "This notebook implemented two complementary agent architectures from Chapter 15:\n", "\n", "### Education Intelligence Agent (pp. 422–441)\n", "Combined knowledge graph curricula, Bayesian Knowledge Tracing, and adaptive techniques grounded in cognitive science to create a personalized learning system. The Alex case study demonstrated all components in production: IRT-based placement, ZPD-aligned curriculum planning, BKT mastery tracking, SM-2 spaced repetition, and a two-stage misconception-detecting feedback generator. In production deployment, the programming tutor achieved a **34% improvement in assessment scores** and pushed course completion from 71% to 89% (p. 453).\n", "\n", "### Collective Intelligence Agent (pp. 441–453)\n", "Extended multi-agent collaboration into consensus-driven problem solving. Weighted voting with expertise calibration, adversarial critics for groupthink prevention, and cross-pollination for emergent insight produced synthesized designs that outperformed single-agent outputs. The rubric design case study showed three agents converging on a hybrid evaluation instrument that none produced independently.\n", "\n", "### Key Principle\n", "> *The most powerful agent systems combine deep domain specialization with systematic feedback loops and collaborative reasoning. Whether teaching a student or solving a design problem, the pattern holds: **perceive** the current state, **reason** about the best next action, **act**, observe the outcome, and **learn**.*\n", "\n", "> *Education and knowledge agents apply this pattern in domains where success is measured not by the agent's own performance but by the growth of the humans it serves.* — Ch.15, pp. 453–454\n", "\n", "### Further Reading\n", "- **Chapter 1:** Cognitive loop foundations\n", "- **Chapter 12:** Ethical and explainable agents (audit trail requirements)\n", "- **Chapter 13:** Stochastic decision-making (POMDP extensions)\n", "- **Chapter 16:** Embodied agents that bridge digital and physical intelligence\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }