{ "cells": [ { "cell_type": "markdown", "id": "a81bcae4", "metadata": {}, "source": [ "# Chapter 9: Software Development Agents\n", "\n", "**Book:** *Agents* by Imran Ahmad (Packt, 2026)\n", "**Author:** Imran Ahmad\n", "**Chapter pages:** 235–279\n", "\n", "> *\"The art of programming is the art of organizing and mastering complexity.\"* — Edsger Dijkstra\n", "\n", "---\n", "\n", "## Introduction\n", "\n", "This notebook is the companion code for **Chapter 9** of *Agents* (Packt, 2026). The chapter explores how AI agents are reshaping the way software is built, tested, and improved. Software development has always been about managing complexity through abstraction — from assembly to high-level languages, from manual builds to automated pipelines, from ad-hoc testing to continuous integration. AI agents represent the next step in this progression, bringing **reasoning and adaptation** to the engineering process rather than operating on fixed rules and predefined workflows.\n", "\n", "Unlike a static analyzer that flags violations after code is written, an agent can generate compliant code from the start. Unlike a CI pipeline that executes a predetermined sequence of checks, an agent can plan a verification strategy, interpret failures, and refine its approach based on feedback. Tools like **Cursor** and **Devin** demonstrate this capability by autonomously navigating codebases, planning multi-step changes, and executing them across files.\n", "\n", "The chapter builds directly on the foundations laid in **Chapter 5** (cognitive architectures) and **Chapter 7** (tool-use frameworks), applying them to software engineering itself.\n", "\n", "### Three Agent Classes\n", "\n", "| Agent Class | Feedback Loop | Chapter Section | Book Pages |\n", "|:---|:---|:---:|:---:|\n", "| **Code-Generation** | generate → test → refine | §9.2 | 239–255 |\n", "| **Compliance-Driven** | scan → evaluate → remediate | §9.3 | 255–265 |\n", "| **Self-Improving** | execute → observe → learn → adapt | §9.4 | 265–278 |\n", "\n", "### Ecosystem & Tooling Layers (Figure 9.1, p. 237)\n", "\n", "The agent ecosystem is organized into four functional layers:\n", "1. **Orchestration Frameworks** — LangGraph, LangChain, Workflow Engines: define stateful workflows, refinement loops, memory flow, and tool coordination\n", "2. **Reasoning Cores** — LLMs with retrieval (RAG), tool definitions, and task decomposition capabilities\n", "3. **Quality & Security Gates** — Static analysis, type checking, policy-as-code, compliance rules embedded in the execution loop\n", "4. **Observability Platforms** — Tracing, metrics, dashboards, and feedback loops for reliability and improvement\n", "\n", "### The Shift Toward Autonomy (pp. 237–239)\n", "\n", "Key emerging trends include **Test-Driven Generation (TDG)** as the default mode, **repository-grounded reasoning** to combat hallucinations, **multi-agent specialization** (planner/coder/critic roles), **policy-aware coding** that shifts compliance left, and **hybrid deployment models** balancing hosted and on-premise LLMs.\n", "\n", "### Adoption Maturity Curve (Figure 9.2, p. 239)\n", "\n", "Teams typically progress through four stages: **Drafting** (low-risk code generation) → **Review + Tests** (test synthesis and quality gates in CI) → **Conditional Autonomy** (agents generate PRs, humans approve merges) → **Feedback-Driven** (self-improving agents that learn from operational data).\n", "\n", "> **Note — Conditional Autonomy** (p. 238): The agent can independently execute the full development cycle (code generation, testing, refinement, and pull request creation) but requires explicit human approval before changes are merged into production branches. *The agent proposes; the developer disposes.*\n", "\n", "---\n", "\n", "All cells execute in **Simulation Mode** without an API key. Set `OPENAI_API_KEY` in a `.env` file to switch to Live Mode.\n" ] }, { "cell_type": "markdown", "id": "fbef6190", "metadata": {}, "source": [ "### Figure 9.1 — Agent Ecosystem: Four Functional Layers (p. 237)\n", "\n", "```\n", "┌─────────────────────────────────────────────────────────────────────┐\n", "│ 1. Orchestration Frameworks │\n", "│ LangGraph │ LangChain │ Workflow Engines │\n", "│ Define stateful workflows, refinement loops, tool coordination │\n", "├─────────────────────────────────────────────────────────────────────┤\n", "│ ▼ │\n", "│ 2. Reasoning Cores │\n", "│ LLMs │ RAG │ Tool Definitions │ Task Decomposition │\n", "│ LLMs with retrieval, planning, and task decomposition │\n", "├─────────────────────────────────────────────────────────────────────┤\n", "│ ▼ │\n", "│ 3. Quality & Security Gates │\n", "│ Static Analysis │ Type Checking │ Policy as Code │ Compliance │\n", "│ Embed validation and compliance checks into the execution loop │\n", "├─────────────────────────────────────────────────────────────────────┤\n", "│ ▼ │\n", "│ 4. Observability Platforms │\n", "│ Tracing │ Metrics │ Dashboards │ Feedback Loops │\n", "│ Provide tracing, metrics, and feedback for reliability │\n", "└─────────────────────────────────────────────────────────────────────┘\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "70f5e198", "metadata": {}, "outputs": [], "source": [ "# Cell 2: Install dependencies (run once)\n", "# Ref: requirements.txt — pinned versions for reproducibility\n", "!pip install -r requirements.txt -q" ] }, { "cell_type": "code", "execution_count": null, "id": "794715e7", "metadata": {}, "outputs": [], "source": [ "# Cell 3: Imports and API key resolution\n", "# Ref: src/utils.py — zero-hardcode key chain, ColorLog, @fail_gracefully\n", "\n", "import os\n", "import sys\n", "import json\n", "from datetime import datetime, timezone\n", "\n", "# Ensure chapter09 package is importable from repo root\n", "_repo_root = os.path.dirname(os.path.abspath(''))\n", "if _repo_root not in sys.path:\n", " sys.path.insert(0, _repo_root)\n", "\n", "# --- Core Framework ---\n", "from chapter09.utils import ColorLog, fail_gracefully, get_api_key, is_simulation_mode\n", "\n", "# --- State Models (§9.2, §9.4) ---\n", "from chapter09.state_models import (\n", " Task, AgentState, AdaptationType,\n", " ImprovementHypothesis, PlannerOutput, FeedbackRecord,\n", ")\n", "\n", "# --- Mock Layer (Simulation Mode) ---\n", "from chapter09.mock_llm import MockLLM, MockTestRunner\n", "\n", "# --- Agent Nodes (§9.2) ---\n", "from chapter09.agent_nodes import build_workflow, extract_code_from_response\n", "\n", "# --- Compliance Engine (§9.3) ---\n", "from chapter09.compliance_engine import (\n", " PolicyEngine, ComplianceScanner, RemediationGenerator,\n", " AuditTrail, DataFlowAnalyzer,\n", ")\n", "\n", "# --- Self-Improving Agents (§9.4) ---\n", "from chapter09.self_improving import (\n", " SensingLayer, CriticAgent, PlannerAgent,\n", " LearningLayer, HITLCheckpoint, run_self_improvement_loop,\n", ")\n", "\n", "# --- API Key Resolution ---\n", "api_key = get_api_key()\n", "\n", "if is_simulation_mode():\n", " llm = MockLLM()\n", "else:\n", " from langchain_openai import ChatOpenAI\n", " llm = ChatOpenAI(\n", " model=\"gpt-4o\",\n", " api_key=api_key,\n", " temperature=0.2,\n", " request_timeout=30,\n", " )\n", " ColorLog.success(\"Live Mode: ChatOpenAI initialized.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "bb07b405", "metadata": {}, "outputs": [], "source": [ "# Cell 4: Environment verification\n", "# Confirms all modules loaded and displays active mode.\n", "\n", "ColorLog.header(\"ENVIRONMENT VERIFICATION\")\n", "\n", "import pydantic\n", "ColorLog.info(f\"Python: {sys.version.split()[0]}\")\n", "ColorLog.info(f\"Pydantic: {pydantic.__version__}\")\n", "\n", "try:\n", " import langgraph\n", " ColorLog.info(f\"LangGraph: {langgraph.__version__}\")\n", "except Exception:\n", " ColorLog.warning(\"LangGraph not available — workflow demos will use direct node calls.\")\n", "\n", "ColorLog.info(f\"Mode: {'Simulation (MockLLM)' if is_simulation_mode() else 'Live (ChatOpenAI)'}\")\n", "ColorLog.info(f\"LLM type: {type(llm).__name__}\")\n", "ColorLog.success(\"All modules loaded. Ready to proceed.\")" ] }, { "cell_type": "markdown", "id": "09ecbf2a", "metadata": {}, "source": [ "---\n", "\n", "## §9.2 — Code-Generation Agents (pp. 239–255)\n", "\n", "Code-Generation agents transform natural language specifications into working, tested implementations using **Test-Driven Generation (TDG)** — a methodology that adapts traditional test-driven development (TDD) for autonomous systems (p. 239).\n", "\n", "### Why TDG Matters (p. 239)\n", "\n", "Standard LLM code generation is a one-shot process: prompt → generate → accept/reject. This is inherently unreliable because the model may generate syntactically valid code that fails at runtime, produce correct logic that violates project conventions, or hallucinate APIs that don't exist. TDG eliminates this uncertainty by establishing an **executable specification** (the test suite) before any code is written. The agent's output is not considered complete until it demonstrably satisfies every assertion.\n", "\n", "### The Three-Phase TDG Workflow (Figure 9.3, p. 240)\n", "\n", "1. **Red** (Tester Agent) — Analyze requirements, identify edge cases, generate a comprehensive test suite. Tests *must* fail initially, proving they are active and the feature doesn't yet exist.\n", "2. **Green** (Developer Agent) — Write minimal code to make the test suite pass. Test output returns as structured feedback, forcing iteration until all assertions succeed.\n", "3. **Refactor** — Optimize code quality: eliminate duplication, extract reusable functions, apply style conventions. The test suite runs after every change to ensure behavior remains consistent.\n", "\n", "### Multi-Agent Orchestration (Figure 9.4, pp. 241–243)\n", "\n", "The implementation uses **LangGraph** for explicit state graphs with built-in conditional branching, persistent memory, and HITL checkpoints. An **orchestrator agent** manages the workflow, decomposing requirements into tasks and routing them to specialized workers (tester, developer, refactoring agents).\n", "\n", "> **Note — Framework Choice** (p. 242): CrewAI emphasizes role-based agent definitions with simplified configuration, making it well-suited for rapid prototyping. AutoGen provides a conversation-based framework with strong HITL support. LangGraph was chosen because its explicit state graph model makes control flow visible and debuggable.\n", "\n", "This section demonstrates the full TDG cycle through two progressively complex examples:\n", "- A **shipping calculator** function (single-agent, Stages 1–6)\n", "- A **full-stack user profile** feature (multi-agent, T1/T2/T3)\n" ] }, { "cell_type": "markdown", "id": "e9514c03", "metadata": {}, "source": [ "### Figure 9.3 — Three Phases of the TDD Agent Workflow (p. 240)\n", "\n", "```\n", "┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────┐\n", "│ 🔴 RED PHASE │ │ 🟢 GREEN PHASE │ │ 🔵 REFACTOR │\n", "│ Tester Agent │────────▶│ Developer Agent │────────▶│ Clean Code │\n", "│ │ │ │ │ │\n", "│ Create failing │ │ Implement code to │ │ Optimize while │\n", "│ tests → spec for │ │ pass tests │ │ keeping tests │\n", "│ feature │ │ │ │ green │\n", "└─────────────────────┘ └──────────┬──────────┘ └──────────┬──────────┘\n", " ▲ │ │\n", " │ Tests fail? │\n", " │ ┌────────┴────────┐ │\n", " │ │ YES: Refine & │ │\n", " │ │ iterate back │──────────────────────┘\n", " │ └─────────────────┘\n", " │ │\n", " └──────────────────────────────────────────────────────────────────┘\n", " Feedback arrows: tests govern progress\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "58982ef4", "metadata": {}, "outputs": [], "source": [ "# Cell 6: Inspect the Task and AgentState models\n", "# Ref: §9.2, \"Implementing State Management\" (pp. 247–248)\n", "\n", "ColorLog.header(\"STATE MODELS — Task & AgentState\")\n", "\n", "# Display Task model schema\n", "ColorLog.info(\"Task model fields:\")\n", "for name, field in Task.model_fields.items():\n", " annotation = field.annotation\n", " default = field.default if field.default is not None else \"required\"\n", " ColorLog.info(f\" {name}: {annotation} = {default}\")\n", "\n", "print()\n", "\n", "# Create a sample task matching the chapter's shipping calculator\n", "sample_task = Task(\n", " task_id=\"shipping-calc\",\n", " description=\"Implement calculate_shipping(cart_total, weight) with tiered discounts\",\n", " task_type=\"backend\",\n", ")\n", "ColorLog.success(f\"Sample Task created: {sample_task.task_id} ({sample_task.task_type})\")\n", "ColorLog.info(f\" Status: {sample_task.status}, Iterations: {sample_task.iterations}\")" ] }, { "cell_type": "markdown", "id": "f44b3faa", "metadata": {}, "source": [ "> **Info Box — Pydantic** (p. 248): Pydantic is a Python library for data validation using type annotations. It allows developers to define structured data models as classes whose fields are automatically validated at runtime, ensuring that agent state always conforms to the expected schema. The `Task` and `AgentState` models above enforce type safety on every state transition in the LangGraph workflow." ] }, { "cell_type": "code", "execution_count": null, "id": "e5b97f32", "metadata": {}, "outputs": [], "source": [ "# Cell 7: Stage 1 — Task Assignment\n", "# The orchestrator agent assigns the shipping calculator task.\n", "# Ref: §9.2, \"Stage 1: Task Assignment\" (p. 244)\n", "\n", "ColorLog.header(\"STAGE 1: TASK ASSIGNMENT\")\n", "\n", "task_prompt = (\n", " \"Assign the following task to the developer agent: \"\n", " \"Write a function calculate_shipping(cart_total, weight) \"\n", " \"that returns the shipping cost after applying appropriate discounts.\"\n", ")\n", "\n", "response = llm.invoke(task_prompt)\n", "ColorLog.info(\"Orchestrator assigned task to Developer Agent:\")\n", "print()\n", "print(response.content)" ] }, { "cell_type": "code", "execution_count": null, "id": "fa2b6921", "metadata": {}, "outputs": [], "source": [ "# Cell 8: Stage 2 — Code Synthesis (Initial Implementation)\n", "# The developer agent generates the first version of calculate_shipping.\n", "# This version is intentionally incomplete — missing negative weight validation.\n", "# Ref: §9.2, \"Stage 2: Code Synthesis\" (p. 244)\n", "\n", "ColorLog.header(\"STAGE 2: CODE SYNTHESIS (Initial)\")\n", "\n", "code_prompt = (\n", " \"You are an expert Python backend developer. \"\n", " \"Write a function calculate_shipping(cart_total, weight) that: \"\n", " \"- Base rate: $5.00 \"\n", " \"- Weight cost: $0.50 per unit \"\n", " \"- Tiered discounts: >$100 cart → 20%, >$50 → 10%, else 0% \"\n", " \"Generate the complete implementation.\"\n", ")\n", "\n", "response = llm.invoke(code_prompt)\n", "initial_code = extract_code_from_response(response.content, \"python\")\n", "\n", "ColorLog.info(\"Developer Agent generated initial implementation:\")\n", "print()\n", "print(initial_code)\n", "ColorLog.warning(\"Note: This initial version lacks negative weight validation.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "f475fcf8", "metadata": {}, "outputs": [], "source": [ "# Cell 9: Stage 3 — Test Synthesis\n", "# The tester agent generates a comprehensive pytest suite including edge cases.\n", "# Critically, test_negative_weight expects a ValueError the initial code doesn't raise.\n", "# Ref: §9.2, \"Stage 3: Test Synthesis\" (p. 245)\n", "\n", "ColorLog.header(\"STAGE 3: TEST SYNTHESIS\")\n", "\n", "test_prompt = (\n", " \"Write a comprehensive test suite for calculate_shipping using pytest. \"\n", " \"Include tests for basic shipping, tier one and two discounts, \"\n", " \"zero weight, and negative weight edge case.\"\n", ")\n", "\n", "response = llm.invoke(test_prompt)\n", "test_code = extract_code_from_response(response.content, \"python\")\n", "\n", "ColorLog.info(\"Tester Agent generated pytest suite:\")\n", "print()\n", "print(test_code)\n", "ColorLog.info(\"Notice: test_negative_weight expects ValueError — this will fail initially.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "5f682bd1", "metadata": {}, "outputs": [], "source": [ "# Cell 10: Stage 4 — Execution & Validation (First Run)\n", "# The MockTestRunner simulates pytest. The initial code lacks ValueError\n", "# handling, so test_negative_weight fails — exactly as described in §9.2 (pp. 245–246).\n", "# Ref: §9.2, \"Stage 4: Execution and Validation\" (pp. 245–246)\n", "\n", "ColorLog.header(\"STAGE 4: EXECUTION & VALIDATION (Run 1)\")\n", "\n", "test_runner = MockTestRunner()\n", "\n", "# Run tests against the initial code (no ValueError handling)\n", "passed, output = test_runner.run(initial_code)\n", "\n", "print(output)\n", "print()\n", "\n", "if not passed:\n", " ColorLog.error(\"Tests FAILED — routing to refinement loop (Stage 5).\")\n", "else:\n", " ColorLog.success(\"All tests passed.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "25db0446", "metadata": {}, "outputs": [], "source": [ "# Cell 11: Stage 5 — Refinement Loop\n", "# The developer agent receives the failure context and generates a revised\n", "# implementation that includes the missing ValueError for negative weight.\n", "# Ref: §9.2, \"Stage 5: The Refinement Loop\" (p. 246)\n", "\n", "ColorLog.header(\"STAGE 5: REFINEMENT LOOP\")\n", "\n", "refinement_prompt = (\n", " \"Your previous shipping code FAILED the following tests. Fix it. \"\n", " \"FAILED test_negative_weight - Did not raise ValueError. \"\n", " \"Expected exception ValueError but function returned -0.50. \"\n", " \"Fix the calculate_shipping code to handle negative weight by raising ValueError.\"\n", ")\n", "\n", "ColorLog.info(\"Developer Agent received failure context — refining...\")\n", "response = llm.invoke(refinement_prompt)\n", "refined_code = extract_code_from_response(response.content, \"python\")\n", "\n", "ColorLog.success(\"Developer Agent produced refined implementation:\")\n", "print()\n", "print(refined_code)\n", "ColorLog.info(\"Validation added: negative weight now raises ValueError.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "8fc15ed9", "metadata": {}, "outputs": [], "source": [ "# Cell 12: Stage 6 — Success (All Tests Pass)\n", "# The refined code passes all 5 tests, completing the TDG cycle.\n", "# Ref: §9.2, \"Stage 6: Success and Advancement\" (p. 247)\n", "\n", "ColorLog.header(\"STAGE 6: SUCCESS\")\n", "\n", "# Run tests against refined code (contains ValueError)\n", "passed, output = test_runner.run(refined_code)\n", "\n", "print(output)\n", "print()\n", "\n", "if passed:\n", " ColorLog.success(\n", " \"ALL 5 TESTS PASSED — TDG cycle complete. \"\n", " \"Code validated through evidence, not assumption.\"\n", " )\n", " ColorLog.info(\n", " f\"Refinement required {test_runner._run_count} test run(s) \"\n", " f\"to converge on a correct solution.\"\n", " )\n", "else:\n", " ColorLog.error(\"Unexpected failure — check refined code.\")" ] }, { "cell_type": "markdown", "id": "2b68ff8b", "metadata": {}, "source": [ "---\n", "\n", "### Full-Stack Implementation: User Profile Feature (pp. 249–255)\n", "\n", "The shipping calculator demonstrated TDG for a single function. Now we scale to a **full-stack feature** using multi-agent orchestration with **language-specific specialization**.\n", "\n", "#### From Single Agent to Multi-Agent (pp. 249–250)\n", "\n", "The single-agent roles scale as follows:\n", "- **Orchestrator Agent** → **Project Manager Agent** — decomposes features into language-specific tasks with dependency management\n", "- **Developer Agent** → **Backend Agent** (Python/Flask) + **Frontend Agent** (TypeScript/React) — each runs its own generate-test-refine loop\n", "- **Tester Agent** — now generates framework-appropriate suites (`pytest` for backend, `Jest` for frontend)\n", "- New **Integration Stage** — validates cross-layer contracts after individual components pass\n", "\n", "#### Task Decomposition\n", "\n", "- **T1 (Backend):** Flask API endpoint — `GET /api/v1/users/{id}`\n", "- **T2 (Frontend):** React `UserProfile` component\n", "- **T3 (Integration):** Application routing\n", "\n", "The `build_workflow(llm)` function constructs a LangGraph `StateGraph` with 7 nodes and 2 conditional edges enforcing the `iterations < 3` loop limit.\n", "\n", "Ref: §9.2, \"Practical Implementation: From Single Feature to Full-Stack System\" (pp. 249–255)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "66a722e6", "metadata": {}, "outputs": [], "source": [ "# Cell 14: Build the LangGraph StateGraph\n", "# Ref: §9.2, \"Building the LangGraph Workflow\" — full listing (pp. 251–252)\n", "\n", "ColorLog.header(\"LANGGRAPH WORKFLOW CONSTRUCTION\")\n", "\n", "try:\n", " app = build_workflow(llm)\n", " ColorLog.success(f\"Workflow ready: {type(app).__name__}\")\n", " workflow_available = True\n", "except ImportError as e:\n", " ColorLog.warning(f\"LangGraph not installed: {e}\")\n", " ColorLog.info(\"Will demonstrate node functions individually instead.\")\n", " workflow_available = False" ] }, { "cell_type": "code", "execution_count": null, "id": "58aa57e8", "metadata": {}, "outputs": [], "source": [ "# Cell 15: Backend Agent — T1 (Flask User API)\n", "# Ref: §9.2, \"Implementing Agent Nodes\" — backend_agent_node listing (pp. 250–251)\n", "\n", "ColorLog.header(\"T1: BACKEND AGENT — Flask User API\")\n", "\n", "backend_prompt = (\n", " \"You are an expert Python backend developer. \"\n", " \"Task: Create a GET /api/v1/users/{id} endpoint that returns \"\n", " \"user data (name, email, recent activity). \"\n", " \"Project Context: Flask REST API with SQLAlchemy ORM. \"\n", " \"Requirements: Use Flask blueprints, include error handling, \"\n", " \"follow PEP 8 style guidelines.\"\n", ")\n", "\n", "response = llm.invoke(backend_prompt)\n", "backend_code = extract_code_from_response(response.content, \"python\")\n", "\n", "ColorLog.success(\"Backend Agent generated Flask API code:\")\n", "print()\n", "print(backend_code[:800] + \"\\n...\" if len(backend_code) > 800 else backend_code)" ] }, { "cell_type": "code", "execution_count": null, "id": "fd9b9a37", "metadata": {}, "outputs": [], "source": [ "# Cell 16: Frontend Agent — T2 (React UserProfile)\n", "# Ref: §9.2, \"Agent Specialization for Full-Stack Development\" (p. 250)\n", "\n", "ColorLog.header(\"T2: FRONTEND AGENT — React UserProfile\")\n", "\n", "frontend_prompt = (\n", " \"You are an expert React/TypeScript frontend developer. \"\n", " \"Task: Create a React UserProfile component that fetches \"\n", " \"data from /api/v1/users/{id} and displays name, email, \"\n", " \"and recent activity. Use functional components with hooks.\"\n", ")\n", "\n", "response = llm.invoke(frontend_prompt)\n", "frontend_code = extract_code_from_response(response.content, \"typescript\")\n", "\n", "ColorLog.success(\"Frontend Agent generated React component:\")\n", "print()\n", "print(frontend_code[:800] + \"\\n...\" if len(frontend_code) > 800 else frontend_code)" ] }, { "cell_type": "code", "execution_count": null, "id": "9b9033cd", "metadata": {}, "outputs": [], "source": [ "# Cell 17: Integration — T3 (Routing)\n", "# Ref: §9.2, \"Integration (T3)\" (p. 253)\n", "\n", "ColorLog.header(\"T3: INTEGRATION — Application Routing\")\n", "\n", "integration_prompt = (\n", " \"Integrate the frontend routing to connect with the backend API. \"\n", " \"Add the UserProfile component to the main application routing structure.\"\n", ")\n", "\n", "response = llm.invoke(integration_prompt)\n", "integration_code = extract_code_from_response(response.content, \"typescript\")\n", "\n", "ColorLog.success(\"Integration Agent connected backend + frontend:\")\n", "print()\n", "print(integration_code[:600] + \"\\n...\" if len(integration_code) > 600 else integration_code)" ] }, { "cell_type": "code", "execution_count": null, "id": "07a393bb", "metadata": {}, "outputs": [], "source": [ "# Cell 18: Execute Full LangGraph Workflow\n", "# Runs the complete multi-agent pipeline: planning → backend → frontend → integration → summary\n", "# Ref: §9.2, \"Execution and Measured Outcomes\" (pp. 253–254)\n", "\n", "ColorLog.header(\"FULL-STACK WORKFLOW EXECUTION\")\n", "\n", "if workflow_available:\n", " # Chapter's exact initial_state\n", " initial_state = {\n", " \"user_story\": \"Add a user profile page with name, email, and recent activity\",\n", " \"tasks\": [],\n", " \"current_task\": None,\n", " \"backend_code\": {},\n", " \"frontend_code\": {},\n", " \"test_code\": {},\n", " \"messages\": [],\n", " \"final_output\": None,\n", " }\n", "\n", " ColorLog.info(\"Invoking LangGraph workflow...\")\n", " final_state = app.invoke(initial_state)\n", "\n", " # Display measured outcomes\n", " fo = final_state[\"final_output\"]\n", " print()\n", " ColorLog.header(\"MEASURED OUTCOMES\")\n", " ColorLog.success(f\"Tasks Completed: {fo['tasks_completed']}/{fo['total_tasks']}\")\n", " ColorLog.success(f\"Total Iterations: {fo['total_iterations']}\")\n", " ColorLog.success(f\"Avg Iterations/Task: {fo['avg_iterations']}\")\n", " ColorLog.success(f\"Backend Files: {fo['backend_files']}\")\n", " ColorLog.success(f\"Frontend Files: {fo['frontend_files']}\")\n", "\n", " print()\n", " ColorLog.info(\"Task Details:\")\n", " for detail in fo[\"task_details\"]:\n", " icon = \"\\u2713\" if detail[\"status\"] == \"completed\" else \"\\u2717\"\n", " ColorLog.info(\n", " f\" {icon} {detail['task_id']} ({detail['task_type']}): \"\n", " f\"{detail['status']}, {detail['iterations']} iteration(s)\"\n", " )\n", "else:\n", " ColorLog.info(\"LangGraph not available — individual node demos shown above.\")\n", " ColorLog.info(\n", " \"Install langgraph to run the full workflow: \"\n", " \"pip install 'langgraph>=0.3.0'\"\n", " )" ] }, { "cell_type": "markdown", "id": "46a67cd8", "metadata": {}, "source": [ "---\n", "\n", "## §9.3 — Compliance-Driven Agents (pp. 255–265)\n", "\n", "Compliance-Driven agents embed security and policy awareness directly into the development workflow. They operate through a **scan → evaluate → remediate** feedback loop, parallel to TDG's *generate → test → refine* cycle.\n", "\n", "### Why Compliance Needs Its Own Agent Class (pp. 255–256)\n", "\n", "Compliance validation operates on an entirely different knowledge domain from functional testing. TDG validates *functional correctness* — does the code produce expected output? Compliance validates adherence to *externally imposed constraints*: regulatory mandates (GDPR, PCI DSS, HIPAA), organizational security policies, and data governance rules. A function can pass every unit test while simultaneously violating a data retention policy. These are orthogonal concerns requiring distinct architectures, knowledge bases, and enforcement mechanisms.\n", "\n", "### Core Capabilities (pp. 256–258)\n", "\n", "This section demonstrates:\n", "- **Static Compliance Validation** — keyword-based policy rule evaluation (p. 257)\n", "- **Semantic Code Understanding** — LLM-powered contextual analysis (p. 257)\n", "- **Contextual Remediation** — automated fix generation, e.g., SHA-1 → SHA-256 (pp. 257–258)\n", "- **Data Flow Analysis** — PII/PHI variable tagging and tracing (p. 258)\n", "- **Audit Trail Generation** — immutable compliance logging (p. 258)\n", "- **PCI DSS Case Study** — full scan-evaluate-remediate pipeline (pp. 262–265)\n", "\n", "### Architectural Components (pp. 259–260)\n", "\n", "The compliance agent integrates: a **Policy Engine** (declarative Rego rules), **Code & Infrastructure Analyzers** (AST parsing, SAST scanning), a **Language Model Layer** (translating policy failures into developer-friendly guidance), **CI/CD Integration** (pipeline gates), and **HITL Overrides** for ambiguous cases.\n", "\n", "> **Note — Traditional vs. LLM-Augmented Compliance** (pp. 259–260): Tools like SonarQube or Checkmarx detect known vulnerability patterns through predefined signatures — fast and reliable for known threats but blind to novel or contextual violations. The LLM-augmented architecture adds *semantic reasoning*: understanding what code intends to do, evaluating whether that intent complies with policy, and detecting violations that emerge from interactions between individually compliant components.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "5700011f", "metadata": {}, "outputs": [], "source": [ "# Cell 20: PolicyEngine class — Rego-like rule evaluation\n", "# Ref: §9.3, \"Architectural Components — Policy Engine\" (p. 259)\n", "\n", "ColorLog.header(\"POLICY ENGINE — Rule Registry\")\n", "\n", "engine = PolicyEngine()\n", "engine.load_pci_dss()\n", "engine.load_hipaa()\n", "\n", "ColorLog.info(f\"Total rules loaded: {len(engine.rules)}\")\n", "print()\n", "for rule in engine.rules:\n", " ColorLog.info(f\" [{rule.severity}] {rule.rule_id}: {rule.description[:70]}...\")" ] }, { "cell_type": "code", "execution_count": null, "id": "81ec8b64", "metadata": {}, "outputs": [], "source": [ "# Cell 21: Static Compliance Validation — Detect card_number in logging\n", "# This is the chapter's primary example: a Code-Generation agent produces\n", "# payment code that logs the full card number — a PCI DSS violation.\n", "# Ref: §9.3, \"Static Compliance Validation\" (p. 257)\n", "\n", "ColorLog.header(\"STATIC COMPLIANCE VALIDATION\")\n", "\n", "# Code from the chapter's process_payment example\n", "vulnerable_code = '''\n", "def process_payment(card_number, cvv, amount):\n", " logger.info(f\"Processing payment for card {card_number}\") # Violation!\n", " result = payment_gateway.charge(card_number, cvv, amount)\n", " logger.info(f\"Payment result: {result}\")\n", " return result\n", "'''\n", "\n", "ColorLog.info(\"Scanning payment handler code...\")\n", "violations = engine.evaluate(vulnerable_code)\n", "\n", "print()\n", "for v in violations:\n", " ColorLog.error(f\" VIOLATION: {v['rule_id']} (line {v['line_number']})\")\n", " ColorLog.error(f\" Code: {v['line_content']}\")\n", " ColorLog.error(f\" Regulation: {v['regulation_ref']}\")\n", " ColorLog.info(f\" Hint: {v['remediation_hint']}\")\n", " print()" ] }, { "cell_type": "code", "execution_count": null, "id": "9504fe4b", "metadata": {}, "outputs": [], "source": [ "# Cell 22: Semantic Code Understanding — Anonymization function analysis\n", "# A function claims to anonymize data but retains identifiable fields.\n", "# Pattern matching misses this; semantic analysis catches it.\n", "# Ref: §9.3, \"Semantic Code Understanding\" (p. 257)\n", "\n", "ColorLog.header(\"SEMANTIC CODE UNDERSTANDING\")\n", "\n", "anonymization_code = 'def anonymize_user_data(record):\\n \\\"\\\"\\\"Anonymizes patient records by removing all identifying information.\\\"\\\"\\\"\\n del record[\"name\"]\\n # Fields RETAINED: email, phone_number, date_of_birth, ip_address\\n return record'\n", "\n", "ColorLog.info(\"Running semantic analysis on anonymize_user_data...\")\n", "scanner = ComplianceScanner(engine, llm)\n", "result = scanner.full_scan(anonymization_code)\n", "\n", "print()\n", "ColorLog.info(f\"Static violations: {result['total_violations']}\")\n", "if result['semantic']:\n", " ColorLog.info(\"Semantic Analysis Result:\")\n", " print()\n", " print(result['semantic'][:600])" ] }, { "cell_type": "code", "execution_count": null, "id": "1ecd55a5", "metadata": {}, "outputs": [], "source": [ "# Cell 23: Contextual Remediation — SHA-1 to SHA-256 patch\n", "# Ref: §9.3, \"Contextual Intervention and Remediation\" (pp. 257–258)\n", "\n", "ColorLog.header(\"CONTEXTUAL REMEDIATION\")\n", "\n", "# Detect SHA-1 violation\n", "sha1_code = \"hash_value = hashlib.sha1(data).hexdigest()\"\n", "sha_violations = engine.evaluate(sha1_code)\n", "\n", "if sha_violations:\n", " ColorLog.error(f\"Violation detected: {sha_violations[0]['rule_id']}\")\n", " \n", " # Generate remediation\n", " remediator = RemediationGenerator(llm)\n", " fix = remediator.generate(sha_violations[0])\n", " \n", " print()\n", " ColorLog.info(f\"Original: {fix['original']}\")\n", " ColorLog.success(f\"Patched: {fix['patched']}\")\n", " ColorLog.info(f\"Auto-fix: {fix['auto_fixable']}\")\n", " ColorLog.info(f\"Rationale: {fix['explanation'][:100]}...\")\n", "else:\n", " ColorLog.success(\"No SHA-1 violations found.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "cc205ba5", "metadata": {}, "outputs": [], "source": [ "# Cell 24: Data Flow Analysis — PII tagging and tracing\n", "# Ref: §9.3, \"Data Flow Analysis\" (p. 258)\n", "\n", "ColorLog.header(\"DATA FLOW ANALYSIS\")\n", "\n", "data_flow_code = '''\n", "def handle_checkout(card_number, email, patient_record):\n", " logger.info(f\"Checkout for {email}, card: {card_number}\")\n", " analytics.track(email, event=\"checkout\")\n", " send_to_warehouse(patient_record)\n", " print(f\"Order confirmed for {email}\")\n", "'''\n", "\n", "analyzer = DataFlowAnalyzer()\n", "flows = analyzer.analyze(data_flow_code)\n", "\n", "print()\n", "for flow in flows:\n", " ColorLog.error(\n", " f\" [{flow['data_class']}] {flow['variable']} → \"\n", " f\"{flow['sink']} (line {flow['line_number']})\"\n", " )\n", " ColorLog.info(f\" {flow['recommendation']}\")\n", " print()" ] }, { "cell_type": "markdown", "id": "a2703825", "metadata": {}, "source": [ "> **Info Box — Deployment Across Regulated Industries** (pp. 260–262):\n", "> - **Healthcare (HIPAA):** Compliance agents scan for PHI in logs, error messages, and API responses. Detection combines pattern-matching for structured identifiers (medical record numbers, SSNs, dates of birth) with LLM-based semantic classification for unstructured data.\n", "> - **Fintech (PCI DSS / SOC 2):** Agents prevent deployments sending financial data to non-compliant regions, block insecure RNG for cryptographic operations, and enforce key rotation.\n", "> - **SaaS (ISO 27001):** Agents validate RBAC logic, verify permission checks at every entry point, and ensure audit logging captures all administrative actions." ] }, { "cell_type": "code", "execution_count": null, "id": "813e647b", "metadata": {}, "outputs": [], "source": [ "# Cell 25: PCI DSS Case Study — Full scan-evaluate-remediate pipeline\n", "# Simulates the fintech CI/CD integration from the chapter case study.\n", "# Ref: §9.3, \"Practical Implementation: Enforcing PCI DSS\" (pp. 262–265)\n", "\n", "ColorLog.header(\"PCI DSS CASE STUDY — Full Pipeline\")\n", "\n", "# Simulated pull request code diff\n", "pr_code = '''\n", "import hashlib\n", "\n", "def process_payment(card_number, cvv, amount):\n", " # Log transaction\n", " logger.info(f\"Processing card {card_number} for ${amount}\")\n", " \n", " # Hash for storage\n", " card_hash = hashlib.sha1(card_number.encode()).hexdigest()\n", " \n", " # Transmit to payment gateway\n", " response = requests.post(\n", " \"http://payments.internal/charge\",\n", " json={\"card\": card_number, \"amount\": amount}\n", " )\n", " return response.json()\n", "'''\n", "\n", "ColorLog.info(\"Step 1: SCAN — Evaluating PR code against PCI DSS policies...\")\n", "pr_violations = engine.evaluate(pr_code)\n", "\n", "print()\n", "ColorLog.info(f\"Step 2: EVALUATE — {len(pr_violations)} violation(s) found:\")\n", "for v in pr_violations:\n", " ColorLog.error(f\" [{v['severity']}] {v['rule_id']}: line {v['line_number']}\")\n", "\n", "print()\n", "ColorLog.info(\"Step 3: REMEDIATE — Generating fixes...\")\n", "remediator = RemediationGenerator(llm)\n", "for v in pr_violations:\n", " fix = remediator.generate(v)\n", " if fix.get('auto_fixable'):\n", " ColorLog.success(f\" {v['rule_id']}: Auto-fix → {fix['patched'][:60]}...\")\n", " else:\n", " ColorLog.warning(f\" {v['rule_id']}: Manual review needed\")" ] }, { "cell_type": "code", "execution_count": null, "id": "67e01c05", "metadata": {}, "outputs": [], "source": [ "# Cell 26: Audit Trail Generation — Immutable compliance log\n", "# Ref: §9.3, \"Audit Trail Generation\" (p. 258)\n", "\n", "ColorLog.header(\"AUDIT TRAIL\")\n", "\n", "audit = AuditTrail()\n", "\n", "# Record the scan\n", "audit.append(\"scan\", {\n", " \"files_scanned\": 1,\n", " \"rules_evaluated\": len(engine.rules),\n", " \"trigger\": \"pull_request\",\n", "})\n", "\n", "# Record violations\n", "for v in pr_violations:\n", " audit.append(\"violation\", {\n", " \"rule_id\": v[\"rule_id\"],\n", " \"severity\": v[\"severity\"],\n", " \"line\": v[\"line_number\"],\n", " })\n", "\n", "# Record remediations\n", "audit.append(\"remediation\", {\n", " \"violations_addressed\": len(pr_violations),\n", " \"auto_fixed\": sum(1 for v in pr_violations if v[\"rule_id\"] in [\"PCI-DSS-3.4\", \"PCI-DSS-4.1\"]),\n", " \"manual_review\": sum(1 for v in pr_violations if v[\"rule_id\"] == \"PCI-DSS-3.3\"),\n", "})\n", "\n", "# Display as JSON\n", "print()\n", "ColorLog.info(f\"Audit trail: {audit.count} entries\")\n", "print(audit.to_json())" ] }, { "cell_type": "code", "execution_count": null, "id": "821fe981", "metadata": {}, "outputs": [], "source": [ "# Cell 27: Measured Outcomes — Compliance section summary\n", "# Ref: §9.3, \"Measured Outcomes\" (pp. 264–265)\n", "\n", "ColorLog.header(\"§9.3 — MEASURED OUTCOMES\")\n", "\n", "ColorLog.success(\"Compliance Pipeline Results:\")\n", "ColorLog.info(f\" Rules evaluated: {len(engine.rules)}\")\n", "ColorLog.info(f\" Violations detected: {len(pr_violations)}\")\n", "ColorLog.info(f\" Auto-fixable: {sum(1 for v in pr_violations if v['rule_id'] in ['PCI-DSS-3.4', 'PCI-DSS-4.1'])}\")\n", "ColorLog.info(f\" Audit entries: {audit.count}\")\n", "print()\n", "ColorLog.success(\n", " \"The scan-evaluate-remediate loop detected 3 PCI DSS violations \"\n", " \"across card logging, weak cryptography, and unencrypted transmission.\"\n", ")\n", "ColorLog.info(\n", " \"Chapter reference: Pre-deployment compliance violations dropped \"\n", " \"by 85% within six months of agent deployment.\"\n", ")" ] }, { "cell_type": "markdown", "id": "63afc943", "metadata": {}, "source": [ "---\n", "\n", "## §9.4 — The Self-Improving Agent (pp. 265–278)\n", "\n", "Self-Improving agents execute a closed-loop **execute → observe → learn → adapt** control system (Figure 9.5, p. 267). Unlike the task-scoped loops in §9.2 and §9.3, this loop operates **continuously** across all agent types, transforming static automation into living systems that refine themselves in response to data, feedback, and operational metrics.\n", "\n", "### Closed-Loop Control System (Figure 9.5, pp. 266–268)\n", "\n", "Components:\n", "- **Sensing Layer** — collects explicit feedback (ratings, corrections), implicit feedback (session duration, abandonment rates), and synthetic feedback (consistency checks, benchmark comparisons)\n", "- **Coder Agent** — produces solutions using current strategies, prompts, and tool invocation patterns\n", "- **Critic Agent** — evaluates output against KPIs: Task Completion Rate, Error Recovery Ratio, Latency Distribution, User Satisfaction Index, and Improvement Velocity\n", "- **Planner Agent** — analyzes deviations from expected performance and generates structured `ImprovementHypothesis` objects with confidence scores and evidence counts\n", "- **HITL Checkpoint** — human approval/rejection gate for significant changes (modifying core prompts, adjusting thresholds, changing model parameters)\n", "- **Learning Layer** — applies versioned adaptations with rollback support (prompt updates, retrieval strategy changes, threshold adjustments)\n", "- **Deploy & Test** — validates changes against held-out evaluation sets before production deployment\n", "\n", "> **Note — Over-Personalization Risk** (p. 277): An agent that optimizes too aggressively for individual user patterns may narrow the range of solutions it considers. Mitigations include diversity constraints, held-out evaluation sets from underrepresented patterns, and periodic human review of adaptation trajectories.\n" ] }, { "cell_type": "markdown", "id": "1156e55e", "metadata": {}, "source": [ "### Figure 9.5 — Self-Improvement Loop (p. 267)\n", "\n", "```\n", " ┌──────────────────┐\n", " │ Task Execution │◀─────────────────────────────────────────┐\n", " └────────┬─────────┘ │\n", " ▼ │\n", " ┌──────────────────┐ │\n", " │ Sensing Layer │ Collect Feedback │\n", " │ Explicit · │ (ratings, telemetry, │\n", " │ Implicit · │ benchmarks) │ Feedback\n", " │ Synthetic │ │ Loop\n", " └────────┬─────────┘ │\n", " ▼ │\n", " ┌──────────────────┐ │\n", " │ Coder Agent │ Produces Solution │\n", " └────────┬─────────┘ │\n", " ▼ │\n", " ┌──────────────────┐ │\n", " │ Critic Agent │ Reviews & Evaluates │\n", " │ Against KPIs │ │\n", " └────────┬─────────┘ │\n", " ▼ │\n", " ┌──────────────────┐ ┌──────────────────┐ │\n", " │ Planner Agent │───▶│ Human │ │\n", " │ Analyzes │ │ Validation │ │\n", " │ Deviations │ │ (HITL) │ │\n", " └────────┬─────────┘ └────────┬─────────┘ │\n", " │ Bypass │ │\n", " ▼ ▼ │\n", " ┌──────────────────┐ │\n", " │ Learning Layer │ Update Prompts / Adjust Parameters │\n", " └────────┬─────────┘ │\n", " ▼ │\n", " ┌──────────────────┐ │\n", " │ Deploy & Test │ Refined Behavior │\n", " └────────┬─────────┘ │\n", " └───────────────────────────────────────────────────┘\n", "```\n" ] }, { "cell_type": "code", "execution_count": null, "id": "eb63f5ca", "metadata": {}, "outputs": [], "source": [ "# Cell 29: SensingLayer — Multi-modal feedback collection\n", "# Ref: §9.4, \"The Sensing Layer\" and \"Multi-Modal Feedback Collection\" (pp. 266–267, 272)\n", "\n", "ColorLog.header(\"SENSING LAYER — Feedback Collection\")\n", "\n", "sensing = SensingLayer()\n", "\n", "# Explicit feedback — direct human signals\n", "sensing.collect_explicit(\n", " \"User rated 3/5 — 'Agent didn't understand my question'\",\n", " {\"rating\": 3, \"category\": \"comprehension\"},\n", ")\n", "sensing.collect_explicit(\n", " \"User rated 5/5 — 'Quick and accurate response'\",\n", " {\"rating\": 5, \"category\": \"satisfaction\"},\n", ")\n", "sensing.collect_explicit(\n", " \"Developer rejected async code — missing error handling\",\n", " {\"action\": \"rejection\", \"pattern\": \"async\"},\n", ")\n", "\n", "# Implicit feedback — behavioral indicators\n", "sensing.collect_implicit(\n", " \"Average 4.2 turns per resolution; 23% rephrased questions\",\n", " {\"avg_turns\": 4.2, \"rephrase_rate\": 0.23},\n", ")\n", "sensing.collect_implicit(\n", " \"Escalation rate: 45% for policy-related queries (target: 20%)\",\n", " {\"escalation_rate\": 0.45, \"target\": 0.20},\n", ")\n", "\n", "# Synthetic feedback — automated benchmarks\n", "sensing.collect_synthetic(\n", " \"Benchmark: 91% functional correctness, 78% code quality, 72% maintainability\",\n", " {\"correctness\": 0.91, \"quality\": 0.78, \"maintainability\": 0.72},\n", ")\n", "\n", "print()\n", "summary = sensing.get_summary()\n", "ColorLog.success(f\"Total records: {summary['total_records']}\")\n", "ColorLog.info(f\"By type: {summary['by_type']}\")" ] }, { "cell_type": "code", "execution_count": null, "id": "da022bb8", "metadata": {}, "outputs": [], "source": [ "# Cell 30: CriticAgent — KPI evaluation against thresholds\n", "# Ref: §9.4, \"The Critic Agent evaluates against certain KPIs\" (p. 268)\n", "\n", "ColorLog.header(\"CRITIC AGENT — KPI Evaluation\")\n", "\n", "critic = CriticAgent(llm=llm)\n", "\n", "# Observed metrics (from chapter's customer support case study)\n", "observed_metrics = {\n", " \"task_completion_rate\": 0.74, # target: 0.80 — BELOW\n", " \"error_recovery_ratio\": 0.89, # target: 0.85 — ABOVE\n", " \"latency_p95\": 2.3, # target: 3.0s — OK\n", " \"user_satisfaction_index\": 3.8, # target: 4.0 — BELOW\n", " \"improvement_velocity\": 0.12, # target: 0.10 — ABOVE\n", "}\n", "\n", "evaluation = critic.evaluate(observed_metrics)\n", "\n", "print()\n", "ColorLog.info(\"KPI Scorecard:\")\n", "for metric, data in evaluation[\"scores\"].items():\n", " if isinstance(data, dict) and \"observed\" in data:\n", " status_color = ColorLog.success if data[\"status\"] in [\"ABOVE TARGET\", \"WITHIN RANGE\"] else ColorLog.error\n", " status_color(\n", " f\" {metric}: {data['observed']} \"\n", " f\"(target: {data['threshold']}) — {data['status']}\"\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "51dbe98f", "metadata": {}, "outputs": [], "source": [ "# Cell 31: PlannerAgent — Generate ImprovementHypothesis objects\n", "# Ref: §9.4, \"Learning Mechanisms and Feedback Translation\" (pp. 269–271)\n", "\n", "ColorLog.header(\"PLANNER AGENT — Hypothesis Generation\")\n", "\n", "planner = PlannerAgent(llm=llm)\n", "planner_output = planner.generate_hypotheses(evaluation, sensing.get_summary())\n", "\n", "print()\n", "ColorLog.info(f\"Hypotheses generated: {len(planner_output.hypotheses)}\")\n", "ColorLog.info(f\"Requires human review: {planner_output.requires_human_review}\")\n", "print()\n", "\n", "for i, h in enumerate(planner_output.hypotheses, 1):\n", " ColorLog.info(f\"Hypothesis {i}:\")\n", " ColorLog.info(f\" Source signal: {h.source_signal}\")\n", " ColorLog.info(f\" Adaptation type: {h.adaptation_type.value}\")\n", " ColorLog.info(f\" Proposed change: {h.proposed_change}\")\n", " ColorLog.info(f\" Confidence: {h.confidence}\")\n", " ColorLog.info(f\" Evidence count: {h.evidence_count}\")\n", " ColorLog.info(f\" Rollback safe: {h.rollback_safe}\")\n", " print()" ] }, { "cell_type": "code", "execution_count": null, "id": "7e357166", "metadata": {}, "outputs": [], "source": [ "# Cell 32: Structured Improvement Record display (Figure 9.6 equivalent)\n", "# Ref: §9.4, \"Structured improvement record generated by the planner agent\" (p. 274, Figure 9.6)\n", "\n", "ColorLog.header(\"STRUCTURED IMPROVEMENT RECORD\")\n", "\n", "# Display as formatted JSON (Figure 9.6)\n", "record = planner_output.model_dump()\n", "record_json = json.dumps(record, indent=2, default=str)\n", "print(record_json)" ] }, { "cell_type": "code", "execution_count": null, "id": "3c5edffd", "metadata": {}, "outputs": [], "source": [ "# Cell 33: HITL Checkpoint — Approval/rejection flow\n", "# Ref: §9.4, \"HITL Checkpoint\" in Figure 9.5 (pp. 267–268)\n", "\n", "ColorLog.header(\"HITL CHECKPOINT\")\n", "\n", "# Demonstrate approval flow\n", "checkpoint_approve = HITLCheckpoint(auto_approve=True)\n", "decisions = checkpoint_approve.review(planner_output)\n", "\n", "print()\n", "approved_count = sum(1 for d in decisions if d[\"approved\"])\n", "ColorLog.info(f\"Result: {approved_count}/{len(decisions)} hypotheses approved.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "15f8bd64", "metadata": {}, "outputs": [], "source": [ "# Cell 34: Learning Layer — Apply approved adaptations\n", "# Ref: §9.4, \"The Learning Layer updates internal parameters\" (p. 268)\n", "\n", "ColorLog.header(\"LEARNING LAYER — Apply Adaptations\")\n", "\n", "learning = LearningLayer()\n", "adaptation_result = learning.apply(decisions)\n", "\n", "print()\n", "ColorLog.success(f\"Adaptations applied: {adaptation_result['adaptations_applied']}\")\n", "ColorLog.success(f\"Current version: v{adaptation_result['current_version']}\")\n", "ColorLog.info(f\"Total history: {adaptation_result['total_adaptations']} adaptation(s)\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b3476ed5", "metadata": {}, "outputs": [], "source": [ "# Cell 35: Customer Support Agent Case Study\n", "# Runs the complete self-improvement loop end-to-end.\n", "# Ref: §9.4, \"Practical Implementation: Adaptive Customer Support Agent\" (pp. 272–276)\n", "\n", "ColorLog.header(\"CASE STUDY: Adaptive Customer Support Agent\")\n", "\n", "ColorLog.info(\"Running complete self-improvement cycle...\")\n", "ColorLog.info(\"This simulates two quarters of continuous operation.\")\n", "print()\n", "\n", "loop_result = run_self_improvement_loop(\n", " llm=llm,\n", " auto_approve=True,\n", ")\n", "\n", "print()\n", "ColorLog.success(\"Self-improvement cycle complete.\")\n", "ColorLog.info(\n", " f\"Sensing: {loop_result['sensing_summary']['total_records']} feedback records\"\n", ")\n", "ColorLog.info(\n", " f\"Critic: {loop_result['critic_evaluation']['overall_health']}\"\n", ")\n", "ColorLog.info(\n", " f\"Planner: {len(loop_result['planner_output']['hypotheses'])} hypotheses\"\n", ")\n", "hitl_approved = sum(1 for d in loop_result['hitl_decisions'] if d['approved'])\n", "ColorLog.info(\n", " f\"HITL: {hitl_approved}/{len(loop_result['hitl_decisions'])} approved\"\n", ")\n", "if loop_result.get('adaptation_summary'):\n", " ColorLog.info(\n", " f\"Learning: {loop_result['adaptation_summary']['adaptations_applied']} applied\"\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "726038a4", "metadata": {}, "outputs": [], "source": [ "# Cell 36: Governance Safeguards — Bias monitoring and rollback\n", "# Ref: §9.4, \"Governance and Ethical Safeguards\" (pp. 276–277)\n", "\n", "ColorLog.header(\"GOVERNANCE — Rollback Demonstration\")\n", "\n", "ColorLog.info(\"Scenario: An adaptation caused over-promising resolution timeframes.\")\n", "ColorLog.info(f\"Current version: v{learning._version}\")\n", "ColorLog.info(f\"Total adaptations: {len(learning.get_history())}\")\n", "\n", "# Demonstrate rollback\n", "if learning._version > 1:\n", " rollback_result = learning.rollback(to_version=1)\n", " print()\n", " ColorLog.warning(f\"Rolled back to v{rollback_result['rolled_back_to']}\")\n", " ColorLog.info(f\"Removed {rollback_result['removed_count']} adaptation(s)\")\n", " ColorLog.info(\"Evaluation tests added to prevent similar regressions.\")\n", "else:\n", " ColorLog.info(\"Only v1 — demonstrating rollback concept.\")\n", " ColorLog.info(\"In production: every version is tagged with training data, \")\n", " ColorLog.info(\"evaluation metrics, and deployment timestamp.\")" ] }, { "cell_type": "code", "execution_count": null, "id": "b8d8e637", "metadata": {}, "outputs": [], "source": [ "# Cell 37: Measured improvement trajectory\n", "# Ref: §9.4, \"Measured Improvement Trajectory\" (pp. 275–276)\n", "\n", "ColorLog.header(\"§9.4 — MEASURED IMPROVEMENT TRAJECTORY\")\n", "\n", "ColorLog.success(\"Customer Support Agent Results (2 quarters):\")\n", "ColorLog.info(\" First-contact resolution: +18% improvement\")\n", "ColorLog.info(\" Avg ticket resolution time: -23% reduction\")\n", "ColorLog.info(\" Customer satisfaction: 3.8 → 4.2 (+0.4)\")\n", "ColorLog.info(\" Improvement velocity: Stable (no plateau)\")\n", "print()\n", "ColorLog.success(\n", " \"The execute-observe-learn-adapt loop achieved a virtuous cycle: \"\n", " \"every interaction contributed to future competence.\"\n", ")" ] }, { "cell_type": "markdown", "id": "c1ee439e", "metadata": {}, "source": [ "---\n", "\n", "## Summary — Three Feedback Loops Compared (pp. 277–278)\n", "\n", "Each agent class addresses a distinct dimension of autonomous software engineering through a characteristic feedback loop:\n", "\n", "| Dimension | Agent Class | Loop Pattern | Key Components | Chapter Section | Pages |\n", "|:---|:---|:---|:---|:---:|:---:|\n", "| Functional Correctness | Code-Generation | generate → test → refine | LangGraph StateGraph, MockTestRunner, conditional edges | §9.2 | 239–255 |\n", "| Normative Compliance | Compliance-Driven | scan → evaluate → remediate | PolicyEngine, ComplianceScanner, AuditTrail | §9.3 | 255–265 |\n", "| Continuous Improvement | Self-Improving | execute → observe → learn → adapt | SensingLayer, CriticAgent, PlannerAgent, LearningLayer | §9.4 | 265–278 |\n", "\n", "The unifying principle: **structured feedback loops** combined with **multi-agent orchestration** and **human-in-the-loop checkpoints** enable progressive autonomy while maintaining governance and traceability.\n", "\n", "> The patterns established here — stateful workflow graphs, iterative refinement through concrete validation signals, and layered agent specialization — provide the building blocks for the more complex multi-domain agent systems explored in subsequent chapters.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "aea4345e", "metadata": {}, "outputs": [], "source": [ "# Cell 39: Complete execution metrics\n", "# Ref: Chapter Summary (pp. 277–278)\n", "\n", "ColorLog.header(\"EXECUTION METRICS — All Sections\")\n", "\n", "print(\"Section | Key Metric | Value\")\n", "print(\"-------------------------|-------------------------------|--------\")\n", "print(\"§9.2 Code-Generation | TDG shipping iterations | 2 runs\")\n", "print(\"§9.2 Code-Generation | Full-stack tasks completed | 3/3\")\n", "\n", "violation_count = len(pr_violations) if 'pr_violations' in dir() else 3\n", "print(f\"§9.3 Compliance | PCI DSS violations detected | {violation_count}\")\n", "print(f\"§9.3 Compliance | Audit trail entries | {audit.count if 'audit' in dir() else 'N/A'}\")\n", "print(\"§9.4 Self-Improving | KPIs below target | 2/5\")\n", "print(\"§9.4 Self-Improving | Hypotheses generated | 3\")\n", "print(\"§9.4 Self-Improving | Adaptations applied | 3\")\n", "\n", "print()\n", "ColorLog.success(\"All sections executed successfully in Simulation Mode.\")\n", "ColorLog.info(f\"LLM: {type(llm).__name__}\")" ] }, { "cell_type": "markdown", "id": "6d26651c", "metadata": {}, "source": [ "---\n", "\n", "## Further Reading\n", "\n", "- **Chapter 5** — Foundational Cognitive Architectures (core patterns referenced throughout)\n", "- **Chapter 7** — Tool-Use Frameworks (agent tool integration)\n", "- **Chapter 10** — Conversational and Content Creation Agents (next chapter)\n", "\n", "**Key Figures in this Chapter:**\n", "- Figure 9.1 — Agent ecosystem and tooling layers (p. 237)\n", "- Figure 9.2 — Adoption maturity curve (p. 239)\n", "- Figure 9.3 — Three phases of the TDD agent workflow (p. 240)\n", "- Figure 9.4 — LangGraph architecture: generate, test, refine loop (p. 243)\n", "- Figure 9.5 — Self-improvement loop (p. 267)\n", "- Figure 9.6 — Structured improvement record (p. 274)\n", "\n", "**Get the full book:** Scan the QR code at [packtpub.com/unlock](https://packtpub.com/unlock) or search for *Agents* by Imran Ahmad.\n", "\n", "---\n", "\n", "*Companion code for Chapter 9 (pp. 235–279) of \"Agents\" by Imran Ahmad (Packt, 2026)*\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.10.0" } }, "nbformat": 4, "nbformat_minor": 5 }