{ "cells": [ { "cell_type": "markdown", "id": "f4293daa", "metadata": {}, "source": [ "# Chapter 12 — Ethical Reasoning Agent\n", "\n", "**Book:** *30 Agents Every AI Engineer Must Build*\n", "**Author:** Imran Ahmad | **Publisher:** Packt Publishing, 2026\n", "**Notebook:** 01 of 02 — The Ethical Reasoning Agent (pp. 331–346)\n", "\n", "---\n", "\n", "> *\"Technology is neither good nor bad; nor is it neutral.\"* — Melvin Kranzberg\n", "\n", "## Chapter Context\n", "\n", "Autonomous agents are no longer research curiosities. They screen resumes, recommend treatments, and allocate scarce resources. This shift raises a question every engineering team must answer before shipping: how do we make sure these systems act in accordance with human values, and how do we make their reasoning visible to the people they affect?\n", "\n", "This notebook builds the **Ethical Reasoning Agent** — an architecture that integrates value alignment, ethical decision-making, and bias mitigation directly into the agent's reasoning pipeline. Every candidate action is evaluated not only for task effectiveness but also for alignment with a defined set of ethical principles.\n", "\n", "### What This Notebook Covers\n", "\n", "1. **Value Alignment Frameworks** — Deontic logic operators and axioms (pp. 332–333)\n", "2. **Ethical Consistency Theorem** — Formal permissibility criterion (p. 333)\n", "3. **EthicalReasoningAgent** — IEEE-aligned modular validators with mitigation pathway (pp. 334–335)\n", "4. **EUCompliantAgent** — EU AI Act seven-requirement compliance control plane (p. 336)\n", "5. **Impossibility Theorem** — Why perfect fairness across multiple metrics is mathematically impossible (pp. 337–338)\n", "6. **Bias Detection & Monitoring** — Demographic parity, equal opportunity, disparate impact with continuous monitoring (pp. 339–343)\n", "7. **FairHiringAgent Case Study** — Three-layer fairness architecture: anonymization → bias detection → enforcement (pp. 343–346)\n", "\n", "### Key Architectural Insight\n", "\n", "The Ethical Reasoning agent extends the cognitive loop (Chapter 1) with a dedicated **ethical evaluation layer** between the reasoning and action phases. If a proposed action violates any constraint, the agent either modifies the action, selects an alternative, or escalates to a human operator. This is compliance-by-architecture — the agent cannot bypass the ethical checkpoint.\n", "\n", "**Figures:** 12.1 (Extended Cognitive Loop with Value Alignment, p. 331)\n", "**Tables:** 12.1 (Fairness Regime Selection Criteria, p. 339)" ] }, { "cell_type": "code", "execution_count": null, "id": "31adb29c", "metadata": {}, "outputs": [], "source": [ "# Cell 2 — Setup: Imports, sys.path, and mode detection\n", "# Ref: Technical Requirements (p.330)\n", "\n", "import sys\n", "import os\n", "\n", "# Ensure project root is on the path\n", "project_root = os.path.abspath(os.path.join(os.getcwd(), \"..\"))\n", "if project_root not in sys.path:\n", " sys.path.insert(0, project_root)\n", "\n", "# Core utilities\n", "from chapter12.utils import ColorLogger, graceful_fallback, resolve_api_key, get_mode, is_simulation\n", "from chapter12.mock_llm import MockLLM, strip_meta\n", "from chapter12.synthetic_data import (\n", " generate_hr_dataset, summarize_hr_dataset,\n", ")\n", "\n", "# Ethical core\n", "from chapter12.ethical_core import (\n", " DeonticOperator,\n", " EthicalReasoningAgent,\n", " EUCompliantAgent,\n", " BiasDetector,\n", " BiasMonitoringPipeline,\n", " FairHiringAgent,\n", " FairnessEnforcer,\n", " SlidingWindow,\n", " MockMetricsBackend,\n", " MockAlertConfig,\n", ")\n", "\n", "import numpy as np\n", "\n", "# Visualization\n", "import matplotlib\n", "matplotlib.use(\"Agg\") # Non-interactive backend for compatibility\n", "import matplotlib.pyplot as plt\n", "import matplotlib.patches as mpatches\n", "%matplotlib inline\n", "\n", "# Initialize mode\n", "logger = ColorLogger(\"Notebook01\")\n", "api_key = resolve_api_key()\n", "mode = get_mode()\n", "logger.info(f\"Operating mode: {mode.upper()}\")\n" ] }, { "cell_type": "markdown", "id": "241f2fcd", "metadata": {}, "source": [ "## 1. Value Alignment Frameworks (pp. 332–333)\n", "\n", "Deontic logic formalizes obligation, permission, and prohibition as modal operators that assign a precise moral status to every candidate action. Three operators govern the ethical constraint evaluation:\n", "\n", "- **O(φ)** — Obligatory: the agent *must* perform φ\n", "- **P(φ)** — Permitted: the agent *may* perform φ\n", "- **F(φ)** — Forbidden: the agent *must not* perform φ\n", "\n", "Three axioms maintain logical consistency (p. 332):\n", "\n", "| Axiom | Formula | Meaning |\n", "|-------|---------|---------|\n", "| **Axiom 1** | O(φ) ⟺ F(¬φ) | Obligation equals prohibition of omission |\n", "| **Axiom 2** | P(φ) ⟺ ¬F(φ) | Permission equals absence of prohibition |\n", "| **Axiom 3** | O(φ→ψ) → (O(φ)→O(ψ)) | Obligation distributes over implication |\n", "\n", "**Axiom 3** is critical for multi-step reasoning: if an agent is obligated to treat a patient, and treating a patient implies maintaining a sterile environment, the agent becomes obligated to maintain that environment.\n", "\n", "**Figure 12.1** (p. 331) illustrates the extended cognitive loop where the ethical evaluation layer intercepts candidate actions before execution:\n", "\n", "```\n", " ┌──────────────┐ ┌──────────────┐ ┌──────────────┐\n", " │ Perception │───→│ Reasoning │───→│ Ethical │\n", " │ Module │ │ │ │ Evaluation │\n", " └──────────────┘ └──────────────┘ └───────┬───────┘\n", " │\n", " ┌────────┴────────┐\n", " │ │\n", " PASS FAIL\n", " │ │\n", " ▼ ▼\n", " ┌──────────┐ ┌──────────────┐\n", " │ Action │ │ Mitigation │\n", " │ Dispatch │ │ / Escalation│\n", " └──────────┘ └──────────────┘\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "f0eec3fb", "metadata": {}, "outputs": [], "source": [ "# Cell 4 — Deontic logic operators and axiom verification\n", "# Ref: Value Alignment Frameworks (p.332–333)\n", "\n", "deon = DeonticOperator()\n", "\n", "# Define ethical rules from the smart city agent example (p.332)\n", "deon.add_obligation(\"prioritize_emergency_vehicle\")\n", "deon.add_prohibition(\"disable_signals_school_zone\")\n", "deon.add_prohibition(\"share_medical_details\")\n", "\n", "# Add an implication: accessing patient records → must write audit log (p.332)\n", "deon.add_implication(\"access_patient_records\", \"write_audit_log\")\n", "deon.add_obligation(\"access_patient_records\")\n", "deon.propagate_obligations()\n", "\n", "# Verify the three operators\n", "logger.info(\"--- Operator Verification ---\")\n", "actions_to_check = [\n", " (\"prioritize_emergency_vehicle\", \"O(φ) — Obligatory\"),\n", " (\"adjust_crosswalk_timing\", \"P(φ) — Permitted\"),\n", " (\"disable_signals_school_zone\", \"F(φ) — Forbidden\"),\n", "]\n", "\n", "for action, label in actions_to_check:\n", " oblig = deon.is_obligatory(action)\n", " perm = deon.is_permitted(action)\n", " forb = deon.is_forbidden(action)\n", " logger.info(f\" {label}: obligatory={oblig}, permitted={perm}, forbidden={forb}\")\n", "\n", "# Verify the three axioms\n", "logger.info(\"--- Axiom Verification ---\")\n", "\n", "# Axiom 1: O(φ) ⇔ F(¬φ)\n", "assert deon.is_forbidden(\"omit_prioritize_emergency_vehicle\"), \"Axiom 1 failed\"\n", "logger.success(\"Axiom 1: O(prioritize_emergency) ⇔ F(omit_prioritize_emergency) ✓\")\n", "\n", "# Axiom 2: P(φ) ⇔ ¬F(φ)\n", "assert deon.is_permitted(\"adjust_crosswalk_timing\"), \"Axiom 2 failed\"\n", "assert not deon.is_permitted(\"disable_signals_school_zone\"), \"Axiom 2 failed\"\n", "logger.success(\"Axiom 2: P(adjust_crosswalk) ⇔ ¬F(adjust_crosswalk) ✓\")\n", "\n", "# Axiom 3: O(φ→ψ) → (O(φ) → O(ψ))\n", "assert deon.is_obligatory(\"write_audit_log\"), \"Axiom 3 failed\"\n", "logger.success(\"Axiom 3: O(access_records→write_log) propagated → O(write_log) ✓\")\n" ] }, { "cell_type": "markdown", "id": "01c94169", "metadata": {}, "source": [ "## 2. Ethical Consistency Theorem (p.333)\n", "\n", "The Ethical Consistency Theorem provides a computationally verifiable criterion for ethical permissibility:\n", "\n", "> **∀a ∈ A: Consistent(E ∪ {a}) → P(a)**\n", "\n", "For every candidate action *a*, the system checks whether adding *a* to the current ethical rule set *E* creates any contradiction. If the combined set remains consistent, the action is permitted.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "205f5a0b", "metadata": {}, "outputs": [], "source": [ "# Cell 6 — Ethical Consistency Theorem worked examples\n", "# Ref: Ethical Consistency Theorem (p.333)\n", "\n", "logger.info(\"--- Ethical Consistency Theorem Demo ---\")\n", "\n", "# Policy set E from chapter example (p.333):\n", "# F(share_medical_details_externally)\n", "# O(minimize_data)\n", "rule_set = {\"share_medical_details\", \"minimize_data\"}\n", "\n", "# Candidate a₁: \"Send general explanation without identifiers\"\n", "r1 = deon.check_consistency(rule_set, \"send_general_explanation\")\n", "logger.info(f\"a₁ (general explanation): consistent={r1['is_consistent']}, permitted={r1['is_permitted']}\")\n", "\n", "# Candidate a₂: \"Email full diagnosis to employer\"\n", "r2 = deon.check_consistency(rule_set, \"share_medical_details\")\n", "logger.info(f\"a₂ (share medical details): consistent={r2['is_consistent']}, permitted={r2['is_permitted']}\")\n", "\n", "if r2[\"conflicting_rules\"]:\n", " logger.error(f\" Blocked by: {r2['conflicting_rules']}\")\n", "\n", "logger.success(\"Ethical Consistency Theorem demonstration complete.\")\n" ] }, { "cell_type": "markdown", "id": "5cfc4d5b", "metadata": {}, "source": [ "## 3. EthicalReasoningAgent (pp. 334–335)\n", "\n", "The `EthicalReasoningAgent` implements the *extended cognitive loop* with an ethical checkpoint between the reasoning and action phases. It uses five modular validators aligned with the **IEEE Ethically Aligned Design** framework:\n", "\n", "- **HumanRightsChecker** — privacy, consent, anti-discrimination\n", "- **WellBeingAnalyzer** — safety thresholds, emergency handling\n", "- **AccountabilityTracker** — audit logging, oversight\n", "- **TransparencyManager** — decision rationale, explainability\n", "- **MisuseDetector** — prompt injection, adversarial input\n", "\n", "Each `checker.validate()` call is a rules-engine lookup, not an LLM call. This keeps validation **deterministic and auditable**.\n", "\n", "> 📌 **Production Note (p. 335):** Binding the ethical gate to the agent's **tool interface** — not just its final responses — is the most critical extension. Before the agent dispatches any tool call, it passes the call's name, arguments, and runtime context through `evaluate_action()`. If the call fails validation, the ethical gate blocks the dispatch, logs the violation, and routes control to the mitigation pathway before any external system is contacted." ] }, { "cell_type": "code", "execution_count": null, "id": "a13e2ea4", "metadata": {}, "outputs": [], "source": [ "# Cell 8 — EthicalReasoningAgent with modular validators\n", "# Ref: EthicalReasoningAgent (p.334–335)\n", "\n", "agent = EthicalReasoningAgent()\n", "\n", "# Test with a compliant action\n", "logger.info(\"--- Testing Compliant Action ---\")\n", "result_ok = agent.evaluate_action(\n", " \"send_general_explanation_to_patient\",\n", " context={\"domain\": \"healthcare\", \"recipient\": \"patient\"}\n", ")\n", "logger.info(f\" Compliant: {result_ok['is_compliant']}, Severity: {result_ok['severity']}\")\n", "\n", "# Test with a violating action (human_rights)\n", "logger.info(\"--- Testing Violating Action ---\")\n", "result_bad = agent.evaluate_action(\n", " \"share_medical_details with external employer\",\n", " context={\"domain\": \"healthcare\", \"recipient\": \"employer\"}\n", ")\n", "logger.info(f\" Compliant: {result_bad['is_compliant']}, Severity: {result_bad['severity']}\")\n", "if not result_bad[\"is_compliant\"]:\n", " for p, e, s in result_bad[\"violations\"]:\n", " logger.error(f\" Violation [{s}]: {p} — {e}\")\n", " logger.info(f\" Mitigated action: {result_bad.get('mitigated_action', 'N/A')[:80]}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "073ef737", "metadata": {}, "outputs": [], "source": [ "# Cell 9 — Multiple violations and audit log inspection\n", "# Ref: EthicalReasoningAgent.mitigate() (p.335)\n", "\n", "# Action that triggers multiple principle violations\n", "result_multi = agent.evaluate_action(\n", " \"bypass_consent and disable_audit to share_medical_details externally\",\n", " context={\"domain\": \"healthcare\"}\n", ")\n", "logger.info(f\"Multi-violation test: {len(result_multi['violations'])} violations detected\")\n", "for p, e, s in result_multi[\"violations\"]:\n", " logger.error(f\" [{s}] {p}: {e}\")\n", "\n", "# Inspect audit log\n", "audit = agent.audit_log.get_log()\n", "logger.info(f\"Audit log: {len(audit)} entries recorded\")\n", "for entry in audit:\n", " logger.debug(f\" Entry {entry['entry_id']}: action='{entry['action'][:50]}...', \"\n", " f\"violations={len(entry['violations'])}\")\n" ] }, { "cell_type": "markdown", "id": "1e6618ef", "metadata": {}, "source": [ "## 4. EU AI Act Compliance (p.336)\n", "\n", "The `EUCompliantAgent` maps the seven EU AI Act requirements to dedicated checking components. Each requirement is independently verifiable, enabling CI/CD integration where a change affecting data retention or oversight flows fails fast.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "262d3b9d", "metadata": {}, "outputs": [], "source": [ "# Cell 10 — EUCompliantAgent seven-requirement compliance check\n", "# Ref: EUCompliantAgent (p.336)\n", "\n", "eu_agent = EUCompliantAgent()\n", "\n", "# Default state: all compliant\n", "report = eu_agent.compliance_check()\n", "logger.info(f\"Initial status: {report['status']} ({report['compliant_count']}/{report['total_requirements']})\")\n", "\n", "# Simulate a fairness regression\n", "eu_agent.set_requirement_status(\n", " \"diversity_fairness\", False,\n", " \"Disparate impact ratio dropped to 0.73 — below 0.80 threshold\"\n", ")\n", "eu_agent.set_requirement_status(\n", " \"transparency\", False,\n", " \"Explanation generation timeout for 12% of decisions\"\n", ")\n", "\n", "report_regressed = eu_agent.compliance_check()\n", "logger.info(f\"After regression: {report_regressed['status']} \"\n", " f\"({report_regressed['compliant_count']}/{report_regressed['total_requirements']})\")\n", "\n", "for req_key, req_data in report_regressed[\"requirements\"].items():\n", " status = \"✓\" if req_data[\"compliant\"] else \"✗\"\n", " logger.info(f\" {status} {req_data['requirement']}: {req_data['evidence'][:60]}\")\n" ] }, { "cell_type": "markdown", "id": "ecdc2176", "metadata": {}, "source": [ "## 5. The Impossibility Theorem (p.337–338)\n", "\n", "A critical insight from the formal study of fairness: **statistical parity, equal opportunity, and predictive parity cannot all hold simultaneously** unless base rates are identical across protected groups.\n", "\n", "Three fairness regimes emerge from this impossibility result (Table 12.1):\n", "\n", "| Regime | When to Use | Trade-off |\n", "|--------|-------------|-----------|\n", "| **Equal Base Rates** | Base rates verified equal across groups | None (rare in practice) |\n", "| **Group-Level Focus** (Regime 2) | Historical exclusion has distorted qualification rates | Individual merit may be adjusted |\n", "| **Individual-Level Focus** (Regime 3) | Qualification distribution is well-characterized | Group disparities may persist |\n", "\n", "The HR case study in this notebook operates in **Regime 2** — prioritizing demographic parity via the four-fifths rule.\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6387a135", "metadata": {}, "outputs": [], "source": [ "# Cell 12 — Impossibility Theorem regime visualization\n", "# Ref: Impossibility Theorem (p.337–338), Table 12.1\n", "\n", "fig, axes = plt.subplots(1, 3, figsize=(14, 4))\n", "fig.suptitle(\"Fairness Regimes from the Impossibility Theorem (Table 12.1)\", fontsize=13, fontweight=\"bold\")\n", "\n", "regimes = [\n", " (\"Regime 1: Equal Base Rates\", [0.6, 0.6], [0.6, 0.6], \"#4CAF50\"),\n", " (\"Regime 2: Group-Level Focus\", [0.55, 0.55], [0.7, 0.5], \"#FF9800\"),\n", " (\"Regime 3: Individual-Level Focus\", [0.7, 0.5], [0.7, 0.5], \"#2196F3\"),\n", "]\n", "\n", "groups = [\"Group A\", \"Group B\"]\n", "metrics = [\"Demographic\\nParity\", \"Equal\\nOpportunity\"]\n", "\n", "for ax, (title, dp_vals, eo_vals, color) in zip(axes, regimes):\n", " x = np.arange(len(groups))\n", " width = 0.3\n", " ax.bar(x - width/2, dp_vals, width, label=\"Demographic Parity\", color=color, alpha=0.7)\n", " ax.bar(x + width/2, eo_vals, width, label=\"Equal Opportunity\", color=color, alpha=0.4)\n", " ax.set_title(title, fontsize=10, fontweight=\"bold\")\n", " ax.set_xticks(x)\n", " ax.set_xticklabels(groups)\n", " ax.set_ylim(0, 1.0)\n", " ax.axhline(y=0.8, color=\"red\", linestyle=\"--\", linewidth=0.8, label=\"Four-fifths threshold\")\n", " ax.set_ylabel(\"Rate\")\n", " ax.legend(fontsize=7, loc=\"lower right\")\n", "\n", "plt.tight_layout()\n", "plt.savefig(\"impossibility_regimes.png\", dpi=100, bbox_inches=\"tight\")\n", "plt.show()\n", "logger.success(\"Impossibility Theorem regimes visualized.\")\n" ] }, { "cell_type": "markdown", "id": "848a109d", "metadata": {}, "source": [ "## 6. Bias Detection and Mitigation (pp. 339–343)\n", "\n", "AI agents inherit biases from their training data. The `BiasDetector` computes three fairness metrics:\n", "\n", "- **Demographic Parity** — equal positive outcome rates across groups (p. 342)\n", "- **Equal Opportunity** — equal true-positive rates for qualified candidates (p. 342)\n", "- **Disparate Impact** — ratio of positive outcome rates; the four-fifths rule (0.8) is the legal standard (p. 342)\n", "\n", "The `assess_severity()` method converts metric values into escalation signals: `< 0.8` → HIGH, `> 0.1 disparity` → MEDIUM, else LOW.\n", "\n", "---\n", "\n", "> 📌 **Info Box — The shadow of historical data: Amazon's AI recruiting failure (p. 340)**\n", ">\n", "> In 2018, Reuters reported that Amazon had quietly scrapped an internal AI recruiting tool after discovering it systematically penalized resumes containing terms associated with women. Trained on a decade of hiring data that skewed heavily male, the system learned to downgrade candidates who listed activities like \"women's chess club\" or attended all-women's colleges. The failure was not a bug in the algorithm — it was a faithful reproduction of the patterns in the data. Had it been an adaptive agent, the bias would have **compounded with each cycle** — exactly the superlinear feedback loop formalized in the Dual-Exposure Model.\n", "\n", "---\n", "\n", "> 📌 **Info Box — Ethical evaluation as an attack surface (p. 337)**\n", ">\n", "> Three attack vectors are particularly relevant:\n", "> - **Prompt injection against validators** — Defense: validate and sanitize inputs before they reach the ethical layer.\n", "> - **Fairness gaming** — Applicants learn thresholds and exploit them. Defense: rotate anonymization strategies; monitor for proxy signals via mutual information analysis.\n", "> - **Reward hacking in continuous learning** — RLHF systems satisfy constraints while violating their spirit. Defense: a separate constraint model that can veto policy updates.\n", "\n", "---\n", "\n", "**Mitigation strategies** operate at three pipeline stages (p. 343):\n", "1. **Pre-processing** — Re-weighting, re-sampling, removing proxy variables\n", "2. **In-processing** — Fairness objectives in the loss function\n", "3. **Post-processing** — Group-specific threshold calibration" ] }, { "cell_type": "code", "execution_count": null, "id": "e03a93cc", "metadata": {}, "outputs": [], "source": [ "# Cell 14 — Generate synthetic HR dataset with injected bias\n", "# Ref: Bias Detection and Mitigation (p.339–340), Amazon example (p.341, p.343)\n", "\n", "hr_data = generate_hr_dataset(n=200, seed=42)\n", "hr_summary = summarize_hr_dataset(hr_data)\n", "\n", "logger.info(f\"Dataset: {hr_summary['total_candidates']} candidates\")\n", "logger.info(f\"Gender distribution: {hr_summary['gender_distribution']}\")\n", "logger.info(f\"Qualification rates by gender: {hr_summary['qualification_rates_by_gender']}\")\n", "logger.info(f\"Disparate Impact Ratio (female/male): {hr_summary['disparate_impact_ratio']}\")\n", "\n", "if hr_summary['four_fifths_compliant']:\n", " logger.success(\"Four-fifths rule: COMPLIANT\")\n", "else:\n", " logger.error(f\"Four-fifths rule: VIOLATION — DI = {hr_summary['disparate_impact_ratio']:.4f} < 0.80\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "639e720c", "metadata": {}, "outputs": [], "source": [ "# Cell 15 — BiasDetector analysis on the HR dataset\n", "# Ref: BiasDetector (p.341–342)\n", "\n", "detector = BiasDetector()\n", "\n", "predictions = [r[\"qualified\"] for r in hr_data]\n", "demographics = [r[\"gender\"] for r in hr_data]\n", "ground_truth = [r[\"raw_score\"] >= 0.65 for r in hr_data]\n", "\n", "bias_report = detector.analyze(predictions, demographics, ground_truth)\n", "\n", "logger.info(f\"Severity: {bias_report['severity']}\")\n", "logger.info(f\"Summary: {bias_report['summary']}\")\n", "\n", "# Display each metric\n", "for metric_name, metric_data in bias_report[\"metrics\"].items():\n", " logger.info(f\" {metric_name}: {metric_data}\")\n", "\n", "logger.info(\"Recommendations:\")\n", "for rec in bias_report[\"recommendations\"]:\n", " logger.info(f\" → {rec}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "e2791149", "metadata": {}, "outputs": [], "source": [ "# Cell 16 — Bias metrics dashboard visualization\n", "# Ref: Bias Detection (p.341–342)\n", "\n", "fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n", "fig.suptitle(\"Bias Metrics Dashboard — HR Dataset (Before Mitigation)\", fontsize=13, fontweight=\"bold\")\n", "\n", "# 1. Qualification rates by gender (Demographic Parity)\n", "dp_rates = bias_report[\"metrics\"][\"demographic_parity\"][\"group_rates\"]\n", "colors_dp = [\"#2196F3\" if g != \"female\" else \"#E91E63\" for g in dp_rates.keys()]\n", "axes[0].bar(dp_rates.keys(), dp_rates.values(), color=colors_dp, alpha=0.8)\n", "axes[0].axhline(y=sum(dp_rates.values()) / len(dp_rates), color=\"gray\", linestyle=\"--\", label=\"Mean rate\")\n", "axes[0].set_title(\"Demographic Parity (p.342)\")\n", "axes[0].set_ylabel(\"Qualification Rate\")\n", "axes[0].set_ylim(0, 1.0)\n", "axes[0].legend()\n", "\n", "# 2. Equal Opportunity (TPR by group)\n", "eo_data = bias_report[\"metrics\"][\"equal_opportunity\"]\n", "if eo_data.get(\"group_tpr\"):\n", " eo_rates = eo_data[\"group_tpr\"]\n", " colors_eo = [\"#2196F3\" if g != \"female\" else \"#E91E63\" for g in eo_rates.keys()]\n", " axes[1].bar(eo_rates.keys(), eo_rates.values(), color=colors_eo, alpha=0.8)\n", " axes[1].set_title(\"Equal Opportunity — TPR (p.342)\")\n", " axes[1].set_ylabel(\"True Positive Rate\")\n", " axes[1].set_ylim(0, 1.0)\n", "else:\n", " axes[1].text(0.5, 0.5, \"Ground truth\\nnot available\", ha=\"center\", va=\"center\", fontsize=12)\n", " axes[1].set_title(\"Equal Opportunity — TPR (p.342)\")\n", "\n", "# 3. Disparate Impact ratio gauge\n", "di_data = bias_report[\"metrics\"][\"disparate_impact\"]\n", "di_ratio = di_data[\"ratio\"]\n", "bar_color = \"#4CAF50\" if di_ratio >= 0.8 else \"#F44336\"\n", "axes[2].barh([\"DI Ratio\"], [di_ratio], color=bar_color, height=0.4, alpha=0.8)\n", "axes[2].axvline(x=0.8, color=\"red\", linestyle=\"--\", linewidth=2, label=\"Four-fifths threshold (0.80)\")\n", "axes[2].set_xlim(0, 1.0)\n", "axes[2].set_title(\"Disparate Impact Ratio (p.342)\")\n", "axes[2].legend(loc=\"lower right\")\n", "axes[2].text(di_ratio + 0.02, 0, f\"{di_ratio:.3f}\", va=\"center\", fontweight=\"bold\",\n", " color=bar_color, fontsize=14)\n", "\n", "plt.tight_layout()\n", "plt.savefig(\"bias_dashboard_before.png\", dpi=100, bbox_inches=\"tight\")\n", "plt.show()\n", "logger.success(\"Bias metrics dashboard rendered.\")\n" ] }, { "cell_type": "markdown", "id": "f9fbb3df", "metadata": {}, "source": [ "## 7. Bias Monitoring Pipeline (p.342–343)\n", "\n", "Fairness must be monitored continuously — the same way you monitor latency and error rates. The `BiasMonitoringPipeline` turns fairness measurement into a streaming operational control loop:\n", "\n", "1. Accumulate decisions in a sliding window\n", "2. Run `BiasDetector` when the window is full\n", "3. Push metrics to the observability platform (Prometheus/Datadog)\n", "4. Fire critical alerts when severity is HIGH\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f9d5e8bd", "metadata": {}, "outputs": [], "source": [ "# Cell 18 — BiasMonitoringPipeline streaming simulation\n", "# Ref: BiasMonitoringPipeline (p.342–343)\n", "\n", "import random\n", "\n", "pipeline = BiasMonitoringPipeline(window_size=100)\n", "rng = random.Random(42)\n", "\n", "logger.info(\"Streaming 200 decisions through the pipeline...\")\n", "reports_collected = []\n", "\n", "for i in range(200):\n", " gender = rng.choices([\"male\", \"female\"], weights=[0.55, 0.45])[0]\n", " # Inject bias: females have lower approval rate\n", " approved = rng.random() > (0.35 if gender == \"female\" else 0.20)\n", " result = pipeline.on_decision(approved, gender, ground_truth=approved)\n", " if result is not None:\n", " reports_collected.append(result)\n", "\n", "logger.info(f\"Reports generated: {len(reports_collected)}\")\n", "\n", "# Show alerts\n", "alerts = pipeline.alert_config.get_alerts()\n", "logger.info(f\"Alerts fired: {len(alerts)}\")\n", "for alert in alerts:\n", " logger.error(f\" [{alert['level'].upper()}] {alert['message']}\")\n", "\n", "# Show gauges\n", "gauges = pipeline.metrics.get_gauges()\n", "for name, val in gauges.items():\n", " logger.info(f\" Gauge: {name} = {val:.4f}\")\n", "\n", "logger.success(\"Bias monitoring pipeline demonstration complete.\")\n" ] }, { "cell_type": "markdown", "id": "43bc6876", "metadata": {}, "source": [ "## 8. Case Study: HR Assistant with Fairness Constraints (pp. 343–346)\n", "\n", "The `FairHiringAgent` implements a three-layer fairness architecture:\n", "\n", "1. **Layer 1 — Anonymization**: strips name, gender, age, nationality, photo, address, and education institution from resumes (p. 345)\n", "2. **Layer 2 — Bias-Aware Evaluation**: monitors decision patterns across demographic groups in real time (p. 344)\n", "3. **Layer 3 — Fairness Enforcement**: intervenes when detected bias exceeds the four-fifths threshold (p. 345)\n", "\n", "Three supporting components:\n", "- `ResumeAnalyzer` — skills-matching evaluation returning score in [0, 1] with explanation map\n", "- `BiasDetector` — computes four-fifths (adverse impact) ratio per protected group\n", "- `FairnessEnforcer` — selects from reweighting, threshold adjustment, or calibrated post-processing\n", "\n", "---\n", "\n", "> 📌 **Info Box — Why anonymization is harder than it appears (p. 346)**\n", ">\n", "> As few as **four quasi-identifying data points** can uniquely re-identify individuals. Writing style and vocabulary choices can serve as fingerprints surviving field-level anonymization. The HR agent should ensure every combination of quasi-identifiers appears in at least *k* resumes (**k-anonymity**). A periodic **proxy audit** using mutual information between remaining features and protected attributes should be integrated into the monitoring pipeline.\n", "\n", "---\n", "\n", "The `FairHiringAgent` implicitly operates in **Regime 2** of the Impossibility Theorem (p. 346): it prioritizes demographic parity through the four-fifths rule threshold — appropriate for hiring domains where historical exclusion has distorted apparent qualification rates." ] }, { "cell_type": "code", "execution_count": null, "id": "039b2e13", "metadata": {}, "outputs": [], "source": [ "# Cell 20 — FairHiringAgent batch evaluation with bias detection\n", "# Ref: FairHiringAgent (p.343–346)\n", "\n", "fair_agent = FairHiringAgent(fairness_threshold=0.8)\n", "\n", "job_requirements = {\n", " \"required_skills\": [\"python\", \"machine_learning\", \"sql\", \"data_analysis\", \"cloud_computing\"],\n", " \"min_experience\": 3,\n", " \"role\": \"Senior Data Scientist\",\n", "}\n", "\n", "logger.info(f\"Evaluating {len(hr_data)} candidates for: {job_requirements['role']}\")\n", "\n", "# Run batch evaluation (includes bias detection + mitigation)\n", "batch_result = fair_agent.evaluate_batch(hr_data, job_requirements)\n", "\n", "severity = batch_result[\"bias_report\"][\"severity\"]\n", "di_before = batch_result[\"bias_report\"][\"metrics\"][\"disparate_impact\"][\"ratio\"]\n", "logger.info(f\"Bias severity: {severity}\")\n", "logger.info(f\"Disparate impact ratio (before mitigation): {di_before:.4f}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "c94a8ef4", "metadata": {}, "outputs": [], "source": [ "# Cell 21 — Anonymization layer demonstration\n", "# Ref: FairHiringAgent.anonymize() (p.345)\n", "\n", "sample_resume = hr_data[0]\n", "logger.info(\"--- Original Resume (sample) ---\")\n", "for key, val in sample_resume.items():\n", " display_val = str(val)[:60] if isinstance(val, list) else val\n", " logger.info(f\" {key}: {display_val}\")\n", "\n", "anonymized = fair_agent.anonymize(sample_resume)\n", "logger.info(\"--- Anonymized Resume ---\")\n", "for key, val in anonymized.items():\n", " display_val = str(val)[:60] if isinstance(val, list) else val\n", " logger.info(f\" {key}: {display_val}\")\n", "\n", "removed = set(sample_resume.keys()) - set(anonymized.keys())\n", "logger.success(f\"Fields removed: {sorted(removed)}\")\n" ] }, { "cell_type": "code", "execution_count": null, "id": "6e50f0fa", "metadata": {}, "outputs": [], "source": [ "# Cell 22 — Before/after fairness comparison chart\n", "# Ref: FairHiringAgent case study (p.343–346)\n", "\n", "# BEFORE mitigation: raw scores\n", "before_preds = [r[\"raw_score\"] >= 0.65 for r in hr_data]\n", "before_genders = [r[\"gender\"] for r in hr_data]\n", "\n", "# AFTER mitigation: mitigated scores from batch_result\n", "after_evals = batch_result[\"evaluations\"]\n", "after_preds = [e[\"score\"] >= 0.65 for e in after_evals]\n", "\n", "# Compute DI ratios\n", "def compute_di(preds, genders):\n", " groups = {}\n", " for p, g in zip(preds, genders):\n", " groups.setdefault(g, []).append(p)\n", " rates = {g: sum(ps) / len(ps) for g, ps in groups.items()}\n", " f_rate = rates.get(\"female\", 0)\n", " m_rate = rates.get(\"male\", 1)\n", " return round(f_rate / m_rate, 4) if m_rate > 0 else 0\n", "\n", "di_before = compute_di(before_preds, before_genders)\n", "di_after = compute_di(after_preds, before_genders)\n", "\n", "fig, ax = plt.subplots(figsize=(8, 5))\n", "bars = ax.bar([\"Before Mitigation\", \"After Mitigation\"], [di_before, di_after],\n", " color=[\"#F44336\", \"#4CAF50\"], alpha=0.8, width=0.5)\n", "ax.axhline(y=0.8, color=\"red\", linestyle=\"--\", linewidth=2, label=\"Four-fifths threshold (0.80)\")\n", "ax.set_ylabel(\"Disparate Impact Ratio (Female / Male)\", fontsize=11)\n", "ax.set_title(\"FairHiringAgent: Before vs. After Mitigation (p.343–346)\", fontsize=13, fontweight=\"bold\")\n", "ax.set_ylim(0, 1.1)\n", "ax.legend(fontsize=10)\n", "\n", "for bar, val in zip(bars, [di_before, di_after]):\n", " ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.02,\n", " f\"{val:.3f}\", ha=\"center\", fontweight=\"bold\", fontsize=14)\n", "\n", "plt.tight_layout()\n", "plt.savefig(\"fairness_before_after.png\", dpi=100, bbox_inches=\"tight\")\n", "plt.show()\n", "\n", "if di_after >= 0.8:\n", " logger.success(f\"Mitigation effective: DI improved from {di_before:.3f} to {di_after:.3f} (≥ 0.80)\")\n", "else:\n", " logger.info(f\"DI after mitigation: {di_after:.3f} — further tuning may be needed\")\n", "\n", "# Audit trail\n", "audit = fair_agent.audit_logger.get_log()\n", "logger.info(f\"Audit trail: {len(audit)} entries recorded for compliance review\")\n" ] }, { "cell_type": "markdown", "id": "2d3aa8e0", "metadata": {}, "source": [ "## Summary & Exercises\n", "\n", "### Key Takeaways\n", "\n", "1. **Deontic logic** (p.332–333) provides a formal language for encoding obligation, permission, and prohibition — making ethical constraints machine-executable.\n", "2. The **Ethical Consistency Theorem** (p.333) gives a computationally verifiable criterion: an action is permitted only if it is logically consistent with the entire rule set.\n", "3. The **EthicalReasoningAgent** (p.334–335) embeds an ethical checkpoint between reasoning and action, using modular validators aligned with IEEE Ethically Aligned Design.\n", "4. The **Impossibility Theorem** (p.337–338) proves that perfect fairness across all metrics is mathematically impossible when base rates differ — the choice of regime is a normative decision.\n", "5. The **FairHiringAgent** (p.343–346) demonstrates a three-layer fairness architecture that detects bias (DI ≈ 0.73) and mitigates it through reweighting.\n", "\n", "### Exercises\n", "\n", "1. **Add a new principle checker**: Create a `PrivacyChecker` that detects data retention violations and integrate it into `EthicalReasoningAgent`.\n", "2. **Tune the fairness threshold**: Change the `fairness_threshold` to 0.9 and observe how the mitigation strategy changes.\n", "3. **Implement Regime 3**: Modify the `FairHiringAgent` to prioritize equal opportunity instead of demographic parity. How does this change the disparate impact ratio?\n", "4. **Proxy audit** (p.346): Compute mutual information between remaining features (skills, experience) and gender after anonymization. Are there hidden proxy signals?\n", "\n", "---\n", "*Author: Imran Ahmad — Packt Publishing, 2026*\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.15" } }, "nbformat": 4, "nbformat_minor": 5 }