🏛️ Production-Grade MCP Server + Agentic System

A reference implementation of an MCP server designed to actually ship

Multi-tenant · Authenticated · Observable · Rate-limited · Cached · Circuit-broken · Governed

📖 Full Step-by-Step Blog Walkthrough

This repository is the companion codebase for a long-form blog post that walks through every single component end to end, with every line of code explained in context. Start there if you want to understand the "why" behind the architecture before reading the code.

🔗 Building a Production-Grade MCP Server Architecture with Agentic System →

🎯 What This Is

Most MCP tutorials end with a @tool decorator that returns "hello world". That is fine for a demo. It is not what ships.

This repository is a reference implementation of an MCP server designed to run in production: multi-tenant, authenticated, observable, rate-limited, cached, circuit-broken, and governed. It exposes a company's heterogeneous data layer (Postgres, Elasticsearch, S3, vector DB) to AI agents as a single, secure tool surface, and ships with a four-agent support copilot (Planner → Retriever → Synthesizer → Critic) that uses it end to end.

The codebase is deliberately organised around twelve components that keep showing up on the 3 AM pager when teams skip them. Each one lives in its own module and can be read, replaced, or extended independently.

🏗️ Architecture Overview

The complete production-grade system: MCP server dispatch pipeline on the right, four-agent orchestrator on the left, data plane on top, observability on the bottom, identity and governance as crosscutting concerns.

🧩 The 12 Components

#	Component	Lives in	What it gives you
1	🚪 Transport & Session Layer	`server.py`	stdio for local, Streamable HTTP for remote, horizontal-scale-friendly sessions
2	🔐 Authentication Server	`auth/oauth.py`	OAuth 2.1 + PKCE, short-lived JWTs, JWKS validation
3	⚖️ Authorization & Policy Engine	`auth/policy.py`	Tool-level RBAC, tenant-scoped ABAC, deny-by-default
4	📚 Tool Registry & Discovery	`tools/registry.py`	Dynamic toolsets, `.well-known` capability metadata
5	✅ Input Validation Layer	`validation/schemas.py`	Pydantic schemas, enum constraints, agent-adversarial input as default threat model
6	🔧 Tool Execution Engine	`tools/base.py`	Three-level hierarchy (atomic / composed / workflow)
7	🔄 Circuit Breaker & Retry	`reliability/`	Closed → open → half-open, Adaptive Timeout Budget Allocation
8	🚦 Rate Limiting & Quotas	`ratelimit/limiter.py`	Redis token-bucket (Lua-atomic), per-tenant and per-tool
9	⚡ Caching Layer	`cache/manager.py`	Two-tier (L1 in-process, L2 Redis), stampede prevention
10	🧱 Structured Error Framework	`errors/framework.py`	Machine-readable errors with `retryable` and `hint` fields
11	🔭 Observability Stack	`observability/`	OpenTelemetry traces, Prometheus metrics, audit logs
12	🛡️ Governance & Multi-Tenancy	`governance/`	Tenant isolation, approval gates, outbound HTTP allowlisting

📖 Diving Deeper, Section by Section

Each diagram below links back to the corresponding section in the blog, where every line of code is walked through in detail.

📦 Data Persistence Layer

Postgres + Row-Level Security · Tenant isolation at the DB layer

🚪 Transport & Session Layer

Dual transport · Stateless session · Middleware chain

🔐 Authentication, Policy & Governance

OAuth 2.1 · YAML policies · Human-in-the-loop approvals

🔧 Tool Execution Engine

Three-level hierarchy · Atomic · Composed · Workflow

🔄 Reliability Layer

Circuit breakers · Retry with jitter · ATBA budget allocator

⚡ Rate Limiting & Caching

Redis token bucket · Two-tier cache · Stampede lock

🔭 Observability Stack

OpenTelemetry · Prometheus · Audit logs · One trace ID

🤖 Multi-Agentic Architecture

Four-agent design · Planner · Retriever · Synthesizer · Critic

🎼 The Orchestrator Flow

End-to-end agent orchestration with one bounded revise loop

🚀 Quick Start

Prerequisites

Docker & Docker Compose
Python 3.11+ (only for running the CLI locally)
An Anthropic API key (for the agent layer)

1. Clone and Configure

git clone https://github.com/FareedKhan-dev/production-grade-mcp-agentic-system.git
cd production-grade-mcp-agentic-system
cp .env.example .env

Edit .env and set at minimum:

ANTHROPIC_API_KEY — for the agent layer
ATLAS_AUTH_JWKS_URL — your OAuth 2.1 provider's JWKS endpoint (or leave default for dev)

2. Bring Up the Stack

docker compose up -d

That brings up the full local environment:

Service	URL	What it is
🏛️ MCP Server	`http://localhost:8080/mcp`	Streamable HTTP endpoint
🔍 Discovery	`http://localhost:8080/.well-known/mcp-server`	Unauthenticated capability metadata
📊 Metrics	`http://localhost:8080/metrics`	Prometheus scrape target
❤️ Health	`http://localhost:8080/healthz`	Liveness probe
🔭 Jaeger	`http://localhost:16686`	Distributed tracing UI
📈 Grafana	`http://localhost:3000`	Metrics dashboards (admin / admin)
🗄️ MinIO Console	`http://localhost:9001`	S3-compatible storage UI

3. Run the Support Copilot CLI

pip install -e .

export ATLAS_MCP_URL=http://localhost:8080
export ATLAS_MCP_TOKEN=dev-token
export ATLAS_TENANT=acme
export ANTHROPIC_API_KEY=sk-ant-...

atlas-copilot "Why was the refund on order o_9002 for CUST-1001 delayed?"

You will see the four agents run end-to-end, the final draft printed with [S1][S2] citations, and a full trace summary including token counts, tool calls, and the run_id that ties back to Jaeger.

4. Connect from Claude Desktop / Cursor

Add this to your MCP host config:

{
  "mcpServers": {
    "production-mcp": {
      "type": "http",
      "url": "http://localhost:8080/mcp",
      "headers": {
        "Authorization": "Bearer ${ATLAS_MCP_TOKEN}",
        "X-Tenant-Id": "acme"
      }
    }
  }
}

📂 Repository Structure

.
├── 📄 README.md
├── 🐳 docker-compose.yml          # Full local stack: app + data + observability
├── 🐳 Dockerfile                  # Two-stage build, non-root runtime
├── 📜 LICENSE
├── 📦 pyproject.toml              # Dependencies, dev tools, CLI entry points
├── ⚙️  .env.example                # Every setting documented by component
│
├── 🔧 config/                     # Runtime configuration (hot-reloadable)
│   ├── http_allowlist.yaml       # Per-tenant outbound HTTP allowlist
│   └── policy.yaml               # YAML-driven authorization policies
│
├── 🚢 deploy/                     # Deployment sidecar configs
│   ├── otel/config.yaml          # OpenTelemetry Collector pipeline
│   ├── prometheus/prometheus.yml # Prometheus scrape targets
│   └── sql/init.sql              # Schema + RLS policies + seed data
│
├── 📚 docs/                       # Deep-dive documentation
│   ├── AGENT_SYSTEM.md           # Multi-agent orchestrator internals
│   ├── ARCHITECTURE.md           # The 12 components in detail
│   └── DEPLOYMENT.md             # K8s, Cloudflare Workers, bare-metal
│
├── 🧠 src/atlas_mcp/              # Main application source
│   ├── config.py                 # Centralized typed settings
│   ├── server.py                 # ⚡ Component 1: Transport & dispatch
│   │
│   ├── 🤖 agents/                 # Four-agent support copilot
│   │   ├── planner.py            # Emits retrieval plan JSON
│   │   ├── retriever.py          # Bounded tool-calling loop
│   │   ├── synthesizer.py        # Drafts reply with citations
│   │   ├── critic.py             # Approves or sends one revise
│   │   ├── orchestrator.py       # Wires the four agents together
│   │   ├── mcp_client.py         # Thin JSON-RPC MCP client
│   │   ├── memory.py             # STM (Redis) + LTM (vector)
│   │   └── cli.py                # atlas-copilot CLI entry point
│   │
│   ├── 🔐 auth/                   # Components 2 + 3
│   │   ├── oauth.py              # JWT + JWKS validation
│   │   ├── middleware.py         # Bearer token extraction
│   │   └── policy.py             # YAML-driven policy engine
│   │
│   ├── 🛡️  governance/             # Component 12
│   │   ├── tenant.py             # Tenant pinning middleware
│   │   └── approval.py           # Human-in-the-loop gate
│   │
│   ├── 🔧 tools/                  # Components 4 + 6
│   │   ├── registry.py           # In-memory tool index + discovery
│   │   ├── base.py               # Tool abstract base + metadata
│   │   ├── atomic/               # Level 1: one backend each
│   │   ├── composed/             # Level 2: deterministic chains
│   │   └── workflow/             # Level 3: multi-step procedures
│   │
│   ├── 🔄 reliability/            # Component 7
│   │   ├── circuit_breaker.py    # 3-state machine per tool
│   │   ├── retry.py              # Exponential backoff + jitter
│   │   └── atba.py               # Adaptive Timeout Budget Allocation
│   │
│   ├── 🚦 ratelimit/              # Component 8
│   │   └── limiter.py            # Redis token bucket (Lua-atomic)
│   │
│   ├── ⚡ cache/                   # Component 9
│   │   └── manager.py            # L1 + L2 cache with stampede lock
│   │
│   ├── 🧱 errors/                 # Component 10
│   │   └── framework.py          # Structured Error Recovery (SERF)
│   │
│   ├── 🔭 observability/          # Component 11
│   │   ├── tracing.py            # OpenTelemetry spans
│   │   ├── metrics.py            # Prometheus instruments
│   │   └── audit.py              # Structured JSONL audit log
│   │
│   └── ✅ validation/             # Component 5
│       └── schemas.py            # Tool call envelope
│
└── 🧪 tests/                      # Narrow tests, load-bearing properties
    ├── test_circuit_breaker.py   # State machine transitions
    ├── test_errors.py            # SERF wire format + retry semantics
    └── test_policy.py            # Deny-beats-allow + default-deny

🎨 Tech Stack

Layer	Technology
Language	Python 3.11+
Web framework	Starlette + Uvicorn
MCP SDK	`mcp>=1.2.0`
Auth	PyJWT + Authlib (OAuth 2.1 resource server)
Validation	Pydantic v2 + Pydantic Settings
Database	asyncpg (PostgreSQL 16 with RLS)
Search	Elasticsearch 8 (async client)
Vector DB	Qdrant
Object storage	aioboto3 (MinIO / S3)
Cache + queues	Redis 7 (`redis[hiredis]`)
Reliability	tenacity (retries) + custom breaker + custom ATBA
Tracing	OpenTelemetry SDK + OTLP exporter
Metrics	prometheus_client
Logging	structlog (JSON)
LLM	Anthropic Messages API (Claude)

🧪 Testing

The test suite is deliberately narrow, covering the three load-bearing safety properties:

pip install -e ".[dev]"
pytest -v

test_circuit_breaker.py — state machine transitions, retryable vs deterministic error classification
test_errors.py — SERF wire format, retry semantics, MCP-level error data
test_policy.py — default-deny, deny-beats-allow, glob matching, PII condition blocking

🛣️ Production Deployment

For running this in an actual production environment (managed Postgres, real OAuth provider, SIEM integration, Kubernetes), see docs/DEPLOYMENT.md.

Key swaps between local dev and production:

Local (docker-compose)	Production
Dev JWT issuer	WorkOS AuthKit / Auth0 / Keycloak
MinIO	AWS S3 / GCS / Azure Blob
Local Postgres	AWS RDS / Cloud SQL / Supabase
Redis container	Upstash / ElastiCache / MemoryDB
Local OTel collector	Datadog / Honeycomb / Grafana Cloud
File-based audit log	Splunk / Chronicle / SIEM of choice

📚 Documentation

📖 Blog Walkthrough — Building a Production-Grade MCP Server (recommended starting point)
🏗️ docs/ARCHITECTURE.md — The 12 components in depth
🤖 docs/AGENT_SYSTEM.md — Multi-agent orchestrator internals
🚢 docs/DEPLOYMENT.md — Production deployment options

📜 License

MIT. See LICENSE.

⭐ If this helped you, please consider starring the repo

Built with ☕ and a lot of 3 AM debugging

📖 Read the full blog walkthrough · 🐛 Report an issue · 💬 Start a discussion