Offensive Security Research Config for Claude Code
A spec-driven offensive security framework for Claude Code — structured engagement workflows based on the Cyber Kill Chain, 31 kill-chain skills (multi-file progressive-disclosure) plus a discipline layer (a SessionStart dispatcher + 6 process/discipline skills), 8 collaborative agents, and a shared 47-file vulnerability reference library. Inspired by GitHub's spec-kit, obra/superpowers, and gadievron/raptor (crash→exploitability + OSS-repo forensics).
Quick Setup
# Method 0: Install as a Claude Code plugin (recommended — auto-loads the skill dispatcher)
/plugin marketplace add hypnguyen1209/offensive-claude
/plugin install offensive-claude@offensive-claude-marketplaceInstalling as a plugin registers a SessionStart hook that injects the
using-offensive-claude dispatcher into every conversation, so the skill-invocation discipline
(scope → finding → OPSEC) is active from the first message.
# Method 1: One-liner install (recommended)
curl -sL https://raw.githubusercontent.com/hypnguyen1209/offensive-claude/main/install.sh | bash# Method 2: Clone + install script
git clone https://github.com/hypnguyen1209/offensive-claude.git ~/offensive-claude
cd ~/offensive-claude && bash install.sh# Method 3: Manual copy
git clone https://github.com/hypnguyen1209/offensive-claude.git ~/offensive-claude
cp -r ~/offensive-claude/skills ~/.claude/skills
cp -r ~/offensive-claude/agents ~/.claude/agents
cp -r ~/offensive-claude/templates ~/.claude/templates
cp -r ~/offensive-claude/workflows ~/.claude/workflows
cp -r ~/offensive-claude/commands ~/.claude/commands
cp -r ~/offensive-claude/presets ~/.claude/presets
cp ~/offensive-claude/CLAUDE.md ~/.claude/CLAUDE.mdSkills and agents activate automatically — no additional configuration needed.
Engagement Workflow
Engagements follow the Cyber Kill Chain as a structured 9-phase pipeline with quality gates:
Phase 0 Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7 Phase 8
SCOPE → RECON → WEAPONIZE → DELIVERY → EXPLOIT → INSTALLATION → C2 → ACTIONS ON → REPORT
OBJECTIVES
Quick Start — Web App Pentest
/engage.init web-app --client ACME
/engage.scope # Define targets, ROE, authorization
/engage.recon # Subdomain enum, port scan, tech fingerprint
/engage.weaponize # Select exploits, design payloads
/engage.exploit # Execute exploits, document findings
/engage.report # Generate technical report + executive summary
Orchestration Commands
| Command | Phase | Action |
|---|---|---|
/engage.init <preset> | — | Initialize engagement with workflow preset |
/engage.scope | 0 | Define targets, ROE, authorization |
/engage.recon | 1 | Passive/active reconnaissance |
/engage.weaponize | 2 | Payload development, exploit design |
/engage.deliver | 3 | Delivery vector execution |
/engage.exploit | 4 | Exploitation, finding documentation |
/engage.install | 5 | Persistence establishment |
/engage.c2 | 6 | C2 infrastructure setup |
/engage.actions | 7 | Objectives execution, lateral movement |
/engage.report | 8 | Report generation |
/engage.status | — | Show pipeline status and progress |
/engage.gate | — | Validate current phase gate |
/engage.crash | 4 | Crash → root cause (rr) → reachability (gcov/trace) → empirical exploitability verdict |
/engage.cvediff | 2,4 | Find a CVE's canonical fix commit(s) across sources, then scope-gated diff for root cause |
/engage.scorecard | — | Calibrate model verdict trust (Wilson-bounded miss-rate) to short-circuit re-validation |
/engage.threatmodel | 1 | Materialize / lint / drift-check the engagement threat model |
/engage.memory | — | Recall prior patterns / record confirmed findings (cross-engagement learning) |
/engage.pickup | — | Resume an engagement from the engine trace (skip completed steps) |
Workflow Presets
| Preset | Phases | Use Case |
|---|---|---|
web-app | 0,1,2,3,4,8 | OWASP-focused web application assessment |
network | 0,1,2,4,5,6,7,8 | Internal network penetration test |
red-team | ALL (0-8) | Full adversary simulation |
cloud | 0,1,4,8 | AWS/Azure/GCP security audit |
mobile | 0,1,2,4,8 | Android/iOS application pentest |
ad-domain | 0,1,2,4,5,7,8 | Active Directory domain assessment |
bug-bounty | 0,1,4,8 | Bug bounty vulnerability hunting |
Quality Gates
Each phase transition validates:
- Required artifacts exist (templates filled)
- Findings have mandatory fields (CWE, CVSS, evidence, ATT&CK ID)
- Gate PASS → suggests next phase + relevant skills
- Gate FAIL → lists missing items
Structure
.
├── skills/ # 31 skill modules (progressive-disclosure layout)
│ ├── recon-osint/
│ │ ├── SKILL.md # thin router: when-to-activate + technique map + OPSEC/detection
│ │ ├── references/ # per-skill technique deep-dives (theory + code + detection + OPSEC)
│ │ └── scripts/ # runnable tooling backing each technique
│ ├── coding-mastery/scripts/_lib/ # shared safety libs: scope_guard, action_guard, http_creds, redact_headers
│ ├── engagement-memory/ # cross-engagement pattern-learning memory (support skill)
│ ├── using-offensive-claude/ # SessionStart DISPATCHER — skill-invocation discipline
│ ├── engagement-flow/ # process skills: sequence the kill chain,
│ ├── scope-discipline/ # no target without authorization,
│ ├── threat-model-discipline/ # model the attack surface + detect drift before exploiting,
│ ├── finding-discipline/ # no [CONFIRMED] without proof,
│ ├── opsec-discipline/ # detection/cleanup/redaction before acting,
│ ├── writing-offensive-skills/ # authoring conventions
│ ├── exploit-development/
│ ├── ...
│ └── references/ # shared 47-file vulnerability pattern library
├── .claude-plugin/ # plugin.json + marketplace.json (install as a Claude Code plugin)
├── hooks/ # SessionStart hook that injects the dispatcher every session
├── .devcontainer/ # reproducible binary-analysis toolchain (rr/gdb/gcov/afl++) for the
│ # crash→exploitability pipeline; scoped SYS_PTRACE/SYS_PERFMON, not --privileged
├── agents/ # 8 collaborative sub-agents (incl. finding-validator, finding-checker)
├── engine/ # bounded, resumable, traceable autopilot runner
│ ├── engine.py # phase runner (budget + loop-detect + trace + resume; not an LLM)
│ ├── budget.py loop_detector.py tracer.py
│ ├── rebuttal.py # bounded generator↔checker rebuttal loop (default-to-skeptic)
│ └── model_scorecard.py # Wilson-bounded, fail-closed model-verdict trust calibration
├── tests/ # pytest suite for the safety-critical scripts (run: pytest)
├── templates/ # Structured templates per Kill Chain phase
│ ├── scope/ # scope-definition + scope.schema.json/example (machine-readable ROE)
│ ├── threat-model/ # threat model (assets/entry-points/boundaries/ATT&CK) + drift baseline
│ └── ... (recon, weaponize, delivery, exploit, install, c2, actions, report)
├── workflows/ # Kill Chain workflow definitions (YAML) + WORKFLOW-ENGINE.md
├── commands/ # /engage.* orchestration slash commands (incl. memory, pickup)
├── presets/ # Engagement type presets (7 presets)
├── .github/ # SECURITY.md (coordinated disclosure) + CI (workflows/tests.yml)
├── TERMS.md # Acceptable-use policy / authorization requirement
├── CLAUDE.md # System prompt & behavior config
├── settings.json # Claude Code settings, permissions, MCP servers
├── install.sh # One-liner install script
└── README.md
Safety, Testing & Autonomy
The framework's safety controls are executable, not prose, and covered by an automated test suite:
| Control | What it does |
|---|---|
scope_guard.py | Enforces the engagement scope (scope.json); host parsing matches HTTP clients (userinfo/IPv6/IDN safe), fails closed |
validate_findings.py | Evidence-grounding + per-class FP harness via structured proof signals; native-bug reachability bar (gcov/trace) + [EVD-XXX] citation gate (evidence_kit.py re-verifiable evidence) |
safe_subprocess.py | Hardened exec for untrusted inputs/repos: shell=False, clean env, bounded+fail-closed, UTF-8 decode, git_safe() (hooks/prompt/host-config/ext-transport disabled) |
action_guard.py | 3-state gate (allow / require_approval / block): out-of-scope → block, safe-method policy, per-host circuit breaker |
redact_headers.py | Masks Authorization/Cookie/API-key/JWT at the data boundary (fail-closed) before traffic reaches the model |
finding-checker + engine/rebuttal.py | Blind artifact-only adversarial checker driving a bounded generator↔checker rebuttal loop (default-to-skeptic; EXHAUSTED/STALLED never accept) |
engine/model_scorecard.py | Fail-closed model-verdict trust calibration (Wilson 95% upper-bound miss-rate) to short-circuit re-validation only on a proven track record |
engagement-memory/ | Persists confirmed findings as impact-ranked patterns; recalls top-N prior techniques at recon/weaponize |
engine/ | Bounded autopilot: hard step/time budget, loop detection, append-only trace, --resume; offensive actions stay operator-gated |
tests/ + CI | pytest suite (run pytest); GitHub Actions runs it + byte-compile (skills/ + engine/) + shellcheck on every push |
All safety code is adversarially red-teamed and regression-tested. See TERMS.md for the
authorization requirement — every request the toolkit sends is the operator's responsibility.
Crash → Exploitability Pipeline
For native memory-corruption work, a staged-proof pipeline turns "it crashes" into a defensible,
artefact-backed exploitability verdict — run via /engage.crash in the .devcontainer/ toolchain:
- Root cause —
rrdeterministic record/replay reverse-steps to the corrupting write (rr_root_cause.sh, emits atrace_proof). - Reachability —
gcovline-hit / function trace proves the vulnerable line actually ran; the harness will not mark a native bug[CONFIRMED]without acoverage_proof/trace_proof. - Empirical feasibility — rebuild the crash witness under permissive/distro/hardened/asan
profiles and record which still fire (
feasibility_profile.py);exploit_context.pythen forbids/exploitfrom using a technique the empirical mitigation map marks blocked. - Path feasibility — branch guards → tri-state SAT/UNSAT (
path_conditions.py, Z3 optional; a tool limit isnull/manual, never a false "infeasible").
Supporting tools: evidence_kit.py (typed, re-verifiable [EVD-XXX] evidence), variant_hunt.py
(one finding → all siblings, clustered by root cause), cve_diff.py (multi-source fix-commit discovery
→ scope-gated diff), and the incident-response repo-compromise forensics kit.
Skill-Invocation Discipline (dispatcher + process skills)
Installed as a plugin, a SessionStart hook injects the using-offensive-claude dispatcher into
every conversation: if there's even a 1% chance a skill applies, invoke it before acting. Process /
discipline skills come before domain skills (the offensive analog of brainstorming / TDD / debugging):
| Process skill | Rule | Backed by |
|---|---|---|
engagement-flow | Sequence the kill chain with quality gates | /engage.*, engine/ |
scope-discipline | No target without authorization | scope_guard.py, action_guard.py |
threat-model-discipline | Model the attack surface + detect drift before exploiting | threatmodel_lint.py, /engage.threatmodel |
finding-discipline | No [CONFIRMED] without proof | validate_findings.py, finding-validator, finding-checker |
opsec-discipline | Decide detection / cleanup / redaction before acting | redact_headers.py |
writing-offensive-skills | Conventions for authoring skills in this repo | — |
Each discipline skill carries an Iron Law + Red-Flags + Rationalizations table (resists shortcutting
under pressure). The dispatcher auto-loads; domain skills below are invoked via the Skill tool.
Skills (31 domain)
Each skill is a progressive-disclosure module: a thin SKILL.md router (when-to-activate, a technique
map of technique → ATT&CK ID → CWE → reference → script, and an OPSEC/detection summary), backed by
per-skill references/ deep-dives and runnable scripts/. Every technique pairs the offensive path with
a Sigma/EDR detection signature and OPSEC notes, and cites current (2024–2026) CVEs/techniques.
Descriptions use Use when… triggers so the dispatcher routes to the right skill.
| # | Skill | Kill Chain | Coverage |
|---|---|---|---|
| 01 | recon-osint | Recon | Subdomain enum, CVE lookup, breach intel, DNS history, Shodan/Censys |
| 02 | vulnerability-analysis | Recon, Exploit | Taint analysis, source-sink tracing, false positive discipline |
| 03 | exploit-development | Weaponize, Exploit | ROP chains, heap exploitation, shellcode, deserialization, mitigation bypass |
| 04 | reverse-engineering | Weaponize, Exploit | IDA/Ghidra, Frida, angr, firmware extraction, anti-RE bypass |
| 05 | web-pentest | Delivery, Exploit | SQLi, XSS, SSRF, race conditions, GraphQL, JWT, business logic |
| 06 | network-attack | Recon, Actions | AD exploitation, lateral movement, pivoting, wireless, protocol attacks |
| 07 | red-team-ops | Install, Actions | C2, persistence, privesc, defense evasion, LOLBins, exfiltration |
| 08 | cloud-security | Recon, Exploit | AWS/Azure/GCP privesc, container escape, Kubernetes, IaC review |
| 09 | malware-analysis | Weaponize | Static/dynamic analysis, YARA rules, unpacking, C2 protocol RE |
| 10 | ai-security | Recon, Exploit | Prompt injection, RAG poisoning, model extraction, adversarial ML |
| 11 | threat-hunting | Report | MITRE ATT&CK mapping, Sigma rules, log correlation, behavioral detection |
| 12 | privesc-linux | Exploit, Actions | SUID, capabilities, sudo, kernel exploits, Docker escape, cron abuse |
| 13 | privesc-windows | Exploit, Actions | Token abuse, service exploitation, UAC bypass, credential harvesting |
| 14 | coding-mastery | Weaponize | Python/C/Go/Rust/ASM for exploit dev, scanners, C2, crypto |
| 15 | crypto-analysis | Recon, Exploit | TLS auditing, hash cracking, RSA attacks, side-channel, implementation review |
| 16 | incident-response | Report | Memory forensics (Volatility), timeline analysis, IOC extraction, containment, repo/OSS-compromise forensics (dangling-commit recovery, GH Archive / Wayback / Events API) |
| 17 | edr-evasion | Delivery, Install | Hook unhooking, direct/indirect syscalls, AMSI/ETW bypass, sleep masking |
| 18 | initial-access | Delivery | HTML smuggling, ISO/MOTW bypass, DLL sideload, staged payloads, phishing |
| 19 | shellcode-dev | Weaponize | PEB walk, API hashing, loaders, PE-to-shellcode, cross-platform |
| 20 | windows-mitigations | Exploit | ASLR/DEP/CFG/CET/ACG bypass, WDAC/ASR bypass, PPL exploitation |
| 21 | windows-boundaries | Exploit, Install | Kernel/user boundary, sandbox escape, AppContainer, COM elevation |
| 22 | keylogger-arch | Install, Actions | SetWindowsHookEx, RawInput, direct HID, ETW capture, stealth IOCs |
| 23 | mobile-pentest | Recon, Exploit | Android/iOS, Frida, SSL pinning bypass, exported components, biometric bypass |
| 24 | advanced-redteam | C2, Actions | C2 infra (redirectors, malleable profiles), OPSEC, tiered infrastructure |
| 25 | active-directory-attack | Exploit, Actions | Kerberoasting, NTLM relay, Golden/Silver Ticket, ADCS, delegation abuse |
| 26 | cicd-supply-chain | Weaponize, Delivery | Pipeline poisoning (Actions/GitLab/Jenkins), dependency confusion, OIDC abuse, SLSA/provenance |
| 27 | ai-agent-redteam | Delivery, Exploit | Agentic AI/MCP tool abuse, indirect prompt-injection chains, RAG/memory poisoning, jailbreaks |
| 28 | container-k8s-escape | Exploit, Actions | Container breakout, runc CVEs, K8s RBAC escalation, admission/ingress attacks, node pivot |
| 29 | browser-exploitation | Weaponize, Exploit | V8/JSC JIT type confusion, heap-sandbox & renderer→browser escape, Electron/IPC RCE |
| 30 | macos-offensive | Exploit, Install | TCC/Gatekeeper bypass, keychain, LaunchAgent persistence, ESF evasion (planned) |
| 31 | engagement-memory | Recon, Weaponize, Report | Cross-engagement pattern learning — ranked recall of prior techniques (support) |
Agents (8)
| Agent | Layer | Active Phases | Role |
|---|---|---|---|
| redteam-planner | Planning | Scope, Recon, Weaponize, Actions | Attack path design, OPSEC strategy |
| exploit-researcher | Execution | Recon, Weaponize, Exploit | CVE research, exploit chain development |
| security-reviewer | Analysis | Recon, Exploit, Report | Finding validation, gate checks |
| reverse-engineer | Execution | Weaponize, Exploit, Install | Binary analysis, vulnerability discovery |
| ai-researcher | Execution | Recon, Weaponize, Exploit | AI/ML security assessment |
| network-analyst | Analysis | Recon, Delivery, C2, Actions | Protocol analysis, C2 review |
| finding-validator | Analysis | Exploit, Actions, Report | Adversarial PASS/KILL/DOWNGRADE verdict on findings |
| finding-checker | Analysis | Exploit, Actions, Report | Blind artifact-only checker driving the bounded generator↔checker rebuttal loop |
Agents collaborate through structured handoffs — planning agents feed execution agents, execution agents feed analysis agents for validation.
Vulnerability References (47 files)
Detailed patterns with vulnerable/secure code examples, organized by category:
- Taint Analysis (4): source-sink tracing, filter evaluation, threat model, false positive reduction
- Memory Safety (7): buffer overflow, integer overflow, UAF, null deref, OOB read, unsafe Rust
- Injection (11): SQL, command, XSS, SSRF, SSTI, XXE, deserialization, path traversal, file upload, prototype pollution, ReDoS
- Authentication (8): bypass, authorization flaws, session management, hardcoded creds, default creds, brute force, permissions
- Cryptography (4): weak algorithms, key management, side-channel, certificate validation
- Concurrency (3): race conditions, TOCTOU, established patterns
- Web/API (5): CORS, CSRF, open redirect, resource exhaustion, API security
- Supply Chain (3): dependency confusion, code integrity, ML model files
- Active Directory (1): delegation, GPO abuse, RODC, SCCM/WSUS, ADCS, trust attacks
MCP Servers
| Server | Purpose |
|---|---|
| mitm-search | Web search via mcp.mitm.vn |
| ida-multi-mcp | IDA Pro integration (decompile, rename, xrefs, patching) |
| jadx-mcp-server | Android APK decompilation and analysis |
How It Works
- Claude Code reads
CLAUDE.md— sets offensive security persona with Kill Chain methodology - Use
/engage.init <preset>to start a structured engagement, or use skills standalone - Each phase has templates, quality gates, skill mappings, and agent coordination
- Agents collaborate through structured handoffs — planning → execution → analysis layers
- Quality gates validate findings before phase transitions (CWE, CVSS, evidence required)
- Reports are generated from structured finding records with evidence linking
Customization
- Add skills: create
skills/<name>/SKILL.mdwith YAML frontmatter including kill_chain metadata - Add agents: create
agents/<name>.mdwith layer, phases, and collaboration metadata - Add workflows: create
workflows/<name>.ymlfollowing the workflow schema - Add presets: create
presets/<name>/preset.ymlwith phase/skill/agent selection - Add templates: create
templates/<phase>/<name>.mdwith gate and dependency metadata - Add MCP servers: edit
mcpServersinsettings.json
Requirements
- Claude Code CLI, Desktop App, or VS Code extension
- For MCP integrations: IDA Pro with ida-multi-mcp plugin, JADX with MCP server