Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md
  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible

CI codecov PyPI npm Model: Kompress-v2-base License: Apache 2.0 Docs

Docs · Install · Proof · Agents · Discord · llms.txt · Enterprise

AI agents / LLMs: read /llms.txt here, or fetch the live index / full docs blob.


chopratejas%2Fheadroom | Trendshift

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Headroom in action
Live: 10,144 → 1,260 tokens — same FATAL found.

What it does

  • Librarycompress(messages) in Python or TypeScript, inline in any app
  • Proxyheadroom proxy --port 8787, zero code changes, any language
  • Agent wrapheadroom wrap claude|codex|aider|copilot|opencode in one command; Cursor prints manual proxy settings to paste into the app
  • MCP serverheadroom_compress, headroom_retrieve, headroom_stats for any MCP client
  • Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
  • headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md
  • Output token reduction — trims what the model writes back (not just what you send): drops ceremony/restated code and skips deep "thinking" on routine steps. See Output token reduction.
  • Reversible (CCR) — originals are cached for retrieval on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)
  • ContentRouter — detects content type, selects the right compressor
  • SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
  • CacheAligner — stabilizes prefixes so provider KV caches actually hit
  • CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

Architecture · CCR reversible compression · Kompress-v2-base model card

Get started (60 seconds)

# 1 — Install pip install "headroom-ai[all]" # Python npm install headroom-ai # Node / TypeScript # 2 — Pick your mode headroom wrap claude # wrap a coding agent headroom proxy --port 8787 # drop-in proxy, zero code changes # or: from headroom import compress # inline library # 3 — See the savings headroom perf headroom dashboard # live savings dashboard (proxy must be running)

Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Proof

Savings on real agent workloads:

WorkloadBeforeAfterSavings
Code search (100 results)17,7651,40892%
SRE incident debugging65,6945,11892%
GitHub issue triage54,17414,76173%
Codebase exploration78,50241,25447%

Accuracy preserved on standard benchmarks:

BenchmarkCategoryNBaselineHeadroomDelta
GSM8KMath1000.8700.870±0.000
TruthfulQAFactual1000.5300.560+0.030
SQuAD v2QA10097%19% compression
BFCLTools10097%32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

Output token reduction (cut what the model writes back)

Everything above shrinks the prompt you send. But you also pay for every token the model writes back — and on Opus-class models output costs 5× input. A lot of that output is waste: "Great, let me…" preambles, re-printing code you just showed it, and deep "thinking" on routine steps like reading a file.

Headroom can trim that too, from the proxy, without you changing any code:

  • Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt (so your prompt cache still hits).
  • Effort routing — when a turn is just the model resuming after a tool result (a file read, a passing test), it dials the model's thinking effort down. New questions and errors keep full effort.

Turn it on:

export HEADROOM_OUTPUT_SHAPER=1 # off by default headroom proxy --port 8787

Already running a proxy? These switches are read live on every request, so a proxy that headroom wrap reused (rather than started) would not see a value you export afterwards — its environment was snapshotted at launch. headroom wrap now hot-syncs your current settings to the running proxy via a loopback POST /admin/runtime-env, so they take effect immediately with no restart (no cold start, no dropped requests, no lost caches). Set them before you wrap. On a shared proxy these overrides are global — the last explicit setting wins.

Learn the right terseness for you. People don't say how terse they want answers — they show it (they interrupt long replies, or move on before they could have read them). headroom learn --verbosity reads your past sessions and picks the level automatically:

headroom learn --verbosity # preview what it found (dry run) headroom learn --verbosity --apply # save it; the proxy uses it from now on

See how many output tokens you saved. Output savings are counterfactual — we never see what the model would have written — so Headroom reports an honest estimate with a confidence range, never a made-up number:

headroom output-savings # Reduction: 31.7% (95% CI 27.7% … 35.7%) [estimated]

Want a measured number instead of an estimate? Leave 10% of conversations unshaped as a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1. The dashboard shows an Output Tokens Saved card next to input compression, labelled measured or estimated with the confidence band.

→ Full write-up incl. the measurement methodology: docs/proposals/output-token-reduction.md

Star History Chart

Agent compatibility matrix

Agentheadroom wrapNotes
Claude Code--memory · --code-graph · --1m
Codexshares memory with Claude
CursorManual setupstarts proxy and prints base URLs for Cursor settings
Aiderstarts proxy + launches
Copilot CLIstarts proxy + launches
OpenClawinstalls as ContextEngine plugin
OpenCodeinjects config · starts proxy + launches
Cortex Code60–65% savings · library mode

Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install.

GitHub Copilot CLI subscription mode

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login headroom wrap copilot --subscription -- --model gpt-4o

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=... during launch.

headroom copilot-auth login stores a Headroom-specific Copilot OAuth token. This avoids relying on generic GitHub or Copilot CLI tokens that can read Copilot account metadata but may still be rejected by Copilot's token-exchange endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:

export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com

For GitHub.com Enterprise Cloud URLs such as github.com/enterprises/your-enterprise, do not set an enterprise-domain override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN or GITHUB_COPILOT_GITHUB_TOKEN rather than relying on host keychain access.

When to use · When to skip

Great fit if you…

  • run AI coding agents daily and want savings without changing your code
  • work across multiple agents and want shared memory
  • need reversible compression — originals are retrievable via CCR within the configured TTL

Skip it if you…

  • only use a single provider's native compaction and don't need cross-agent memory
  • work in a sandboxed environment where local processes can't run
Integrations — drop Headroom into any stack
Your setupHook in with
Any Python appcompress(messages, model=…)
Any TypeScript appawait compress(messages, { model })
Anthropic / OpenAI SDKwithHeadroom(new Anthropic()) · withHeadroom(new OpenAI())
Vercel AI SDKwrapLanguageModel({ model, middleware: headroomMiddleware() })
LiteLLMlitellm.callbacks = [HeadroomCallback()]
LangChainHeadroomChatModel(your_llm)
AgnoHeadroomAgnoModel(your_model)
StrandsStrands guide
ASGI appsapp.add_middleware(CompressionMiddleware)
Multi-agentSharedContext().put / .get
MCP clientsheadroom mcp install
What's inside
  • SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
  • CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
  • Kompress-base — our HuggingFace model, trained on agentic traces.
  • Image compression — 40–90% reduction via trained ML router.
  • CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
  • IntelligentContext — score-based context fitting with learned importance.
  • CCR — reversible compression; LLM retrieves originals on demand.
  • Cross-agent memory — shared store, agent provenance, auto-dedup.
  • SharedContext — compressed context passing across multi-agent workflows.
  • headroom learn — plugin-based failure mining for Claude, Codex, Gemini.
Pipeline internals

Headroom exposes one stable request lifecycle across compress(), the SDK, and the proxy:

SetupPre-StartPost-StartInput ReceivedInput CachedInput RoutedInput CompressedInput RememberedPre-SendPost-SendResponse Received

  • Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.
  • Pipeline extensions observe or customize lifecycle stages via on_pipeline_event(...).
  • Compression hooks sit alongside the canonical lifecycle as an additional extension seam.
  • Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy.

  • CLI/tool slices: headroom/providers/claude, copilot, codex, openclaw
  • Provider runtime slices: headroom/providers/claude, gemini, plus shared backend/runtime dispatch in headroom/providers/registry.py
  • Core files stay orchestration-first: wrap.py, client.py, cli/proxy.py, and proxy/server.py delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

Install

pip install "headroom-ai[all]" # Python, everything npm install headroom-ai # TypeScript / Node docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy], [mcp], [ml] (Kompress-base), [code], [memory], [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Using pipx? Choose a supported interpreter explicitly:

pipx install --python python3.13 "headroom-ai[all]"

Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

Updating

headroom update # detects pip / pipx / uv tool and upgrades in place headroom update --check # report the latest release without upgrading headroom update --pre # include pre-releases

headroom update figures out how Headroom was installed (pip/venv, pip --user, pipx, uv tool) and runs the matching upgrade across macOS, Linux, and Windows. For git checkouts, editable installs, Docker images, and externally-managed system Pythons (PEP 668) it prints the correct manual step instead of guessing.

The proxy also shows a one-line "update available" notice on startup. It checks PyPI at most once a day, in the background, and never blocks. Opt out with HEADROOM_UPDATE_CHECK=off (also skipped in --stateless mode and CI).

Corporate / SSL-inspection environments

If pip install "headroom-ai[all]" fails with CERTIFICATE_VERIFY_FAILED (unable to get local issuer certificate), your network uses SSL inspection — a MITM proxy presenting a company-issued CA. The build backend (maturin) downloads rustup over a connection your TLS stack doesn't trust. Install Rust first so the build doesn't fetch it:

# macOS / Linux curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable # Windows winget install Rustlang.Rustup && rustup default stable

Restart your shell, then pip install "headroom-ai[all]". A prebuilt wheel avoids the Rust build entirely where available: pip install --only-binary headroom-ai headroom-ai. Prebuilt wheels are published for Windows (win_amd64), Linux (x86_64 / aarch64), and macOS (Apple Silicon), so installs on those platforms never need a local Rust toolchain — the Rust-first dance above is only for the platform-independent sdist fallback (e.g. Intel macOS).

Two runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via REQUESTS_CA_BUNDLE / SSL_CERT_FILE / CURL_CA_BUNDLE:

  • cdn.pyke.io — the ONNX Runtime for the Rust core. Alternatively pre-provide it with ORT_STRATEGY=system and ORT_LIB_LOCATION=/path/to/onnxruntime.
  • huggingface.co — the kompress-base compression model. Pre-download it and run with HF_HUB_OFFLINE=1, or set HF_ENDPOINT to a trusted mirror.

Running with compression disabled (pure gateway) requires neither asset.

"Basic Constraints of CA cert not marked critical" (Python 3.13+ strict mode)

A different failure from the one above. If TLS fails with:

[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed:
Basic Constraints of CA cert not marked critical

then the corporate CA is found and trusted — adding it to a CA bundle changes nothing. Python 3.13 + OpenSSL 3.x enable VERIFY_X509_STRICT by default, which enforces RFC 5280 §4.2.1.9: a CA cert's basicConstraints must be marked critical. Inspection roots like Zscaler set CA:TRUE without the critical bit, so the chain is rejected.

Set HEADROOM_TLS_STRICT=0 to clear only the strict flag from every TLS context Headroom controls — the proxy's httpx upstream client and the urllib3/huggingface_hub path used for model downloads. Chain validation, signature, expiry, and hostname checks all stay on; this is strictly narrower than disabling verification.

HEADROOM_TLS_STRICT=0 headroom proxy --port 8787

The Rust core's ONNX download (cdn.pyke.io) uses a separate TLS stack (rustls / OS trust store), unaffected by HEADROOM_TLS_STRICT. On Windows the corporate root must be in the machine certificate store (browsers already trust it there); or pre-provision ONNX Runtime with ORT_STRATEGY=system + ORT_LIB_LOCATION=/path/to/onnxruntime to skip the download entirely.

headroom learn

headroom learn in action

headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md / GEMINI.md.

Documentation

Start hereGo deeper
QuickstartArchitecture
ProxyHow compression works
MCP toolsCCR — reversible compression
MemoryCache optimization
Failure learningBenchmarks
ConfigurationLimitations

Compared to

Headroom runs locally, covers every content type, works with every major framework, and is reversible.

ScopeDeployLocalReversible
HeadroomAll context — tools, RAG, logs, files, historyProxy · library · middleware · MCPYesYes
RTKCLI command outputsCLI wrapperYesNo
lean-ctxCLI commands, MCP tools, editor rulesCLI wrapper · MCPYesNo
Compresr, Token Co.Text sent to their APIHosted API callNoNo
OpenAI CompactionConversation historyProvider-nativeNoNo

Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting — git show --short, scoped ls, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use lean-ctx as the selected CLI context tool; set HEADROOM_CONTEXT_TOOL=lean-ctx before running headroom wrap ....

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom uv sync --extra dev && uv run pytest

Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.

Community

License

Apache 2.0 — see LICENSE.

关于 About

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
agentaianthropicclaude-codecompressioncontext-engineeringcontext-windowcursorfastapilangchainllmmcpopenaiprompt-engineeringproxypythonragtoken-optimizationtokenstypescript

语言 Languages

Python79.3%
Rust15.9%
TypeScript2.6%
HTML1.1%
Shell0.4%
PowerShell0.4%
Dockerfile0.1%
PLpgSQL0.1%
Makefile0.0%
C0.0%
JavaScript0.0%
CSS0.0%
HCL0.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
1730
Total Commits
峰值: 232次/周
Less
More

核心贡献者 Contributors