Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

Headroom

Compress everything your AI agent reads. Same answers, fraction of the tokens.

Every tool call, DB query, file read, and RAG retrieval your agent makes is 70-95% boilerplate.
Headroom compresses it away before it hits the model.

Works with any agent — coding agents (Claude Code, Codex, Cursor, Aider), custom agents
(LangChain, LangGraph, Agno, Strands, OpenClaw), or your own Python and TypeScript code.

CI PyPI Python Downloads npm License Documentation Discord


Where Headroom Fits

Your Agent / App
  (coding agents, customer support bots, RAG pipelines,
   data analysis agents, research agents, any LLM app)
      │
      │  tool calls, logs, DB reads, RAG results, file reads, API responses
      ▼
   Headroom  ← proxy, Python/TypeScript SDK, or framework integration
      │
      ▼
 LLM Provider  (OpenAI, Anthropic, Google, Bedrock, 100+ via LiteLLM)

Headroom sits between your application and the LLM provider. It intercepts requests, compresses the context, and forwards an optimized prompt. Use it as a transparent proxy (zero code changes), a Python function (compress()), or a framework integration (LangChain, LiteLLM, Agno).

What gets compressed

Headroom optimizes any data your agent injects into a prompt:

  • Tool outputs — shell commands, API calls, search results
  • Database queries — SQL results, key-value lookups
  • RAG retrievals — document chunks, embeddings results
  • File reads — code, logs, configs, CSVs
  • API responses — JSON, XML, HTML
  • Conversation history — long agent sessions with repetitive context

Quick Start

Python:

pip install "headroom-ai[all]"

TypeScript / Node.js:

npm install headroom-ai

Any agent — one function

Python:

from headroom import compress result = compress(messages, model="claude-sonnet-4-5-20250929") response = client.messages.create(model="claude-sonnet-4-5-20250929", messages=result.messages) print(f"Saved {result.tokens_saved} tokens ({result.compression_ratio:.0%})")

TypeScript:

import { compress } from 'headroom-ai'; const result = await compress(messages, { model: 'gpt-4o' }); const response = await openai.chat.completions.create({ model: 'gpt-4o', messages: result.messages }); console.log(`Saved ${result.tokensSaved} tokens`);

Works with any LLM client — Anthropic, OpenAI, LiteLLM, Bedrock, Vercel AI SDK, or your own code.

Any agent — proxy (zero code changes)

headroom proxy --port 8787
# Point any LLM client at the proxy ANTHROPIC_BASE_URL=http://localhost:8787 your-app OPENAI_BASE_URL=http://localhost:8787/v1 your-app

Works with any language, any tool, any framework. Proxy docs

Coding agents — one command

headroom wrap claude # Starts proxy + launches Claude Code headroom wrap codex # Starts proxy + launches OpenAI Codex CLI headroom wrap aider # Starts proxy + launches Aider headroom wrap cursor # Starts proxy + prints Cursor config

Headroom starts a proxy, points your tool at it, and compresses everything automatically.

Multi-agent — SharedContext

from headroom import SharedContext ctx = SharedContext() ctx.put("research", big_agent_output) # Agent A stores (compressed) summary = ctx.get("research") # Agent B reads (~80% smaller) full = ctx.get("research", full=True) # Agent B gets original if needed

Compress what moves between agents — any framework. SharedContext Guide

MCP Tools (Claude Code, Cursor)

headroom mcp install && claude

Gives your AI tool three MCP tools: headroom_compress, headroom_retrieve, headroom_stats. MCP Guide

Drop into your existing stack

Your setupAdd HeadroomOne-liner
Any Python appcompress()result = compress(messages, model="gpt-4o")
Any TypeScript appcompress()const result = await compress(messages, { model: 'gpt-4o' })
Vercel AI SDKMiddlewarewrapLanguageModel({ model, middleware: headroomMiddleware() })
OpenAI Node SDKWrap clientconst client = withHeadroom(new OpenAI())
Anthropic TS SDKWrap clientconst client = withHeadroom(new Anthropic())
Multi-agentSharedContextctx = SharedContext(); ctx.put("key", data)
LiteLLMCallbacklitellm.callbacks = [HeadroomCallback()]
Any Python proxyASGI Middlewareapp.add_middleware(CompressionMiddleware)
Agno agentsWrap modelHeadroomAgnoModel(your_model)
LangChainWrap modelHeadroomChatModel(your_llm)
OpenClawContextEngine pluginSee OpenClaw plugin
Claude CodeWrapheadroom wrap claude
Codex / AiderWrapheadroom wrap codex or headroom wrap aider

Full Integration Guide | TypeScript SDK


Demo

Headroom Demo


Does It Actually Work?

100 production log entries. One critical error buried at position 67.

BaselineHeadroom
Input tokens10,1441,260
Correct answers4/44/4

Both responses: "payment-gateway, error PG-5523, fix: Increase max_connections to 500, 1,847 transactions affected."

87.6% fewer tokens. Same answer. Run it: python examples/needle_in_haystack_test.py

What Headroom kept

From 100 log entries, SmartCrusher kept 6: first 3 (boundary), the FATAL error at position 67 (anomaly detection), and last 2 (recency). The error was automatically preserved — not by keyword matching, but by statistical analysis of field variance.

Real Workloads

ScenarioBeforeAfterSavings
Code search (100 results)17,7651,40892%
SRE incident debugging65,6945,11892%
Codebase exploration78,50241,25447%
GitHub issue triage54,17414,76173%

Accuracy Benchmarks

Compression preserves accuracy — tested on real OSS benchmarks.

Standard Benchmarks — Baseline (direct to API) vs Headroom (through proxy):

BenchmarkCategoryNBaselineHeadroomDelta
GSM8KMath1000.8700.8700.000
TruthfulQAFactual1000.5300.560+0.030

Compression Benchmarks — Accuracy after full compression stack:

BenchmarkCategoryNAccuracyCompressionMethod
SQuAD v2QA10097%19%Before/After
BFCLTool/Function10097%32%LLM-as-Judge
Tool Outputs (built-in)Agent8100%20%Before/After
CCR Needle RetentionLossless50100%77%Exact Match

Run it yourself:

# Quick smoke test (8 cases, ~10s) python -m headroom.evals quick -n 8 --provider openai --model gpt-4o-mini # Full Tier 1 suite (~$3, ~15 min) python -m headroom.evals suite --tier 1 -o eval_results/ # CI mode (exit 1 on regression) python -m headroom.evals suite --tier 1 --ci

Full methodology: Benchmarks | Evals Framework


Key Capabilities

Lossless Compression

Headroom never throws data away. It compresses aggressively, stores the originals, and gives the LLM a tool to retrieve full details when needed. When it compresses 500 items to 20, it tells the model what was omitted ("87 passed, 2 failed, 1 error") so the model knows when to ask for more.

Smart Content Detection

Auto-detects what's in your context — JSON arrays, code, logs, plain text — and routes each to the best compressor. JSON goes to SmartCrusher, code goes through AST-aware compression (Python, JS, Go, Rust, Java, C++), text goes to Kompress (ModernBERT-based, with [ml] extra).

Cache Optimization

Stabilizes message prefixes so your provider's KV cache actually works. Claude offers a 90% read discount on cached prefixes — but almost no framework takes advantage of it. Headroom does.

Failure Learning

headroom learn # Analyze past Claude Code sessions, show recommendations headroom learn --apply # Write learnings to CLAUDE.md and MEMORY.md headroom learn --all --apply # Learn across all your projects

Reads your conversation history, finds every failed tool call, correlates it with what eventually succeeded, and writes specific corrections into your project files. Next session starts smarter. Learn docs

headroom learn demo

Image Compression

40-90% token reduction via trained ML router. Automatically selects the right resize/quality tradeoff per image.

All features
FeatureWhat it does
Content RouterAuto-detects content type, routes to optimal compressor
SmartCrusherUniversal JSON compression — arrays of dicts, strings, numbers, mixed types, nested objects
CodeCompressorAST-aware compression for Python, JS, Go, Rust, Java, C++
KompressModernBERT token compression (replaces LLMLingua-2)
CCRReversible compression — LLM retrieves originals when needed
Compression SummariesTells the LLM what was omitted ("3 errors, 12 failures")
CacheAlignerStabilizes prefixes for provider KV cache hits
IntelligentContextScore-based context management with learned importance
Image Compression40-90% token reduction via trained ML router
MemoryPersistent memory across conversations
Compression HooksCustomize compression with pre/post hooks
Read LifecycleDetects stale/superseded Read outputs, replaces with CCR markers
headroom learnAnalyzes past failures, writes project-specific learnings to CLAUDE.md/MEMORY.md
headroom wrapOne-command setup for Claude Code, Codex, Aider, Cursor
SharedContextCompressed inter-agent context sharing for multi-agent workflows
MCP Toolsheadroom_compress, headroom_retrieve, headroom_stats for Claude Code/Cursor

Headroom vs Alternatives

Context compression is a new space. Here's how the approaches differ:

ApproachScopeDeploy asFramework integrationsData stays local?Reversible
HeadroomMulti-algorithm compressionAll context (tool outputs, DB reads, RAG, files, logs, history)Proxy, Python library, ASGI middleware, or callbackLangChain, LangGraph, Agno, Strands, LiteLLM, MCPYes (OSS)Yes (CCR)
RTKCLI command rewriterShell command outputsCLI wrapperNoneYes (OSS)No
CompresrCloud compression APIText sent to their APIAPI callNoneNoNo
Token CompanyCloud compression APIText sent to their APIAPI callNoneNoNo

Use it however you want. Headroom works as a standalone proxy (headroom proxy), a one-function Python library (compress()), ASGI middleware, or a LiteLLM callback. Already using LiteLLM, LangChain, or Agno? Drop Headroom in without replacing anything.

Headroom + RTK work well together. RTK rewrites CLI commands (git showgit show --short), Headroom compresses everything else (JSON arrays, code, logs, RAG results, conversation history). Use both.

Headroom vs cloud APIs. Compresr and Token Company are hosted services — you send your context to their servers, they compress and return it. Headroom runs locally. Your data never leaves your machine. You also get lossless compression (CCR): the LLM can retrieve the full original when it needs more detail.


How It Works Inside

  Your prompt
      │
      ▼
  1. CacheAligner            Stabilize prefix for KV cache
      │
      ▼
  2. ContentRouter           Route each content type:
      │                         → SmartCrusher    (JSON)
      │                         → CodeCompressor  (code)
      │                         → Kompress        (text, with [ml])
      ▼
  3. IntelligentContext      Score-based token fitting
      │
      ▼
  LLM Provider

  Needs full details? LLM calls headroom_retrieve.
  Originals are in the Compressed Store — nothing is thrown away.

Overhead: 15-200ms compression latency (net positive for Sonnet/Opus). Full data: Latency Benchmarks


Integrations

IntegrationStatusDocs
headroom wrap claude/codex/aider/cursorStableProxy Docs
compress() — one functionStableIntegration Guide
SharedContext — multi-agentStableSharedContext Guide
LiteLLM callbackStableIntegration Guide
ASGI middlewareStableIntegration Guide
Proxy serverStableProxy Docs
AgnoStableAgno Guide
MCP (Claude Code, Cursor, etc.)StableMCP Guide
StrandsStableStrands Guide
LangChainStableLangChain Guide
OpenClawStableOpenClaw plugin

OpenClaw Plugin

The @headroom-ai/openclaw plugin integrates Headroom as a ContextEngine for OpenClaw. It compresses tool outputs, code, logs, and structured data inline — 70-90% token savings with zero LLM calls. The plugin can connect to a local or remote Headroom proxy and will auto-start one locally if needed.

Install

pip install "headroom-ai[proxy]" openclaw plugins install --dangerously-force-unsafe-install headroom-ai/openclaw

Why --dangerously-force-unsafe-install? The plugin auto-starts headroom proxy as a subprocess when no running proxy is detected. OpenClaw blocks process-launching plugins by default, so this flag is required to permit that behavior.

Once installed, assign Headroom as the context engine in your OpenClaw config:

{ "plugins": { "entries": { "headroom": { "enabled": true } }, "slots": { "contextEngine": "headroom" } } }

The plugin auto-detects and auto-starts the proxy — no manual proxy management needed. See the plugin README for full configuration options, local development setup, and launcher details.


Cloud Providers

headroom proxy --backend bedrock --region us-east-1 # AWS Bedrock headroom proxy --backend vertex_ai --region us-central1 # Google Vertex headroom proxy --backend azure # Azure OpenAI headroom proxy --backend openrouter # OpenRouter (400+ models)

Installation

pip install headroom-ai # Core library pip install "headroom-ai[all]" # Everything including evals (recommended) pip install "headroom-ai[proxy]" # Proxy server + MCP tools pip install "headroom-ai[mcp]" # MCP tools only (no proxy) pip install "headroom-ai[ml]" # ML compression (Kompress, requires torch) pip install "headroom-ai[agno]" # Agno integration pip install "headroom-ai[langchain]" # LangChain (experimental) pip install "headroom-ai[evals]" # Evaluation framework only

Python 3.10+


Documentation

Integration GuideLiteLLM, ASGI, compress(), proxy
Proxy DocsProxy server configuration
ArchitectureHow the pipeline works
CCR GuideReversible compression
BenchmarksAccuracy validation
Latency BenchmarksCompression overhead & cost-benefit analysis
LimitationsWhen compression helps, when it doesn't
Evals FrameworkProve compression preserves accuracy
MemoryPersistent memory
AgnoAgno agent framework
MCPContext engineering toolkit (compress, retrieve, stats)
SharedContextCompressed inter-agent context sharing
LearnOffline failure learning for coding agents
ConfigurationAll options

Community

Questions, feedback, or just want to follow along? Join us on Discord


Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom pip install -e ".[dev]" && pytest

License

Apache License 2.0 — see LICENSE.

关于 About

The Context Optimization Layer for LLM Applications
agentaianthropiccompressioncontext-engineeringcontext-windowfastapilangchainllmmcpopenaiproxypythonragtoken-optimization

语言 Languages

Python96.5%
TypeScript2.4%
HTML1.1%
Dockerfile0.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
373
Total Commits
峰值: 63次/周
Less
More

核心贡献者 Contributors