Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

Lynkr

The AI coding proxy that compresses tokens before they hit the model.

87.6% fewer tokens on JSON tool results. 53% fewer tokens on tool-heavy requests. 171ms semantic cache hits. Zero code changes.

npm version Tests License: Apache 2.0 Node.js Ask DeepWiki

87.6%
JSON Compression
53%
Tool Token Reduction
171ms
Semantic Cache Hits
13+
LLM Providers
0
Code Changes Required

Numbers from a live benchmark against LiteLLM on identical workloads. See full report →


Quick Start (2 Minutes)

1. Install Lynkr

npm install -g lynkr

2. Configure Lynkr

First run creates a .env file. Edit it with your provider settings.

Option A: Free & Local (Ollama) - Recommended for Testing

# Install Ollama first: https://ollama.com ollama pull qwen2.5-coder:latest

Create/edit .env in your project directory:

# Provider MODEL_PROVIDER=ollama FALLBACK_ENABLED=false # Ollama Configuration OLLAMA_ENDPOINT=http://localhost:11434 OLLAMA_MODEL=qwen2.5-coder:latest # Server PORT=8081 # Optional: Limits (remove for unlimited) POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100 # Disable overly strict command filtering POLICY_SAFE_COMMANDS_ENABLED=false

Option B: Cloud (OpenRouter) - Recommended for Production

# Get API key from https://openrouter.ai

Create/edit .env:

# Provider MODEL_PROVIDER=openrouter OPENROUTER_API_KEY=sk-or-v1-your-key-here FALLBACK_ENABLED=false # Server PORT=8081 # Optional: Limits (remove for unlimited) POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100 # Optional: Enable caching PROMPT_CACHE_ENABLED=true SEMANTIC_CACHE_ENABLED=true

Option C: Enterprise (AWS Bedrock)

Create/edit .env:

# Provider MODEL_PROVIDER=bedrock AWS_BEDROCK_API_KEY=your-aws-key AWS_BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0 FALLBACK_ENABLED=false # Server PORT=8081 # Optional: Limits (remove for unlimited) POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100

Option D: Enterprise (Databricks)

Create/edit .env:

# Provider MODEL_PROVIDER=databricks DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com DATABRICKS_API_KEY=your-token FALLBACK_ENABLED=false # Server PORT=8081 # Optional: Limits (remove for unlimited) POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100

Then start Lynkr:

lynkr start

3. Connect Your Tool

Claude Code

Windows (Command Prompt):

set ANTHROPIC_BASE_URL=http://localhost:8081 set ANTHROPIC_API_KEY=dummy claude "write a hello world in python"

Linux/macOS:

export ANTHROPIC_BASE_URL=http://localhost:8081 export ANTHROPIC_API_KEY=dummy claude "write a hello world in python"

Cursor IDE

  • Settings → Models → Override Base URL
  • Set to: http://localhost:8081/v1
  • API Key: any-value

Codex CLI

Edit ~/.codex/config.toml:

model_provider = "lynkr" [model_providers.lynkr] base_url = "http://localhost:8081/v1" wire_api = "responses"

Done! Your AI tool now uses your chosen provider.


Common Startup Errors

Error: unable to determine transport target for "pino-pretty"

Problem: You're running an older version (< 9.3.0).

Solution: Update to the latest version:

npm install -g lynkr@latest

If you must use an older version, set NODE_ENV=production before starting.

Warning: Missing tier configuration: TIER_SIMPLE, TIER_MEDIUM...

This is just a warning - you can ignore it. Tier routing is optional.

To remove the warning, add to .env:

TIER_SIMPLE=ollama:qwen2.5-coder:latest TIER_MEDIUM=ollama:qwen2.5-coder:latest TIER_COMPLEX=ollama:qwen2.5-coder:latest TIER_REASONING=ollama:qwen2.5-coder:latest

Warning: FALLBACK_PROVIDER='databricks' is enabled but missing credentials

Solution: Add to .env:

FALLBACK_ENABLED=false

Error: connect ECONNREFUSED ::1:11434 (Ollama)

Problem: Ollama is not running.

Solution:

ollama serve

Keep this terminal open, and start Lynkr in a new terminal.

Error: Connection refused or 404 Not Found

Problem: Lynkr is not running or wrong port.

Solution: Check Lynkr is running on the correct port:

curl http://localhost:8081/

Should return: {"service":"Lynkr","version":"9.x.x","status":"running"}


Why Lynkr?

AI coding tools lock you into one provider and send every token raw. Lynkr breaks both locks.

Claude Code / Cursor / Codex / Cline / Continue
                    ↓
                  Lynkr
          ┌─────────────────────┐
          │  Strip unused tools  │  ← 53% fewer tokens on tool calls
          │  Compress JSON blobs │  ← 87.6% on large tool results
          │  Semantic cache      │  ← 171ms hits, 0 tokens billed
          │  Route by complexity │  ← cheap model for simple, cloud for hard
          └─────────────────────┘
                    ↓
    Ollama | Bedrock | Azure | Moonshot | OpenRouter | OpenAI

What you get:

  • 53% fewer tokens on tool-heavy requests (Claude Code, Cursor sessions)
  • 87.6% compression on large JSON tool results (grep, file reads, test output)
  • Semantic cache serves repeated queries in 171ms with 0 tokens billed
  • Automatic tier routing — simple questions go to cheap models, complex ones escalate
  • ✅ Route through your company's infrastructure (Databricks, Azure, Bedrock)
  • Zero code changes — just change one environment variable

Supported Providers

ProviderTypeExample ModelsCost
OllamaLocalqwen2.5-coder, deepseek-coder, llama3Free
llama.cppLocalAny GGUF modelFree
LM StudioLocalLocal models with GUIFree
OpenRouterCloudGPT-4o, Claude 3.5, Llama 3, Gemini$
AWS BedrockCloudClaude, Llama, Mistral, Titan$$
DatabricksCloudClaude Sonnet 4.5, Opus 4.6$$$
Azure OpenAICloudGPT-4o, o1, o3$$$
Azure AnthropicCloudClaude Sonnet, Opus$$$
OpenAICloudGPT-4o, o3-mini$$$
DeepSeekCloudDeepSeek R1, Reasoner$

4 local providers for 100% offline, free usage. 10+ cloud providers for scale.


Advanced: Tier Routing (Save Even More)

Route different request types to different models automatically:

# .env file MODEL_PROVIDER=ollama FALLBACK_ENABLED=false # Use small/fast models for simple tasks TIER_SIMPLE=ollama:qwen2.5:3b # Use medium models for normal coding TIER_MEDIUM=ollama:qwen2.5:7b # Use powerful models for complex architecture TIER_COMPLEX=ollama:deepseek-r1:14b TIER_REASONING=ollama:deepseek-r1:14b # Optional: Limits (remove for unlimited) for long conversations POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100

Lynkr analyzes each request and routes it to the appropriate tier. Simple questions use fast models. Complex refactoring uses powerful models.

Result: 70-90% of requests use cheaper/faster models. Only hard problems hit expensive models.


Complete .env Examples

MVP: Minimal Working Setup (Ollama)

Copy-paste ready configuration for immediate use:

# .env - Minimal Ollama Setup # ============================================ # REQUIRED: Provider Configuration # ============================================ MODEL_PROVIDER=ollama FALLBACK_ENABLED=false # ============================================ # REQUIRED: Ollama Settings # ============================================ OLLAMA_ENDPOINT=http://localhost:11434 OLLAMA_MODEL=qwen2.5-coder:latest # ============================================ # REQUIRED: Server Configuration # ============================================ PORT=8081 HOST=0.0.0.0 # ============================================ # REQUIRED: Claude Code/Cursor Compatibility # ============================================ POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100 POLICY_SAFE_COMMANDS_ENABLED=false # ============================================ # OPTIONAL: Performance (Recommended) # ============================================ LOG_LEVEL=warn LOAD_SHEDDING_ENABLED=true LOAD_SHEDDING_HEAP_THRESHOLD=0.85

Steps:

  1. Install Ollama: curl -fsSL https://ollama.com/install.sh | sh
  2. Pull model: ollama pull qwen2.5-coder:latest
  3. Copy above to .env in your project directory
  4. Run: lynkr start

Production: Cloud with Tier Routing (OpenRouter)

Optimized for cost savings with smart routing:

# .env - Production OpenRouter Setup # ============================================ # REQUIRED: Provider Configuration # ============================================ MODEL_PROVIDER=openrouter OPENROUTER_API_KEY=sk-or-v1-your-key-here FALLBACK_ENABLED=false # ============================================ # REQUIRED: Server Configuration # ============================================ PORT=8081 HOST=0.0.0.0 # ============================================ # TIER ROUTING: Smart Cost Optimization # ============================================ # Simple queries → Cheap/fast model TIER_SIMPLE=openrouter:google/gemini-flash-1.5 # Normal coding → Balanced model TIER_MEDIUM=openrouter:anthropic/claude-3.5-sonnet # Complex refactoring → Powerful model TIER_COMPLEX=openrouter:anthropic/claude-opus-4 # Deep reasoning → Most capable model TIER_REASONING=openrouter:anthropic/claude-opus-4 # ============================================ # REQUIRED: Claude Code/Cursor Compatibility # ============================================ POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100 POLICY_SAFE_COMMANDS_ENABLED=false # ============================================ # OPTIONAL: Token Optimization (60-80% savings) # ============================================ PROMPT_CACHE_ENABLED=true SEMANTIC_CACHE_ENABLED=true SEMANTIC_CACHE_THRESHOLD=0.95 TOOL_INJECTION_ENABLED=false # ============================================ # OPTIONAL: Performance Tuning # ============================================ LOG_LEVEL=warn LOAD_SHEDDING_ENABLED=true LOAD_SHEDDING_HEAP_THRESHOLD=0.85

Expected savings: 70-90% of requests use Gemini Flash ($). Only 10-30% use Claude Opus ($$$).


Enterprise: Databricks Foundation Models

For teams using Databricks Model Serving:

# .env - Enterprise Databricks Setup # ============================================ # REQUIRED: Provider Configuration # ============================================ MODEL_PROVIDER=databricks DATABRICKS_API_BASE=https://your-workspace.cloud.databricks.com DATABRICKS_API_KEY=dapi1234567890abcdef FALLBACK_ENABLED=false # ============================================ # REQUIRED: Model Configuration # ============================================ # Option 1: Single model (no tier routing) DATABRICKS_MODEL=databricks-meta-llama-3-1-405b-instruct # Option 2: Tier routing (comment out above, uncomment below) # TIER_SIMPLE=databricks:databricks-meta-llama-3-1-70b-instruct # TIER_MEDIUM=databricks:databricks-claude-sonnet-4-5 # TIER_COMPLEX=databricks:databricks-claude-opus-4-6 # TIER_REASONING=databricks:databricks-claude-opus-4-6 # ============================================ # REQUIRED: Server Configuration # ============================================ PORT=8081 HOST=0.0.0.0 # ============================================ # REQUIRED: Claude Code/Cursor Compatibility # ============================================ POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100 POLICY_SAFE_COMMANDS_ENABLED=false # ============================================ # OPTIONAL: Enterprise Features # ============================================ LOG_LEVEL=info LOAD_SHEDDING_ENABLED=true LOAD_SHEDDING_HEAP_THRESHOLD=0.85 # Optional: Metrics for monitoring # PROMETHEUS_METRICS_ENABLED=true

Hybrid: Local + Cloud Fallback

Use free Ollama, fallback to cloud when needed:

# .env - Hybrid Setup (Advanced) # ============================================ # PRIMARY: Local Ollama # ============================================ MODEL_PROVIDER=ollama OLLAMA_ENDPOINT=http://localhost:11434 OLLAMA_MODEL=qwen2.5-coder:latest # ============================================ # FALLBACK: Cloud Provider # ============================================ FALLBACK_ENABLED=true FALLBACK_PROVIDER=openrouter OPENROUTER_API_KEY=sk-or-v1-your-key-here # ============================================ # TIER ROUTING: Mix Local + Cloud # ============================================ TIER_SIMPLE=ollama:qwen2.5:3b TIER_MEDIUM=ollama:qwen2.5:7b TIER_COMPLEX=openrouter:anthropic/claude-3.5-sonnet TIER_REASONING=openrouter:anthropic/claude-opus-4 # ============================================ # REQUIRED: Server Configuration # ============================================ PORT=8081 HOST=0.0.0.0 # ============================================ # REQUIRED: Claude Code/Cursor Compatibility # ============================================ POLICY_MAX_STEPS=50 POLICY_MAX_TOOL_CALLS=100 POLICY_SAFE_COMMANDS_ENABLED=false # ============================================ # OPTIONAL: Performance # ============================================ LOG_LEVEL=warn LOAD_SHEDDING_ENABLED=true

Best of both worlds: 80% of requests stay local (free). Complex tasks use cloud (paid).


Common Issues & Fixes

IssueSolution
"Service temporarily overloaded"Ollama model too large for RAM. Use smaller model or increase --max-old-space-size
"Route not found: HEAD /"Ignore - harmless health check from Claude Code
"Hallucinated tool calls"Normal - Lynkr automatically filters invalid tools
"Safe Command DSL blocked"Add POLICY_SAFE_COMMANDS_ENABLED=false to .env
"spawn graphify ENOENT"Optional feature. Set CODE_GRAPH_ENABLED=false in .env (see Advanced Features section for installation)
Slow first request (20+ sec)Ollama loading model into memory. Add OLLAMA_KEEP_ALIVE=30m in Ollama config
No response after N turnsRemove POLICY_MAX_STEPS and POLICY_MAX_TOOL_CALLS from .env (unlimited by default in v9.3.0+)

Advanced Features

Token Optimization (60-80% savings)

# Enable all optimizations PROMPT_CACHE_ENABLED=true SEMANTIC_CACHE_ENABLED=true TOOL_INJECTION_ENABLED=false CODE_MODE_ENABLED=true

Memory System (Titans-inspired)

MEMORY_ENABLED=true MEMORY_TTL=3600000 # 1 hour

Load Shedding & Resilience

LOAD_SHEDDING_ENABLED=true LOAD_SHEDDING_HEAP_THRESHOLD=0.85

Admin Hot-Reload (no restart needed)

curl -X POST http://localhost:8081/v1/admin/reload

Code Intelligence (Optional - Graphify)

Graphify provides AST-based code analysis for smarter routing decisions.

Installation (Rust required):

# Install Rust if not already installed curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env # Build and install graphify git clone https://github.com/safishamsi/graphify cd graphify cargo build --release sudo cp target/release/graphify /usr/local/bin/ # Verify installation graphify --version

Enable in .env:

CODE_GRAPH_ENABLED=true CODE_GRAPH_WORKSPACE=/path/to/your/project # Optional, defaults to cwd

Features:

  • AST-based complexity scoring
  • Structural code analysis (19 languages supported)
  • Enhanced routing decisions based on code structure

Note: Graphify is completely optional. If not installed, Lynkr falls back to simpler complexity analysis.


Installation Methods

NPM (recommended)

npm install -g lynkr

One-line installer

curl -fsSL https://raw.githubusercontent.com/Fast-Editor/Lynkr/main/install.sh | bash

Homebrew

brew tap fast-editor/lynkr brew install lynkr

Docker

git clone https://github.com/Fast-Editor/Lynkr.git cd Lynkr docker-compose up -d

From source

git clone https://github.com/Fast-Editor/Lynkr.git cd Lynkr npm install cp .env.example .env npm start

Documentation

GuideDescription
InstallationAll installation methods
Provider SetupConfiguration for all 12+ providers
Claude CodeClaude Code CLI integration
Cursor IDECursor setup + troubleshooting
Codex CLICodex configuration
Tier RoutingSmart model routing by complexity
Token Optimization60-80% cost reduction
TroubleshootingCommon issues and solutions
API ReferenceREST API endpoints
ProductionEnterprise deployment

Benchmark Results

Measured on real agentic coding workloads (Claude Code / Cursor sessions) with Ollama, Moonshot, and Azure OpenAI backends. Run with node benchmark-tier-routing.js.

Token compression

ScenarioTokens without LynkrTokens with LynkrReduction
14-tool request (read task)1,04254747%
14-tool request (write task)1,04341260%
Large JSON grep result (60 items)3,45842787.6%

Lynkr strips irrelevant tool schemas before forwarding (smart tool selection) and binary-compresses large JSON tool results (TOON) — both happen in-process with no added latency.

Semantic cache

Tokens billedResponse time
First call (cold)2,8571,891ms
Second call — paraphrased, cache hit0171ms

Near-identical prompts return cached responses in 171ms. Zero tokens billed on a cache hit.

Tier routing

RequestRouted to
"What does git stash do?"SIMPLE → local model (free)
JWT vs cookies security analysisCOMPLEX → cloud model (correct)

Lynkr scores each request on 15 dimensions (token count, code complexity, reasoning markers, risk signals, agentic patterns) and routes automatically. No caller changes needed.

Full benchmark report with methodology


Cost Comparison

ScenarioDirect AnthropicLynkr + OllamaLynkr + OpenRouter
Daily coding (8h)$10-30/day$0 (free)$2-8/day
Monthly (heavy use)$300-900$0$60-240

With tier routing + token optimization: additional 50-87% savings on cloud providers depending on workload.


Why Lynkr vs Alternatives

FeatureLynkrLiteLLMOpenRouterPortKey
Setupnpm install -g lynkrPython + Docker + PostgresAccount signupDocker stack
Claude Code native✅ Drop-in⚠️ Requires config⚠️ Partial
Cursor native✅ Drop-in⚠️ Partial⚠️ Partial
Local modelsOllama, llama.cpp, LM StudioOllama only
Automatic tier routing✅ 15-dimension scorer⚠️ Cost-only❌ Manual metadata
TOON JSON compression✅ up to 87.6%
Smart tool selection✅ up to 60% token reduction
Semantic cache✅ 171ms hits, 0 tokens✅ Prompt cache only
Long-term memory✅ SQLite, per-session
MCP integration✅ + Code Mode (96% reduction)
Self-hosted✅ Node.js only✅ Python stack❌ SaaS✅ Docker
DependenciesNode.js 20+Python, Prisma, PostgreSQLNoneDocker, Python

Lynkr's edge: Purpose-built for AI coding tools. Compresses tokens before they reach the model — not just after. Zero-config for Claude Code, Cursor, and Codex. Installs in one command.


Community


License

Apache 2.0 — See LICENSE.


Built by Vishal Veera Reddy for developers who want control over their AI tools.

关于 About

Streamline your workflow with Lynkr, a CLI tool that acts as an HTTP proxy for efficient code interactions using Claude Code CLI.
agentsaiclaudeclaudecodecode-assistantcode-assistantscode-generationdatabricksdeveloper-toolsllmllmopsllmsmcp

语言 Languages

JavaScript67.1%
Rust19.1%
Makefile11.5%
HTML1.0%
Python0.6%
Shell0.5%
DTrace0.2%
Dockerfile0.1%
C0.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
173
Total Commits
峰值: 29次/周
Less
More

核心贡献者 Contributors