Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

CodeSeek

Code intelligence CLI tool for Claude Code. AST-based call graph analysis + semantic search — right from your terminal.

Quick Start

# Install via npm (handles setup wizard + binary download automatically) npm install -g codeseek # First run — interactive setup wizard configures your embedding model codeseek # Index your project codeseek init # Search code by symbol name codeseek search main --limit 10 # Query call graph codeseek callers main codeseek callees process_data # Register with Claude Code / Codex as MCP tools codeseek install # Check status codeseek status # Auto-index on git commits codeseek install-hooks

Install

npm

npm install -g codeseek

The npm package ships a lightweight JS wrapper that handles:

StepDescription
First-run wizardInteractive CLI prompts for embedding API token, model, and base URL
Binary downloadAutomatically pulls the correct Rust binary for your platform from GitHub Releases
Pass-throughAll commands (init, search, callers, etc.) are forwarded to the native binary

Supported platforms:

PlatformArchitecture
macOSarm64 (Apple Silicon), x64 (Intel)
Linuxx64

Homebrew

brew tap CodeBendKit/codeseek git@github.com:CodeBendKit/codeseek.git brew install codeseek

From source

git clone https://github.com/CodeBendKit/codeseek.git cd codeseek ./build.sh --release

build.sh compiles both the TypeScript wrapper (dist/) and the Rust binary, then installs to ~/.codeseek/bin/.

Commands

CommandDescription
codeseekFirst-time setup wizard (configures embedding model interactively)
codeseek initBuild/update code index (full on first run, MD5-incremental thereafter)
codeseek statusIndex statistics: functions, files, last update
codeseek search <query>Symbol name search (falls back from vector → graph name match)
codeseek callers <symbol>Find functions that call this symbol
codeseek callees <symbol>Find functions this symbol calls
codeseek listList all indexed projects with paths
codeseek installRegister codeseek as MCP tools in Claude Code / Codex
codeseek uninstallRemove MCP integration
codeseek uninitDelete the current project index
codeseek install-hooksInstall git hooks (post-commit/post-merge → codeseek init)
codeseek serve --mcpStart MCP server (stdio JSON-RPC, used by Claude Code internally)

All query commands support --json for machine-readable output.

Claude Code / Codex Integration

codeseek install

Writes MCP server config to:

AgentConfig file
Claude Code~/.claude.json (global, all projects) or ./.mcp.json (project-local)
Codex CLI~/.codex/config.toml

Claude Code auto-discovers these tools after restart:

ToolCapability
codeseek_searchFind symbols by name
codeseek_callersTrace upstream callers
codeseek_calleesTrace downstream callees
codeseek_statusCheck index health

Remove integration:

codeseek uninstall

How It Works

Index Building (codeseek init)

Source files
  → Tree-sitter AST parse (7 languages)
  → Extract functions / classes / methods
  → Batch embed via API (20 texts per call, SQLite cache)
  → Store vectors in LanceDB
  → Build BM25 index in Tantivy
  → Serialize call graph (PetCodeGraph)
  → Save to ~/.codeseek/<project_hash>/

Idempotent: first run is full build, subsequent runs compare MD5 hashes — only changed files are re-processed. Use codeseek install-hooks for automatic re-index on git commit/merge.

Hybrid Search Pipeline (codeseek search)

                        ┌─────────────────────┐
User query ────────────→│  Embedding Model     │──→ Query vector
                        └─────────────────────┘
                                  │
          ┌───────────────────────┼───────────────────────┐
          ▼                       ▼                       ▼
   ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
   │ Dense Search │       │ Sparse Search│       │ Graph Search │
   │ (LanceDB ANN)│       │ (Tantivy BM25)│      │ (PetCodeGraph)│
   └──────┬───────┘       └──────┬──────┘       └──────┬──────┘
          │                      │                      │
          └──────────────────────┼──────────────────────┘
                                 ▼
                        ┌─────────────────┐
                        │   RRF Fusion    │  ← Reciprocal Rank Fusion
                        │  (Top-20 candidates)│
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │    Reranker     │  ← Cross-Encoder fine re-ranking
                        │ (Qwen3-Reranker)│     scores each (query, code) pair
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌─────────────────┐
                        │   Final Results  │  ← Top-5 (or Top-N)
                        └─────────────────┘
StageTechnologyRoleSpeed
Dense SearchLanceDB + Embedding ModelSemantic vector similarityFast
Sparse SearchTantivy BM25Keyword & token matchingFast
RRF FusionReciprocal Rank FusionMerge heterogeneous scores fairlyInstant
RerankerCross-Encoder (Qwen3-Reranker-4B)Full-interaction precision scoring~1-2s
FallbackPetCodeGraphGraph-based name search (no API needed)Instant

If embedding/Reranker are unavailable, the pipeline falls back gracefully to graph-based name search.

Storage

  • Config: ~/.codeseek/config.json (global, shared across all projects)
  • Index: ~/.codeseek/<md5(project_root)>/
    • project.json — Project metadata
    • graph.bin — Serialized call graph
    • embeddings.lance/ — LanceDB vector data
    • tantivy_bm25/ — BM25 full-text index
    • file_hashes.json — MD5 incremental tracking

No daemon, no HTTP server. Every command is a standalone process.

Supported Languages

LanguageFunctionsStructs/ClassesCall Graph
Rust
Python
JavaScript
TypeScript
Go
C/C++
Java

Configuration

~/.codeseek/config.json:

{ "embedding": { "provider": "openai-compatible", "model": "Qwen/Qwen3-Embedding-4B", "api_token": "sk-...", "api_base_url": "https://api.siliconflow.cn/v1", "dimensions": 2560 }, "index": { "min_code_block_length": 16, "enable_reranker": true, "hybrid": { "enable_bm25": true, "bm25_top_k": 20, "vector_top_k": 20, "rrf_k": 60, "rrf_top_k": 20 }, "reranker": { "enabled": true, "model": "Qwen/Qwen3-Reranker-4B", "api_token": "sk-...", "api_base_url": "https://api.siliconflow.cn/v1/rerank", "top_n": 5, "candidate_multiplier": 5, "timeout_secs": 60 } }, "installed_hooks": {} }

Model Roles

ModelRoleWhen
Qwen/Qwen3-Embedding-4BConverts code → vectors for dense searchIndex building
Qwen/Qwen3-Reranker-4BScores (query, code) pairs for precisionSearch time

Set via the interactive wizard on first run, or create manually.

Development

cd rust-core # Build cargo build # Build + install to ~/.codeseek/bin/ cd .. && ./build.sh --release # Run tests cargo test # Compile TypeScript wrapper npm run build

License

MIT

Built with: Tree-sitter · Petgraph · LanceDB · Tantivy · Tokio · Clap

关于 About

Rust-powered code intelligence CLI for AI coding agents. Builds call graphs and hybrid semantic search indexes (Dense + Sparse + RRF + Reranker) across 7 languages. Ships as native MCP tools for Claude Code and Codex CLI.
bm25c-licall-graphclaude-codeclicode-analysiscode-intelligencecross-encoderembeddinghybrid-searchlancedbmcpq-wenqwenrerankerrustsemantic-searchtree-sitter

语言 Languages

Rust89.3%
Python6.7%
TypeScript1.8%
JavaScript1.7%
Shell0.4%
Ruby0.1%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
67
Total Commits
峰值: 65次/周
Less
More

核心贡献者 Contributors