Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

graphify

English | 简体中文

CI PyPI Sponsor

An AI coding assistant skill. Type /graphify in Claude Code, Codex, OpenCode, OpenClaw, or Factory Droid - it reads your files, builds a knowledge graph, and gives you back structure you didn't know was there. Understand a codebase faster. Find the "why" behind architectural decisions.

Fully multimodal. Drop in code, PDFs, markdown, screenshots, diagrams, whiteboard photos, even images in other languages - graphify uses Claude vision to extract concepts and relationships from all of it and connects them into one graph. 19 languages supported via tree-sitter AST (Python, JS, TS, Go, Rust, Java, C, C++, Ruby, C#, Kotlin, Scala, PHP, Swift, Lua, Zig, PowerShell, Elixir, Objective-C).

Andrej Karpathy keeps a /raw folder where he drops papers, tweets, screenshots, and notes. graphify is the answer to that problem - 71.5x fewer tokens per query vs reading the raw files, persistent across sessions, honest about what it found vs guessed.

/graphify .                        # works on any folder - your codebase, notes, papers, anything
graphify-out/
├── graph.html       interactive graph - click nodes, search, filter by community
├── GRAPH_REPORT.md  god nodes, surprising connections, suggested questions
├── graph.json       persistent graph - query weeks later without re-reading
└── cache/           SHA256 cache - re-runs only process changed files

Add a .graphifyignore file to exclude folders you don't want in the graph:

# .graphifyignore
vendor/
node_modules/
dist/
*.generated.py

Same syntax as .gitignore. Patterns match against file paths relative to the folder you run graphify on.

How it works

graphify runs in two passes. First, a deterministic AST pass extracts structure from code files (classes, functions, imports, call graphs, docstrings, rationale comments) with no LLM needed. Second, Claude subagents run in parallel over docs, papers, and images to extract concepts, relationships, and design rationale. The results are merged into a NetworkX graph, clustered with Leiden community detection, and exported as interactive HTML, queryable JSON, and a plain-language audit report.

Clustering is graph-topology-based — no embeddings. Leiden finds communities by edge density. The semantic similarity edges that Claude extracts (semantically_similar_to, marked INFERRED) are already in the graph, so they influence community detection directly. The graph structure is the similarity signal — no separate embedding step or vector database needed.

Every relationship is tagged EXTRACTED (found directly in source), INFERRED (reasonable inference, with a confidence score), or AMBIGUOUS (flagged for review). You always know what was found vs guessed.

Install

Requires: Python 3.10+ and one of: Claude Code, Codex, OpenCode, OpenClaw, or Factory Droid

pip install graphifyy && graphify install

The PyPI package is temporarily named graphifyy while the graphify name is being reclaimed. The CLI and skill command are still graphify.

Platform support

PlatformInstall command
Claude Code (Linux/Mac)graphify install
Claude Code (Windows)graphify install (auto-detected) or graphify install --platform windows
Codexgraphify install --platform codex
OpenCodegraphify install --platform opencode
OpenClawgraphify install --platform claw
Factory Droidgraphify install --platform droid

Codex users also need multi_agent = true under [features] in ~/.codex/config.toml for parallel extraction. Factory Droid uses the Task tool for parallel subagent dispatch. OpenClaw uses sequential extraction (parallel agent support is still early on that platform).

Then open your AI coding assistant and type:

/graphify .

Note: Codex uses $ instead of / for skill calling, so type $graphify . instead.

Make your assistant always use the graph (recommended)

After building a graph, run this once in your project:

PlatformCommand
Claude Codegraphify claude install
Codexgraphify codex install
OpenCodegraphify opencode install
OpenClawgraphify claw install
Factory Droidgraphify droid install

Claude Code does two things: writes a CLAUDE.md section telling Claude to read graphify-out/GRAPH_REPORT.md before answering architecture questions, and installs a PreToolUse hook (settings.json) that fires before every Glob and Grep call. If a knowledge graph exists, Claude sees: "graphify: Knowledge graph exists. Read GRAPH_REPORT.md for god nodes and community structure before searching raw files." — so Claude navigates via the graph instead of grepping through every file.

Codex, OpenCode, OpenClaw, Factory Droid write the same rules to AGENTS.md in your project root. These platforms don't support PreToolUse hooks, so AGENTS.md is the always-on mechanism.

Uninstall with the matching uninstall command (e.g. graphify claude uninstall).

Always-on vs explicit trigger — what's the difference?

The always-on hook surfaces GRAPH_REPORT.md — a one-page summary of god nodes, communities, and surprising connections. Your assistant reads this before searching files, so it navigates by structure instead of keyword matching. That covers most everyday questions.

/graphify query, /graphify path, and /graphify explain go deeper: they traverse the raw graph.json hop by hop, trace exact paths between nodes, and surface edge-level detail (relation type, confidence score, source location). Use them when you want a specific question answered from the graph rather than a general orientation.

Think of it this way: the always-on hook gives your assistant a map. The /graphify commands let it navigate the map precisely.

Manual install (curl)
mkdir -p ~/.claude/skills/graphify curl -fsSL https://raw.githubusercontent.com/safishamsi/graphify/v3/graphify/skill.md \ > ~/.claude/skills/graphify/SKILL.md

Add to ~/.claude/CLAUDE.md:

- **graphify** (`~/.claude/skills/graphify/SKILL.md`) - any input to knowledge graph. Trigger: `/graphify`
When the user types `/graphify`, invoke the Skill tool with `skill: "graphify"` before doing anything else.

Usage

/graphify                          # run on current directory
/graphify ./raw                    # run on a specific folder
/graphify ./raw --mode deep        # more aggressive INFERRED edge extraction
/graphify ./raw --update           # re-extract only changed files, merge into existing graph
/graphify ./raw --cluster-only     # rerun clustering on existing graph, no re-extraction
/graphify ./raw --no-viz           # skip HTML, just produce report + JSON
/graphify ./raw --obsidian                          # also generate Obsidian vault (opt-in)
/graphify ./raw --obsidian --obsidian-dir ~/vaults/myproject  # write vault to a specific directory

/graphify add https://arxiv.org/abs/1706.03762        # fetch a paper, save, update graph
/graphify add https://x.com/karpathy/status/...       # fetch a tweet
/graphify add https://... --author "Name"             # tag the original author
/graphify add https://... --contributor "Name"        # tag who added it to the corpus

/graphify query "what connects attention to the optimizer?"
/graphify query "what connects attention to the optimizer?" --dfs   # trace a specific path
/graphify query "what connects attention to the optimizer?" --budget 1500  # cap at N tokens
/graphify path "DigestAuth" "Response"
/graphify explain "SwinTransformer"

/graphify ./raw --watch            # auto-sync graph as files change (code: instant, docs: notifies you)
/graphify ./raw --wiki             # build agent-crawlable wiki (index.md + article per community)
/graphify ./raw --svg              # export graph.svg
/graphify ./raw --graphml          # export graph.graphml (Gephi, yEd)
/graphify ./raw --neo4j            # generate cypher.txt for Neo4j
/graphify ./raw --neo4j-push bolt://localhost:7687    # push directly to a running Neo4j instance
/graphify ./raw --mcp              # start MCP stdio server

# git hooks - platform-agnostic, rebuild graph on commit and branch switch
graphify hook install
graphify hook uninstall
graphify hook status

# always-on assistant instructions - platform-specific
graphify claude install            # CLAUDE.md + PreToolUse hook (Claude Code)
graphify claude uninstall
graphify codex install             # AGENTS.md (Codex)
graphify opencode install          # AGENTS.md (OpenCode)
graphify claw install              # AGENTS.md (OpenClaw)
graphify droid install             # AGENTS.md (Factory Droid)

# query the graph directly from the terminal (no AI assistant needed)
graphify query "what connects attention to the optimizer?"
graphify query "show the auth flow" --dfs
graphify query "what is CfgNode?" --budget 500
graphify query "..." --graph path/to/graph.json

Works with any mix of file types:

TypeExtensionsExtraction
Code.py .ts .js .go .rs .java .c .cpp .rb .cs .kt .scala .php .swift .lua .zig .ps1 .ex .exs .m .mmAST via tree-sitter + call-graph + docstring/comment rationale
Docs.md .txt .rstConcepts + relationships + design rationale via Claude
Office.docx .xlsxConverted to markdown then extracted via Claude (requires pip install graphifyy[office])
Papers.pdfCitation mining + concept extraction
Images.png .jpg .webp .gifClaude vision - screenshots, diagrams, any language

What you get

God nodes - highest-degree concepts (what everything connects through)

Surprising connections - ranked by composite score. Code-paper edges rank higher than code-code. Each result includes a plain-English why.

Suggested questions - 4-5 questions the graph is uniquely positioned to answer

The "why" - docstrings, inline comments (# NOTE:, # IMPORTANT:, # HACK:, # WHY:), and design rationale from docs are extracted as rationale_for nodes. Not just what the code does - why it was written that way.

Confidence scores - every INFERRED edge has a confidence_score (0.0-1.0). You know not just what was guessed but how confident the model was. EXTRACTED edges are always 1.0.

Semantic similarity edges - cross-file conceptual links with no structural connection. Two functions solving the same problem without calling each other, a class in code and a concept in a paper describing the same algorithm.

Hyperedges - group relationships connecting 3+ nodes that pairwise edges can't express. All classes implementing a shared protocol, all functions in an auth flow, all concepts from a paper section forming one idea.

Token benchmark - printed automatically after every run. On a mixed corpus (Karpathy repos + papers + images): 71.5x fewer tokens per query vs reading raw files. The first run extracts and builds the graph (this costs tokens). Every subsequent query reads the compact graph instead of raw files — that's where the savings compound. The SHA256 cache means re-runs only re-process changed files.

Auto-sync (--watch) - run in a background terminal and the graph updates itself as your codebase changes. Code file saves trigger an instant rebuild (AST only, no LLM). Doc/image changes notify you to run --update for the LLM re-pass.

Git hooks (graphify hook install) - installs post-commit and post-checkout hooks. Graph rebuilds automatically after every commit and every branch switch. If a rebuild fails, the hook exits with a non-zero code so git surfaces the error instead of silently continuing. No background process needed.

Wiki (--wiki) - Wikipedia-style markdown articles per community and god node, with an index.md entry point. Point any agent at index.md and it can navigate the knowledge base by reading files instead of parsing JSON.

Worked examples

CorpusFilesReductionOutput
Karpathy repos + 5 papers + 4 images5271.5xworked/karpathy-repos/
graphify source + Transformer paper45.4xworked/mixed-corpus/
httpx (synthetic Python library)6~1xworked/httpx/

Token reduction scales with corpus size. 6 files fits in a context window anyway, so graph value there is structural clarity, not compression. At 52 files (code + papers + images) you get 71x+. Each worked/ folder has the raw input files and the actual output (GRAPH_REPORT.md, graph.json) so you can run it yourself and verify the numbers.

Privacy

graphify sends file contents to your AI coding assistant's underlying model API for semantic extraction of docs, papers, and images — Anthropic (Claude Code), OpenAI (Codex), or whichever provider your platform uses. Code files are processed locally via tree-sitter AST — no file contents leave your machine for code. No telemetry, usage tracking, or analytics of any kind. The only network calls are to your platform's model API during extraction, using your own API key.

Tech stack

NetworkX + Leiden (graspologic) + tree-sitter + vis.js. Semantic extraction via Claude (Claude Code), GPT-4 (Codex), or whichever model your platform runs. No Neo4j required, no server, runs entirely locally.

Star history

Star History Chart

Contributing

Worked examples are the most trust-building contribution. Run /graphify on a real corpus, save output to worked/{slug}/, write an honest review.md evaluating what the graph got right and wrong, submit a PR.

Extraction bugs - open an issue with the input file, the cache entry (graphify-out/cache/), and what was missed or invented.

See ARCHITECTURE.md for module responsibilities and how to add a language.

关于 About

AI coding assistant skill (Claude Code, Codex, OpenCode, OpenClaw). Turn any folder of code, docs, papers, or images into a queryable knowledge graph
claude-codecodexgraphragknowledge-graphopenclawskills

语言 Languages

Python100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
52
Total Commits
峰值: 52次/周
Less
More

核心贡献者 Contributors