# GBrain — Full Context > GBrain is a personal knowledge brain and GStack mod for agent platforms. Pluggable engines (PGLite default, Postgres+pgvector for scale), contract-first operations, 26 fat-markdown skills. Teaches agents brain ops, ingestion, enrichment, scheduling, identity, and access control. This file concatenates core GBrain documentation for single-fetch ingestion. For the link-only index, see `llms.txt`. Source of truth: https://github.com/garrytan/gbrain. # Core entry points ## AGENTS.md Source: https://raw.githubusercontent.com/garrytan/gbrain/master/AGENTS.md # Agents working on GBrain This is your install + operating protocol. Claude Code reads `./CLAUDE.md` automatically. Everyone else (Codex, Cursor, OpenClaw, Aider, Continue, or an LLM fetching via URL): start here. ## Install (5 min) 1. Clone: `git clone https://github.com/garrytan/gbrain ~/gbrain && cd ~/gbrain` 2. Install: `bun install` 3. Init the brain: `gbrain init` (defaults to PGLite, zero-config). For 1000+ files or multi-machine sync, init suggests Postgres + pgvector via Supabase. 4. Read [`./INSTALL_FOR_AGENTS.md`](./INSTALL_FOR_AGENTS.md) for the full 9-step flow (API keys, identity, cron, verification). ## Read this order 1. `./AGENTS.md` (this file) — install + operating protocol. 2. [`./CLAUDE.md`](./CLAUDE.md) — architecture reference, key files, trust boundaries, test layout. 3. [`./skills/RESOLVER.md`](./skills/RESOLVER.md) — skill dispatcher. Read before any task. ## Trust boundary (critical) GBrain distinguishes **trusted local CLI callers** (`OperationContext.remote = false`, set by `src/cli.ts`) from **untrusted agent-facing callers** (`remote = true`, set by `src/mcp/server.ts`). Security-sensitive operations like `file_upload` tighten filesystem confinement when `remote = true` and default to strict behavior when unset. If you are writing or reviewing an operation, consult `src/core/operations.ts` for the contract. ## Common tasks - **Configure:** [`docs/ENGINES.md`](./docs/ENGINES.md), [`docs/guides/live-sync.md`](./docs/guides/live-sync.md), [`docs/mcp/DEPLOY.md`](./docs/mcp/DEPLOY.md). - **Debug:** [`docs/GBRAIN_VERIFY.md`](./docs/GBRAIN_VERIFY.md), [`docs/guides/minions-fix.md`](./docs/guides/minions-fix.md), `gbrain doctor --fix`. - **Migrate:** [`docs/UPGRADING_DOWNSTREAM_AGENTS.md`](./docs/UPGRADING_DOWNSTREAM_AGENTS.md), [`skills/migrations/`](./skills/migrations/), `gbrain apply-migrations`. - **Everything else:** [`./llms.txt`](./llms.txt) is the full documentation map. [`./llms-full.txt`](./llms-full.txt) is the same map with core docs inlined for single-fetch ingestion. ## Before shipping Run `bun test` plus the E2E lifecycle described in `./CLAUDE.md` (spin up the test Postgres container, run `bun run test:e2e`, tear it down). Ship via the `/ship` skill, not by hand. ## Privacy Never commit real names of people, companies, or funds into public artifacts. See the Privacy rule in `./CLAUDE.md`. GBrain pages reference real contacts; public docs must use generic placeholders (`alice-example`, `acme-example`, `fund-a`). ## Forks If you are a fork, regenerate `llms.txt` + `llms-full.txt` with your own URL base before publishing: `LLMS_REPO_BASE=https://raw.githubusercontent.com/your-org/your-fork/main bun run build:llms`. --- ## CLAUDE.md Source: https://raw.githubusercontent.com/garrytan/gbrain/master/CLAUDE.md # CLAUDE.md GBrain is a personal knowledge brain and GStack mod for agent platforms. Pluggable engines: PGLite (embedded Postgres via WASM, zero-config default) or Postgres + pgvector + hybrid search in a managed Supabase instance. `gbrain init` defaults to PGLite; suggests Supabase for 1000+ files. GStack teaches agents how to code. GBrain teaches agents everything else: brain ops, signal detection, content ingestion, enrichment, cron scheduling, reports, identity, and access control. ## Architecture Contract-first: `src/core/operations.ts` defines ~41 shared operations (adds `find_orphans` in v0.12.3). CLI and MCP server are both generated from this single source. Engine factory (`src/core/engine-factory.ts`) dynamically imports the configured engine (`'pglite'` or `'postgres'`). Skills are fat markdown files (tool-agnostic, work with both CLI and plugin contexts). **Trust boundary:** `OperationContext.remote` distinguishes trusted local CLI callers (`remote: false` set by `src/cli.ts`) from untrusted agent-facing callers (`remote: true` set by `src/mcp/server.ts`). Security-sensitive operations like `file_upload` tighten filesystem confinement when `remote=true` and default to strict behavior when unset. ## Key files - `src/core/operations.ts` — Contract-first operation definitions (the foundation). Also exports upload validators: `validateUploadPath`, `validatePageSlug`, `validateFilename`. `OperationContext.remote` flags untrusted callers. - `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). As of v0.13.1, `BrainEngine` has a `readonly kind: 'postgres' | 'pglite'` discriminator so migrations (`src/core/migrate.ts`) and other consumers can branch on engine without `instanceof` + dynamic imports. - `src/core/engine-factory.ts` — Engine factory with dynamic imports (`'pglite'` | `'postgres'`) - `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 40 BrainEngine methods. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. As of v0.13.1, `connect()` wraps `PGlite.create()` in a try/catch that emits an actionable error naming the macOS 26.3 WASM bug (#223) and pointing at `gbrain doctor`; the lock is released on failure so the next process can retry cleanly. v0.22.0: `searchKeyword` and `searchKeywordChunks` multiply `ts_rank` by the source-factor CASE expression at the chunk-grain level; `searchVector` becomes a two-stage CTE — inner CTE keeps `ORDER BY cc.embedding <=> vec` so HNSW stays usable, outer SELECT re-ranks by `raw_score * source_factor`. Inner LIMIT scales with offset to preserve pagination contract. - `src/core/pglite-schema.ts` — PGLite-specific DDL (pgvector, pg_trgm, triggers) - `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted). `addLinksBatch` / `addTimelineEntriesBatch` use `INSERT ... SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` — 4-5 array params regardless of batch size, sidesteps the 65535-parameter cap. As of v0.12.3, `searchKeyword` / `searchVector` scope `statement_timeout` via `sql.begin` + `SET LOCAL` so the GUC dies with the transaction instead of leaking across the pooled postgres.js connection (contributed by @garagon). `getEmbeddingsByChunkIds` uses `tryParseEmbedding` so one corrupt row skips+warns instead of killing the query. v0.22.0: `searchKeyword`, `searchKeywordChunks`, and `searchVector` apply source-aware ranking by inlining the source-factor CASE and `NOT (col LIKE …)` hard-exclude clause from `src/core/search/sql-ranking.ts`. `searchVector` switches to a two-stage CTE (HNSW-safe inner ORDER BY, source-boost re-rank in the outer SELECT) and carries `p.source_id` through inner→outer for v0.18 multi-source callers. - `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts. Exports `parseEmbedding(value)` (throws on unknown input, used by migration + ingest paths where data integrity matters) and as of v0.12.3 `tryParseEmbedding(value)` (returns `null` + warns once per process, used by search/rescore paths where availability matters more than strictness). - `src/core/db.ts` — Connection management, schema initialization - `src/commands/migrate-engine.ts` — Bidirectional engine migration (`gbrain migrate --to supabase/pglite`) - `src/core/import-file.ts` — importFromFile + importFromContent (chunk + embed + tags) - `src/core/sync.ts` — Pure sync functions (manifest parsing, filtering, slug conversion) - `src/core/storage.ts` — Pluggable storage interface (S3, Supabase Storage, local) - `src/core/supabase-admin.ts` — Supabase admin API (project discovery, pgvector check) - `src/core/file-resolver.ts` — File resolution with fallback chain (local -> .redirect.yaml -> .redirect -> .supabase) - `src/core/chunkers/` — 3-tier chunking (recursive, semantic, LLM-guided). v0.19.0 adds `code.ts` — tree-sitter-based semantic chunker for 29 languages with embedded-asset WASMs (`src/assets/wasm/`), `@dqbd/tiktoken` cl100k_base tokenizer, small-sibling merging. `CHUNKER_VERSION` constant folded into `importCodeFile`'s `content_hash` so chunker shape changes force clean re-chunks across releases. - `src/core/errors.ts` (v0.19.0) — `StructuredAgentError` + `buildError` + `serializeError`. Every new v0.19.0 agent-facing surface (code-def, code-refs, usage errors) uses this envelope; matches v0.17.0 `CycleReport.PhaseResult.error` shape. - `src/assets/wasm/` (v0.19.0) — 36 tree-sitter grammar WASMs + tree-sitter runtime. Committed to the repo so `bun --compile` embeds them deterministically via `import path from ... with { type: 'file' }`. The CI guard `scripts/check-wasm-embedded.sh` fails the build if the compiled binary ever silently falls through to recursive chunks. - `src/commands/code-def.ts` + `src/commands/code-refs.ts` (v0.19.0) — symbol definition + references lookup. Query `content_chunks.symbol_name` or chunk_text ILIKE with `page_kind='code'` filter. Auto-JSON when stdout is not a TTY (gh-CLI convention). Bypass the standard `searchKeyword` `DISTINCT ON (slug)` collapse so multiple call-sites from the same file surface. - `src/core/search/` — Hybrid search: vector + keyword + RRF + multi-query expansion + dedup. As of v0.22.0, `searchKeyword` / `searchKeywordChunks` / `searchVector` apply source-aware ranking at the SQL layer (curated content like `originals/`, `concepts/`, `writing/` outranks bulk content like `wintermute/chat/`, `daily/`, `media/x/`). `searchVector` uses a two-stage CTE so source-boost re-ranking doesn't kill the HNSW index. Hard-exclude prefixes (`test/`, `archive/`, `attachments/`, `.raw/` by default) filter at retrieval, not post-rank. Both gates honor `detail !== 'high'` so temporal queries surface chat pages normally. - `src/core/search/intent.ts` — Query intent classifier (entity/temporal/event/general → auto-selects detail level) - `src/core/search/eval.ts` — Retrieval eval harness: P@k, R@k, MRR, nDCG@k metrics + runEval() orchestrator - `src/core/search/source-boost.ts` (v0.22.0) — Source-type boost map keyed by slug prefix. `DEFAULT_SOURCE_BOOSTS` (originals/ 1.5, concepts/ 1.3, writing/ 1.4, people/companies/deals/ 1.2, daily/ 0.8, media/x/ 0.7, wintermute/chat/ 0.5) and `DEFAULT_HARD_EXCLUDES` (test/, archive/, attachments/, .raw/). `parseSourceBoostEnv` / `parseHardExcludesEnv` parse comma-separated `prefix:factor` pairs from `GBRAIN_SOURCE_BOOST` / `GBRAIN_SEARCH_EXCLUDE` env vars. `resolveBoostMap` and `resolveHardExcludes` merge defaults + env + caller `SearchOpts.exclude_slug_prefixes`/`include_slug_prefixes`. - `src/core/search/sql-ranking.ts` (v0.22.0) — Pure SQL string builders. `buildSourceFactorCase(slugColumn, boostMap, detail)` emits a CASE expression with longest-prefix-match wins (returns literal `'1.0'` when `detail === 'high'` for temporal-bypass parity with COMPILED_TRUTH_BOOST). `buildHardExcludeClause(slugColumn, prefixes)` emits `NOT (col LIKE 'p1%' OR col LIKE 'p2%')` — OR-chain wrapped in NOT, NOT `NOT LIKE ALL/ANY` (those quantifiers don't express set-exclusion). LIKE meta-character escape covers all three of `%`, `_`, AND `\` (backslash matters because it's Postgres LIKE's default escape char). Single-quote doubling on SQL string literals so injection-style inputs are inert text. - `src/commands/eval.ts` — `gbrain eval` command: single-run table + A/B config comparison - `src/core/embedding.ts` — OpenAI text-embedding-3-large, batch, retry, backoff - `src/core/check-resolvable.ts` — Resolver validation: reachability, MECE overlap, DRY checks, structured fix objects. v0.14.1: `CROSS_CUTTING_PATTERNS.conventions` is an array (notability gate accepts both `conventions/quality.md` and `_brain-filing-rules.md`). New `extractDelegationTargets()` parses `> **Convention:**`, `> **Filing rule:**`, and inline backtick references. DRY suppression is proximity-based via `DRY_PROXIMITY_LINES = 40`. - `src/core/repo-root.ts` — Shared `findRepoRoot(startDir?)` (v0.16.4): walks up from `startDir` (default `process.cwd()`) looking for `skills/RESOLVER.md`. Zero-dependency module imported by both `doctor.ts` and `check-resolvable.ts`. Parameterized `startDir` makes tests hermetic. - `src/commands/check-resolvable.ts` — Standalone CLI wrapper (v0.16.4) over `checkResolvable()`. Exports `parseFlags`, `resolveSkillsDir`, `DEFERRED`, `runCheckResolvable`. Exit rule: **1 on any issue (warnings OR errors)**, stricter than doctor's `ok` flag — honors README:259. Stable JSON envelope `{ok, skillsDir, report, autoFix, deferred, error, message}` — same shape on success and error paths. `--fix` path runs `autoFixDryViolations` BEFORE `checkResolvable` (same ordering as doctor). `scripts/skillify-check.ts` subprocess-calls `gbrain check-resolvable --json` (cached per process) and fails loud on binary-missing — no silent false-pass. **v0.19:** AGENTS.md workspaces now resolve natively (see `src/core/resolver-filenames.ts`) — gbrain inspects the 107-skill OpenClaw deployment whether the routing file is `RESOLVER.md` or `AGENTS.md`. `DEFERRED[]` is empty — Checks 5 + 6 shipped as real code, not issue URLs. - `src/core/resolver-filenames.ts` (v0.19) — central list of accepted routing filenames (`RESOLVER.md`, `AGENTS.md`). Shared by `findRepoRoot`, `check-resolvable`, and skillpack install so every code path walks the same fallback chain. - `src/commands/skillify.ts` + `src/core/skillify/{generator,templates}.ts` (v0.19) — `gbrain skillify scaffold ` creates all stubs for a new skill in one command: SKILL.md, script, tests, routing-eval.jsonl, resolver entry, filing-rules pointer. `gbrain skillify check