Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

bumblebee

Bumblebee is a read-only inventory collector for package, extension, and developer-tool metadata on macOS and Linux developer endpoints.

It answers a narrow supply-chain response question: when an advisory names a package, extension, or version, which developer machines show a match in their on-disk metadata right now?

SBOMs help answer what shipped, and EDR helps answer what ran or touched the network, but supply-chain response often needs a different view: messy local state across lockfiles, package-manager metadata, extension manifests, and supported developer-tool configs.

Bumblebee turns that scattered on-disk state into structured NDJSON component records and, when given an exposure catalog, flags exact matches for fast, read-only exposure checks when responders already know what they are looking for.

Scope

  • Single static binary, Go 1.25+, zero non-stdlib dependencies.
  • Three scan profiles (baseline, project, deep) for different populations and cadences.
  • Reads only the lockfiles, package-manager install metadata, extension manifests, and supported MCP JSON configs listed in docs/inventory-sources.md. No package manager execution (npm ls, pip show, go list, ...) and no source-file reads. MCP host configs can carry environment values and credentials in their env blocks; Bumblebee parses these configs for the server inventory it needs but does not emit those values in its records.

Coverage

FamilyEmitted ecosystemSources
npmnpmpackage-lock.json, npm-shrinkwrap.json, node_modules/.package-lock.json, node_modules/<pkg>/package.json
pnpmnpmpnpm-lock.yaml, .pnpm/.../package.json
Yarnnpmyarn.lock (Classic + Berry)
Bunnpmbun.lock; bun.lockb presence as diagnostic
PyPIpypi*.dist-info/METADATA, INSTALLER, direct_url.json, *.egg-info/PKG-INFO
Go modulesgogo.sum, go.mod
RubyGemsrubygemsGemfile.lock, installed *.gemspec
Composerpackagistcomposer.lock, vendor/composer/installed.json
MCPmcpJSON host configs: mcp.json, .mcp.json, claude_desktop_config.json, mcp_config.json, mcp_settings.json, cline_mcp_settings.json, plus ~/.gemini/settings.json (Gemini CLI / Code Assist). Non-JSON configs (Codex config.toml, Continue YAML) are not parsed in v0.1.
Editor extensionseditor-extensionVS Code, Cursor, Windsurf, VSCodium manifests
Browser extensionsbrowser-extensionChromium-family (manifest.json) and Firefox (extensions.json) per profile

Per-ecosystem detail: docs/inventory-sources.md.

Install

Requires Go 1.25+. Zero non-stdlib dependencies.

# Install the latest tagged release into $GOBIN. go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest # Or pin a specific tag. go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1

To build from a checkout:

go build -o bumblebee ./cmd/bumblebee go test ./...

Stamp an explicit version at build time:

go build -ldflags "-X main.Version=v0.1.1" -o bumblebee ./cmd/bumblebee

bumblebee version prints the version plus the VCS revision, build time, and Go runtime — so a record emitted in production can be traced back to a specific build. Version precedence: -ldflags override, module version recorded by go install, then the in-tree default tracked in VERSION.

Self-test

After installing, run a built-in end-to-end check against embedded fixtures:

bumblebee selftest # selftest OK (2 findings in 1ms)

The fixtures live inside the binary, use deliberately fake package names (bumblebee-selftest-evil@0.0.0), and make no network calls. A non-zero exit means the local install can no longer detect what it should — a fast pre-deployment smoke test for fleet rollouts.

Profiles

Bumblebee is a one-shot scanner: each invocation performs a single scan and exits. Cadence is the runner's responsibility (cron, launchd, systemd, MDM, etc.). Each record carries profile and a per-root root_kind so receivers can keep populations separate.

ProfileScansUse for
baselineCommon global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs.Recurring lightweight inventory via an external runner.
projectConfigured development directories, such as ~/code, ~/src, or ~/work.Recurring inventory for known project workspaces.
deepExplicit --root paths, including broad roots like $HOME.On-demand incident or campaign checks, usually with --ecosystem, --exposure-catalog, and --findings-only.

baseline and project refuse bare-home roots; only deep walks them.

Quick start

# Baseline global inventory. bumblebee scan --profile baseline > inventory.ndjson # Daily project sweep with explicit roots. bumblebee scan --profile project \ --root "$HOME/code" \ --root "$HOME/Developer" # Limit a run to selected emitted ecosystems. bumblebee scan --profile baseline \ --ecosystem npm,pypi \ --ecosystem go # On-demand exposure scan against a published advisory. bumblebee scan --profile deep \ --root "$HOME" \ --exposure-catalog ./catalog.json \ --max-duration 10m

Preview the resolved roots without scanning:

bumblebee roots --profile baseline # prints "<root_kind>\t<path>" lines

--root is a filesystem path to scan; repeatable, required for deep, optional for the other profiles. --ecosystem is repeatable and comma-separated. --exposure-catalog accepts a JSON file or a directory of *.json catalogs (merged non-recursively, all files must share schema_version). --findings-only requires --exposure-catalog and suppresses package records while keeping findings. bumblebee scan --help lists every flag.

Output

Records are NDJSON, one per line. Diagnostics go to stderr as NDJSON. Each run ends with a scan_summary record; receivers use it to decide whether to promote a run to current state. See docs/transport.md for HTTPS/file output and docs/state-model.md for the receiver-side current-state model.

Package record:

Example package record
{ "record_type": "package", "record_id": "package:...", "schema_version": "0.1.0", "scanner_name": "bumblebee", "scanner_version": "v0.1.1", "run_id": "9b1f0c2e4d5a6b7c8d9e0f1a2b3c4d5e", "scan_time": "2026-05-15T18:22:01.482Z", "endpoint": { "hostname": "alex-mbp", "os": "darwin", "arch": "arm64", "username": "alex", "uid": "501", "device_id": "MDM-7F4A2B" }, "profile": "project", "ecosystem": "npm", "package_name": "@tanstack/query-core", "normalized_name": "@tanstack/query-core", "version": "5.59.20", "project_path": "/Users/alex/code/web-app", "root_kind": "project_root", "package_manager": "pnpm", "source_type": "pnpm-lockfile", "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml", "has_lifecycle_scripts": false, "confidence": "high" }

confidence:

  • high — exact identity and version came from canonical metadata.
  • medium — identity is reliable, but version or source is partial.
  • low — config/path/spec reference only; not proof of an installed exact version.

Finding record (exposure-catalog match):

Example finding record
{ "record_type": "finding", "record_id": "finding:...", "schema_version": "0.1.0", "scanner_name": "bumblebee", "scanner_version": "v0.1.1", "run_id": "3a8c7d1e9f0b2a4c6d8e0f1a2b3c4d5e", "scan_time": "2026-05-15T18:22:01.482Z", "endpoint": { "hostname": "alex-mbp", "os": "darwin", "arch": "arm64", "username": "alex", "uid": "501", "device_id": "MDM-7F4A2B" }, "profile": "deep", "finding_type": "package_exposure", "severity": "critical", "catalog_id": "advisory-2026-0042", "catalog_name": "example-pkg 1.2.3 (compromised release)", "ecosystem": "npm", "package_name": "example-pkg", "normalized_name": "example-pkg", "version": "1.2.3", "root_kind": "deep_home_root", "project_path": "/Users/alex/code/web-app", "source_type": "pnpm-lockfile", "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml", "confidence": "high", "evidence": "exact name+version match (version=1.2.3)" }

record_id is a content-addressed hash of a canonical identity tuple per record type, stable across runs. Per-record-type field lists and dedupe guidance: docs/state-model.md.

Exposure Catalog Format

Minimal JSON, exact (ecosystem, name, version) matching only:

{ "schema_version": "0.1.0", "entries": [ { "id": "advisory-2026-0042", "name": "example-pkg 1.2.3 (compromised release)", "ecosystem": "npm", "package": "example-pkg", "versions": ["1.2.3"], "severity": "critical" } ] }

The catalog must be a JSON object with schema_version and entries keys. Bare top-level arrays are rejected. Unsupported future schema_version values are rejected. Multiple catalog files can be loaded together by pointing --exposure-catalog at a directory; see the flag description above.

Sample exposure catalogs

The threat_intel/ directory holds maintained exposure catalogs built from public threat-intelligence reporting on recent supply-chain campaigns, assembled with Perplexity Computer and updated via PRs as new campaigns are reported. See threat_intel/README.md for the current catalog list and review guidance.

License

Apache License 2.0. See LICENSE.

关于 About

Read-only developer endpoint scanner for on-disk package, extension, and developer-tool metadata, built to check exposure to known software supply-chain compromises.
golangpackage-inventorysupply-chain-security

语言 Languages

Go100.0%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
4
Total Commits
峰值: 4次/周
Less
More

核心贡献者 Contributors