bumblebee
Bumblebee is a read-only inventory collector for package, extension, and developer-tool metadata on macOS and Linux developer endpoints.
It answers a narrow supply-chain response question: when an advisory names a package, extension, or version, which developer machines show a match in their on-disk metadata right now?
SBOMs help answer what shipped, and EDR helps answer what ran or touched the network, but supply-chain response often needs a different view: messy local state across lockfiles, package-manager metadata, extension manifests, and supported developer-tool configs.
Bumblebee turns that scattered on-disk state into structured NDJSON component records and, when given an exposure catalog, flags exact matches for fast, read-only exposure checks when responders already know what they are looking for.
Scope
- Single static binary, Go 1.25+, zero non-stdlib dependencies.
- Three scan profiles (
baseline,project,deep) for different populations and cadences. - Reads only the lockfiles, package-manager install metadata,
extension manifests, and supported MCP JSON configs listed in
docs/inventory-sources.md. No package
manager execution (
npm ls,pip show,go list, ...) and no source-file reads. MCP host configs can carry environment values and credentials in theirenvblocks; Bumblebee parses these configs for the server inventory it needs but does not emit those values in its records.
Coverage
| Family | Emitted ecosystem | Sources |
|---|---|---|
| npm | npm | package-lock.json, npm-shrinkwrap.json, node_modules/.package-lock.json, node_modules/<pkg>/package.json |
| pnpm | npm | pnpm-lock.yaml, .pnpm/.../package.json |
| Yarn | npm | yarn.lock (Classic + Berry) |
| Bun | npm | bun.lock; bun.lockb presence as diagnostic |
| PyPI | pypi | *.dist-info/METADATA, INSTALLER, direct_url.json, *.egg-info/PKG-INFO |
| Go modules | go | go.sum, go.mod |
| RubyGems | rubygems | Gemfile.lock, installed *.gemspec |
| Composer | packagist | composer.lock, vendor/composer/installed.json |
| MCP | mcp | JSON host configs: mcp.json, .mcp.json, claude_desktop_config.json, mcp_config.json, mcp_settings.json, cline_mcp_settings.json, plus ~/.gemini/settings.json (Gemini CLI / Code Assist). Non-JSON configs (Codex config.toml, Continue YAML) are not parsed in v0.1. |
| Editor extensions | editor-extension | VS Code, Cursor, Windsurf, VSCodium manifests |
| Browser extensions | browser-extension | Chromium-family (manifest.json) and Firefox (extensions.json) per profile |
Per-ecosystem detail: docs/inventory-sources.md.
Install
Requires Go 1.25+. Zero non-stdlib dependencies.
# Install the latest tagged release into $GOBIN. go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest # Or pin a specific tag. go install github.com/perplexityai/bumblebee/cmd/bumblebee@v0.1.1
To build from a checkout:
go build -o bumblebee ./cmd/bumblebee go test ./...
Stamp an explicit version at build time:
go build -ldflags "-X main.Version=v0.1.1" -o bumblebee ./cmd/bumblebee
bumblebee version prints the version plus the VCS revision, build
time, and Go runtime — so a record emitted in production can be traced
back to a specific build. Version precedence: -ldflags override,
module version recorded by go install, then the in-tree default
tracked in VERSION.
Self-test
After installing, run a built-in end-to-end check against embedded fixtures:
bumblebee selftest # selftest OK (2 findings in 1ms)
The fixtures live inside the binary, use deliberately fake package
names (bumblebee-selftest-evil@0.0.0), and make no network calls. A
non-zero exit means the local install can no longer detect what it
should — a fast pre-deployment smoke test for fleet rollouts.
Profiles
Bumblebee is a one-shot scanner: each invocation performs a single scan
and exits. Cadence is the runner's responsibility (cron, launchd, systemd,
MDM, etc.). Each record carries profile and a per-root root_kind so
receivers can keep populations separate.
| Profile | Scans | Use for |
|---|---|---|
baseline | Common global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs. | Recurring lightweight inventory via an external runner. |
project | Configured development directories, such as ~/code, ~/src, or ~/work. | Recurring inventory for known project workspaces. |
deep | Explicit --root paths, including broad roots like $HOME. | On-demand incident or campaign checks, usually with --ecosystem, --exposure-catalog, and --findings-only. |
baseline and project refuse bare-home roots; only deep walks them.
Quick start
# Baseline global inventory. bumblebee scan --profile baseline > inventory.ndjson # Daily project sweep with explicit roots. bumblebee scan --profile project \ --root "$HOME/code" \ --root "$HOME/Developer" # Limit a run to selected emitted ecosystems. bumblebee scan --profile baseline \ --ecosystem npm,pypi \ --ecosystem go # On-demand exposure scan against a published advisory. bumblebee scan --profile deep \ --root "$HOME" \ --exposure-catalog ./catalog.json \ --max-duration 10m
Preview the resolved roots without scanning:
bumblebee roots --profile baseline # prints "<root_kind>\t<path>" lines
--root is a filesystem path to scan; repeatable, required for deep,
optional for the other profiles. --ecosystem is repeatable and
comma-separated. --exposure-catalog accepts a JSON file or a directory
of *.json catalogs (merged non-recursively, all files must share
schema_version). --findings-only requires --exposure-catalog and
suppresses package records while keeping findings. bumblebee scan --help
lists every flag.
Output
Records are NDJSON, one per line. Diagnostics go to stderr as NDJSON. Each
run ends with a scan_summary record; receivers use it to decide whether
to promote a run to current state. See docs/transport.md
for HTTPS/file output and docs/state-model.md for the
receiver-side current-state model.
Package record:
Example package record
{ "record_type": "package", "record_id": "package:...", "schema_version": "0.1.0", "scanner_name": "bumblebee", "scanner_version": "v0.1.1", "run_id": "9b1f0c2e4d5a6b7c8d9e0f1a2b3c4d5e", "scan_time": "2026-05-15T18:22:01.482Z", "endpoint": { "hostname": "alex-mbp", "os": "darwin", "arch": "arm64", "username": "alex", "uid": "501", "device_id": "MDM-7F4A2B" }, "profile": "project", "ecosystem": "npm", "package_name": "@tanstack/query-core", "normalized_name": "@tanstack/query-core", "version": "5.59.20", "project_path": "/Users/alex/code/web-app", "root_kind": "project_root", "package_manager": "pnpm", "source_type": "pnpm-lockfile", "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml", "has_lifecycle_scripts": false, "confidence": "high" }
confidence:
high— exact identity and version came from canonical metadata.medium— identity is reliable, but version or source is partial.low— config/path/spec reference only; not proof of an installed exact version.
Finding record (exposure-catalog match):
Example finding record
{ "record_type": "finding", "record_id": "finding:...", "schema_version": "0.1.0", "scanner_name": "bumblebee", "scanner_version": "v0.1.1", "run_id": "3a8c7d1e9f0b2a4c6d8e0f1a2b3c4d5e", "scan_time": "2026-05-15T18:22:01.482Z", "endpoint": { "hostname": "alex-mbp", "os": "darwin", "arch": "arm64", "username": "alex", "uid": "501", "device_id": "MDM-7F4A2B" }, "profile": "deep", "finding_type": "package_exposure", "severity": "critical", "catalog_id": "advisory-2026-0042", "catalog_name": "example-pkg 1.2.3 (compromised release)", "ecosystem": "npm", "package_name": "example-pkg", "normalized_name": "example-pkg", "version": "1.2.3", "root_kind": "deep_home_root", "project_path": "/Users/alex/code/web-app", "source_type": "pnpm-lockfile", "source_file": "/Users/alex/code/web-app/pnpm-lock.yaml", "confidence": "high", "evidence": "exact name+version match (version=1.2.3)" }
record_id is a content-addressed hash of a canonical identity tuple per
record type, stable across runs. Per-record-type field lists and dedupe
guidance: docs/state-model.md.
Exposure Catalog Format
Minimal JSON, exact (ecosystem, name, version) matching only:
{ "schema_version": "0.1.0", "entries": [ { "id": "advisory-2026-0042", "name": "example-pkg 1.2.3 (compromised release)", "ecosystem": "npm", "package": "example-pkg", "versions": ["1.2.3"], "severity": "critical" } ] }
The catalog must be a JSON object with schema_version and entries
keys. Bare top-level arrays are rejected. Unsupported future
schema_version values are rejected. Multiple catalog files can be
loaded together by pointing --exposure-catalog at a directory; see
the flag description above.
Sample exposure catalogs
The threat_intel/ directory holds maintained exposure
catalogs built from public threat-intelligence reporting on recent
supply-chain campaigns, assembled with
Perplexity Computer and updated
via PRs as new campaigns are reported. See
threat_intel/README.md for the current
catalog list and review guidance.
License
Apache License 2.0. See LICENSE.