Bumblebee - Go GitHub Trending Open Source Project Recommendation

**Bumblebee** is a read-only inventory scanner for package, extension, and developer-tool metadata on macOS and Linux endpoints, built with Go 1.25+ and currently sitting at 607 stars on GitHub. Developed by Perplexity AI, it answers a very specific supply-chain security question: when a security advisory names a compromised package, which developer machines actually have it installed on-disk right now?

Unlike traditional SBOM generators that tell you what shipped in a build, or EDR tools that monitor what ran or touched the network, Bumblebee fills a critical gap in the supply-chain response workflow — it gives you a read-only snapshot of messy local developer state across lockfiles, package manager metadata, extension manifests, and developer tool configurations.

## Why Should You Care About Bumblebee?

The security community has gotten reasonably good at answering "what was in the package I downloaded?" (SBOMs) and "what processes touched the network?" (EDR). But there's a persistent blind spot in supply-chain incident response: when a critical vulnerability in a npm package or a compromised RubyGem is disclosed, how do you quickly determine which of your 500+ developer machines have that specific package version installed?

This is exactly the problem Bumblebee solves — and it does so with a remarkably clean design philosophy. The tool makes exactly zero package manager executions (`npm ls`, `pip show`, `go list`...) and reads exactly zero source files. Instead, it parses the metadata files that package managers themselves write: `package-lock.json`, `pnpm-lock.yaml`, `*.dist-info/METADATA`, `Gemfile.lock`, `composer.lock`, and more. This makes it both fast and auditable — you're reading the same data that `npm install` and `pip install` read when resolving dependencies.

In practice, this means Bumblebee can scan an entire organization's developer fleet in minutes, flag exact package+version matches against a threat-intelligence exposure catalog, and output structured NDJSON records that feed directly into a SIEM or security data pipeline. The tool ships as a single static Go binary with zero non-stdlib dependencies — which is exactly what you'd want from a security tool deployed at fleet scale.

## README Highlights

The project covers an impressive breadth of ecosystems:

- **JavaScript/TypeScript ecosystem**: Reads `package-lock.json`, `npm-shrinkwrap.json`, `node_modules/.package-lock.json`, `yarn.lock` (Classic + Berry), `pnpm-lock.yaml`, and `bun.lock`
- **Python**: Parses `*.dist-info/METADATA`, `INSTALLER`, `direct_url.json`, `*.egg-info/PKG-INFO` — note it reads PyPI distribution metadata directly from disk, not via the PyPI API
- **Go modules**: Reads `go.sum` and `go.mod`
- **Ruby**: `Gemfile.lock` and installed `*.gemspec`
- **PHP/Composer**: `composer.lock` and `vendor/composer/installed.json`
- **MCP server configs**: Parses JSON MCP host configs (`mcp.json`, `.mcp.json`, `claude_desktop_config.json`, `mcp_settings.json`, etc.) — important because MCP configs often contain environment values and credentials that tools should be aware of
- **Editor extensions**: VS Code, Cursor, Windsurf, VSCodium manifests
- **Browser extensions**: Chromium family (`manifest.json`) and Firefox (`extensions.json`)

The tool supports three scan profiles:
- **`baseline`**: Common global/user package roots, language toolchains, editor extensions, browser extensions, and MCP configs. Good for recurring lightweight inventory via an external runner (cron, launchd, systemd, MDM)
- **`project`**: Configured development directories (`~/code`, `~/src`, `~/work`). Good for recurring inventory for known project workspaces
- **`deep`**: Explicit `--root` paths including `$HOME`. On-demand incident or campaign checks with `--exposure-catalog` and `--findings-only`

## Real-World Use Cases

### 1. Emergency vulnerability response
When thexz utility or a popular npm package is found compromised, security teams can push Bumblebee across all developer machines and immediately get a list of machines with the affected version installed — no agents needed, just the static binary.

### 2. Pre-acquisition security audit
Before acquiring a company or onboarding a new development team, run Bumblebee to inventory the full package landscape across their developer machines. You'd be surprised how many older versions of critical packages are sitting around in `node_modules/`.

### 3. Continuous compliance monitoring
Integrate Bumblebee with MDM (Jamf, Kandji, etc.) to run periodic baseline scans and catch when developers accidentally pull in packages that violate your organization's license or security policies. The NDJSON output format is designed to be ingested by data pipelines directly.

## Quick Start Guide

Here's a practical walkthrough for running a basic inventory scan on a developer machine:

```bash
# 1. Install (requires Go 1.25+)
go install github.com/perplexityai/bumblebee/cmd/bumblebee@latest

# 2. Run a built-in self-test to verify installation
bumblebee selftest
# Expected: "selftest OK (2 findings in 1ms)"
# The self-test uses fake package names (bumblebee-selftest-evil@0.0.0)
# and makes no network calls — a good pre-deployment smoke test.

# 3. Preview which roots will be scanned (without scanning)
bumblebee roots --profile baseline
# Prints "\t" lines showing what the scanner will examine

# 4. Run a baseline global inventory scan
bumblebee scan --profile baseline > inventory.ndjson
# Outputs NDJSON records, one per discovered package
# Diagnostics go to stderr, data to stdout

# 5. Scan a specific project directory only
bumblebee scan --profile project --root "$HOME/code/myproject"

# 6. Filter to specific ecosystems only (npm and PyPI)
bumblebee scan --profile baseline --ecosystem npm,pypi

# 7. On-demand exposure check against an advisory catalog
bumblebee scan --profile deep --root "$HOME" --exposure-catalog ./advisory-catalog.json --findings-only
# --findings-only suppresses package records and only shows exposure matches
```

## Key Technical Highlights

- **Single static binary, zero runtime dependencies**: Compile once with `go build -o bumblebee ./cmd/bumblebee`, deploy everywhere. The binary includes embedded self-test fixtures, so `selftest` works even in air-gapped environments.

- **Content-addressed record identity**: Each NDJSON record has a `record_id` that is a content-addressed hash of a canonical identity tuple, stable across runs. This means receivers can deduplicate findings without relying on scan timing or machine IDs.

- **Exposure catalog matching**: When given an `--exposure-catalog` JSON file (or directory of files), Bumblebee flags exact `(ecosystem, package_name, version)` matches. The [`threat_intel/`](https://github.com/perplexityai/bumblebee/tree/main/threat_intel) directory in the repo already maintains exposure catalogs built from public threat-intelligence reporting on recent supply-chain campaigns.

- **MCP host config parsing**: The tool parses MCP JSON configs (`mcp.json`, `claude_desktop_config.json`, etc.) for server inventory, but notably does NOT emit the `env` values it finds — it respects the sensitivity of those credentials while still inventorying which MCP servers are configured.

## Star Trend

⭐ 607 stars | 📈 Growing fast — project was created 2026-05-18 and pushed latest update 2026-05-23

## Comparison with Similar Tools

**Bumblebee vs. Anchore Syft**: Syft generates SBOMs from container images and filesystems — it's great for build-time attestation. Bumblebee targets developer endpoint inventory specifically, reads only metadata files (not package contents), and is designed for fleet-scale read-only scanning. Think of Syft as "what went into this build" and Bumblebee as "what do my developers actually have on their machines."

**Bumblebee vs. CycloneDX**: CycloneDX is a standard format for SBOMs. Bumblebee doesn't care about format — it outputs NDJSON with its own schema, which is designed for ingestion by data pipelines rather than as a compliance artifact. They're complementary: Bumblebee feeds data, CycloneDX formats it for sharing.

## Community Discussions from GitHub Issues

A recent security audit of Bumblebee v0.1.x surfaced some interesting discussions worth highlighting:

**Issue #5 — Security Audit: Multiple vulnerabilities in bumblebee v0.1.x (9 findings) — 1 comment**

An independent security researcher filed a detailed audit with 9 findings. The Perplexity team responded promptly and thoroughly: each finding was evaluated against the actual code and their threat model. The maintainer clarified that the `--all-users` flag (H-6) was intentional documented behavior for operator mode, and the HMAC replay window (H-3) is standard HMAC contract — enforcement is the receiver's responsibility. The takeaway here is that the team has a well-defined threat model and distinguishes clearly between tool behavior and operator responsibilities. This kind of transparency is exactly what you'd want from a supply-chain security tool.

**Issue #6 — Security findings in bumblebee v0.1.x — please do not share publicly until maintainers respond — 2 comments**

A separate researcher filed findings with a "please do not share publicly" request. The maintainer pointed out the contradiction — private security disclosures should go through the `SECURITY.md` advisory process, not a public GitHub issue. This led to a helpful exchange clarifying responsible disclosure practices for security researchers. It's a reminder that even in open-source security work, coordinated disclosure via a proper security advisory (rather than public issues) is the professional norm.

## Common Pitfalls and Tips

1. **Using `--all-users` without understanding its scope**: The `--all-users` flag is an opt-in operator mode that enumerates ALL users on a multi-user machine, not just the invoking user. If you're running this in a shared enterprise environment, be aware that this emits package records for every user account on the system. The maintainers consider this intended behavior, but it's worth understanding before deploying fleet-wide.

2. **Catalog files must be JSON objects, not arrays**: The `--exposure-catalog` expects a JSON file with `{"schema_version": "...", "entries": [...]}` structure. A bare top-level array will be rejected. If you're building your own catalogs from advisory feeds, make sure to wrap them in the proper object structure — the schema is in the project's documentation.

3. **`baseline` and `project` profiles refuse bare-home roots**: Only the `deep` profile allows `--root "$HOME"`. If you want to scan your home directory, you must explicitly use `--profile deep`, which is intentional — deep scans are meant for targeted incident response, not routine inventory.

4. **Non-Claude models and JSON schema compliance**: Bumblebee is built around Claude Code's tool-use capabilities, and the orchestrator relies on JSON schema validation for output stability. If you configure a non-Anthropic model (via OpenRouter or another gateway), you may see degraded schema compliance — the project acknowledges this in its README and the schema-validation + repair turn still applies, but quality varies by model.

## Summary

Bumblebee fills a genuinely underserved niche in the supply-chain security tooling landscape. While SBOM generators tell you what shipped and EDR tools tell you what ran, Bumblebee answers the question that incident responders actually need: which of my developer machines have this specific package installed right now? The read-only metadata approach is elegant — it makes the tool fast, auditable, and deployable as a single static binary. With support for 10+ ecosystems across npm, PyPI, Go, Ruby, PHP, MCP configs, and browser extensions, it covers the full landscape of a typical developer's machine. The fact that Perplexity AI is actively maintaining threat-intel exposure catalogs in the repo is a nice touch that lowers the barrier to immediate use.

If you're a security engineer, platform engineer, or anyone responsible for developer fleet security, Bumblebee is worth adding to your toolbox. Even just running `bumblebee selftest` and a baseline scan on your own machine takes 5 minutes and gives you a structured inventory of every package across your development environment.

## Project Links

- GitHub Repository
- @perplexityai on GitHub

---

🔗 More GitHub Trending Open Source Projects: AI & Machine Learning • Open Source Security

Bumblebee - Go GitHub Trending Open Source Project Recommendation | 2026-05-23

🔥 tailscale — The easiest, most secure way to use WireGuard

Orca — AI Orchestrator for Parallel Agents | 2026-05-24

发表评论点击这里取消回复。

归档

分类

Bumblebee - Go GitHub Trending Open Source Project Recommendation | 2026-05-23

微信扫一扫,分享到朋友圈

🔥 tailscale — The easiest, most secure way to use WireGuard

Orca — AI Orchestrator for Parallel Agents | 2026-05-24

猜你喜欢

发表评论 点击这里取消回复。

归档

分类

关注我们的公众号

发表评论点击这里取消回复。