browser-harness: A Production-Grade Testing Framework for AI Browser Agents
文章目录
As AI coding agents become increasingly sophisticated, the need for a unified framework to run, test, and benchmark their browser-interaction capabilities has never been greater. browser-harness (8,000+ stars on GitHub) is exactly that — a purpose-built testing infrastructure that lets you exercise AI agents against real web pages, collect structured traces, run regression suites, and measure performance across platforms including Linux, macOS, and Windows.
What Is browser-harness?
browser-harness is an open-source framework from the team behind browser-use, one of the most popular AI browser-automation libraries. While browser-use focuses on controlling Chrome for task automation, browser-harness zooms out to provide a complete evaluation layer — think CI for your AI agent. It ships with a daemon process, a CLI doctor check, Wayland/X11 display detection, Chrome profile management, and a structured logging pipeline that makes debugging flaky agent runs far less painful.
Core Technical Highlights
- Cross-Platform Daemon Relay — Supports both Unix-domain sockets (Linux/macOS) and a Windows-native relay via named pipes, so your test harness behaves consistently regardless of OS. Recent work (PR #162) fixed
socket.AF_UNIXcompatibility on native Windows so tests no longer crash when the daemon isn't attached. - Intelligent Chrome Launch — Auto-detects Chrome installations, manages profiles, and handles the Chrome 147+ remote-debugging restriction on default profiles. The launch freshness check was recently hardened to handle
OSErrorfrom permission issues and concurrent file modifications. - Wayland-First Display Detection — On Linux it now checks
WAYLAND_DISPLAY/XDG_SESSION_TYPEenvironment variables before falling back to logind session queries, eliminating spuriouspgrep gnome-shellscans that could false-match unrelated sessions.
English Community Discussions
Issues on browser-harness are unusually active for a testing tool — here's what the community is talking about:
Issue #162 — Windows AF_UNIX Compatibility Fix (7 comments)
@Bortlesboat: "I tested this PR on native Windows and hit one test expectation issue. Environment: Windows 11 10.0.26200, Python 3.13.12 via
— Bortlesboat on Issue #162uv sync, Chrome running but no daemon attached yet,socket.AF_UNIX: absent,asyncio.start_unix_server: absent."
@catsmonster resolved it in commit 9715664: the doctor test no longer asserts exit code 0 when the daemon is absent — it now only checks that the doctor banner prints and the AF_UNIX crash path is avoided. This is a great example of a community-driven regression fix across platforms.
Issue #122 — Security: Path Injection in Daemon (4 comments)
@qodo-ai-reviewer: "daemon.py interpolates BU_NAME directly into filesystem paths that are opened/written/unlinked; a BU_NAME containing path separators or '..' can redirect PID/LOG/PORT writes and cleanup unlinks outside the intended metadata/temp directory. Severity: action required | Category: security."
— qodo-ai-reviewer on Issue #122
@Will-hxw addressed all inline review comments and the /tmp hardening was subsequently fixed in commit 018a605. A good reminder that even internal naming conventions need sanitization when they touch the filesystem.
Issue #142 — Chrome 147+ Remote Debugging Block (3 comments)
@qodo-ai-reviewer: "
— qodo-ai-reviewer on Issue #142launch_chrome()computes freshness byrglob()+stat()over the entire profile tree without handlingOSError, so permission issues or concurrently-changing files can crash Chrome auto-launch on macOS/Windows. Severity: action required | Category: reliability."
@1RB addressed this by improving the Wayland detection logic first, then fixing the rglob()/stat() code path with proper OSError guards — a fix that benefits all three desktop platforms.
Summary
browser-harness fills a critical gap in the AI agent ecosystem: it's not enough to run your agent, you need to evaluate it reliably across environments. With active cross-platform development (Windows support especially improving rapidly), a security-conscious codebase, and a healthy issue velocity, this is one to watch — or better, to contribute to.
Language: Python | Stars: 8,000+ | Issues: 79 open
@browser-use / browser-harness — https://github.com/browser-use/browser-harness