browser-harness: A Production-Grade Testing Framework for AI Browser Agents |

文章目录

What Is browser-harness?
Core Technical Highlights
English Community Discussions
Summary

As AI coding agents become increasingly sophisticated, the need for a unified framework to run, test, and benchmark their browser-interaction capabilities has never been greater. browser-harness (8,000+ stars on GitHub) is exactly that — a purpose-built testing infrastructure that lets you exercise AI agents against real web pages, collect structured traces, run regression suites, and measure performance across platforms including Linux, macOS, and Windows.

What Is browser-harness?

browser-harness is an open-source framework from the team behind browser-use, one of the most popular AI browser-automation libraries. While browser-use focuses on controlling Chrome for task automation, browser-harness zooms out to provide a complete evaluation layer — think CI for your AI agent. It ships with a daemon process, a CLI doctor check, Wayland/X11 display detection, Chrome profile management, and a structured logging pipeline that makes debugging flaky agent runs far less painful.

Core Technical Highlights

Cross-Platform Daemon Relay — Supports both Unix-domain sockets (Linux/macOS) and a Windows-native relay via named pipes, so your test harness behaves consistently regardless of OS. Recent work (PR #162) fixed socket.AF_UNIX compatibility on native Windows so tests no longer crash when the daemon isn't attached.
Intelligent Chrome Launch — Auto-detects Chrome installations, manages profiles, and handles the Chrome 147+ remote-debugging restriction on default profiles. The launch freshness check was recently hardened to handle OSError from permission issues and concurrent file modifications.
Wayland-First Display Detection — On Linux it now checks WAYLAND_DISPLAY / XDG_SESSION_TYPE environment variables before falling back to logind session queries, eliminating spurious pgrep gnome-shell scans that could false-match unrelated sessions.

English Community Discussions

Issues on browser-harness are unusually active for a testing tool — here's what the community is talking about:

Issue #162 — Windows AF_UNIX Compatibility Fix (7 comments)

@Bortlesboat: "I tested this PR on native Windows and hit one test expectation issue. Environment: Windows 11 10.0.26200, Python 3.13.12 via uv sync, Chrome running but no daemon attached yet, socket.AF_UNIX: absent, asyncio.start_unix_server: absent."
— Bortlesboat on Issue #162

@catsmonster resolved it in commit 9715664: the doctor test no longer asserts exit code 0 when the daemon is absent — it now only checks that the doctor banner prints and the AF_UNIX crash path is avoided. This is a great example of a community-driven regression fix across platforms.

Issue #122 — Security: Path Injection in Daemon (4 comments)

@qodo-ai-reviewer: "daemon.py interpolates BU_NAME directly into filesystem paths that are opened/written/unlinked; a BU_NAME containing path separators or '..' can redirect PID/LOG/PORT writes and cleanup unlinks outside the intended metadata/temp directory. Severity: action required | Category: security."
— qodo-ai-reviewer on Issue #122

@Will-hxw addressed all inline review comments and the /tmp hardening was subsequently fixed in commit 018a605. A good reminder that even internal naming conventions need sanitization when they touch the filesystem.

Issue #142 — Chrome 147+ Remote Debugging Block (3 comments)

@qodo-ai-reviewer: "launch_chrome() computes freshness by rglob()+stat() over the entire profile tree without handling OSError, so permission issues or concurrently-changing files can crash Chrome auto-launch on macOS/Windows. Severity: action required | Category: reliability."
— qodo-ai-reviewer on Issue #142

@1RB addressed this by improving the Wayland detection logic first, then fixing the rglob()/stat() code path with proper OSError guards — a fix that benefits all three desktop platforms.

Summary

browser-harness fills a critical gap in the AI agent ecosystem: it's not enough to run your agent, you need to evaluate it reliably across environments. With active cross-platform development (Windows support especially improving rapidly), a security-conscious codebase, and a healthy issue velocity, this is one to watch — or better, to contribute to.

Language: Python | Stars: 8,000+ | Issues: 79 open
@browser-use / browser-harness — https://github.com/browser-use/browser-harness