Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 55 additions & 6 deletions SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: browser
description: Direct browser control via CDP. Use when the user wants to automate, scrape, test, or interact with web pages. Connects to the user's already-running Chrome.
description: Direct browser control via CDP. Use when the user wants to automate, scrape, test, or interact with web pages. Connects to Chrome, Chromium, or a manager-created cloud browser.
---

# browser-harness
Expand All @@ -24,6 +24,8 @@ PY
- Invoke as browser-harness — it's on $PATH. No cd, no uv run.
- Use the heredoc form for every multi-line command. It prevents shell quote mangling inside Python strings and JavaScript snippets.
- First navigation is new_tab(url), not goto_url(url) — goto runs in the user's active tab and clobbers their work.
- Local Chrome requires an explicit selected profile. Use `list_local_profiles()` to see stable ids like `google-chrome:Default`, then `use_local_profile(profile_id)`.
- Plain helper calls use the selected local browser. If you need an isolated or remote browser, call `browser_new(...)` before `new_tab(...)`.

## Tool call shape

Expand All @@ -33,11 +35,58 @@ browser-harness <<'PY'
PY
```

run.py calls ensure_daemon() before exec — you never start/stop manually unless you want to.
Legacy mode calls ensure_daemon() before exec. Manager mode starts when the script uses a `browser_*` lifecycle helper or `BH_MANAGER_MODE=1`.

### Remote browsers
Local Chrome sessions snapshot the selected profile when the daemon starts. If a profile is changed later, existing named daemons keep their current profile until restarted.

Use remote for parallel sub-agents (each gets its own isolated browser via a distinct BU_NAME) or on a headless server. BROWSER_USE_API_KEY must be set. start_remote_daemon, list_cloud_profiles, list_local_profiles, sync_local_profile are pre-imported.
### Managed browsers

Use this when you need an isolated browser, parallel sub-agents, a cloud browser, or a restart after the current browser gets blocked.

```bash
browser-harness <<'PY'
print(browser_status())
print(browser_new(backend="cloud", proxy_country="us"))
new_tab("https://example.com")
print(page_info())
print(browser_close())
PY
```

Lifecycle helpers:
- `browser_status()` — current binding state.
- `browser_new(backend="cloud"|"managed", profile="clean", proxy_country=None, reason=None)` — create and switch to a browser. Cloud responses include `live_url` when Browser Use returns one.
- `browser_list()` — browser ids visible to this run/agent.
- `browser_switch(browser_id)` — reuse an existing browser id.
- `browser_close(browser_id=None)` — close the active private browser, or release access to a shared one.

After `browser_new(...)` or `browser_switch(...)`, all normal page helpers work unchanged. If you are a sub-agent, create your own browser unless the parent gives you a browser id to reuse with `browser_switch(id)`.

If `browser_new(backend="cloud")` reports `cloud-auth-required`, run:

```bash
browser-harness auth login
```

It prints/opens a Browser Use login URL and waits while the user logs in online. The API key is stored locally and is never printed. In SSH/headless environments use:

```bash
browser-harness auth login --device-code
```

If the user directly provides an API key, store it through stdin instead of a command argument:

```bash
browser-harness auth login --api-key-stdin
```

Never put API keys in command-line arguments.

Then retry `browser_new(backend="cloud")`.

### Legacy remote browsers

Prefer `browser_new(backend="cloud")` for new work. The older named-daemon remote API is still available for advanced/manual cases. BROWSER_USE_API_KEY must be set or `browser-harness auth login` must have been completed. start_remote_daemon, list_cloud_profiles, list_local_profiles, sync_local_profile are pre-imported.

```bash
browser-harness <<'PY'
Expand Down Expand Up @@ -95,11 +144,11 @@ If you start struggling with a specific mechanic while navigating, look in inter
## Design constraints

- Coordinate clicks default. Input.dispatchMouseEvent goes through iframes/shadow/cross-origin at the compositor level.
- Connect to the user's running Chrome. Don't launch your own browser.
- Legacy mode connects to the user's selected local Chrome profile. Manager mode may create cloud or managed browsers via `browser_new`.
- cdp-use is only for CDPClient.send_raw. Prefer raw CDP strings over typed wrappers.
- run.py stays tiny. No argparse, subcommands, or extra control layer.
- Core helpers stay short. Put task-specific helper additions in `agent-workspace/agent_helpers.py`; daemon/bootstrap and remote session admin live in the core package.
- Don't add a manager layer. No retries framework, session manager, daemon supervisor, config system, or logging framework.
- Don't add another manager layer. Use the built-in `browser_*` lifecycle helpers.

## Gotchas (field-tested)

Expand Down
142 changes: 142 additions & 0 deletions TEMP_BROWSER_MANAGER_CONTEXT_DO_NOT_MERGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Temporary Browser Manager Context

Remove this file before merging the PR. It is session context for review and follow-up, not product documentation.

## Why This Branch Exists

The current browser-harness works unusually well because the LLM sees the actual Python helper surface and can directly control browser/page behavior with very little indirection. The goal of this branch is to preserve that property while adding a tiny lifecycle layer for cases the current harness handles poorly:

- many parallel agents;
- subagents needing either their own browser or a reused parent browser;
- remote/cloud browser creation from inside the harness flow;
- isolated per-browser daemon/runtime/tmp/artifact directories;
- safer cleanup and switching between browser backends.

The important constraint from the discussion was: do not turn the LLM into a browser manager with a complicated control plane. The LLM should see a small set of obvious helpers, then use the existing page helpers exactly as before.

## Final LLM-Facing Interface

The intended surface is:

```python
browser_status()
browser_new(backend="cloud"|"managed", profile="clean", proxy_country=None, reason=None)
browser_list()
browser_switch(browser_id)
browser_close(browser_id=None)
```

After `browser_new(...)` or `browser_switch(...)`, normal browser-harness helpers such as `new_tab`, `page_info`, `capture_screenshot`, `click_at_xy`, `js`, and `cdp` work unchanged.

For cloud browsers, missing auth should produce `cloud-auth-required`; the model should run `browser-harness auth login` and retry. The user logs in online and the API key is stored locally without being printed into chat. If a user directly provides an API key, the safe storage path is `browser-harness auth login --api-key-stdin`, never a command-line argument.

The model does not need to know about sockets, daemon names, runtime dirs, CDP URLs, Browser Use browser IDs, or process cleanup. Those are manager internals.

## Why Python Instead Of Rust

This was switched from the earlier Rust manager direction to Python because browser-harness is already a Python package and the simplest install path matters more than a theoretically cleaner standalone daemon.

Python keeps the end-to-end flow simple:

```bash
uv tool install -e .
browser-harness <<'PY'
print(browser_new(backend="cloud", proxy_country="us"))
new_tab("https://example.com")
print(page_info())
print(browser_close())
PY
```

No separate Rust build, no extra binary distribution problem, and no cross-language install story. The manager daemon is just another Python module/script in the package.

## Architecture

The manager owns browser leases. A lease includes:

- `browser_id`;
- backend type: `cloud` or `managed`;
- per-browser harness daemon name;
- per-browser runtime/tmp/download/artifact/profile dirs;
- CDP endpoint info;
- owner agent and allowed agent ids;
- an active execution lock.

The runtime path is:

```text
LLM code
-> browser_* helper
-> manager_client over Unix socket
-> manager_daemon creates/switches/closes lease
-> per-browser browser_harness.daemon
-> existing page helpers talk to that daemon
```

The existing non-manager browser-harness path still works.

## Parallelism Reasoning

The branch tries to handle the obvious 100-agent failure modes:

- manager auto-start is single-flight via a file lock, so concurrent agents should not start competing managers;
- browser ids and daemon names are generated per lease;
- each lease gets isolated runtime/tmp/artifact/profile directories;
- manager registry state is persisted under the manager root;
- browser creation does not hold the global manager lock while slow cloud/local startup happens;
- execution locks are per client process, so two simultaneous `browser-harness` invocations from the same agent do not mutate the same browser at once;
- cross-run close/switch attempts are rejected.

This is still not a full stress-test result. It is the first implementation pass with targeted unit coverage for the scary cases.

## Subagent Model

The harness cannot rely on controlling Codex subagent spawn parameters. The practical design is therefore prompt/interface based:

- default subagent behavior: call `browser_new(...)` and get an isolated browser;
- reuse behavior: parent gives a `browser_id`, subagent calls `browser_switch(browser_id)`;
- if the browser is busy, the manager returns `busy`, and the safe action is to wait or call `browser_new(...)`.

This keeps the LLM-visible protocol minimal and avoids requiring Codex runtime changes.

## Local Browser Note

The VM used for this work must not start local Chrome or Chromium. Local managed-browser code exists, but local startup was intentionally not smoke-tested here.

Cloud/live lifecycle should be tested separately with a Browser Use API key in the environment. Do not commit keys or put them in docs.

OAuth auth was added after this note was first created. Cloud lifecycle can now also be tested after `browser-harness auth login`, which stores a local Browser Use API key outside the repo.

## Verification Done In This Session

Commands run:

```bash
uv run --with pytest pytest -q tests/unit
uv run python -m compileall -q src/browser_harness
```

Result at the time this note was written:

```text
101 passed
```

A no-browser protocol smoke was also run:

- auto-start Python manager;
- `browser_status()` returned `no-active-browser`;
- `browser_list()` returned `[]`;
- test manager was killed afterward.

No local Chrome/Chromium was started.

## What To Review Before Merge

- Decide whether manager mode should be enabled by AST-detecting lifecycle helper calls, env vars only, or both.
- Live-test `browser_new(backend="cloud")` and `browser_close()` with a real Browser Use key.
- Live-test `browser_new(backend="managed")` on a laptop, not the VM.
- Stress-test many parallel agents/processes using the same manager root.
- Decide whether stale lease cleanup needs a sweeper.
- Decide whether profile support should remain `profile="clean"` only for the first version.
- Remove this file before merging.
49 changes: 47 additions & 2 deletions install.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ This makes new Codex or Claude Code sessions in other folders load the runtime b

```text
Chrome / Browser Use cloud -> CDP WS -> browser_harness.daemon -> IPC -> browser_harness.run
^
optional browser_harness.manager_daemon owns many isolated browser leases
```

- Protocol is one JSON line each way.
Expand All @@ -58,6 +60,47 @@ Chrome / Browser Use cloud -> CDP WS -> browser_harness.daemon -> IPC -> browser
- BU_CDP_WS overrides local Chrome discovery for remote browsers.
- BU_CDP_URL overrides local Chrome discovery with a specific DevTools HTTP endpoint (used for Way 2).
- BU_BROWSER_ID + BROWSER_USE_API_KEY lets the daemon stop a Browser Use cloud browser on shutdown.
- Manager mode auto-starts `browser-harness-manager` when `browser_status`, `browser_new`, `browser_list`, `browser_switch`, or `browser_close` is used.
- Cloud manager mode reads Browser Use auth from `BROWSER_USE_API_KEY` first, then the local `browser-harness auth login` store.

## Browser Use Cloud auth

For cloud browsers, prefer OAuth login over pasting API keys:

```bash
browser-harness auth login
```

The command generates a PKCE login request, opens or prints a Browser Use login URL, waits for the local callback, exchanges the code for an API key, and stores it in a private local file. The key is never printed.

Headless/SSH fallback:

```bash
browser-harness auth login --device-code
```

If you already have a Browser Use API key, store it safely through stdin:

```bash
browser-harness auth login --api-key-stdin
```

Do not pass API keys as command-line arguments; they can leak through shell history and process listings.

Other auth commands:

```bash
browser-harness auth status
browser-harness auth logout
```

Key resolution order for cloud browser creation:

```text
BROWSER_USE_API_KEY
-> stored browser-harness auth key
-> cloud-auth-required
```

# Browser connection setup and troubleshooting

Expand All @@ -67,10 +110,12 @@ This section is the source of truth for how browser-harness connects to a browse

Browser-harness can connect to any Chrome or Chromium-based browser on your computer, or to a Browser Use cloud browser.

**Cloud browsers** are managed by the Browser Use cloud API. Start one in Python with `start_remote_daemon("work", ...)`. Authentication is via the `BROWSER_USE_API_KEY` environment variable; the harness handles the WebSocket URL itself. To carry your local Chrome cookies into a cloud browser, install `profile-use` once (`curl -fsSL https://browser-use.com/profile.sh | sh`), then call `uuid = sync_local_profile("MyChromeProfile")` followed by `start_remote_daemon("work", profileId=uuid)`. Cookies are the only thing synced — not localStorage, not extensions, not history.
**Cloud browsers** are managed by the Browser Use cloud API. In manager mode, start one with `browser_new(backend="cloud", proxy_country="us")`; for legacy named daemons use `start_remote_daemon("work", ...)`. Authentication is via `BROWSER_USE_API_KEY` or `browser-harness auth login`; the harness handles the WebSocket URL itself. To carry your local Chrome cookies into a cloud browser, install `profile-use` once (`curl -fsSL https://browser-use.com/profile.sh | sh`), then call `uuid = sync_local_profile("MyChromeProfile")` followed by `start_remote_daemon("work", profileId=uuid)`. Cookies are the only thing synced — not localStorage, not extensions, not history.

**Local browsers** require remote debugging to be enabled. There are two ways, and they suit different use cases.

Local Way 1 also requires an explicit selected profile before the harness attaches. Run `list_local_profiles()` to get stable ids such as `google-chrome:Default`, then `use_local_profile("google-chrome:Default")`. The daemon snapshots that selected profile at startup and refuses to attach to an arbitrary available Chrome profile.

*Way 1: chrome://inspect/#remote-debugging checkbox — uses your real profile.* In your running Chrome, navigate to `chrome://inspect/#remote-debugging` and tick the "Allow remote debugging for this browser instance" checkbox. This setting is per-profile and sticky: tick it once and it persists across every future Chrome launch of that profile. Then run any `browser-harness` command. On Chrome 144 and later, the first attach by the harness triggers an in-browser "Allow remote debugging?" popup that you must click Allow on. The popup may reappear on later attaches under conditions that are not fully characterized.[^1] This path inherits your everyday Chrome's logins, extensions, history, and bookmarks, which makes it the right choice for an agent helping you with tasks in your real browser.

*Way 2: command-line flag — uses an isolated profile, no popups ever.* Launch Chrome with `--remote-debugging-port=9222 --user-data-dir=<path>`. Two precisions:
Expand Down Expand Up @@ -98,7 +143,7 @@ If the user hasn't said which connection method to use, default to Way 1 if Chro
PY
```

If it prints page info, you're done.
If it prints page info, you're done. If it reports `needs-profile`, run `list_local_profiles()`, choose a stable profile id with the user, call `use_local_profile(profile_id)`, then retry.

2. Otherwise run `browser-harness --doctor`. The two lines that matter for connection are `chrome running` and `daemon alive`.

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ dependencies = [

[project.scripts]
browser-harness = "browser_harness.run:main"
browser-harness-manager = "browser_harness.manager_daemon:main"

[tool.setuptools]
package-dir = {"" = "src"}
Expand Down
Loading