Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
name: release

on:
release:
types: [published]

jobs:
publish:
name: publish to PyPI
runs-on: ubuntu-latest
environment: pypi
permissions:
contents: read
id-token: write
steps:
- uses: actions/checkout@v4

@cubic-dev-ai cubic-dev-ai Bot Jun 21, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Release workflow uses mutable action tags instead of immutable commit SHAs, creating CI/CD supply-chain risk for PyPI publishing.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/release.yml, line 16:

<comment>Release workflow uses mutable action tags instead of immutable commit SHAs, creating CI/CD supply-chain risk for PyPI publishing.</comment>

<file context>
@@ -0,0 +1,25 @@
+      contents: read
+      id-token: write
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
</file context>
Fix with cubic

- uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Build distributions
run: |
python -m pip install --upgrade build
python -m build
- name: Publish distributions
uses: pypa/gh-action-pypi-publish@release/v1
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@ __pycache__/
*.pyc
*.log
.env
.browser-harness-dev/
build/
dist/
uv.lock
*.egg-info/
.idea/
Expand Down
28 changes: 21 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,7 @@ One websocket to Chrome, nothing between. The agent writes what's missing during
Paste into Claude Code or Codex:

```text
Set up https://github.com/browser-use/browser-harness for me.

Read `install.md` and follow the steps to install browser-harness and connect it to my browser.
Install browser-harness with uv, register the skill from `browser-harness skill`, and connect it to my browser.
```

The agent will open `chrome://inspect/#remote-debugging`. Tick the checkbox so the agent can connect to your browser:
Expand All @@ -51,23 +49,39 @@ Stealth, sub-agents, or headless deployment.<br>
- `install.md` — first-time install and browser bootstrap
- `SKILL.md` — day-to-day usage
- `src/browser_harness/` — protected core package
- `agent-workspace/agent_helpers.py` — helper code the agent edits
- `agent-workspace/domain-skills/` — reusable site-specific skills the agent edits
- `${XDG_CONFIG_HOME:-~/.config}/browser-harness/agent-workspace/agent_helpers.py` — helper code the agent edits
- `${XDG_CONFIG_HOME:-~/.config}/browser-harness/agent-workspace/domain-skills/` — reusable site-specific skills the agent edits

Plain `browser-harness` helper calls use the selected local browser profile. For isolated or cloud work, start with `browser_new("private")` or `browser_new("cloud")`, keep the returned short `id`, and call `browser(id)` before page helpers in each script; cloud responses include a `live_url` preview when available.

## Development

From a checkout, use `./browser-harness` to run the current working tree without activating a virtualenv or depending on the globally installed command:

```bash
./browser-harness <<'PY'
print(browser_new("private"))
PY
```

Normal agent-facing docs should keep using `browser-harness`; the `./browser-harness` launcher is only for local repo testing.

The dev launcher uses a short checkout-specific manager path under `/tmp`, so it does not attach to a stale global manager or another task's default manager.

## Contributing

PRs and improvements welcome. The best way to help: **contribute a new domain skill** under [agent-workspace/domain-skills/](agent-workspace/domain-skills/) for a site or task you use often (LinkedIn outreach, ordering on Amazon, filing expenses, etc.). Each skill teaches the agent the selectors, flows, and edge cases it would otherwise have to rediscover.

- **Skills are written by the harness, not by you.** Just run your task with the agent — when it figures something non-obvious out, it files the skill itself (see [SKILL.md](SKILL.md)). Please don't hand-author skill files; agent-generated ones reflect what actually works in the browser.
- Open a PR with the generated `agent-workspace/domain-skills/<site>/` folder — small and focused is great.
- Open a PR with the generated `domain-skills/<site>/` folder copied into this repo's `agent-workspace/domain-skills/` examples — small and focused is great.
- Bug fixes, docs tweaks, and helper improvements are equally welcome.
- Browse existing skills (`github/`, `linkedin/`, `amazon/`, ...) to see the shape.

If you're not sure where to start, open an issue and we'll point you somewhere useful.

## Domain skills

Set `BH_DOMAIN_SKILLS=1` to enable [agent-workspace/domain-skills/](agent-workspace/domain-skills/) — community-contributed per-site playbooks `goto_url` surfaces by domain. Contribute via PR.
Set `BH_DOMAIN_SKILLS=1` to enable domain skills from the agent workspace. This repo's [agent-workspace/domain-skills/](agent-workspace/domain-skills/) directory contains examples to contribute via PR.

---

Expand Down
144 changes: 73 additions & 71 deletions SKILL.md
Original file line number Diff line number Diff line change
@@ -1,66 +1,105 @@
---
name: browser
description: Direct browser control via CDP. Use when the user wants to automate, scrape, test, or interact with web pages. Connects to the user's already-running Chrome.
name: browser-harness
description: Always use browser-harness for any web interaction: automation, scraping, testing, or site/app work.
---

# browser-harness

Direct browser control via CDP. For task-specific edits, use `agent-workspace/agent_helpers.py`. For setup, install, or connection problems, read install.md.
Managed browsers have short explicit ids. Create or receive an id, then select it inside each script.

Domain skills (community-contributed per-site playbooks under `agent-workspace/domain-skills/`) are off by default. Set `BH_DOMAIN_SKILLS=1` to enable them; see the bottom section.

**If `BH_DOMAIN_SKILLS=1` and the task is site-specific, read every file in the matching `agent-workspace/domain-skills/<site>/` directory before inventing an approach.**

## Usage
Create and use a private browser:

```bash
browser-harness <<'PY'
b = browser_new("private")
browser(b["id"])
new_tab("https://docs.browser-use.com")
wait_for_load()
print({"id": b["id"], "page": page_info()})
PY
```

Use an existing managed browser:

```bash
browser-harness <<'PY'
browser("abc123")
print(page_info())
PY
```

- Invoke as browser-harness — it's on $PATH. No cd, no uv run.
- Use the heredoc form for every multi-line command. It prevents shell quote mangling inside Python strings and JavaScript snippets.
- First navigation is new_tab(url), not goto_url(url) — goto runs in the user's active tab and clobbers their work.
`browser(id)` selects a browser for this script only. Do not rely on a current browser across separate shell commands. Sharing an id means sharing that browser's tabs, cookies, downloads, and session state.

## Tool call shape
Inspect managed browsers:

```bash
browser-harness <<'PY'
# any python. helpers pre-imported. daemon auto-starts.
print(browser_list())
print(browser_status("abc123"))
PY
```

run.py calls ensure_daemon() before exec — you never start/stop manually unless you want to.
`browser_list()` shows known managed browser ids and their owners.

## Choose Browser

- User's logged-in local Chrome: use normal helpers. If setup asks for a profile, run `browser_profiles()`, ask the user which `id` to use, then run `browser_use_profile(id)` and retry.
- Isolated local browser: `browser_new("private")`, then keep the returned `id`.
- Browser Use cloud browser with live view: `browser_new("cloud")`, then keep the returned `id`.
- Managed browser page work: call `browser(id)` first in the script.
- Subagent: if the parent gives an id, start browser scripts with `browser(id)` and do not close it unless asked.
- Done with a private or cloud browser: `browser_close(id)`.
- Done with all browsers you created: `browser_close_owned()`.

## Browser Helpers

```python
browser_status(id)
browser_profiles()
browser_use_profile(profile_id)
browser_new("private")
browser_new("cloud")
browser(id)
browser_list()
browser_close(id)
browser_close_owned()
```

`browser_profiles()` and `browser_use_profile(...)` are local setup calls. They do not start browser work.

### Remote browsers
Inside one Python script, `browser(id)` attaches the process to that browser so normal page helpers work: `new_tab`, `page_info`, `capture_screenshot`, `click_at_xy`, `type_text`, `js`, and `cdp`.

Use remote for parallel sub-agents (each gets its own isolated browser via a distinct BU_NAME) or on a headless server. BROWSER_USE_API_KEY must be set. start_remote_daemon, list_cloud_profiles, list_local_profiles, sync_local_profile are pre-imported.
If `browser_new("cloud")` reports `cloud-auth-required`, run:

```bash
browser-harness <<'PY'
start_remote_daemon("work") # default — clean browser, no profile
# start_remote_daemon("work", profileName="my-work") # reuse a cloud profile (already logged in)
# start_remote_daemon("work", profileId="<uuid>") # same, but by UUID
# start_remote_daemon("work", proxyCountryCode="de", timeout=120) # DE proxy, 2-hour timeout
# start_remote_daemon("work", proxyCountryCode=None) # disable the Browser Use proxy
PY
browser-harness auth login
```

BU_NAME=work browser-harness <<'PY'
new_tab("https://example.com")
print(page_info())
PY
If the user directly provides an API key, store it through stdin:

```bash
browser-harness auth login --api-key-stdin
```

start_remote_daemon prints liveUrl and auto-opens it in the local browser (if a GUI is detected) so the user can watch along. Headless servers print only — share the URL with the user. The daemon PATCHes the cloud browser to stop on shutdown, which persists profile state. Running remote daemons bill until timeout.
Never put API keys in command-line arguments.

Profiles (cookies-only login state) live in interaction-skills/profile-sync.md — covers list_cloud_profiles(), the chat-driven "which profile?" pattern, and sync_local_profile() for uploading a local Chrome profile.
## Page Workflow

## Interaction skills
- First navigation is `new_tab(url)`, not `goto_url(url)`.
- Screenshots are the default way to understand and verify visible state: `capture_screenshot()`.
- If using `view_image`, call it after `capture_screenshot()` returns the PNG path; do not parallelize capture and viewing.
- Click visible targets by screenshot coordinates: `click_at_xy(x, y)`.
- Use `js(...)` for DOM inspection or extraction when coordinates are the wrong tool.
- After navigation, call `wait_for_load()`.
- If the current tab is stale or internal, call `ensure_real_tab()`.
- If a tab/session dies (`target-gone`, `browser session ended`), open a fresh tab; if status is not ready, create a new browser.
- If redirected to a login wall, stop and ask the user. Do not type credentials from screenshots.
- For anything helpers do not cover, use raw CDP: `cdp("Domain.method", params)`.

## Interaction Skills

If you get stuck on a browser mechanic, check `interaction-skills/` for focused guidance:

If you start struggling with a specific mechanic while navigating, look in interaction-skills/ for helpers. They cover reusable UI mechanics like dialogs, tabs, dropdowns, iframes, and uploads. The available interaction skills are:
- connection.md
- cookies.md
- cross-origin-iframes.md
Expand All @@ -79,45 +118,8 @@ If you start struggling with a specific mechanic while navigating, look in inter
- uploads.md
- viewport.md

## What actually works

- Screenshots first: use capture_screenshot() to understand the current page quickly, find visible targets, and decide whether you need a click, a selector, or more navigation.
- Clicking: capture_screenshot() → read the pixel off the image → click_at_xy(x, y) → capture_screenshot() to verify. Suppress the Playwright-habit reflex of "locate first, then click" — no getBoundingClientRect, no selector hunt. Drop to DOM only when the target has no visible geometry (hidden input, 0×0 node). Hit-testing happens in Chrome's browser process, so clicks go through iframes / shadow DOM / cross-origin without extra work.
- Bulk HTTP: http_get(url) + ThreadPoolExecutor. No browser for static pages (249 Netflix pages in 2.8s).
- After goto: wait_for_load().
- Wrong/stale tab: ensure_real_tab(). Use it when the current tab is stale or internal; the daemon also auto-recovers from stale sessions on the next call.
- Verification: print(page_info()) is the simplest "is this alive?" check, but screenshots are the default way to verify whether a visible action actually worked.
- DOM reads: use js(...) for inspection and extraction when the screenshot shows that coordinates are the wrong tool.
- Iframe sites (Azure blades, Salesforce): click_at_xy(x, y) passes through; only drop to iframe DOM work when coordinate clicks are the wrong tool.
- Auth wall: redirected to login → stop and ask the user. Don't type credentials from screenshots.
- Raw CDP for anything helpers don't cover: cdp("Domain.method", params).

## Design constraints

- Coordinate clicks default. Input.dispatchMouseEvent goes through iframes/shadow/cross-origin at the compositor level.
- Connect to the user's running Chrome. Don't launch your own browser.
- cdp-use is only for CDPClient.send_raw. Prefer raw CDP strings over typed wrappers.
- run.py stays tiny. No argparse, subcommands, or extra control layer.
- Core helpers stay short. Put task-specific helper additions in `agent-workspace/agent_helpers.py`; daemon/bootstrap and remote session admin live in the core package.
- Don't add a manager layer. No retries framework, session manager, daemon supervisor, config system, or logging framework.

## Gotchas (field-tested)

- Omnibox popups are fake page targets. Filter chrome://omnibox-popup... and other internals when you need a real tab.
- CDP target order != Chrome's visible tab-strip order. Use UI automation when the user means "the first/second tab I can see"; Target.activateTarget only shows a known target.
- Default daemon sessions can go stale. ensure_real_tab() re-attaches to a real page.
- Browser Use API is camelCase on the wire. cdpUrl, proxyCountryCode, etc.
- Remote cdpUrl is HTTPS, not ws. Resolve the websocket URL via /json/version.
- Stop cloud browsers with PATCH /browsers/{id} + {"action":"stop"}.
- After every meaningful action, re-screenshot before assuming it worked. Use the image to verify changed state, open menus, navigation, visible errors, and whether the page is in the state you expected.
- Use screenshots to drive exploration. They are often the fastest way to find the next click target, notice hidden blockers, and decide if a selector is even worth writing.
- Prefer compositor-level actions over framework hacks. Try screenshots, coordinate clicks, and raw key input before adding DOM-specific workarounds.
- If you need framework-specific DOM tricks, check interaction-skills/ first. That is where dropdown, dialog, iframe, shadow DOM, and form-specific guidance belongs.

## Domain skills (opt-in)

Only applies when `BH_DOMAIN_SKILLS=1`. Otherwise ignore — `agent-workspace/domain-skills/` is dormant and `goto_url` won't surface skill files.
## Domain Skills

When enabled, search `agent-workspace/domain-skills/<host>/` before inventing an approach. `goto_url` returns up to 10 skill filenames for the navigated host.
Domain skills are off by default. If `BH_DOMAIN_SKILLS=1` and the task is site-specific, read every file in `$BH_AGENT_WORKSPACE/domain-skills/<site>/` before inventing an approach. Default workspace: `~/.config/browser-harness/agent-workspace`.

If you learn anything non-obvious — a private API, stable selector, framework quirk, URL pattern, hidden wait, or site-specific trap — open a PR to `agent-workspace/domain-skills/<site>/`. Capture the durable shape of the site (the map, not the diary). Don't write pixel coordinates (break on layout), task narration, or secrets — the directory is public.
When enabled, `goto_url(...)` returns up to 10 matching skill filenames for the current host.
19 changes: 19 additions & 0 deletions browser-harness
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#!/usr/bin/env bash
set -euo pipefail

ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)"
DEV_ID="$(printf '%s' "$ROOT" | cksum | awk '{print $1}')"

export BH_MANAGER_ROOT="${BH_MANAGER_ROOT:-/tmp/bhm-dev-$DEV_ID}"
export BH_MANAGER_SOCKET="${BH_MANAGER_SOCKET:-$BH_MANAGER_ROOT/manager.sock}"

if [ -x "$ROOT/.venv/bin/python" ]; then
exec "$ROOT/.venv/bin/python" -m browser_harness.run "$@"
fi

if command -v uv >/dev/null 2>&1; then
exec uv --directory "$ROOT" run python -m browser_harness.run "$@"
fi

export PYTHONPATH="$ROOT/src${PYTHONPATH:+:$PYTHONPATH}"
exec python3 -m browser_harness.run "$@"
Loading