--- title: browser - Browser description: Control a browser to access and interact with web pages --- Control a Chromium browser for web navigation, element interaction and content extraction. Supports JavaScript-rendered pages and uses a compact DOM snapshot so the Agent can efficiently understand page structure. ## Installation ```bash cow install-browser ``` This command will: - Install the `playwright` Python package (with auto-fallback for older systems) - Install system dependencies on Linux - Download the Chromium browser (Linux servers automatically use the headless build) - Detect China-mainland networks and use mirror acceleration ```bash pip install playwright playwright install chromium ``` On Linux servers, install system dependencies as well: ```bash sudo playwright install-deps chromium ``` On older systems (e.g. Ubuntu 18.04, glibc < 2.28), install a compatible version: ```bash pip install playwright==1.28.0 python -m playwright install chromium ``` To accelerate the Chromium download from China: ```bash export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright python -m playwright install chromium ``` 1. Supported on Ubuntu 20.04+, Debian 10+, macOS and Windows. Older systems such as Ubuntu 18.04 will fall back to a compatible version automatically. 2. The browser tool has heavy dependencies (~300MB) and is optional. For lightweight web content retrieval, use the `web_fetch` tool. ## Workflow A typical browser workflow for the Agent: 1. **`navigate`** — Open the target URL 2. **`snapshot`** — Get a compact DOM with auto-numbered interactive elements (`ref`) 3. **`click` / `fill` / `select`** — Operate elements by `ref` 4. **`snapshot`** — Snapshot again to verify the result ## Supported Actions | Action | Description | Key parameters | | --- | --- | --- | | `navigate` | Open URL | `url` | | `snapshot` | Get structured page text (primary way) | `selector` (optional) | | `click` | Click an element | `ref` or `selector` | | `fill` | Fill text into an input | `ref` or `selector`, `text` | | `select` | Select a dropdown option | `ref` or `selector`, `value` | | `scroll` | Scroll the page | `direction` (up/down/left/right) | | `screenshot` | Save a screenshot to the workspace | `full_page` | | `wait` | Wait for an element or timeout | `selector`, `timeout` | | `press` | Press a key (Enter, Tab, etc.) | `key` | | `back` / `forward` | Browser back / forward | - | | `get_text` | Get an element's text content | `selector` | | `evaluate` | Run JavaScript | `script` | ## Use Cases - Access a URL to retrieve dynamic page content - Fill in forms and log in - Operate web elements (click buttons, select options, etc.) - Verify the result of a deployed web page - Scrape content that requires JS rendering ## Run Mode The browser picks a mode based on the runtime environment: | Environment | Mode | | --- | --- | | macOS / Windows | Headed (browser window visible) | | Linux desktop (with DISPLAY) | Headed | | Linux server (no DISPLAY) | Headless | You can override it in `config.json`: ```json { "tools": { "browser": { "headless": true } } } ``` ## Persistent Login **Log in to a target site once and the Agent can keep using it.** Two ways are supported: ### Option 1: Persistent mode (default) Works out of the box. Login state is saved under `~/.cow/browser_profile`. No configuration needed. To disable persistence and start with a clean environment every time: ```json { "tools": { "browser": { "persistent": false } } } ``` ### Option 2: CDP mode (attach to real Chrome) Have the Agent connect to a separately launched real Chrome (instead of the Chromium bundled with Playwright) for full browser fingerprints. Useful for sites with strict bot detection. Launch Chrome with a debugging port and a dedicated user data directory: ```bash "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` ```bash google-chrome \ --remote-debugging-port=9222 \ --user-data-dir="$HOME/.cow/chrome-cdp" ``` ```powershell & "C:\Program Files\Google\Chrome\Application\chrome.exe" ` --remote-debugging-port=9222 ` --user-data-dir="$env:USERPROFILE\.cow\chrome-cdp" ``` Then point the Agent at the endpoint in `config.json`: ```json { "tools": { "browser": { "cdp_endpoint": "http://localhost:9222" } } } ``` Chrome 137+ requires `--remote-debugging-port` to be paired with a dedicated `--user-data-dir`. As a result, the CDP-launched Chrome **cannot directly reuse the login state of your daily Chrome**; you'll need to log in once inside this dedicated profile.