mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 00:57:41 +08:00
173 lines
5.0 KiB
Plaintext
173 lines
5.0 KiB
Plaintext
---
|
|
title: browser - Browser
|
|
description: Control a browser to access and interact with web pages
|
|
---
|
|
|
|
Control a Chromium browser for web navigation, element interaction and content extraction. Supports JavaScript-rendered pages and uses a compact DOM snapshot so the Agent can efficiently understand page structure.
|
|
|
|
## Installation
|
|
|
|
<Tabs>
|
|
<Tab title="CLI install (recommended)">
|
|
```bash
|
|
cow install-browser
|
|
```
|
|
|
|
This command will:
|
|
- Install the `playwright` Python package (with auto-fallback for older systems)
|
|
- Install system dependencies on Linux
|
|
- Download the Chromium browser (Linux servers automatically use the headless build)
|
|
- Detect China-mainland networks and use mirror acceleration
|
|
</Tab>
|
|
<Tab title="Manual install">
|
|
```bash
|
|
pip install playwright
|
|
playwright install chromium
|
|
```
|
|
|
|
On Linux servers, install system dependencies as well:
|
|
```bash
|
|
sudo playwright install-deps chromium
|
|
```
|
|
|
|
On older systems (e.g. Ubuntu 18.04, glibc < 2.28), install a compatible version:
|
|
```bash
|
|
pip install playwright==1.28.0
|
|
python -m playwright install chromium
|
|
```
|
|
|
|
To accelerate the Chromium download from China:
|
|
```bash
|
|
export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright
|
|
python -m playwright install chromium
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
<Note>
|
|
1. Supported on Ubuntu 20.04+, Debian 10+, macOS and Windows. Older systems such as Ubuntu 18.04 will fall back to a compatible version automatically.
|
|
2. The browser tool has heavy dependencies (~300MB) and is optional. For lightweight web content retrieval, use the `web_fetch` tool.
|
|
</Note>
|
|
|
|
## Workflow
|
|
|
|
A typical browser workflow for the Agent:
|
|
|
|
1. **`navigate`** — Open the target URL
|
|
2. **`snapshot`** — Get a compact DOM with auto-numbered interactive elements (`ref`)
|
|
3. **`click` / `fill` / `select`** — Operate elements by `ref`
|
|
4. **`snapshot`** — Snapshot again to verify the result
|
|
|
|
## Supported Actions
|
|
|
|
| Action | Description | Key parameters |
|
|
| --- | --- | --- |
|
|
| `navigate` | Open URL | `url` |
|
|
| `snapshot` | Get structured page text (primary way) | `selector` (optional) |
|
|
| `click` | Click an element | `ref` or `selector` |
|
|
| `fill` | Fill text into an input | `ref` or `selector`, `text` |
|
|
| `select` | Select a dropdown option | `ref` or `selector`, `value` |
|
|
| `scroll` | Scroll the page | `direction` (up/down/left/right) |
|
|
| `screenshot` | Save a screenshot to the workspace | `full_page` |
|
|
| `wait` | Wait for an element or timeout | `selector`, `timeout` |
|
|
| `press` | Press a key (Enter, Tab, etc.) | `key` |
|
|
| `back` / `forward` | Browser back / forward | - |
|
|
| `get_text` | Get an element's text content | `selector` |
|
|
| `evaluate` | Run JavaScript | `script` |
|
|
|
|
## Use Cases
|
|
|
|
- Access a URL to retrieve dynamic page content
|
|
- Fill in forms and log in
|
|
- Operate web elements (click buttons, select options, etc.)
|
|
- Verify the result of a deployed web page
|
|
- Scrape content that requires JS rendering
|
|
|
|
## Run Mode
|
|
|
|
The browser picks a mode based on the runtime environment:
|
|
|
|
| Environment | Mode |
|
|
| --- | --- |
|
|
| macOS / Windows | Headed (browser window visible) |
|
|
| Linux desktop (with DISPLAY) | Headed |
|
|
| Linux server (no DISPLAY) | Headless |
|
|
|
|
You can override it in `config.json`:
|
|
|
|
```json
|
|
{
|
|
"tools": {
|
|
"browser": {
|
|
"headless": true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Persistent Login
|
|
|
|
**Log in to a target site once and the Agent can keep using it.** Two ways are supported:
|
|
|
|
### Option 1: Persistent mode (default)
|
|
|
|
Works out of the box. Login state is saved under `~/.cow/browser_profile`. No configuration needed.
|
|
|
|
To disable persistence and start with a clean environment every time:
|
|
|
|
```json
|
|
{
|
|
"tools": {
|
|
"browser": {
|
|
"persistent": false
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Option 2: CDP mode (attach to real Chrome)
|
|
|
|
Have the Agent connect to a separately launched real Chrome (instead of the Chromium bundled with Playwright) for full browser fingerprints. Useful for sites with strict bot detection.
|
|
|
|
Launch Chrome with a debugging port and a dedicated user data directory:
|
|
|
|
<Tabs>
|
|
<Tab title="macOS">
|
|
```bash
|
|
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
|
|
--remote-debugging-port=9222 \
|
|
--user-data-dir="$HOME/.cow/chrome-cdp"
|
|
```
|
|
</Tab>
|
|
<Tab title="Linux">
|
|
```bash
|
|
google-chrome \
|
|
--remote-debugging-port=9222 \
|
|
--user-data-dir="$HOME/.cow/chrome-cdp"
|
|
```
|
|
</Tab>
|
|
<Tab title="Windows">
|
|
```powershell
|
|
& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
|
|
--remote-debugging-port=9222 `
|
|
--user-data-dir="$env:USERPROFILE\.cow\chrome-cdp"
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
Then point the Agent at the endpoint in `config.json`:
|
|
|
|
```json
|
|
{
|
|
"tools": {
|
|
"browser": {
|
|
"cdp_endpoint": "http://localhost:9222"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
<Note>
|
|
Chrome 137+ requires `--remote-debugging-port` to be paired with a dedicated `--user-data-dir`. As a result, the CDP-launched Chrome **cannot directly reuse the login state of your daily Chrome**; you'll need to log in once inside this dedicated profile.
|
|
</Note>
|