Files
chatgpt-on-wechat/docs/tools/browser.mdx
zhayujie a0dfdb79df feat(browser): persistent login + CDP attach mode #2809
Browser sessions now reuse a Chromium user profile across runs by default
(`~/.cow/browser_profile`), so users only log in to a site once.
Three launch modes are selectable via `tools.browser` in config.json:
  - persistent (default): Playwright Chromium with a persistent user_data_dir
  - cdp: attach to an externally launched real Chrome via `cdp_endpoint`
    (full fingerprints, ideal for sites with strict bot detection)
  - fresh: clean context every run, set `persistent: false`

Also:
  - Self-heal when the user closes the browser window mid-session: detect
    closed page/context/browser via close listeners and exception scanning,
    then transparently relaunch on the next request.
  - Graceful CDP shutdown: disconnect only, never kill the user's Chrome.
  - Friendly errors when the CDP endpoint is unreachable or the persistent
    profile is locked, so the LLM can guide the user instead of looping.
  - Fix tool config being silently overwritten by workspace config in
    AgentInitializer; per-tool user settings (e.g. browser.cdp_endpoint)
    are now merged instead of replaced.
  - Update zh / en / ja docs with the new login-persistence section,
    including the Chrome 137+ requirement to pair --remote-debugging-port
    with a dedicated --user-data-dir.
2026-05-19 11:52:11 +08:00

173 lines
4.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: browser - 浏览器
description: 控制浏览器访问和操作网页
---
控制 Chromium 浏览器进行网页导航、元素交互和内容提取。支持 JavaScript 渲染的动态页面,使用精简 DOM 快照让 Agent 高效理解页面结构。
## 安装
<Tabs>
<Tab title="CLI 安装(推荐)">
```bash
cow install-browser
```
该命令会自动完成:
- 安装 `playwright` Python 包(旧系统自动降级兼容版本)
- 在 Linux 上安装系统依赖
- 下载 Chromium 浏览器Linux 服务器自动使用无头精简版)
- 自动检测国内网络并使用镜像加速
</Tab>
<Tab title="手动安装">
```bash
pip install playwright
playwright install chromium
```
Linux 服务器还需安装系统依赖:
```bash
sudo playwright install-deps chromium
```
如果系统较旧(如 Ubuntu 18.04glibc < 2.28),需安装兼容版本:
```bash
pip install playwright==1.28.0
python -m playwright install chromium
```
国内网络下载 Chromium 较慢,可设置镜像加速:
```bash
export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright
python -m playwright install chromium
```
</Tab>
</Tabs>
<Note>
1. 支持 Ubuntu 20.04+、Debian 10+、macOS、Windows。Ubuntu 18.04 等旧系统会自动降级安装兼容版本。
2. 浏览器工具依赖较重约300MB为可选安装。轻量的网页内容获取可使用 `web_fetch` 工具。
</Note>
## 工作流程
Agent 使用浏览器的典型流程:
1. **`navigate`** — 打开目标 URL
2. **`snapshot`** — 获取页面精简 DOM交互元素自动编号ref
3. **`click` / `fill` / `select`** — 通过 ref 编号操作元素
4. **`snapshot`** — 再次快照验证操作结果
## 支持的操作
| 操作 | 说明 | 关键参数 |
| --- | --- | --- |
| `navigate` | 打开 URL | `url` |
| `snapshot` | 获取页面结构化文本(主要方式) | `selector`(可选) |
| `click` | 点击元素 | `ref` 或 `selector` |
| `fill` | 填入文本 | `ref` 或 `selector``text` |
| `select` | 下拉选择 | `ref` 或 `selector``value` |
| `scroll` | 滚动页面 | `direction`up/down/left/right |
| `screenshot` | 截图保存到工作区 | `full_page` |
| `wait` | 等待元素或超时 | `selector``timeout` |
| `press` | 按键Enter、Tab 等) | `key` |
| `back` / `forward` | 浏览器前进/后退 | - |
| `get_text` | 获取元素文本内容 | `selector` |
| `evaluate` | 执行 JavaScript | `script` |
## 使用场景
- 访问指定 URL 获取动态页面内容
- 填写表单、登录操作
- 操作网页元素(点击按钮、选择选项等)
- 验证部署后的网页效果
- 抓取需要 JS 渲染的动态内容
## 运行模式
浏览器会根据运行环境自动选择模式:
| 环境 | 模式 |
| --- | --- |
| macOS / Windows | 有头模式(显示浏览器窗口) |
| Linux 桌面(有 DISPLAY | 有头模式 |
| Linux 服务器(无 DISPLAY | 无头模式headless |
可在 `config.json` 中手动覆盖:
```json
{
"tools": {
"browser": {
"headless": true
}
}
}
```
## 登录态持久化
**只需登录一次目标网站Agent 后续可直接使用**。提供两种方式:
### 方式一Persistent 模式(默认)
开箱即用,登录信息保存在 `~/.cow/browser_profile`。无需任何配置。
如需关闭持久化模式,每次都用纯净环境:
```json
{
"tools": {
"browser": {
"persistent": false
}
}
}
```
### 方式二CDP 模式(接管真实 Chrome
让 Agent 连接独立启动的真实 Chrome而非 Playwright 自带的 Chromium获得完整浏览器指纹适合反爬严格的网站。
启动 Chrome 时加上调试端口和独立用户目录:
<Tabs>
<Tab title="macOS">
```bash
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/.cow/chrome-cdp"
```
</Tab>
<Tab title="Linux">
```bash
google-chrome \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/.cow/chrome-cdp"
```
</Tab>
<Tab title="Windows">
```powershell
& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
--remote-debugging-port=9222 `
--user-data-dir="$env:USERPROFILE\.cow\chrome-cdp"
```
</Tab>
</Tabs>
在 `config.json` 中配置端点:
```json
{
"tools": {
"browser": {
"cdp_endpoint": "http://localhost:9222"
}
}
}
```
<Note>
Chrome 137+ 限制 `--remote-debugging-port` 必须搭配独立 `--user-data-dir`,因此 CDP 启动的 Chrome **无法直接复用你日常 Chrome 的登录态**,需要在独立目录中重新登录一次。
</Note>