mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 00:57:41 +08:00
Browser sessions now reuse a Chromium user profile across runs by default
(`~/.cow/browser_profile`), so users only log in to a site once.
Three launch modes are selectable via `tools.browser` in config.json:
- persistent (default): Playwright Chromium with a persistent user_data_dir
- cdp: attach to an externally launched real Chrome via `cdp_endpoint`
(full fingerprints, ideal for sites with strict bot detection)
- fresh: clean context every run, set `persistent: false`
Also:
- Self-heal when the user closes the browser window mid-session: detect
closed page/context/browser via close listeners and exception scanning,
then transparently relaunch on the next request.
- Graceful CDP shutdown: disconnect only, never kill the user's Chrome.
- Friendly errors when the CDP endpoint is unreachable or the persistent
profile is locked, so the LLM can guide the user instead of looping.
- Fix tool config being silently overwritten by workspace config in
AgentInitializer; per-tool user settings (e.g. browser.cdp_endpoint)
are now merged instead of replaced.
- Update zh / en / ja docs with the new login-persistence section,
including the Chrome 137+ requirement to pair --remote-debugging-port
with a dedicated --user-data-dir.
173 lines
4.7 KiB
Plaintext
173 lines
4.7 KiB
Plaintext
---
|
||
title: browser - 浏览器
|
||
description: 控制浏览器访问和操作网页
|
||
---
|
||
|
||
控制 Chromium 浏览器进行网页导航、元素交互和内容提取。支持 JavaScript 渲染的动态页面,使用精简 DOM 快照让 Agent 高效理解页面结构。
|
||
|
||
## 安装
|
||
|
||
<Tabs>
|
||
<Tab title="CLI 安装(推荐)">
|
||
```bash
|
||
cow install-browser
|
||
```
|
||
|
||
该命令会自动完成:
|
||
- 安装 `playwright` Python 包(旧系统自动降级兼容版本)
|
||
- 在 Linux 上安装系统依赖
|
||
- 下载 Chromium 浏览器(Linux 服务器自动使用无头精简版)
|
||
- 自动检测国内网络并使用镜像加速
|
||
</Tab>
|
||
<Tab title="手动安装">
|
||
```bash
|
||
pip install playwright
|
||
playwright install chromium
|
||
```
|
||
|
||
Linux 服务器还需安装系统依赖:
|
||
```bash
|
||
sudo playwright install-deps chromium
|
||
```
|
||
|
||
如果系统较旧(如 Ubuntu 18.04,glibc < 2.28),需安装兼容版本:
|
||
```bash
|
||
pip install playwright==1.28.0
|
||
python -m playwright install chromium
|
||
```
|
||
|
||
国内网络下载 Chromium 较慢,可设置镜像加速:
|
||
```bash
|
||
export PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright
|
||
python -m playwright install chromium
|
||
```
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
<Note>
|
||
1. 支持 Ubuntu 20.04+、Debian 10+、macOS、Windows。Ubuntu 18.04 等旧系统会自动降级安装兼容版本。
|
||
2. 浏览器工具依赖较重(约300MB),为可选安装。轻量的网页内容获取可使用 `web_fetch` 工具。
|
||
</Note>
|
||
|
||
## 工作流程
|
||
|
||
Agent 使用浏览器的典型流程:
|
||
|
||
1. **`navigate`** — 打开目标 URL
|
||
2. **`snapshot`** — 获取页面精简 DOM,交互元素自动编号(ref)
|
||
3. **`click` / `fill` / `select`** — 通过 ref 编号操作元素
|
||
4. **`snapshot`** — 再次快照验证操作结果
|
||
|
||
## 支持的操作
|
||
|
||
| 操作 | 说明 | 关键参数 |
|
||
| --- | --- | --- |
|
||
| `navigate` | 打开 URL | `url` |
|
||
| `snapshot` | 获取页面结构化文本(主要方式) | `selector`(可选) |
|
||
| `click` | 点击元素 | `ref` 或 `selector` |
|
||
| `fill` | 填入文本 | `ref` 或 `selector`,`text` |
|
||
| `select` | 下拉选择 | `ref` 或 `selector`,`value` |
|
||
| `scroll` | 滚动页面 | `direction`(up/down/left/right) |
|
||
| `screenshot` | 截图保存到工作区 | `full_page` |
|
||
| `wait` | 等待元素或超时 | `selector`,`timeout` |
|
||
| `press` | 按键(Enter、Tab 等) | `key` |
|
||
| `back` / `forward` | 浏览器前进/后退 | - |
|
||
| `get_text` | 获取元素文本内容 | `selector` |
|
||
| `evaluate` | 执行 JavaScript | `script` |
|
||
|
||
## 使用场景
|
||
|
||
- 访问指定 URL 获取动态页面内容
|
||
- 填写表单、登录操作
|
||
- 操作网页元素(点击按钮、选择选项等)
|
||
- 验证部署后的网页效果
|
||
- 抓取需要 JS 渲染的动态内容
|
||
|
||
## 运行模式
|
||
|
||
浏览器会根据运行环境自动选择模式:
|
||
|
||
| 环境 | 模式 |
|
||
| --- | --- |
|
||
| macOS / Windows | 有头模式(显示浏览器窗口) |
|
||
| Linux 桌面(有 DISPLAY) | 有头模式 |
|
||
| Linux 服务器(无 DISPLAY) | 无头模式(headless) |
|
||
|
||
可在 `config.json` 中手动覆盖:
|
||
|
||
```json
|
||
{
|
||
"tools": {
|
||
"browser": {
|
||
"headless": true
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 登录态持久化
|
||
|
||
**只需登录一次目标网站,Agent 后续可直接使用**。提供两种方式:
|
||
|
||
### 方式一:Persistent 模式(默认)
|
||
|
||
开箱即用,登录信息保存在 `~/.cow/browser_profile`。无需任何配置。
|
||
|
||
如需关闭持久化模式,每次都用纯净环境:
|
||
|
||
```json
|
||
{
|
||
"tools": {
|
||
"browser": {
|
||
"persistent": false
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
### 方式二:CDP 模式(接管真实 Chrome)
|
||
|
||
让 Agent 连接独立启动的真实 Chrome(而非 Playwright 自带的 Chromium),获得完整浏览器指纹,适合反爬严格的网站。
|
||
|
||
启动 Chrome 时加上调试端口和独立用户目录:
|
||
|
||
<Tabs>
|
||
<Tab title="macOS">
|
||
```bash
|
||
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
|
||
--remote-debugging-port=9222 \
|
||
--user-data-dir="$HOME/.cow/chrome-cdp"
|
||
```
|
||
</Tab>
|
||
<Tab title="Linux">
|
||
```bash
|
||
google-chrome \
|
||
--remote-debugging-port=9222 \
|
||
--user-data-dir="$HOME/.cow/chrome-cdp"
|
||
```
|
||
</Tab>
|
||
<Tab title="Windows">
|
||
```powershell
|
||
& "C:\Program Files\Google\Chrome\Application\chrome.exe" `
|
||
--remote-debugging-port=9222 `
|
||
--user-data-dir="$env:USERPROFILE\.cow\chrome-cdp"
|
||
```
|
||
</Tab>
|
||
</Tabs>
|
||
|
||
在 `config.json` 中配置端点:
|
||
|
||
```json
|
||
{
|
||
"tools": {
|
||
"browser": {
|
||
"cdp_endpoint": "http://localhost:9222"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
<Note>
|
||
Chrome 137+ 限制 `--remote-debugging-port` 必须搭配独立 `--user-data-dir`,因此 CDP 启动的 Chrome **无法直接复用你日常 Chrome 的登录态**,需要在独立目录中重新登录一次。
|
||
</Note>
|