feat(browser): persistent login + CDP attach mode #2809

Browser sessions now reuse a Chromium user profile across runs by default
(`~/.cow/browser_profile`), so users only log in to a site once.
Three launch modes are selectable via `tools.browser` in config.json:
  - persistent (default): Playwright Chromium with a persistent user_data_dir
  - cdp: attach to an externally launched real Chrome via `cdp_endpoint`
    (full fingerprints, ideal for sites with strict bot detection)
  - fresh: clean context every run, set `persistent: false`

Also:
  - Self-heal when the user closes the browser window mid-session: detect
    closed page/context/browser via close listeners and exception scanning,
    then transparently relaunch on the next request.
  - Graceful CDP shutdown: disconnect only, never kill the user's Chrome.
  - Friendly errors when the CDP endpoint is unreachable or the persistent
    profile is locked, so the LLM can guide the user instead of looping.
  - Fix tool config being silently overwritten by workspace config in
    AgentInitializer; per-tool user settings (e.g. browser.cdp_endpoint)
    are now merged instead of replaced.
  - Update zh / en / ja docs with the new login-persistence section,
    including the Chrome 137+ requirement to pair --remote-debugging-port
    with a dedicated --user-data-dir.
This commit is contained in:
zhayujie
2026-05-19 11:52:11 +08:00
parent a85c5f9d4e
commit a0dfdb79df
6 changed files with 592 additions and 50 deletions

View File

@@ -4,6 +4,15 @@ Browser tool - Control a Chromium browser for web navigation and interaction.
Uses Playwright under the hood. Browser instance is lazily started on first
use, reused across tool calls within the same session, and cleaned up via
close().
Launch modes (configured under `tools.browser` in config.json):
- persistent (default): Chromium runs with a persistent user_data_dir
(default `~/.cow/browser_profile`), so cookies and login state survive
across runs. The user only needs to log in once.
- cdp: When `cdp_endpoint` is set, attach to an externally launched Chrome
via the Chrome DevTools Protocol. Lets the agent reuse the user's real
browser (with all logins / extensions / true fingerprints).
- fresh: Set `persistent` to false to fall back to a clean context every run.
"""
import json
@@ -25,7 +34,10 @@ class BrowserTool(BaseTool):
"get_text, press, evaluate.\n\n"
"Workflow: navigate (auto-includes snapshot with element refs) → click/fill/select by ref → snapshot to verify.\n\n"
"Use snapshot as the primary way to read pages. Use screenshot + send to show key results to the user. "
"For login/CAPTCHA/authorization etc., screenshot and ask the user for help."
"For login/CAPTCHA/authorization etc., screenshot and ask the user for help. "
"Login state is persisted across sessions (cookies / localStorage are kept in a "
"user profile directory), so once the user logs in to a site, the agent can keep "
"using it without logging in again."
)
params: dict = {