Commit Graph

95 Commits

Author SHA1 Message Date
zhayujie
136b0b89e8 fix: optimize browser memory 2026-05-28 19:09:26 +08:00
zhayujie
bccce2d7cb feat(models): support xiaomi mimo 2026-05-28 10:49:52 +08:00
zhayujie
83cd6ad158 fix(browser): preserve non-http schemes in navigate URL 2026-05-27 18:42:21 +08:00
zhayujie
ad2db1a776 feat(mcp): support streamable-http mcp protocol 2026-05-26 12:11:59 +08:00
zhayujie
c5a3f991c5 fix(scheduler): make cron pushes survive restart on weixin channel 2026-05-25 12:15:57 +08:00
zhayujie
bd85fee7d7 fix(models): persist explicit provider for vision and image capabilities 2026-05-23 20:43:25 +08:00
zhayujie
571897e2fd fix: modify default model in vision tool 2026-05-22 18:18:16 +08:00
zhayujie
cc10d230b0 Merge pull request #2826 from zhayujie/feat-multi-model
feat: multi-provider model console
2026-05-22 11:08:13 +08:00
zhayujie
2517f2add8 feat(models): support gpt-5.5 2026-05-22 11:04:55 +08:00
zhayujie
8c25395805 feat(models): support gemini-3.5-flash 2026-05-22 10:39:04 +08:00
zhayujie
b7734c3926 feat(search): multi-provider web search + console integration
Search tool now supports 4 backends with unified output (bocha,
qianfan, zhipu, linkai) and a routing layer:
  - strategy 'auto' (default): pick first configured in canonical order
    bocha > qianfan > zhipu > linkai
  - strategy 'fixed': pin a specific provider
  - agent may pass `provider` to override per-call (only exposed when
    ≥2 providers configured + auto strategy)
2026-05-21 19:58:03 +08:00
zhayujie
09fa624797 fix(scheduler): once tasks with tz-aware schedule never fire 2026-05-21 16:18:36 +08:00
zhayujie
b8333e351c feat(voice): rework TTS/ASR stack and unify tool/skill config schema 2026-05-21 16:00:54 +08:00
zhayujie
a0dfdb79df feat(browser): persistent login + CDP attach mode #2809
Browser sessions now reuse a Chromium user profile across runs by default
(`~/.cow/browser_profile`), so users only log in to a site once.
Three launch modes are selectable via `tools.browser` in config.json:
  - persistent (default): Playwright Chromium with a persistent user_data_dir
  - cdp: attach to an externally launched real Chrome via `cdp_endpoint`
    (full fingerprints, ideal for sites with strict bot detection)
  - fresh: clean context every run, set `persistent: false`

Also:
  - Self-heal when the user closes the browser window mid-session: detect
    closed page/context/browser via close listeners and exception scanning,
    then transparently relaunch on the next request.
  - Graceful CDP shutdown: disconnect only, never kill the user's Chrome.
  - Friendly errors when the CDP endpoint is unreachable or the persistent
    profile is locked, so the LLM can guide the user instead of looping.
  - Fix tool config being silently overwritten by workspace config in
    AgentInitializer; per-tool user settings (e.g. browser.cdp_endpoint)
    are now merged instead of replaced.
  - Update zh / en / ja docs with the new login-persistence section,
    including the Chrome 137+ requirement to pair --remote-debugging-port
    with a dedicated --user-data-dir.
2026-05-19 11:52:11 +08:00
zhayujie
a85c5f9d4e fix(scheduler): make scheduler init idempotent to prevent duplicate task runs 2026-05-18 18:36:48 +08:00
zhayujie
fe871aad77 fix(tools): unify text file truncation thresholds in read tool 2026-05-13 16:15:06 +08:00
zhayujie
29e66cb186 fix(mcp): correct hot-reload sync on default Agent 2026-05-08 15:40:29 +08:00
zhayujie
307769b949 feat(mcp): load MCP servers asynchronously at startup
Boot MCP servers (npx/uvx) on a background thread instead of blocking
agent init. Built-in tools serve traffic immediately while MCP comes
online; each new agent reads whatever is ready at creation time.
Idempotent via _mcp_loaded flag — concurrent sessions never re-fork
subprocesses. Per-server failures are isolated and warmup is triggered
in app.py so loading overlaps with channel startup.
2026-05-08 15:22:42 +08:00
ooaaooaa123
b861eef26f fix(mcp): address PR review feedback on stability and config
Stability fixes in mcp_client.py:
- Fix stderr buffer overflow: start daemon thread to continuously drain
  stderr pipe, preventing 64KB buffer fill that blocks child process
- Fix notification interference: loop readline and skip JSON-RPC messages
  without 'id' field (notifications) instead of treating them as responses
- Fix concurrent race condition: wrap send+receive in _call_lock so
  multiple sessions cannot interleave reads/writes on the same client
- Fix missing timeout: use select.select() with 30s timeout in
  _readline_with_timeout() to prevent infinite block on dead MCP server

Config improvements in tool_manager.py:
- Add _normalize_mcp_configs() to support both list format (mcp_servers)
  and dict format (mcpServers used by Claude Desktop / Cursor)
- Add _load_mcp_configs() to load from ~/cow/mcp.json first, falling back
  to config.json mcp_servers field for backward compatibility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:58:40 +08:00
ooaaooaa123
caaf006a49 fix(mcp): wire MCP tools into agent and fix env var inheritance
Two bugs found during end-to-end validation with Amap and Chrome DevTools
MCP servers:

1. MCP tools were loaded into ToolManager._mcp_tool_instances but never
   added to the agent's tool list. AgentInitializer._load_tools() only
   iterated tool_classes (built-in tools). Added a second pass to append
   all MCP tool instances.

2. When a MCP server config contains an "env" dict, it was passed directly
   to subprocess.Popen, replacing the entire process environment. This
   caused npx to fail because PATH and other inherited vars were missing.
   Fixed by merging config env on top of os.environ.

Validated with:
- @amap/amap-maps-mcp-server (12 tools, stdio + API key env var)
- chrome-devtools-mcp (29 tools, stdio + remote debugging port)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:40:56 +08:00
ooaaooaa123
b2429ec30c feat(mcp): add MCP (Model Context Protocol) tool integration
Allows CowAgent to dynamically load tools from any MCP server at startup,
extending the agent from a fixed toolset to an open, extensible tool ecosystem.

## What's added

- `agent/tools/mcp/mcp_client.py`: lightweight JSON-RPC client supporting both
  stdio (subprocess) and SSE (HTTP) transports — zero extra dependencies
- `agent/tools/mcp/mcp_tool.py`: `McpTool` wraps a single MCP tool as a
  `BaseTool`, with dynamic name/description/params set at instance level
- `agent/tools/tool_manager.py`: new `_load_mcp_tools()` loads MCP servers at
  startup via `McpClientRegistry`; falls back gracefully on any error; no-op
  when `mcp_servers` is not configured
- `config.py`: registers `mcp_servers` in `available_setting` with inline docs

## Design

- No new dependencies — JSON-RPC implemented from scratch using stdlib only
- MCP clients are long-lived (initialized once, shared across tool calls)
- `McpClientRegistry` holds all subprocess handles and shuts them down cleanly
- Server init failures are non-fatal: logged as warnings, agent continues normally
- Zero overhead when `mcp_servers` is absent from config

## Config example

```json
"mcp_servers": [
  {
    "name": "filesystem",
    "type": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  }
]
```

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:16:04 +08:00
zhayujie
a5790d82f6 feat(qianfan): scope vision support to multimodal models 2026-05-06 16:11:10 +08:00
zhayujie
63f99af1e6 Merge pull request #2800 from jimmyzhuu/feat/qianfan-vision-provider
Add Qianfan support to Vision tool
2026-05-06 15:39:12 +08:00
zhayujie
4eed2568aa fix(bash): reduce safety check false positives 2026-05-06 15:36:44 +08:00
jimmyzhuu
fb7962c7f2 fix: use available qianfan vision model 2026-05-06 13:34:39 +08:00
jimmyzhuu
fccb7ff9ed feat: route qianfan vision provider 2026-05-06 13:25:59 +08:00
zhayujie
80e9062041 fix(vision): respect tool.vision.model and add automatic fallback #2792 2026-05-03 22:28:32 +08:00
zhayujie
aea081703f fix(scheduler): inject delivered output into receiver session with sliding window
Further refinements on top of #2795:

- persist real session_id (notify_session_id) at task creation so group chats
  correctly map back to the user's actual conversation
- mark scheduler turns with [SCHEDULED] (recognise legacy "Scheduled task"
  prefix too for backward-compatible pruning)
- prune both DB and in-memory to scheduler_inject_max_per_session (default 3),
  only marker-tagged pairs are touched; regular user turns never deleted
- send_message type gated by scheduler_inject_send_message (default false) —
  fixed reminder text rarely benefits follow-up Q&A

Co-authored-by: huangrichao2020 <grdomai43881@gmail.com>
2026-05-03 21:27:24 +08:00
tingchim2pro
f150d7d83a fix: remember scheduled task outputs in receiver session (v2)
Address review feedback from #2794:

1. Use notify_session_id instead of receiver for correct group chat mapping
   - Task creation should store the real session_id in action.notify_session_id
   - Falls back to receiver for backward compatibility with old tasks

2. Add injection to all four execution branches:
   - _execute_agent_task
   - _execute_send_message
   - _execute_tool_call
   - _execute_skill_call (also fixed missing channel.send)

3. Add config switch and content truncation:
   - scheduler_inject_to_session (default: true) to toggle the feature
   - 2000 char limit prevents high-frequency tasks from bloating sessions

Fixes #2793
2026-05-02 19:00:50 +08:00
zhayujie
c9c99de3d9 fix(bash): scope safety confirm to destructive deletions outside workspace 2026-04-28 10:18:47 +08:00
zhayujie
81e8bb62ae feat(skill): support gpt-image-2 in image generation skill 2026-04-22 20:39:49 +08:00
zhayujie
2c13e1b923 feat(models): support kimi-k2.6 2026-04-22 12:01:40 +08:00
zhayujie
d4e5ecd497 fix: compatible with Python 3.7 by deferring Literal import in truncate.py 2026-04-15 12:29:09 +08:00
zhayujie
5162da5654 Merge branch 'master' into feat-knowledge 2026-04-12 16:46:38 +08:00
zhayujie
a1d82f6193 feat(knowledge): add cli and update docs 2026-04-12 16:39:06 +08:00
zhayujie
ea78e3d0c6 feat(knowledge): document link supports jumping to view 2026-04-11 20:16:43 +08:00
zhayujie
26693acc3f feat(vision): prioritize main model for image recognition with multi-provider fallback
- Add call_vision method to all bot implementations (DashScope, Claude,
  Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot)
  using each vendor's native multimodal API format
- Remove call_with_tools/call_vision from Bot base class to fix MRO
  shadowing issue with OpenAICompatibleBot mixin
- Refactor vision tool provider resolution: MainModel → other configured
  models (auto-discovered) → OpenAI → LinkAI, with automatic fallback
- Return actual model name used in call_vision responses
- Sync config.json API keys to .env bidirectionally on startup
- Fix bot instance cache to detect bot_type/use_linkai config changes
- Add SSE reconnection support for web console
- Preserve image path hints in Gemini text for correct vision tool calls
- Update docs/tools/vision.mdx
2026-04-11 19:46:11 +08:00
zhayujie
4d5375f6d6 fix(win): add Windows platform hint in bash tool description 2026-04-08 16:54:26 +08:00
zhayujie
424557fedb fix(win): use PowerShell instead of cmd.exe 2026-04-08 16:50:45 +08:00
zhayujie
89251e603f fix(win): use PowerShell instead of cmd.exe for bash tool on Windows 2026-04-08 16:18:56 +08:00
zhayujie
ad86deb014 fix: prioritize using a custom master model for vision 2026-04-08 15:16:59 +08:00
zhayujie
360e3670eb feat(browser): detect implicit interactive elements 2026-04-07 01:41:14 +08:00
zhayujie
66b71c50e9 feat(wecom_bot): add Wecom Bot QR code scan auth 2026-03-31 21:27:50 +08:00
zhayujie
1ae2918064 feat: support install browser in chat 2026-03-31 15:15:17 +08:00
zhayujie
b6571e5cad fix: browser resource optimization 2026-03-30 21:39:38 +08:00
zhayujie
7549d48cf1 fix: browser thread bug 2026-03-30 21:27:08 +08:00
zhayujie
fa149cf4aa fix(browser): multi-thread browser instance bug 2026-03-30 00:57:19 +08:00
zhayujie
d09ae49287 feat(browser): auto-snapshot on navigate, screenshot prompt guidance
Browser tool enhancements:
- Navigate action now auto-includes snapshot result, saving one LLM round-trip
- Wait for networkidle + 800ms after navigation for SPA/JS-rendered pages
- Prompt guides agent to screenshot key results and ask user for login/CAPTCHA help
- Fixed playwright version pinned to 1.52.0; mirror fallback to official CDN on failure

Web console file/image support:
- SSE real-time push for images and files via on_event (file_to_send)
- Added /api/file endpoint to serve local files for web preview
- Frontend renders images in media-content container (survives delta/done overwrites)
- File attachment cards with download links; RFC 5987 encoding for non-ASCII filenames

Tool workspace fix:
- Inject workspace_dir as cwd into send and browser tools (previously only file tools)
- Screenshots now save to ~/cow/tmp/ instead of project directory
2026-03-29 19:09:11 +08:00
zhayujie
3458621147 feat: add browser tool 2026-03-29 14:59:06 +08:00
zhayujie
22b8ca0095 feat: optimize vision image compression 2026-03-23 21:18:04 +08:00