Compare commits

...

1493 Commits
1.2.3 ... 2.1.0

Author SHA1 Message Date
zhayujie
feaa9076b0 feat: release 2.1.0 2026-06-01 16:02:55 +08:00
zhayujie
ce0249706e docs: update issue/pr templates 2026-06-01 11:10:12 +08:00
zhayujie
af2c839231 docs: add contributing guide and issue/PR templates 2026-06-01 11:01:28 +08:00
zhayujie
2b2d24ed25 docs: update doc references 2026-05-31 22:22:48 +08:00
zhayujie
1dbf41f384 Merge pull request #2852 from zhayujie/feat-i18n
feat: support i18n across the whole project
2026-05-31 20:15:59 +08:00
zhayujie
9e6a2cc2c0 feat(installer): revamp install flow with i18n 2026-05-31 20:11:23 +08:00
zhayujie
7bf4ef3d05 docs: make English the default docs language and fix link paths 2026-05-31 17:52:22 +08:00
zhayujie
126649f70f feat(i18n): localize system prompts, workspace templates and dynamic prompts 2026-05-31 17:38:31 +08:00
zhayujie
1827a2a31c feat(i18n): bind web language switch to cow_lang config 2026-05-31 17:01:43 +08:00
zhayujie
fcf4eb78dc feat(i18n): add global language resolution and localize user-facing text 2026-05-31 16:49:35 +08:00
zhayujie
2ec6ea8045 Merge pull request #2850 from lyteen/feature/command-matching
feat: /command matching
2026-05-31 15:17:16 +08:00
lyteen
0994a3586d [feat] Fuzzy /command Resolution & Custom Aliases 2026-05-30 23:12:24 +08:00
zhayujie
29c4be6a3a feat(terminal): add agent streaming UX with reasoning/tool-call rendering 2026-05-30 19:10:56 +08:00
zhayujie
c5b8e06891 feat(channel): add Discord channel 2026-05-30 18:20:27 +08:00
zhayujie
54a20bca92 docs: update README doc 2026-05-30 17:32:21 +08:00
zhayujie
6e786bde90 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-05-30 17:18:51 +08:00
zhayujie
b671b0d725 docs: add web file serve root config 2026-05-30 17:18:31 +08:00
zhayujie
57f5692074 Merge pull request #2840 from 6vision/feat/wechatcom-kf-channel
feat: add wechatcom kf channel
2026-05-30 17:17:59 +08:00
zhayujie
b0ac0731c7 Merge branch 'master' into feat/wechatcom-kf-channel 2026-05-30 17:17:29 +08:00
zhayujie
3c161df526 Merge pull request #2848 from 6vision/fix/wechatmp-passive-merge-replies
fix(wechatmp): improve passive reply multi-turn output and local image sending
2026-05-30 17:12:36 +08:00
zhayujie
aa3f48e93c fix(web): confine /api/file to allowed dirs to prevent arbitrary file read 2026-05-30 17:06:58 +08:00
zhayujie
5ae1e1adde feat(channel): support slack bot 2026-05-30 17:01:42 +08:00
6vision
fe8b8fe831 fix(wechatmp): support local file:// images in send
Agent-generated images are sent as IMAGE_URL with a file:// path, but the wechatmp channel always used requests.get, which fails on file:// with InvalidSchema. Now read local files directly (file:// or local path) and fall back to HTTP download for remote URLs, in both passive and active reply modes.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-30 16:33:49 +08:00
6vision
5aca54c083 fix(wechatmp): flush cached segments while task still running
Previously the passive reply only drained the cache after the agent task fully finished, so for long multi-turn tasks the user could not retrieve already-cached intermediate segments. Now return cached segments as soon as they are available, even while the task is still running; the next user message fetches the rest.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-30 15:48:27 +08:00
6vision
458b1a1d88 fix(wechatmp): merge cached text segments in passive reply
In subscription account passive reply mode, WeChat allows only one reply per request. Multi-turn agent output was cached as separate entries, forcing the user to send an extra message to fetch each one. Now drain and merge all consecutive cached text segments into a single reply; media still returns one at a time.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-30 14:41:51 +08:00
zhayujie
3dd4b84179 feat(models): support claude-opus-4-8 2026-05-29 10:19:45 +08:00
6vision
99bddb79d6 fix(wechat_kf): download attachments to agent_workspace/tmp
So agent tools resolve relative refs like 	mp/xxx.pdf on the first
try, matching weixin's _get_tmp_dir convention.
2026-05-28 19:40:12 +08:00
zhayujie
136b0b89e8 fix: optimize browser memory 2026-05-28 19:09:26 +08:00
6vision
c605b0b080 feat(wechat_kf): cache images/files and merge into next text turn
Adopt the same channel-level pattern as weixin/wecom_bot/feishu so
the agent actually sees attachments the user sent:
- IMAGE: agent mode never reads memory.USER_IMAGE_CACHE, so a photo
  sent before a question (e.g. "image" then 30s later "what's this?")
  used to be lost. Now lone images go into channel.file_cache and
  the next TEXT turn appends "[图片: <path>]" to the query before
  producing the context. Cross-batch image+text combinations now
  work as users expect.
- FILE: previously dropped at the sync_msg filter and unsupported
  by WechatKfMessage. Add msgtype="file" parsing, download via the
  WeCom media API, preserve the original filename from
  Content-Disposition (RFC 5987 + plain forms), and route through
  the same file_cache pipeline as images, surfacing as
  "[文件: <path>]" in the next text turn.
2026-05-28 18:11:41 +08:00
zhayujie
b7b8e3679c fix: avoid conflict with pypi translate package 2026-05-28 15:48:20 +08:00
zhayujie
aeb6610ff4 Merge pull request #2843 from zhayujie/feat-telegram
feat(channel): support telegram bot
2026-05-28 15:12:08 +08:00
zhayujie
e3eacc77d7 feat(channel): support telegram bot 2026-05-28 15:07:09 +08:00
6vision
37661daf40 refactor(wechat_kf): persist sync_msg cursor under $HOME
Move the sync_msg cursor file from the project-local tmp/ dir to ~/.wechat_kf_cursors.json so it survives tmp/ cleanups and cwd changes across restarts. Aligns with the weixin channel's credentials file convention.

- add wechat_kf_cursor_path config (default ~/.wechat_kf_cursors.json)
- expand ~ via os.path.expanduser in the channel init
- chmod the cursor file to 0o600 after each flush (no-op on Windows)
2026-05-28 14:33:45 +08:00
6vision
877b848370 fix(wechat_kf): stop dropping rapid-fire messages in batch dedup
_dedup_image_text_pair previously fell back to returning only the last message whenever the batch was not exactly an image+text pair, which silently dropped multiple texts/images sent in quick succession.

Cursor freshness is already guaranteed by sync_msg, so no extra stale-history protection is needed. Now we return all messages by default and only collapse a batch when it is exactly a 2-message image+text pair within a 5s window (order-insensitive, normalized to [image, text]).
2026-05-28 14:23:04 +08:00
6vision
5c163cc0fe fix: dispatch callback async to avoid WeCom 5s timeout
WeCom requires the callback HTTP response within ~5s, otherwise it retries the same notification. The previous code ran sync_msg pulling synchronously inside Query.POST, so a backlog could exceed the deadline and trigger retries that race on the same cursor and end up replying to the same user multiple times.

- Dispatch consume_callback to a background ThreadPoolExecutor and return 'success' immediately from the HTTP handler.
- Serialize work per open_kfid with a lock so retried/concurrent callbacks queue up instead of racing the cursor window.
- Shutdown the executor on channel stop().
2026-05-28 12:23:56 +08:00
6vision
6e04ea8240 refactor(wechat_kf): rename channel from wechatcom_kf and split corp_id
Rename the WeCom customer-service channel and give it its own corp_id
field so users no longer have to share `wechatcom_corp_id` with the
self-built WeCom app channel.

Renames (channel-side):
- channel type / const: wechatcom_kf -> wechat_kf
- package dir: channel/wechatcom_kf/ -> channel/wechat_kf/
- python files / classes: WechatComKf* -> WechatKf*
- config keys: wechatcom_kf_{secret,token,aes_key,port} ->
  wechat_kf_{secret,token,aes_key,port}; new wechat_kf_corp_id
- env vars: WECHATCOM_KF_* -> WECHAT_KF_*; new WECHAT_KF_CORP_ID
- log prefix / cursor file: [wechatcom_kf] -> [wechat_kf]
- web console CHANNEL_DEFS key + startup log line

Renames (docs):
- docs/channels/wecom-kf.mdx -> docs/channels/wechat-kf.mdx (zh/en/ja)
- update docs.json sidebar entries and all field names inside the docs

In addition, the Web Console "微信客服" entry now exposes its own
Corp ID field instead of reusing the wechatcom_app one, and includes
the screenshot of the visual config in the channel guide.

Web Console onboarding section is added (Tabs: Web Console / config
file) and the local URL `http://127.0.0.1:9899/` parenthetical is
dropped for consistency with other channel docs.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-28 12:12:44 +08:00
zhayujie
d106465419 feat(channel): telegram first version 2026-05-28 12:10:00 +08:00
zhayujie
f39380cea7 Merge pull request #2841 from zhayujie/feat-add-mimo
feat(models): support xiaomi mimo
2026-05-28 10:51:43 +08:00
zhayujie
bccce2d7cb feat(models): support xiaomi mimo 2026-05-28 10:49:52 +08:00
6vision
6721dbdbcc docs(wechatcom_kf): add web console onboarding tab 2026-05-27 21:53:54 +08:00
zhayujie
83cd6ad158 fix(browser): preserve non-http schemes in navigate URL 2026-05-27 18:42:21 +08:00
zhayujie
116fb27257 fix: robust tool args JSON parsing for non-strict providers #2823 2026-05-27 18:37:54 +08:00
zhayujie
8d67177a1b feat(agent): support user-initiated cancel for in-flight agent runs 2026-05-26 23:36:09 +08:00
zhayujie
ad2db1a776 feat(mcp): support streamable-http mcp protocol 2026-05-26 12:11:59 +08:00
zhayujie
2e6d9e0f27 chore: remove useless plugins 2026-05-25 17:11:57 +08:00
zhayujie
e05f85f3ce feat: optimize model name display in English 2026-05-25 15:09:53 +08:00
zhayujie
40c48a9a61 chore(deps): relax numpy>=1.24 to >=1.21 for Python 3.7 compatibility 2026-05-25 14:47:55 +08:00
zhayujie
c9a7525d0b Merge pull request #2832 from yangluxin613/feat/cjk-search-fix
fix(memory): CJK keyword search + vector search optimization
2026-05-25 14:45:49 +08:00
yangluxin613
fd571ac539 fix(memory): address PR review — numpy/UPSERT soft deps + BM25 floor + BLOB dim
- numpy soft dependency: try/except import + _HAS_NUMPY flag; _encode_embedding
  and _decode_embedding fall back to struct.pack/unpack; search_vector falls back
  to pure-Python cosine loop — startup never fails without numpy reinstalled
- SQLite UPSERT guard: _HAS_UPSERT = sqlite_version_info >= (3,24,0); save_chunk
  and save_chunks_batch fall back to INSERT OR REPLACE on SQLite < 3.24 with a
  one-time startup warning about potential FTS rowid drift
- _bm25_rank_to_score floor: 0.3 + 0.69*(|rank|/(1+|rank|)) → always in [0.3, 0.99),
  prevents small-corpus matches scoring 0.0 and being filtered by min_score
- detect_index_dim BLOB-aware: check isinstance(raw, bytes) first and return
  len(raw)//4 before json.loads, so /memory status works after embedding format switch
- Comment: "CJK single-char" → "CJK tokens shorter than 3 characters"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 14:15:16 +08:00
zhayujie
c5a3f991c5 fix(scheduler): make cron pushes survive restart on weixin channel 2026-05-25 12:15:57 +08:00
zhayujie
eb74b73351 fix(web): handle non-string web_password to avoid login TypeError 2026-05-25 11:14:14 +08:00
yangluxin613
9b31f45481 fix(memory): _search_like ASCII query always returns empty
matched_count only counted cjk_words hits; pure ASCII queries had
cjk_words=[] so matched_count=0 and all SQL-matched rows were filtered
out. Change to count across all tokens (cjk_words + ascii_words) so
the LIKE fallback works correctly when FTS5 is unavailable.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 09:02:07 +08:00
yangluxin613
bc9c1691f5 fix(memory): CJK keyword search + vector search optimization
- Add trigram FTS5 table for CJK/mixed-language search with BM25 ranking
- Fix three-step search routing: unicode61 (ASCII) → trigram (CJK/mixed) → LIKE fallback
- Fix _bm25_rank_to_score: abs(rank)/(1+abs(rank)) instead of max(0,rank)
- Fix INSERT OR REPLACE → UPSERT to preserve FTS5 content table rowid stability
- Fix FTS5 JOIN to use rowid instead of id column
- Fix _search_like: single-char CJK match, dynamic scoring, merged CJK+ASCII path
- Add numpy vectorized cosine similarity + BLOB embedding storage (6x smaller)
- Add _decode_embedding backward compat for legacy JSON embeddings
- Add threading.RLock for concurrent write safety
- Add _meta table to avoid trigram backfill re-running on every startup
- Activate EmbeddingCache in MemoryManager for session-level query deduplication
- Add numpy>=1.24 to requirements.txt
- Merge upstream master (embedding package refactor, FTS5 self-healing methods)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-25 08:56:08 +08:00
zhayujie
73bf83d2ff docs: add public-access notes for server deployment 2026-05-25 00:09:52 +08:00
zhayujie
36e1988fee docs: update README.md 2026-05-24 19:21:06 +08:00
zhayujie
aad6ef635e docs: update README.md 2026-05-24 19:11:34 +08:00
zhayujie
96659cd616 docs: update project docs 2026-05-24 18:58:10 +08:00
zhayujie
c8787b7de4 Merge branch 'feat-readme-refactoring' 2026-05-24 18:30:18 +08:00
zhayujie
91d427c8f9 docs: update docs and readme 2026-05-24 18:29:57 +08:00
zhayujie
c8c0573dbd Merge pull request #2831 from zhayujie/feat-readme-refactoring
docs: README refactoring
2026-05-24 18:10:03 +08:00
zhayujie
29af855ecd docs: update README.md 2026-05-24 18:03:33 +08:00
zhayujie
0a146a245d docs: refactor README 2026-05-24 17:52:47 +08:00
zhayujie
bd85fee7d7 fix(models): persist explicit provider for vision and image capabilities 2026-05-23 20:43:25 +08:00
zhayujie
571897e2fd fix: modify default model in vision tool 2026-05-22 18:18:16 +08:00
zhayujie
840dabeccd fix(weixin): cap thinking messages to avoid rate-limit drops 2026-05-22 17:42:50 +08:00
zhayujie
069bffa3e8 feat: release 2.0.9 2026-05-22 12:25:22 +08:00
zhayujie
cc10d230b0 Merge pull request #2826 from zhayujie/feat-multi-model
feat: multi-provider model console
2026-05-22 11:08:13 +08:00
zhayujie
2517f2add8 feat(models): support gpt-5.5 2026-05-22 11:04:55 +08:00
zhayujie
a534266025 feat(models): add qwen3.7-max 2026-05-22 10:54:56 +08:00
zhayujie
8c25395805 feat(models): support gemini-3.5-flash 2026-05-22 10:39:04 +08:00
zhayujie
36b913124b docs: update models and channels doc 2026-05-22 10:10:07 +08:00
6vision
2fa6343fe5 docs: add WeCom customer service (wechatcom_kf) channel guide
Add a self-deployment guide for the new `wechatcom_kf` channel under
`docs/channels/wecom-kf.mdx` in zh / en / ja, mirroring the existing
`wecom.mdx` structure. Wire each language version into the sidebar in
`docs/docs.json`.

Walks through: creating the WeCom custom app, retrieving Corp ID /
Secret (push-to-phone) / Token / EncodingAESKey, configuring `config.json`,
saving the callback URL + Enterprise Trusted IPs, binding the WeCom
Customer Service account, and distributing the access link / QR code.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-21 21:32:11 +08:00
6vision
06b84225a1 docs(wechatcom_kf): tidy README and hide cursor dir from config
- Clarify Secret retrieval (must tap "查看" on admin's phone, not copy)
- Update WeCom customer-service binding section to point to the
  "接入链接" UI (copy link / generate QR code)
- Drop developer-only asides (wechatcomapp_secret / port collision
  notes, internal sections about cursor persistence, channel runtime
  differences, multi-kf-account support)
- Stop exposing `wechatcom_kf_cursor_dir` as a user config; cursor file
  is now fixed under `tmp/`, which is an internal implementation detail.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-21 21:08:52 +08:00
6vision
5b31da335d fix(wechatcom_kf): use plain WeChatClient to fix 40014 & token log spam
- Switch from the local `WechatComAppClient` (whose `fetch_access_token`
  may return the raw response dict and whose background refresh loop
  re-fetches every 60s) to the stock `wechatpy.enterprise.WeChatClient`.
- Use `client.access_token` (string property) when building sync_msg /
  send_msg URLs; the previous `client.fetch_access_token()` call could
  interpolate a dict into the URL and yield errcode 40014.
- Always skip historical messages on first start; drop the
  `wechatcom_kf_skip_history_on_first_start` config — there is no real
  case for replaying up to 14 days of history.
- Change default callback port from 9899 to 9888.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-21 20:43:06 +08:00
zhayujie
90773ab69f feat(models): allow viewing and editing search vendor credentials 2026-05-21 20:22:09 +08:00
6vision
11d92bb22a feat(channel): add WeCom customer service (wechatcom_kf) channel
Introduce a new channel that integrates with WeCom Customer Service
(微信客服), separate from the existing self-built WeCom app channel.

- Register channel type `wechatcom_kf` in factory, app loader and const
- Add config keys for token / secret / aes_key / port / cursor dir and
  the first-start history-skip switch; also expose corresponding env vars
- Implement channel, message and cursor store under channel/wechatcom_kf/

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-21 19:58:47 +08:00
zhayujie
b7734c3926 feat(search): multi-provider web search + console integration
Search tool now supports 4 backends with unified output (bocha,
qianfan, zhipu, linkai) and a routing layer:
  - strategy 'auto' (default): pick first configured in canonical order
    bocha > qianfan > zhipu > linkai
  - strategy 'fixed': pin a specific provider
  - agent may pass `provider` to override per-call (only exposed when
    ≥2 providers configured + auto strategy)
2026-05-21 19:58:03 +08:00
zhayujie
d3faf9c8dc fix(web): re-render JS-built views on language switch 2026-05-21 17:33:32 +08:00
zhayujie
bca97a1d14 feat(voice): enable TTS on Weixin / DingTalk / WeCom Bot with text-then-voice delivery
- Clear NOT_SUPPORT_REPLYTYPE on weixin, wecom_bot, dingtalk so TTS replies
  are actually synthesized for these channels.
- Wire desire_rtype=VOICE in weixin and wecom_bot _compose_context so the
  always_reply_voice / voice_reply_voice toggles take effect.
- DingTalk: send native sampleAudio (mediaId + duration). The media API
  only accepts ogg/amr, so convert TTS mp3/wav to amr on the fly.
- WeCom Bot: send native voice msgtype via ws (respond + active push),
  converting TTS audio to amr before upload.
- Weixin (ilink): no outbound voice item, deliver TTS as a file attachment.
- chat_channel: when a TEXT reply is converted to VOICE, stash original
  text in context["voice_reply_text"] and send a text bubble before the
  voice reply. Skipped for feishu_streamed and wechatcom_app, which
  already render text alongside the voice.
2026-05-21 17:29:26 +08:00
zhayujie
ac9d0f18c5 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-05-21 16:19:03 +08:00
zhayujie
09fa624797 fix(scheduler): once tasks with tz-aware schedule never fire 2026-05-21 16:18:36 +08:00
zhayujie
b8333e351c feat(voice): rework TTS/ASR stack and unify tool/skill config schema 2026-05-21 16:00:54 +08:00
zhayujie
a01423a196 fix: default agent mode to enabled when "agent" config is absent 2026-05-21 11:17:50 +08:00
zhayujie
7c35df7a82 fix: default agent mode to enabled 2026-05-21 11:14:19 +08:00
zhayujie
2b90f377e6 feat(voice): add dashscope & zhipu ASR, in-page mic input 2026-05-20 22:36:37 +08:00
zhayujie
fff7326209 feat(memory): hot-swap embedding provider on rebuild-index
Switching embedding provider in the web console no longer requires a
restart and no longer drops the running conversation
2026-05-20 21:32:53 +08:00
zhayujie
c181e500bc feat(web): redesign multi-models console
Overhauls the Models tab in the Web Console with a vendor-first layout and
ships a runtime-accurate dispatcher view for vision and image generation.
2026-05-20 20:59:04 +08:00
zhayujie
16b7271826 feat(openai): inject app attribution headers for OpenRouter and Vercel AI Gateway 2026-05-20 11:43:17 +08:00
zhayujie
4a1f62b185 Merge pull request #2822 from a1094174619/fix/tool-error-status-persist
fix: persist tool error status in conversation history reload
2026-05-20 11:06:57 +08:00
zhayujie
d23a0754c1 feat(memory): exclude dream diaries from vector index 2026-05-20 11:04:54 +08:00
zhayujie
3ffb563a44 feat(memory): support multi-vendor embedding fallback
Add embedding_provider config knob with native support for
openai / dashscope / doubao / zhipu / linkai, plus an in-chat
/memory status and /memory rebuild-index workflow for switching
vendors safely.
2026-05-20 11:00:53 +08:00
a1094174619
4e42f2a017 fix: persist tool error status in conversation history reload
When reloading a conversation, failed tool calls incorrectly showed checkmark instead of X because the is_error field was lost in the history rendering pipeline. Propagate is_error from DB extraction through to the frontend rendering to match the live SSE behavior.
2026-05-19 23:50:29 +08:00
zhayujie
a0dfdb79df feat(browser): persistent login + CDP attach mode #2809
Browser sessions now reuse a Chromium user profile across runs by default
(`~/.cow/browser_profile`), so users only log in to a site once.
Three launch modes are selectable via `tools.browser` in config.json:
  - persistent (default): Playwright Chromium with a persistent user_data_dir
  - cdp: attach to an externally launched real Chrome via `cdp_endpoint`
    (full fingerprints, ideal for sites with strict bot detection)
  - fresh: clean context every run, set `persistent: false`

Also:
  - Self-heal when the user closes the browser window mid-session: detect
    closed page/context/browser via close listeners and exception scanning,
    then transparently relaunch on the next request.
  - Graceful CDP shutdown: disconnect only, never kill the user's Chrome.
  - Friendly errors when the CDP endpoint is unreachable or the persistent
    profile is locked, so the LLM can guide the user instead of looping.
  - Fix tool config being silently overwritten by workspace config in
    AgentInitializer; per-tool user settings (e.g. browser.cdp_endpoint)
    are now merged instead of replaced.
  - Update zh / en / ja docs with the new login-persistence section,
    including the Chrome 137+ requirement to pair --remote-debugging-port
    with a dedicated --user-data-dir.
2026-05-19 11:52:11 +08:00
zhayujie
a85c5f9d4e fix(scheduler): make scheduler init idempotent to prevent duplicate task runs 2026-05-18 18:36:48 +08:00
zhayujie
2720bba5b7 fix(mimo): round-trip reasoning_content for thinking-mode providers 2026-05-18 17:49:41 +08:00
zhayujie
4634a7bc2f fix(web): avoid TypeError on single-file upload 2026-05-17 19:00:07 +08:00
zhayujie
16d9b449c9 feat(web): set the web_host to the default value of 127.0.0.1 2026-05-16 18:18:17 +08:00
zhayujie
8761997757 feat(web): add web_host config and password hint for safer deployment 2026-05-16 17:37:07 +08:00
zhayujie
19bba4abbc feat(web): vendor all frontend assets locally #2816 2026-05-16 17:22:04 +08:00
zhayujie
7839f0aac5 Merge pull request #2815 from TryToMakeUsBetter/master
feat(web): support folder upload
2026-05-15 18:57:15 +08:00
Tian
83def1db30 Merge branch 'zhayujie:master' into master 2026-05-15 18:51:53 +08:00
tianyu Gu
a0b29d1ffe fix(web): remove upload dir button, one-time upload all files,path check adapt windows 2026-05-15 18:48:37 +08:00
zhayujie
f5479c56af feat(models): support reasoning_effort config for DeepSeek V4 2026-05-15 18:17:35 +08:00
tianyu Gu
246f0a45c8 feat(web): support folder upload 2026-05-14 17:16:11 +08:00
zhayujie
fe871aad77 fix(tools): unify text file truncation thresholds in read tool 2026-05-13 16:15:06 +08:00
zhayujie
6f860e1bc4 Merge pull request #2810 from Jacques-Zhao/bugfix/wecom_bot_msg_error
fix(wecom_bot): Invalid control character
2026-05-13 10:26:52 +08:00
Zhao Ke Ke
249ea40ae3 fix(wecom_bot): Invalid control character 2026-05-12 18:45:03 +08:00
zhayujie
20d8ae19a7 Merge pull request #2804 from yangluxin613/feat/web-port-browser
feat(web): auto-switch port on conflict and open browser on startup
2026-05-12 10:35:49 +08:00
ooaaooaa123
ad51aabfd7 feat(web): open browser on startup with safe fallback; friendly error on port conflict 2026-05-10 19:30:07 +08:00
zhayujie
1cf395c041 Merge pull request #2807 from yangluxin613/feat/log-ui
feat(log): add level coloring, multiline inherit, and filter checkboxes
2026-05-10 18:59:05 +08:00
zhayujie
745179a5bf Merge pull request #2806 from yangluxin613/feat/app-keyboard-interrupt
fix(app): suppress KeyboardInterrupt traceback on Ctrl+C
2026-05-10 18:58:10 +08:00
zhayujie
ff5d477fa5 Merge pull request #2808 from yangluxin613/fix/update-username-in-docs
docs: update contributor username from ooaaooaa123 to yangluxin613
2026-05-10 18:42:09 +08:00
zhayujie
907825601d feat(models): add baidu ernie-5.1 2026-05-10 18:39:38 +08:00
ooaaooaa123
c2ec26910a docs: update contributor username from ooaaooaa123 to yangluxin613 2026-05-10 18:12:00 +08:00
ooaaooaa123
83f2aea123 feat(log): enhance critical log line color visibility 2026-05-10 17:43:26 +08:00
ooaaooaa123
a5c5439315 feat(log): add level coloring, multiline inherit, and filter checkboxes 2026-05-10 17:21:08 +08:00
ooaaooaa123
eca9b60235 fix(app): suppress KeyboardInterrupt traceback on Ctrl+C 2026-05-10 17:21:01 +08:00
ooaaooaa123
d2d5d98d78 feat(web): auto-switch port on conflict and open browser on startup 2026-05-10 17:20:45 +08:00
zhayujie
fb341b869b docs(mcp): add MCP tools guide 2026-05-08 16:14:48 +08:00
zhayujie
29e66cb186 fix(mcp): correct hot-reload sync on default Agent 2026-05-08 15:40:29 +08:00
zhayujie
307769b949 feat(mcp): load MCP servers asynchronously at startup
Boot MCP servers (npx/uvx) on a background thread instead of blocking
agent init. Built-in tools serve traffic immediately while MCP comes
online; each new agent reads whatever is ready at creation time.
Idempotent via _mcp_loaded flag — concurrent sessions never re-fork
subprocesses. Per-server failures are isolated and warmup is triggered
in app.py so loading overlaps with channel startup.
2026-05-08 15:22:42 +08:00
zhayujie
9a09e057d6 Merge pull request #2801 from ooaaooaa123/feat/mcp-integration
feat(mcp): add MCP (Model Context Protocol) tool integration
2026-05-08 12:06:43 +08:00
zhayujie
3e28659528 fix(feishu): support file message and use absolute workspace path 2026-05-08 11:31:22 +08:00
ooaaooaa123
b861eef26f fix(mcp): address PR review feedback on stability and config
Stability fixes in mcp_client.py:
- Fix stderr buffer overflow: start daemon thread to continuously drain
  stderr pipe, preventing 64KB buffer fill that blocks child process
- Fix notification interference: loop readline and skip JSON-RPC messages
  without 'id' field (notifications) instead of treating them as responses
- Fix concurrent race condition: wrap send+receive in _call_lock so
  multiple sessions cannot interleave reads/writes on the same client
- Fix missing timeout: use select.select() with 30s timeout in
  _readline_with_timeout() to prevent infinite block on dead MCP server

Config improvements in tool_manager.py:
- Add _normalize_mcp_configs() to support both list format (mcp_servers)
  and dict format (mcpServers used by Claude Desktop / Cursor)
- Add _load_mcp_configs() to load from ~/cow/mcp.json first, falling back
  to config.json mcp_servers field for backward compatibility

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-08 09:58:40 +08:00
ooaaooaa123
caaf006a49 fix(mcp): wire MCP tools into agent and fix env var inheritance
Two bugs found during end-to-end validation with Amap and Chrome DevTools
MCP servers:

1. MCP tools were loaded into ToolManager._mcp_tool_instances but never
   added to the agent's tool list. AgentInitializer._load_tools() only
   iterated tool_classes (built-in tools). Added a second pass to append
   all MCP tool instances.

2. When a MCP server config contains an "env" dict, it was passed directly
   to subprocess.Popen, replacing the entire process environment. This
   caused npx to fail because PATH and other inherited vars were missing.
   Fixed by merging config env on top of os.environ.

Validated with:
- @amap/amap-maps-mcp-server (12 tools, stdio + API key env var)
- chrome-devtools-mcp (29 tools, stdio + remote debugging port)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:40:56 +08:00
ooaaooaa123
b2429ec30c feat(mcp): add MCP (Model Context Protocol) tool integration
Allows CowAgent to dynamically load tools from any MCP server at startup,
extending the agent from a fixed toolset to an open, extensible tool ecosystem.

## What's added

- `agent/tools/mcp/mcp_client.py`: lightweight JSON-RPC client supporting both
  stdio (subprocess) and SSE (HTTP) transports — zero extra dependencies
- `agent/tools/mcp/mcp_tool.py`: `McpTool` wraps a single MCP tool as a
  `BaseTool`, with dynamic name/description/params set at instance level
- `agent/tools/tool_manager.py`: new `_load_mcp_tools()` loads MCP servers at
  startup via `McpClientRegistry`; falls back gracefully on any error; no-op
  when `mcp_servers` is not configured
- `config.py`: registers `mcp_servers` in `available_setting` with inline docs

## Design

- No new dependencies — JSON-RPC implemented from scratch using stdlib only
- MCP clients are long-lived (initialized once, shared across tool calls)
- `McpClientRegistry` holds all subprocess handles and shuts them down cleanly
- Server init failures are non-fatal: logged as warnings, agent continues normally
- Zero overhead when `mcp_servers` is absent from config

## Config example

```json
"mcp_servers": [
  {
    "name": "filesystem",
    "type": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
  }
]
```

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-06 20:16:04 +08:00
zhayujie
55aaf60a57 feat: release 2.0.8 2026-05-06 16:19:20 +08:00
zhayujie
a5790d82f6 feat(qianfan): scope vision support to multimodal models 2026-05-06 16:11:10 +08:00
zhayujie
63f99af1e6 Merge pull request #2800 from jimmyzhuu/feat/qianfan-vision-provider
Add Qianfan support to Vision tool
2026-05-06 15:39:12 +08:00
zhayujie
4eed2568aa fix(bash): reduce safety check false positives 2026-05-06 15:36:44 +08:00
jimmyzhuu
fb7962c7f2 fix: use available qianfan vision model 2026-05-06 13:34:39 +08:00
jimmyzhuu
76e6b7b471 docs: document qianfan vision support 2026-05-06 13:28:46 +08:00
jimmyzhuu
fccb7ff9ed feat: route qianfan vision provider 2026-05-06 13:25:59 +08:00
jimmyzhuu
3b12ef2e66 feat: add qianfan vision calls 2026-05-06 13:24:41 +08:00
jimmyzhuu
f9d099be1b feat: add qianfan vision model constants 2026-05-06 13:23:04 +08:00
zhayujie
c322c0e3a5 docs(models): add ernie-5.0 2026-05-06 12:15:14 +08:00
zhayujie
530fc20596 Merge pull request #2790 from jimmyzhuu/feat/qianfan-provider
Add first-class Baidu Qianfan / ERNIE provider
2026-05-06 11:43:32 +08:00
zhayujie
a23b4ed754 Merge pull request #2797 from Zmjjeff7/feat-translate-youdao
feat(translate): add Youdao as a new translation provider
2026-05-06 11:28:50 +08:00
zhayujie
fc4f5077b0 fix: update .gitignore 2026-05-06 11:27:57 +08:00
Zmjjeff7
6a553886da feat(translate): add Youdao as a new translation provider
The translate module previously only supported Baidu translation, and the
factory raised a bare RuntimeError for any other type. This change adds
Youdao Translation as a second provider and improves the factory's error
message.

Implementation details:
- New YoudaoTranslator class in translate/youdao/youdao_translate.py
- Implements Youdao's v3 SHA-256 signature scheme, including the
  truncate-input rule for queries longer than 20 characters
- Maps ISO 639-1 language codes to Youdao-specific codes
  (zh -> zh-CHS, zh-TW -> zh-CHT, others pass through)
- Differentiates network errors, API error codes, and empty translations
- factory.create_translator now lists the supported types in its
  RuntimeError message instead of failing silently
- Default config exposes youdao_translate_app_key and
  youdao_translate_app_secret

Adds 17 unit tests covering signature correctness, language code mapping,
input truncation edge cases, the full request/response flow, and factory
dispatch. All tests pass under Python 3.11.
2026-05-05 23:58:32 +08:00
zhayujie
1065c7e722 fix(feishu): unblock streaming via async push worker 2026-05-05 19:36:15 +08:00
zhayujie
a9c8a59f58 feat(feishu): one-click QR-scan app creation 2026-05-05 18:32:58 +08:00
zhayujie
8730f7fd27 fix(memory): exclude scheduler-injected pairs from daily memory flush 2026-05-05 16:53:01 +08:00
zhayujie
8f608223d7 perf(feishu): tune streaming render speed 2026-05-05 14:53:30 +08:00
zhayujie
a7cbd47a2f fix(feishu): default feishu_stream_reply to true 2026-05-05 14:30:34 +08:00
zhayujie
b80c3fe5a8 feat(feishu): enhance #2791 with cardkit streaming + ASR fixes
- rewrite streaming reply to official cardkit v2.0 API (default on, auto-fallback)
- fix Whisper hallucination: bump ASR sample rate to 16k, pass language=zh
- fix lock-over-IO and tmp file cleanup from #2791
- drop deprecated feishu_bot_name; quiet unknown-key warnings
- docs: cardkit permission and feishu_stream_reply usage
2026-05-05 14:15:25 +08:00
zhayujie
5080051e39 Merge pull request #2791 from ooaaooaa123/feat/feishu-voice-stream-reply
feat(feishu): 支持语音消息收发与流式打字机回复
2026-05-05 13:10:00 +08:00
zhayujie
23bfc8d0ba fix(feishu): update config-template.json 2026-05-05 13:05:39 +08:00
zhayujie
80e9062041 fix(vision): respect tool.vision.model and add automatic fallback #2792 2026-05-03 22:28:32 +08:00
zhayujie
67bd3420ed perf(scheduler): bound isolated session context to agent_max_context_turns/5 2026-05-03 21:49:59 +08:00
zhayujie
aea081703f fix(scheduler): inject delivered output into receiver session with sliding window
Further refinements on top of #2795:

- persist real session_id (notify_session_id) at task creation so group chats
  correctly map back to the user's actual conversation
- mark scheduler turns with [SCHEDULED] (recognise legacy "Scheduled task"
  prefix too for backward-compatible pruning)
- prune both DB and in-memory to scheduler_inject_max_per_session (default 3),
  only marker-tagged pairs are touched; regular user turns never deleted
- send_message type gated by scheduler_inject_send_message (default false) —
  fixed reminder text rarely benefits follow-up Q&A

Co-authored-by: huangrichao2020 <grdomai43881@gmail.com>
2026-05-03 21:27:24 +08:00
zhayujie
f300d2a2d5 Merge pull request #2795 from huangrichao2020/fix/scheduler-remember-v2
fix: remember scheduled task outputs with correct session mapping (v2)
2026-05-03 21:02:40 +08:00
tingchim2pro
f150d7d83a fix: remember scheduled task outputs in receiver session (v2)
Address review feedback from #2794:

1. Use notify_session_id instead of receiver for correct group chat mapping
   - Task creation should store the real session_id in action.notify_session_id
   - Falls back to receiver for backward compatibility with old tasks

2. Add injection to all four execution branches:
   - _execute_agent_task
   - _execute_send_message
   - _execute_tool_call
   - _execute_skill_call (also fixed missing channel.send)

3. Add config switch and content truncation:
   - scheduler_inject_to_session (default: true) to toggle the feature
   - 2000 char limit prevents high-frequency tasks from bloating sessions

Fixes #2793
2026-05-02 19:00:50 +08:00
ooaaooaa123
4d1f059c0d feat(feishu): add voice message support and streaming text reply
- Receive audio messages: map msg_type=audio to ContextType.VOICE and
    download opus file via lazy _prepare_fn for STT pipeline
  - Send voice replies: upload opus audio via Feishu file API, auto-convert
    non-opus formats (e.g. mp3) using pydub before upload
  - Streaming text reply: inject on_event callback into context; send a
  card
    placeholder on first delta, then PATCH-update it in-place at a
    configurable interval (feishu_stream_interval_ms) to achieve typewriter
    effect; set feishu_streamed=True to suppress duplicate send()
  - Enable NOT_SUPPORT_REPLYTYPE=[] to unblock voice and image reply types
  - Fix AudioSegment mutation bug in audio_convert.py: set_frame_rate /
    set_channels return new objects and must be reassigned
  - Add -nostdin to ffmpeg invocation to prevent stdin deadlock in daemon
  - Add feishu_bot_name, feishu_stream_reply, feishu_stream_interval_ms
    config keys to config-template.json
2026-04-30 16:14:57 +08:00
jimmyzhuu
bc7f953fcc docs: add qianfan provider guide 2026-04-29 16:41:25 +08:00
jimmyzhuu
f653483eea feat: expose qianfan in configuration surfaces 2026-04-29 16:32:53 +08:00
jimmyzhuu
6b200fd36b fix: handle qianfan error responses 2026-04-29 16:24:37 +08:00
jimmyzhuu
161fc6cdf0 feat: add qianfan chat bot 2026-04-29 16:19:27 +08:00
jimmyzhuu
6f68ed6bce test: restore cow cli parent module attribute 2026-04-29 16:12:08 +08:00
jimmyzhuu
a4592ffdfe test: isolate cow cli plugin import 2026-04-29 16:08:40 +08:00
jimmyzhuu
7cd7bd1a48 fix: avoid cow cli import side effects 2026-04-29 16:04:48 +08:00
jimmyzhuu
9eeca70292 feat: register qianfan model provider 2026-04-29 15:52:32 +08:00
zhayujie
02bfe30848 fix(memory): prevent duplicate Deep Dream runs 2026-04-28 15:30:51 +08:00
zhayujie
c9c99de3d9 fix(bash): scope safety confirm to destructive deletions outside workspace 2026-04-28 10:18:47 +08:00
zhayujie
8752f0cc60 refactor(openai): drop SDK dependency and switch to native HTTP client 2026-04-27 20:21:54 +08:00
zhayujie
5c65196e44 feat(web): hint API base version path in config placeholder 2026-04-26 17:10:24 +08:00
zhayujie
f5798bfe90 fix: remove unnecessary API Base URL in run scripts 2026-04-26 16:29:08 +08:00
zhayujie
0e556b3468 feat: switch default model to deepseek-v4-flash 2026-04-26 15:54:50 +08:00
zhayujie
31820f56e7 fix(deepseek): back-fill reasoning_content for all assistant turns 2026-04-24 16:39:48 +08:00
zhayujie
fd88828abd fix(models): unify enable_thinking for deepseek-v4 2026-04-24 15:29:43 +08:00
zhayujie
ae11159918 feat(models): unify enable_thinking for deepseek-v4 and other thinking models 2026-04-24 15:22:45 +08:00
zhayujie
472a8605c0 feat(models): support deepseek-v4-pro and deepseek-v4-flash 2026-04-24 11:35:38 +08:00
zhayujie
e1760ba211 feat: release 2.0.7 version 2026-04-23 18:13:53 +08:00
zhayujie
ce4c0a0aa4 feat: release 2.0.7 2026-04-23 17:18:19 +08:00
zhayujie
64511593c4 feat: release 2.0.7 2026-04-23 17:16:17 +08:00
zhayujie
b0e00dfceb feat: support glm-5.1 2026-04-23 16:43:05 +08:00
zhayujie
fc465b463d feat: support kimi coding plan by temporary solution 2026-04-23 16:24:37 +08:00
zhayujie
68ce2e5232 feat(skill): multi-provider image generation with auto-fallback
- Add Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax
  providers to image-generation skill with universal sequential
  fallback: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI
- Each provider filters unsupported size tiers to valid values
  (e.g. Seedream 1K→2K, Qwen 3K→2K, Gemini 3K→2K)
- Pinned model only tries its native provider; auto-routing uses
  each provider's default model
- Support skill-namespaced config (config.skill.image-generation.model
  → SKILL_IMAGE_GENERATION_MODEL env var)
- Add image lightbox (click-to-enlarge) in web console
- Add  docs for built-in skills (skill-creator, knowledge-wiki,
  image-generation) under docs/skills/
2026-04-23 12:39:39 +08:00
zhayujie
81e8bb62ae feat(skill): support gpt-image-2 in image generation skill 2026-04-22 20:39:49 +08:00
zhayujie
2c13e1b923 feat(models): support kimi-k2.6 2026-04-22 12:01:40 +08:00
zhayujie
a0748c2e3b fix(web): cap reasoning content to 4KB across stream/storage/display 2026-04-21 20:31:38 +08:00
zhayujie
40599bb751 fix(web): smart auto-scroll for chat #2775 2026-04-20 21:43:21 +08:00
zhayujie
f3c64ceea7 fix: refresh skill manager on /skill 2026-04-19 19:50:16 +08:00
zhayujie
15c60de709 fix: improve skill installation to support multiple source formats and ensure target directory 2026-04-19 19:05:51 +08:00
zhayujie
6dd316547f fix(web): fix session title generation fallback and reset Bridge on config change 2026-04-19 18:43:48 +08:00
zhayujie
54c7676a44 docs: update architecture diagram 2026-04-18 23:08:36 +08:00
zhayujie
d25b8966ce fix(web): prevent duplicate image previews 2026-04-18 22:32:34 +08:00
zhayujie
14a119c48c fix(gemini): solving the problem of tool call not returnings 2026-04-18 21:18:27 +08:00
zhayujie
c82515a927 fix(agent): don't drop tool_calls from empty-response retry 2026-04-18 20:50:40 +08:00
zhayujie
26e630c2dd feat(cli): /config support set enable_thinking 2026-04-17 16:09:43 +08:00
zhayujie
13370d2056 fix: thinking display is disabled by default 2026-04-17 15:31:59 +08:00
zhayujie
35282db9e0 feat(models): support claude-opus-4-7 2026-04-16 23:24:16 +08:00
zhayujie
426fb88ce7 fix(knowledge): exclude root-level files from knowledge stats to preserve empty state 2026-04-16 22:55:46 +08:00
zhayujie
2384bd0e10 fix: update CI workflows for repo rename and add latest tag 2026-04-16 21:57:20 +08:00
zhayujie
ba3f66d3d1 feat: show root-level files (index.md, log.md) in knowledge tree 2026-04-16 21:47:44 +08:00
zhayujie
7293a0f670 fix: modify repo name in github workflow 2026-04-16 21:38:58 +08:00
zhayujie
9e86d46267 fix: sync env vars when updating config in docker env 2026-04-16 21:32:07 +08:00
zhayujie
848430f062 feat(knowledge): support nested directories in knowledge base listing and display 2026-04-16 12:28:18 +08:00
zhayujie
abd21335c4 Merge pull request #2772 from 6vision/master
fix: bot_type change notification never shown after model switch
2026-04-16 10:43:41 +08:00
6vision
8fa95f058a fix: bot_type change notification never shown after model switch
Made-with: Cursor
2026-04-15 21:48:50 +08:00
zhayujie
d4e5ecd497 fix: compatible with Python 3.7 by deferring Literal import in truncate.py 2026-04-15 12:29:09 +08:00
zhayujie
3830f76729 feat: add custom model provider 2026-04-15 12:26:05 +08:00
zhayujie
83f778fec9 feat(dream): structured organization of dream memories 2026-04-15 11:27:46 +08:00
zhayujie
cabd24605f fix: add random jitter to daily dream schedule 2026-04-15 00:33:33 +08:00
zhayujie
ae20ba1148 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-04-14 22:58:59 +08:00
zhayujie
3a50b64977 feat: web multi session interface 2026-04-14 22:58:25 +08:00
zhayujie
8692e74536 fix(web): hide session panel by default on mobile and support overlay dismiss 2026-04-14 21:09:01 +08:00
zhayujie
1c18bd9889 docs(memory): update long-term memory docs 2026-04-14 17:14:28 +08:00
zhayujie
60e9d98d0a feat: release 2.0.6 2026-04-14 12:37:53 +08:00
zhayujie
83f6625e0c feat: release 2.0.6 2026-04-14 12:08:57 +08:00
zhayujie
acc09543b7 feat(dream): add memory dream cli and docs
- New memory/deep-dream.mdx (zh/en/ja): memory flow, distillation rules, dream diary, manual trigger, safety mechanisms
- Simplify long-term memory page, link to deep-dream for details
- New cli/memory-knowledge.mdx (zh/en/ja): memory and knowledge commands
- Move knowledge commands from general.mdx to memory-knowledge.mdx
- Register new pages in docs.json navigation for all languages
- Add /memory dream to cli/index.mdx command tables
2026-04-14 11:03:53 +08:00
zhayujie
94d8c7e366 feat(dream): add Dream Diary tab to memory management page
- Backend: MemoryService supports category param (memory/dream), lists memory/dreams/*.md
- Backend: MemoryContentHandler resolves dream files from memory/dreams/ directory
- Frontend: add tab switcher (Memory Files / Dream Diary) matching knowledge tab style
- Frontend: dream entries show purple "Dream" badge, empty state with moon icon
- Cloud dispatch passes category param for consistency
2026-04-13 22:08:15 +08:00
zhayujie
ea1a0c8b3d feat(memory): add Deep Dream module for daily memory distillation
- Add Deep Dream: nightly distill daily memories → refined MEMORY.md + dream diary
- Simplify flush prompt to daily-only, defer MEMORY.md maintenance to Deep Dream
- Remove dead code (_append_to_main_memory) and fix fallback summary logic
- Add shrinkage protection and input dedup for dream process
- Ensure flush threads complete before dream starts
- Update docs (zh/en/ja) with dream diary and distillation mechanism
2026-04-13 21:32:52 +08:00
zhayujie
7bc88c17e4 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-04-13 20:13:30 +08:00
zhayujie
33cf1bc4c3 feat(memory): async LLM context summary injection on trim
- Unified flush + context injection into a single async LLM call
  (flush_from_messages accepts context_summary_callback)
- Fixed response parsing bug: handle generator returns and Claude-format
  dicts from bot.call_with_tools, which previously caused all LLM
  summaries to silently fail (falling back to rule-based extraction)
- Removed standalone context summary prompts and methods; reuse the
  existing [DAILY]/[MEMORY] summarization pipeline
- Updated docs (zh/en/ja) to reflect the new injection behavior
2026-04-13 20:13:05 +08:00
zhayujie
9402e63fe1 Merge pull request #2766 from zhayujie/feat-mulit-session
feat(web): add multi-session management for web console
2026-04-13 18:51:07 +08:00
zhayujie
90e4d494b2 feat(web): add multi-session management for web console 2026-04-13 18:50:31 +08:00
zhayujie
da97e948ca feat: refine memory recall/write prompts for better precision and proactivity 2026-04-13 18:02:06 +08:00
zhayujie
89a07e8e74 feat: add enable_thinking config to control deep reasoning on web console 2026-04-13 16:06:28 +08:00
zhayujie
3f3d0381e5 feat: update knowledge docs and fix claude error 2026-04-13 11:16:26 +08:00
zhayujie
3649499dba fix: optimize the stability of network pre-checks 2026-04-13 10:35:38 +08:00
zhayujie
a989d088fd Merge pull request #2764 from WilliamOnVoyage/fix/macos-timeout-fallback
fix: Fix run.sh for MacOS via add timeout fallback
2026-04-13 10:21:44 +08:00
Moliang Zhou
f79a915136 fix: add timeout fallback for macOS compatibility
The `timeout` command (GNU Coreutils) is not available by default on macOS,
causing the installation script to fail with 'timeout: command not found'
during git clone.

This adds a shell function fallback that:
- Uses `gtimeout` if Homebrew coreutils is installed
- Otherwise skips the timeout and runs the command directly
2026-04-12 11:18:44 -07:00
zhayujie
12e8c3d449 Merge pull request #2763 from zhayujie/feat-web-console-upgrade
feat(web): support scheduler push messages and enrich welcome screen
2026-04-12 21:20:34 +08:00
zhayujie
4f7064575e feat(web): support scheduler push messages and enrich welcome screen
- Expand welcome screen from 3 to 6 example cards covering core capabilities
- Enable background polling on page load so scheduler task notifications are received in real-time
- Fix duplicate poll loops via generation-based cancellation, reduce poll frequency to 5s/10s
- Ensure equal card height and adjust layout position for better visual balance
2026-04-12 21:19:50 +08:00
zhayujie
070df826f1 Merge pull request #2762 from zhayujie/feat-web-console-upgrade
feat(web): add password protection for web console
2026-04-12 20:38:45 +08:00
zhayujie
fbe48a4b4e feat(web): add password protection for web console
- Add `web_password` config to enable login authentication
- Use stateless HMAC-signed token (survives restart, invalidates on password change)
- Add `web_session_expire_days` config (default 30 days)
- Protect all API endpoints with auth check (401 on failure)
- Add login page UI with auto-redirect on session expiry
- Add password management in config page (masked display, inline edit)
- Add tooltip hints for Agent config fields
- Update default agent_max_context_turns to 20, agent_max_steps to 20
- Update docs and docker-compose.yml
2026-04-12 20:37:04 +08:00
zhayujie
4dd497fb6d fix: run.ps1 git clone in windows 2026-04-12 17:52:37 +08:00
zhayujie
907882c0a7 fix: git clone pre-check 2026-04-12 17:36:45 +08:00
zhayujie
d36d5aee3f feat: rename repository name from chatgpt-on-wechat to CowAgent
- Update GitHub URLs in README.md (badges, release links, clone address, wiki, issues, contributors)
- Add project rename notice with SEO keywords and git remote update command
- Update docs/docs.json GitHub links
- Update all docs (zh/en/ja) across guide, intro, models, releases, skills
- Update run.sh and scripts/run.ps1 clone URLs and directory names
- Docker image name (zhayujie/chatgpt-on-wechat) kept unchanged for compatibility
2026-04-12 17:09:07 +08:00
zhayujie
c6824e5f5e fix: add legacy-cgi dependency for Python 3.13+ #2758
Add conditional dependency `legacy-cgi` for Python 3.13+ to resolve
`web.py` installation failure caused by the removal of the `cgi` module
(PEP 594).
Thanks @sha156 for reporting.
2026-04-12 16:49:00 +08:00
zhayujie
199c21eede Merge pull request #2761 from zhayujie/feat-knowledge
feat: personal knowledge base system
2026-04-12 16:47:07 +08:00
zhayujie
5162da5654 Merge branch 'master' into feat-knowledge 2026-04-12 16:46:38 +08:00
zhayujie
a1d82f6193 feat(knowledge): add cli and update docs 2026-04-12 16:39:06 +08:00
zhayujie
ea78e3d0c6 feat(knowledge): document link supports jumping to view 2026-04-11 20:16:43 +08:00
zhayujie
3497f00cb4 Merge pull request #2759 from zhayujie/feat-multimodel
feat(vision): prioritize main model for image recognition
2026-04-11 19:55:15 +08:00
zhayujie
5355d45031 Merge pull request #2756 from octo-patch/feature/add-minimax-m2.7-highspeed-tts
feat: add MiniMax-M2.7-highspeed model and MiniMax TTS support
2026-04-11 19:54:03 +08:00
zhayujie
26693acc3f feat(vision): prioritize main model for image recognition with multi-provider fallback
- Add call_vision method to all bot implementations (DashScope, Claude,
  Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot)
  using each vendor's native multimodal API format
- Remove call_with_tools/call_vision from Bot base class to fix MRO
  shadowing issue with OpenAICompatibleBot mixin
- Refactor vision tool provider resolution: MainModel → other configured
  models (auto-discovered) → OpenAI → LinkAI, with automatic fallback
- Return actual model name used in call_vision responses
- Sync config.json API keys to .env bidirectionally on startup
- Fix bot instance cache to detect bot_type/use_linkai config changes
- Add SSE reconnection support for web console
- Preserve image path hints in Gemini text for correct vision tool calls
- Update docs/tools/vision.mdx
2026-04-11 19:46:11 +08:00
zhayujie
76e9fef3b2 feat(knowledge): add file list and graph in web channel 2026-04-11 19:02:55 +08:00
octo-patch
c34308cbd4 feat: add MiniMax-M2.7-highspeed model and MiniMax TTS support
- Add MiniMax-M2.7-highspeed constant to const.py and MODEL_LIST
- Update MinimaxBot default model from MiniMax-M2.1 to MiniMax-M2.7
- Add MinimaxVoice TTS provider (voice/minimax/minimax_voice.py)
  - Supports speech-2.8-hd and speech-2.8-turbo models
  - SSE streaming with hex-decoded audio chunks
  - Reuses MINIMAX_API_KEY
- Register MinimaxVoice in voice factory
- Add unit tests (14 tests, all passing)
- Update README with MiniMax-M2.7-highspeed and TTS configuration
2026-04-11 17:03:44 +08:00
zhayujie
5a10476010 feat: add knowledge switch and cli 2026-04-11 16:44:25 +08:00
zhayujie
46e80dceec Merge pull request #2755 from 6vision/fix/generic-file-send
fix: send generic file types (tar.gz, zip, etc.) as FILE instead of TEXT
2026-04-11 16:36:34 +08:00
6vision
90d1835353 fix: send generic file types (tar.gz, zip, etc.) as FILE instead of TEXT
Previously, files with extensions not in the known categories (image, document, video, audio) fell through to a fallback that returned ReplyType.TEXT, causing the file to never actually be sent to the user. Now the fallback uses ReplyType.FILE so all file types are delivered.

Made-with: Cursor
2026-04-11 15:45:34 +08:00
zhayujie
845fadd0aa fix(knowledge): modify knowledge skill 2026-04-10 18:22:54 +08:00
zhayujie
5748ded52c feat(knowledge): change knowledge base to index-driven self-organizing structure 2026-04-10 16:06:04 +08:00
zhayujie
6a737fb734 feat: display thinking content in web console 2026-04-10 15:07:23 +08:00
zhayujie
3cd92ccda3 feat: add port config 2026-04-09 21:29:53 +08:00
zhayujie
54e81aba11 feat(memory+knowledge): add knowledge wiki system and Light Dream memory extraction
- Add knowledge/ directory structure and knowledge-wiki skill for structured knowledge accumulation
- Auto-inject MEMORY.md into system prompt with truncation (last 200 lines)
- Light Dream: extend flush_memory to extract long-term memories into MEMORY.md with date stamps
- Add mandatory knowledge auto-write rules in system prompt (no user confirmation needed)
- Expand MemoryManager.sync() to index knowledge/ files for vector search
- Update RULE.md template with workspace conventions and knowledge guidelines
2026-04-09 21:22:43 +08:00
zhayujie
d86cb4ded6 fix(weixin): update weixin channel version 2026-04-09 09:55:07 +08:00
zhayujie
4d5375f6d6 fix(win): add Windows platform hint in bash tool description 2026-04-08 16:54:26 +08:00
zhayujie
424557fedb fix(win): use PowerShell instead of cmd.exe 2026-04-08 16:50:45 +08:00
zhayujie
89251e603f fix(win): use PowerShell instead of cmd.exe for bash tool on Windows 2026-04-08 16:18:56 +08:00
zhayujie
a653ed07eb fix(win): defer pip install to a helper bat after cow.exe exits 2026-04-08 15:31:03 +08:00
zhayujie
ad86deb014 fix: prioritize using a custom master model for vision 2026-04-08 15:16:59 +08:00
zhayujie
9525dc7584 fix: avoid stale cow.exe on Windows by spawing fresh process 2026-04-08 12:07:18 +08:00
zhayujie
cd31dd27fd fix: increase web console capacity and add frontend retry 2026-04-08 11:48:27 +08:00
zhayujie
360e3670eb feat(browser): detect implicit interactive elements 2026-04-07 01:41:14 +08:00
zhayujie
8dabe3b4c8 fix: remove install-browser cmd display in /help 2026-04-04 23:28:57 +08:00
zhayujie
443e0c2806 feat: show video in web channel 2026-04-03 17:09:38 +08:00
zhayujie
9cc173cc4d fix: use dynamic model name in system prompt runtime info 2026-04-02 17:01:56 +08:00
zhayujie
b5f33e5ecd feat: support qwen3.6-plus 2026-04-02 16:46:58 +08:00
zhayujie
40dfc6860f fix: skill list showing sub-skills inside collection 2026-04-02 11:47:24 +08:00
zhayujie
1c02a04423 fix: handle error when printing QR code on Windows GBK terminals 2026-04-01 17:23:57 +08:00
zhayujie
de0e45070c chore: remove conflicting dependency 2026-04-01 17:19:15 +08:00
zhayujie
c169cc7d74 fix: remove conflicting dependency 2026-04-01 17:12:15 +08:00
zhayujie
cd62ad76f6 fix: cow CLI support python3.7 2026-04-01 16:51:23 +08:00
zhayujie
dd25b0fb5b feat: refine system prompt style and tone guidance 2026-04-01 16:24:41 +08:00
zhayujie
a38b22a6a2 docs: update docs 2026-04-01 15:31:41 +08:00
zhayujie
830b8f2971 feat: release 2.0.5 2026-04-01 15:01:53 +08:00
zhayujie
b058af122c feat: release 2.0.5 2026-04-01 12:24:21 +08:00
zhayujie
174ee0cafc fix(security): prevent path traversal in memory content API 2026-04-01 10:03:58 +08:00
zhayujie
1c336380c0 docs: update release doc 2026-03-31 22:30:31 +08:00
zhayujie
3068880413 feat: save skill display name when downloading 2026-03-31 21:43:57 +08:00
zhayujie
be596681e5 Merge pull request #2735 from zhayujie/feat-wecom-bot-qrcode
feat(wecom_bot): add Wecom Bot QR code scan auth
2026-03-31 21:28:39 +08:00
zhayujie
66b71c50e9 feat(wecom_bot): add Wecom Bot QR code scan auth 2026-03-31 21:27:50 +08:00
zhayujie
8744810b25 fix: skill install timeout 2026-03-31 20:47:59 +08:00
zhayujie
7f94d37c2e fix: auto-install font in browser 2026-03-31 20:20:13 +08:00
zhayujie
6d9b7baeb4 fix(weixin): file send failed 2026-03-31 18:14:49 +08:00
zhayujie
4470d4c352 fix: reduce docker image size 2026-03-31 16:56:27 +08:00
zhayujie
d2a462a279 fix: add apt source in docker file 2026-03-31 16:34:47 +08:00
zhayujie
14ff2a15e7 fix(cli): cow cli in docker chat 2026-03-31 16:25:47 +08:00
zhayujie
6d1369900e feat: add source args in docker building 2026-03-31 16:06:45 +08:00
zhayujie
1f17ebe69e feat: add browser install in docker image 2026-03-31 16:05:05 +08:00
zhayujie
1ae2918064 feat: support install browser in chat 2026-03-31 15:15:17 +08:00
zhayujie
b6571e5cad fix: browser resource optimization 2026-03-30 21:39:38 +08:00
zhayujie
7549d48cf1 fix: browser thread bug 2026-03-30 21:27:08 +08:00
zhayujie
00353dd0cb feat: support skill hub mirror 2026-03-30 18:46:02 +08:00
zhayujie
afd947195d fix(cli): support skill mirror install 2026-03-30 16:36:17 +08:00
zhayujie
e57ef37167 fix: prevent phantom mouseover from hijacking slash menu 2026-03-30 11:52:05 +08:00
zhayujie
ef33a93654 Merge pull request #2731 from zkjqd/fix/slash-menu-click
Fix the issue where the shortcut command in the input box cannot be clicked to select events
2026-03-30 11:40:06 +08:00
zhayujie
61732aecfc Merge pull request #2721 from yrk111222/feat/modelscope-update
Feat/modelscope update
2026-03-30 11:39:50 +08:00
zkjqd
6764c05c3f input-slash-click 2026-03-30 11:20:03 +08:00
zhayujie
fa149cf4aa fix(browser): multi-thread browser instance bug 2026-03-30 00:57:19 +08:00
zhayujie
e4f9697d06 feat(browser): install font in linux 2026-03-29 23:52:51 +08:00
zhayujie
da061450e5 fix: github skill install cmd 2026-03-29 19:23:47 +08:00
zhayujie
d09ae49287 feat(browser): auto-snapshot on navigate, screenshot prompt guidance
Browser tool enhancements:
- Navigate action now auto-includes snapshot result, saving one LLM round-trip
- Wait for networkidle + 800ms after navigation for SPA/JS-rendered pages
- Prompt guides agent to screenshot key results and ask user for login/CAPTCHA help
- Fixed playwright version pinned to 1.52.0; mirror fallback to official CDN on failure

Web console file/image support:
- SSE real-time push for images and files via on_event (file_to_send)
- Added /api/file endpoint to serve local files for web preview
- Frontend renders images in media-content container (survives delta/done overwrites)
- File attachment cards with download links; RFC 5987 encoding for non-ASCII filenames

Tool workspace fix:
- Inject workspace_dir as cwd into send and browser tools (previously only file tools)
- Screenshots now save to ~/cow/tmp/ instead of project directory
2026-03-29 19:09:11 +08:00
zhayujie
511ee0bbaf fix: windows PowerShell script 2026-03-29 18:28:50 +08:00
zhayujie
3cb5a0fbd6 docs: add CLI system docs 2026-03-29 17:57:12 +08:00
zhayujie
e06925ab85 fix: optimize browser install cli and fix vision prompt 2026-03-29 15:19:59 +08:00
zhayujie
184634e4e7 fix(cli): browser install failed 2026-03-29 15:14:07 +08:00
zhayujie
843c2d02cc Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-03-29 15:09:37 +08:00
zhayujie
8ea2455766 feat(cli): add browser install cmd 2026-03-29 15:09:07 +08:00
zhayujie
9dc9987d56 Merge pull request #2727 from zhayujie/feat-browser-tool
feat: add browser tool
2026-03-29 14:59:39 +08:00
zhayujie
3458621147 feat: add browser tool 2026-03-29 14:59:06 +08:00
zhayujie
079df5a47c feat: support batch skill install from zip and github 2026-03-29 14:38:11 +08:00
zhayujie
ddb07c65a1 feat: support github zip-first download, gitLab, git@ ssh, local path 2026-03-29 13:45:15 +08:00
zhayujie
9b21cd222b fix: update run.sh 2026-03-28 19:36:51 +08:00
zhayujie
90f736843f fix: add click dependencies 2026-03-28 19:35:15 +08:00
zhayujie
13c020eb61 fix(cli): cli output in wecom_bot 2026-03-28 19:26:59 +08:00
zhayujie
dbc06dbe95 fix: use new run.sh when updating 2026-03-28 19:16:41 +08:00
zhayujie
23d097bc1c Merge pull request #2726 from zhayujie/feat-cow-cli
feat: cow cli in terminal and chat
2026-03-28 19:01:56 +08:00
zhayujie
db85b9808e feat(cli): add cow update 2026-03-28 18:58:42 +08:00
zhayujie
df5bae37bc feat: add MiniMax-M2.7 and glm-5-turbo in web console 2026-03-28 18:48:11 +08:00
zhayujie
acc23b6051 feat: optimize agent prompt and fix skill source load 2026-03-28 18:37:07 +08:00
zhayujie
61f2741afc feat: organize skill source field 2026-03-28 17:41:40 +08:00
zhayujie
4dd7ea886a feat(cli): cli options in web console 2026-03-28 16:26:41 +08:00
zhayujie
1e8959fbcf fix: optimize repo clone in run.sh 2026-03-28 15:08:57 +08:00
zhayujie
48729678cf Merge branch 'master' into feat-cow-cli 2026-03-28 14:47:20 +08:00
zhayujie
0684becaa7 fix(cli): register skill when installing 2026-03-28 14:42:18 +08:00
zhayujie
db16bdf8cb fix(cli): add security hardening for skill install and process management 2026-03-27 17:59:15 +08:00
zhayujie
f890318ed9 fix: strip leading/trailing whitespace from agent response 2026-03-26 18:13:39 +08:00
zhayujie
158510cbbe feat(cli): imporve cow cli and skill hub integration 2026-03-26 16:49:42 +08:00
zhayujie
ce90cf7aa8 fix: weixin cdn upload retry 2026-03-26 10:20:29 +08:00
zhayujie
a3a3d006eb Merge pull request #2723 from Xiaozhou345/Xiaozhou345-fix-readme-spacing
优化 README 中的中英文排版空格
2026-03-26 10:14:27 +08:00
zhayujie
8fd029a4a1 feat(cli): support cow cli 2026-03-26 10:08:51 +08:00
Xiaozhou345
2e1b52c1e5 优化 README 中的中英文排版空格
按照中文技术文档规范,在文件名和中文之间增加了空格,提升可读性。
2026-03-25 21:26:01 +08:00
zhayujie
3eb8348708 fix: docker volume permission issue and clean up unused dependencies 2026-03-25 01:25:34 +08:00
zhayujie
393f0c007c fix: context loss after trim 2026-03-24 20:49:28 +08:00
yrk
294e380288 update model_list 2026-03-24 11:00:55 +08:00
yrk
4c1c42efac feat: update modelscope bot 2026-03-24 10:43:45 +08:00
zhayujie
c062ca8c66 Merge pull request #2720 from 6vision/fix/deepseek-docs
Docs: update
2026-03-24 00:25:17 +08:00
6vision
76dcb25103 docs(deepseek): update model descriptions to V3.2 with thinking/non-thinking mode
Made-with: Cursor
2026-03-24 00:05:39 +08:00
6vision
c5b4f236db docs(deepseek): remove migration notes from zh and en docs
Made-with: Cursor
2026-03-24 00:05:39 +08:00
zhayujie
0974c940a8 Merge pull request #2719 from 6vision/feat/deepseek-bot
feat: add independent DeepSeek bot module with dedicated config
2026-03-23 22:42:58 +08:00
6vision
cffa20d37e docs(deepseek): remove migration notes to reduce user cognitive load
Made-with: Cursor
2026-03-23 22:39:15 +08:00
6vision
ef009edd29 docs(deepseek): update config guides for independent DeepSeek module
Update DeepSeek docs (zh/en/ja) and README to reflect the new dedicated deepseek_api_key / deepseek_api_base config fields, with backward compatibility notes.

Made-with: Cursor
2026-03-23 21:43:51 +08:00
zhayujie
3ca52b118d fix(weixin): qrcode url log 2026-03-23 21:33:53 +08:00
zhayujie
13f5fde4fb fix: rebuild system prompt from scratch on every turn 2026-03-23 21:27:44 +08:00
6vision
f512b55ec2 feat(deepseek): add independent DeepSeek bot module with dedicated config
Separate DeepSeek from ChatGPTBot into its own module (models/deepseek/) with dedicated deepseek_api_key and deepseek_api_base config fields, avoiding config conflicts when switching between providers. Backward compatible with old users who configured DeepSeek via open_ai_api_key/open_ai_api_base through automatic fallback.

Made-with: Cursor
2026-03-23 21:23:35 +08:00
zhayujie
22b8ca0095 feat: optimize vision image compression 2026-03-23 21:18:04 +08:00
zhayujie
baf66a103d fix(weixin): preserve original filename for received files 2026-03-23 01:18:02 +08:00
zhayujie
45faa9c1ff fix(wexin): resolve image/file send and receive failures 2026-03-23 00:13:41 +08:00
zhayujie
304381a88d fix: hide breadcrumb on mobile for better space utilization 2026-03-22 23:36:34 +08:00
zhayujie
fc9f54dbc8 feat(weixin): optimize login qrcode generate 2026-03-22 23:04:50 +08:00
zhayujie
7199dc187f fix: default gemini model 2026-03-22 22:52:37 +08:00
zhayujie
e9ae066d53 Merge pull request #2716 from cowagent/fix-gemini-model-attribute
fix: add missing model property to GoogleGeminiBot
2026-03-22 22:49:00 +08:00
cowagent
d71ae406ff fix: add missing model property to GoogleGeminiBot
api_key and api_base were refactored to @property but model was not
migrated, causing AttributeError: 'GoogleGeminiBot' object has no
attribute 'model' when using any Gemini model.
2026-03-22 22:43:26 +08:00
zhayujie
f3216904b3 feat(weixin): optimize weixin login qrcode 2026-03-22 21:34:47 +08:00
zhayujie
5958b69ec9 feat: release 2.0.4 2026-03-22 20:49:41 +08:00
zhayujie
7d4e2cb39a docs: update comments 2026-03-22 19:07:19 +08:00
zhayujie
a483ec0cea feat: optimize weixin channel qr code generate 2026-03-22 18:20:10 +08:00
zhayujie
c1421e0874 feat: support weixin channel in scripts 2026-03-22 16:29:12 +08:00
zhayujie
ce89869c3c feat: support weixin channel 2026-03-22 15:52:13 +08:00
zhayujie
b8b57e34ff fix: auto-repair messages 2026-03-21 14:20:22 +08:00
zhayujie
bc7f627253 fix(wecom_bot): compat with old websocket-client 2026-03-21 14:03:17 +08:00
zhayujie
652156e398 feat: make run.sh executable 2026-03-20 17:56:10 +08:00
zhayujie
9febb071c6 fix: run.sh get pid bug 2026-03-20 17:51:04 +08:00
zhayujie
7d0e1568ac fix: feishu msg and log encoding 2026-03-19 17:07:39 +08:00
zhayujie
b4e711f411 feat: add request header 2026-03-19 17:06:05 +08:00
zhayujie
1b5be1b981 fix: remove feishu_bot_name in run.sh 2026-03-19 14:55:12 +08:00
zhayujie
49d8707c58 refactor: simplify run.sh by extracting shared logic and eliminating duplication 2026-03-19 11:07:16 +08:00
zhayujie
9192f6f7f7 feat: add MiniMax-M2.7 and glm-5-turbo 2026-03-19 10:46:13 +08:00
zhayujie
05022e3745 fix: add log 2026-03-18 23:09:27 +08:00
zhayujie
5356e9ddeb docs: adjust docs order 2026-03-18 21:55:09 +08:00
zhayujie
52acf76e2c docs: update jp docs 2026-03-18 21:01:02 +08:00
zhayujie
40cdbd3b45 Merge pull request #2710 from eltociear/add-ja-doc
docs: add Japanese documents
2026-03-18 19:28:04 +08:00
Ikko Ashimine
5487c0befe docs: add Japanese documents 2026-03-18 19:13:39 +09:00
zhayujie
8bb16c48c0 docs: update install cmd 2026-03-18 16:11:35 +08:00
zhayujie
c6384363f9 feat: workspace volume in docker deploy 2026-03-18 16:03:03 +08:00
zhayujie
8993e8ad3e feat: release 2.0.3 2026-03-18 15:40:49 +08:00
zhayujie
289989d9f7 feat: release 2.0.3 2026-03-18 15:10:21 +08:00
zhayujie
dc2ae0e6f1 feat: support gpt-5.4-mini and gpt-5.4-nano 2026-03-18 14:55:29 +08:00
zhayujie
9c966c152d feat: enhance AGENT.md update prompts to encourage proactive evolution 2026-03-18 12:10:45 +08:00
zhayujie
4efae41048 feat: support coding plan 2026-03-18 11:59:22 +08:00
zhayujie
b8437032e9 fix: optimize image recognition prompts 2026-03-18 10:10:23 +08:00
zhayujie
2d339ca81b Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-03-17 23:03:05 +08:00
zhayujie
d53abc9696 docs: update README.md 2026-03-17 23:02:41 +08:00
zhayujie
446c886d38 Merge pull request #2706 from zhayujie/feat-web-files
feat: support files upload in web console and office parsing
2026-03-17 21:22:38 +08:00
zhayujie
30c6d9b5ae feat: support file and image upload in web console, add office docs parsing in read tool 2026-03-17 21:21:03 +08:00
zhayujie
5e42996b36 fix: guide LLM to use matching skill when tool not found 2026-03-17 18:34:09 +08:00
zhayujie
ceca7b85bf Merge pull request #2705 from zhayujie/feat-qq-channel
feat: add qq channel
2026-03-17 17:26:39 +08:00
zhayujie
a4d54f58c8 feat: complete the QQ channel and supplement the docs 2026-03-17 17:25:36 +08:00
zhayujie
005a0e1bad feat: add qq channel 2026-03-17 15:43:04 +08:00
zhayujie
46d97fd57d feat: channel config set to env 2026-03-17 11:36:20 +08:00
zhayujie
72a26b6353 fix: scheduler auto clean 2026-03-17 11:29:21 +08:00
zhayujie
89a4033fbf fix: web console bot_type 2026-03-17 10:47:41 +08:00
zhayujie
39a5dc64bd Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-03-16 19:07:54 +08:00
zhayujie
d4bdd9b1b7 docs: update README.md for wecom_bot channel 2026-03-16 19:07:08 +08:00
zhayujie
2f5ba87280 Merge pull request #2698 from zhayujie/feat-wecom-bot
feat: wecom_bot channel
2026-03-16 19:04:52 +08:00
zhayujie
8b45d6c750 docs: wecom_bot integration docs 2026-03-16 19:03:18 +08:00
zhayujie
4ecd4df2d4 feat: web console support wecom_bot config 2026-03-16 17:56:59 +08:00
zhayujie
a42f31fe52 feat: support wecom_bot stream card 2026-03-16 17:46:05 +08:00
zhayujie
d4480b695e feat(channel): add wecom_bot channel 2026-03-16 14:39:15 +08:00
zhayujie
c4b5f7fbae refactor: remove unavailable channels 2026-03-16 11:05:45 +08:00
zhayujie
ba915f2cc0 feat: add gemini-3.1-flash-lite-preview and gpt-5.4 2026-03-15 22:06:12 +08:00
zhayujie
4b91140f31 fix: optimize msg receive 2026-03-12 20:49:36 +08:00
zhayujie
9879878dd0 fix: concurrency issue in session 2026-03-12 17:08:09 +08:00
zhayujie
d78105d57c fix: tool call match 2026-03-12 17:05:27 +08:00
zhayujie
153c9e3565 fix(memory): remove useless prompt 2026-03-12 15:29:58 +08:00
zhayujie
c11623596d fix(memory): prevent context memory loss by improving trim strategy 2026-03-12 15:25:46 +08:00
zhayujie
e791a77f77 fix: strengthen bootstrap flow 2026-03-12 12:13:05 +08:00
zhayujie
b641bffb2c fix(feishu): remove bot_name dependency for group chat 2026-03-12 11:30:42 +08:00
zhayujie
ee0c47ac1e feat: file send prompt 2026-03-12 00:11:34 +08:00
zhayujie
eba90e9343 fix: workspace bootstrap 2026-03-11 23:35:42 +08:00
zhayujie
d8374d0fa5 fix: web_fetch encoding 2026-03-11 19:42:37 +08:00
zhayujie
fa61744c6d feat(web_fetch): support downloading and parsing remote document files (PDF, Word, Excel, PPT) 2026-03-11 17:47:15 +08:00
zhayujie
4fec55cc01 feat: web_featch tool support remote file url 2026-03-11 17:16:39 +08:00
zhayujie
1767413712 fix: increase minimax max_tokens 2026-03-11 15:31:35 +08:00
zhayujie
734c8fa84f fix: optimize skill prompt 2026-03-11 12:40:37 +08:00
zhayujie
9a8d422554 feat: package skill install 2026-03-11 12:18:36 +08:00
zhayujie
b21e945c76 feat: optimize bootstrap flow 2026-03-11 11:27:08 +08:00
zhayujie
a02bf1ea09 Merge pull request #2693 from 6vision/fix/bot-type-and-web-config
fix: rename zhipu bot_type, persist bot_type in web config, fix re.syb escape error
2026-03-11 10:24:19 +08:00
zhayujie
eda82bac92 fix: gemini tool call bug 2026-03-11 02:04:09 +08:00
zhayujie
e8d4f7dc4f fix: remove useless file 2026-03-10 22:56:00 +08:00
6vision
c4a93b7789 fix: rename zhipu bot_type, persist bot_type in web config, fix re.sub escape error
- Rename ZHIPU_AI bot type from glm-4 to zhipu to avoid confusion with model names

- Add bot_type persistence in web config to fix provider dropdown resetting on refresh

- Change OpenAI provider key to chatGPT to match bot_factory routing

- Add DEEPSEEK constant and route it to ChatGPTBot (OpenAI-compatible API)

- Keep backward compatibility for legacy bot_type glm-4 in bot_factory

- Fix re.sub bad escape error on Windows paths by using lambda replacement

- Remove unused pydantic import in minimax_bot.py

Made-with: Cursor
2026-03-10 21:34:24 +08:00
zhayujie
c3f9925097 fix: remove injected max-steps prompt from persisted conversation history 2026-03-10 20:08:59 +08:00
zhayujie
2a0cf7511a Merge pull request #2692 from 6vision/master
update:Adjust bot_type resolution priority in Agent mode
2026-03-10 15:17:22 +08:00
6vision
d0a70d3339 update:Adjust bot_type resolution priority in Agent mode 2026-03-10 15:14:01 +08:00
zhayujie
f37e4675dd Merge pull request #2691 from Weikjssss/fix-bot-type-conf
fix: pass bot_type in agent mode
2026-03-10 15:00:04 +08:00
zhayujie
4e32f67eeb fix: validate tool_call_id pairing #2690 2026-03-10 14:52:07 +08:00
Weikjssss
36d54cab52 fix: pass bot_type in agent mode 2026-03-10 14:28:39 +08:00
zhayujie
9d8df10dcf feat: clarify send tool is local-only 2026-03-10 12:10:10 +08:00
zhayujie
45ea88e070 Merge pull request #2689 from cowagent/fix/openai-compat-complete
fix: complete openai_compat migration across all model bots (openai>=1.0 compatibility)
2026-03-10 10:10:58 +08:00
cowagent
d5d0b947f5 fix: complete openai_compat migration across all model bots
Replace all direct openai.error.* usages with the openai_compat
compatibility layer to support openai>=1.0.

Affected files:
- models/chatgpt/chat_gpt_bot.py: fix isinstance checks (RateLimitError, Timeout, APIError, APIConnectionError)
- models/openai/open_ai_bot.py: replace import + fix isinstance checks
- models/ali/ali_qwen_bot.py: replace import + fix isinstance checks
- models/modelscope/modelscope_bot.py: remove unused openai.error import

The openai_compat layer (models/openai/openai_compat.py) already
handles both openai<1.0 and openai>=1.0 gracefully. This completes
the migration started in the existing PR #2688.
2026-03-10 10:06:04 +08:00
zhayujie
f775f1f11e Merge pull request #2688 from JasonOA888/fix/openai-compat
fix: use openai_compat layer for error handling (openai>=1.0 compatibility)
2026-03-10 10:02:41 +08:00
JasonOA888
f1e888f3de fix: use openai_compat layer for error handling
The code was directly importing openai.error which fails with openai>=1.0.
The project already has an openai_compat.py compatibility layer that handles
both old (<1.0) and new (>=1.0) OpenAI SDK versions.

This commit updates chat_gpt_bot.py to use the compatibility layer.

Related: #2687
2026-03-10 00:33:45 +08:00
zhayujie
71c8436e90 fix: skill download to temp dir 2026-03-09 18:43:28 +08:00
zhayujie
08c69f5e9b fix: clean existing skill directory before remote install to ensure full overwrite 2026-03-09 17:23:09 +08:00
zhayujie
a50fafaca2 refactor: convert image vision from skill to native tool 2026-03-09 16:01:56 +08:00
zhayujie
3c6781d240 refactor: inline skill-creator reference files into SKILL.md 2026-03-09 12:02:52 +08:00
zhayujie
3b8b5625f8 feat: add image vision provider 2026-03-09 11:37:45 +08:00
zhayujie
6be2034110 feat: add fallback embedding provider 2026-03-09 11:03:31 +08:00
zhayujie
924dc79f00 perf: lazy import to avoid 4-10s startup delay 2026-03-09 10:21:58 +08:00
zhayujie
ccb9030d3c refactor: convert web-fetch from skill to native tool 2026-03-09 10:13:48 +08:00
zhayujie
8623287ac1 docs: update memory system docs 2026-03-08 22:06:28 +08:00
zhayujie
022c13f3a4 feat: upgrade memory flush system
- Use LLM to summarize discarded context into concise daily memory entries
- Batch trim to half when exceeding max_turns/max_tokens, reducing flush frequency
- Run summarization asynchronously in background thread, no blocking on replies
- Add daily scheduled flush (23:55) as fallback for low-activity days
- Sync trimmed messages back to agent to keep context state consistent
2026-03-08 21:56:12 +08:00
zhayujie
0687916e7f fix: Safari IME enter key triggering message send
Made-with: Cursor
2026-03-08 13:21:31 +08:00
zhayujie
bb868b83ba feat: add chat history query 2026-03-08 13:03:27 +08:00
zhayujie
24298130b9 fix: minimax tool_id missing 2026-03-06 18:42:03 +08:00
zhayujie
6e5ee92ebd docs: add gpt-5.4 2026-03-06 12:25:50 +08:00
zhayujie
5b91fe04aa fix: send tool process url 2026-03-06 12:22:22 +08:00
zhayujie
1623deb3ee feat: support gpt-5.4 2026-03-06 12:04:40 +08:00
zhayujie
4a16e05b7a fix: rebuild skills when installing 2026-03-05 21:11:34 +08:00
zhayujie
f1c04bc60d feat: improve channel connection stability 2026-03-05 15:55:16 +08:00
zhayujie
84c6f31c76 fix: update agent skill metadata 2026-03-03 18:16:42 +08:00
zhayujie
9d528190bf feat: add skill category 2026-03-03 16:06:37 +08:00
zhayujie
0f23b209ad fix: adjust the context of restart loading 2026-03-03 11:38:14 +08:00
zhayujie
63d9325900 Merge pull request #2683 from pelioo/master
更新.gitignore文件添加python目录忽略规则
2026-03-01 19:41:27 +08:00
peli
f342097f81 Merge remote-tracking branch 'upstream/master' 2026-03-01 00:24:14 +08:00
zhayujie
b4806c4366 fix: model provider config 2026-02-28 18:35:04 +08:00
zhayujie
ff37d8a577 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-02-28 18:10:55 +08:00
zhayujie
a773eb7893 fix: filter history to one user and one assistant per turn 2026-02-28 18:09:02 +08:00
zhayujie
7c67513d24 fix: convert bash-style $VAR to %VAR% on Windows 2026-02-28 18:02:06 +08:00
zhayujie
6ed85029c5 fix: agent skills 2026-02-28 16:46:49 +08:00
zhayujie
e9c57ddf4d fix: adjust default turns 2026-02-28 15:25:20 +08:00
zhayujie
a33ce97ed9 fix: restore only user/assistant text from history, strip tool calls
Made-with: Cursor
2026-02-28 15:14:56 +08:00
zhayujie
b788a3dd4e fix: incomplete historical session messages 2026-02-28 15:03:33 +08:00
zhayujie
fccfa92d7e docs: update channel docs 2026-02-28 14:50:55 +08:00
zhayujie
8705bf0a70 feat: update docs 2026-02-28 10:53:16 +08:00
peli
9318138af7 ```
build(env): 更新.gitignore文件添加python目录忽略规则

在.gitignore文件中新增了python目录的忽略配置,
避免将Python环境相关文件提交到版本控制系统中。
```
2026-02-27 23:49:35 +08:00
zhayujie
269fa7d2d5 feat: 2.0.2 en docs 2026-02-27 18:37:22 +08:00
zhayujie
e99837a8b9 feat: release 2.0.2 2026-02-27 18:04:00 +08:00
zhayujie
553861a2c4 docs: update README.md 2026-02-27 16:57:18 +08:00
zhayujie
628a85d1be docs: update README.md 2026-02-27 16:48:23 +08:00
zhayujie
2cb54514a4 Merge pull request #2681 from zhayujie/feat-docs
feat: docs update
2026-02-27 16:04:17 +08:00
zhayujie
6db22827f2 feat: docs update 2026-02-27 16:03:47 +08:00
zhayujie
4cc6d5426b Merge pull request #2680 from zhayujie/feat-web-config
feat: web console config
2026-02-27 14:40:44 +08:00
zhayujie
7d258b5202 feat(channels): add multi-channel management UI with real-time connect/disconnect
- Web console Channels page: display active channels as config cards, support
  save/connect/disconnect with real-time start/stop of channel processes
- Custom dropdown for channel selection (consistent with model selector style),
  custom confirmation dialog for disconnect
- Fix channel stop: use sys.modules['__main__'] to access live ChannelManager
- Fix web request pending: move stop logic outside lock, set daemon_threads=True
- Fix reconnect: new asyncio event loop per startup, ctypes thread interrupt,
  5s grace period before re-establishing remote connection
- Filter stale offline messages (>60s) pushed after reconnect
2026-02-27 14:39:40 +08:00
zhayujie
c8d19ee0bc Merge pull request #2679 from zhayujie/feat-docs
docs: init docs
2026-02-27 12:14:37 +08:00
zhayujie
d891312032 docs: init docs 2026-02-27 12:10:16 +08:00
zhayujie
5edbf4ce32 feat: model and agent config in web console 2026-02-26 21:01:37 +08:00
zhayujie
3ddbdd713d Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-02-26 18:57:43 +08:00
zhayujie
9ba107b511 Merge branch 'feat-multi-channel' 2026-02-26 18:57:19 +08:00
zhayujie
c9adddb76a fix: pass channel_type correctly in multi-channel mode 2026-02-26 18:57:08 +08:00
zhayujie
f0a12d5ff5 Merge pull request #2678 from zhayujie/feat-multi-channel
feat: support multi-channel
2026-02-26 18:34:48 +08:00
zhayujie
7cce224499 feat: support multi-channel 2026-02-26 18:34:08 +08:00
zhayujie
97397ca585 Merge pull request #2674 from haosenwang1018/fix/bare-excepts
fix: replace 29 bare except clauses with except Exception
2026-02-26 12:11:49 +08:00
zhayujie
f2fbc602a8 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-02-26 10:45:01 +08:00
zhayujie
925d728a86 fix: replace upsert syntax to support SQLite lower version 2026-02-26 10:44:04 +08:00
zhayujie
f5f229871b Merge pull request #2676 from zhayujie/feat-multi-channel
feat: improve web console and conversation store
2026-02-26 10:37:03 +08:00
zhayujie
9917552b4b fix: improve web UI stability and conversation history restore
- Fix dark mode FOUC: apply theme in <head> before first paint, defer
  transition-colors to post-init to avoid animated flash on load
- Fix Safari IME Enter bug: defer compositionend reset via setTimeout(0)
- Fix history scroll: use requestAnimationFrame before scrollChatToBottom
- Limit restore turns to min(6, max_turns//3) on restart
- Fix load_messages cutoff to start at turn boundary, preventing orphaned
  tool_use/tool_result pairs from being sent to the LLM
- Merge all assistant messages within one user turn into a single bubble;
  render tool_calls in history using same CSS as live SSE view
- Handle empty choices list in stream chunks
2026-02-26 10:35:20 +08:00
haosenwang1018
adca89b973 fix: replace bare except clauses with except Exception
Bare `except:` catches BaseException including KeyboardInterrupt and
SystemExit. Replaced 29 instances with `except Exception:`.
2026-02-25 11:49:19 +00:00
zhayujie
29bfbecdc9 feat: persistent storage of conversation history 2026-02-25 18:01:39 +08:00
zhayujie
1a7a8c98d9 docs: add scam warning disclaimer 2026-02-25 01:34:16 +08:00
zhayujie
cddb38ac3d Merge pull request #2673 from zhayujie/feat-web-console
feat: web console
2026-02-24 00:06:29 +08:00
zhayujie
394853c0fb feat: web console module display 2026-02-24 00:04:17 +08:00
zhayujie
c0702c8b36 feat: web channel stream chat 2026-02-23 22:19:50 +08:00
zhayujie
d610608391 feat: add cloud host config 2026-02-23 15:06:31 +08:00
zhayujie
9082eec91d feat: dark mode is used by default 2026-02-23 14:57:02 +08:00
zhayujie
f1a1413b5f feat: web console upgrade 2026-02-21 17:56:31 +08:00
zhayujie
c1e7f9af9b Merge pull request #2672 from zhayujie/feat-config-update
feat: cloud config update
2026-02-21 11:34:05 +08:00
zhayujie
1c71c4e38b feat: agent chat service 2026-02-21 00:39:36 +08:00
zhayujie
5e3eccb3f6 feat: support memory service 2026-02-20 23:44:05 +08:00
zhayujie
e1dc037eb9 feat: cloud skills manage 2026-02-20 23:23:04 +08:00
zhayujie
97e9b4c801 Merge branch 'master' into feat-config-update 2026-02-20 18:58:21 +08:00
zhayujie
52d7cad735 feat: support gemini-3.1-pro-preview and claude-4.6-sonnet 2026-02-20 12:14:59 +08:00
zhayujie
c0b1d270ba Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-02-19 14:18:39 +08:00
zhayujie
e59a2892e4 feat: support qwen3.5-plus 2026-02-19 14:18:16 +08:00
zhayujie
5fa0376a49 Merge pull request #2670 from SgtPepper114/fix/gemini-dingtalk-image-inline
fix(gemini): 修复钉钉图片标记未转多模态导致的识图失效
2026-02-19 13:57:04 +08:00
SgtPepper114
05a33042c8 fix(gemini): support dingtalk image markers as multimodal input
- parse [图片: path] markers in text and convert to Gemini inlineData parts

- unify reply path via call_with_tools to reuse multimodal conversion

- keep legacy safety behavior (BLOCK_NONE) and restore safety ratings logging on empty response

- add multimodal request image-part count log for debugging
2026-02-16 13:26:57 +00:00
zhayujie
ce58f23cbc feat: dashscope model name 2026-02-16 20:11:38 +08:00
zhayujie
b6fc9fa370 fix: run script dependency issues 2026-02-15 00:02:50 +08:00
zhayujie
00ae38faae docs: update models in README 2026-02-14 17:36:36 +08:00
zhayujie
ab28ee58ab feat: add doubao-2.0-code model and update README 2026-02-14 16:49:44 +08:00
zhayujie
48db538a2e feat: support Minimax-M2.5, glm-5, kimi-k2.5 2026-02-14 15:27:44 +08:00
zhayujie
46945942e1 feat: support channel start in sub thread 2026-02-13 12:38:52 +08:00
zhayujie
a24b26a1ef Merge pull request #2667 from cowagent/fix-wechatcom-image-support
fix: 支持企业微信图片消息识别功能
2026-02-12 16:44:18 +08:00
zhayujie
6f8421cdd5 fix: 支持企业微信图片消息识别功能
- 在 ChatGPTBot 中添加 ContextType.IMAGE 处理分支
- 新增 reply_image() 方法,支持 OpenAI Vision API
- 自动 Base64 编码图片并检测格式
- 自动清理临时文件

修复 #2625
2026-02-12 12:00:24 +08:00
zhayujie
284cd9bca9 Merge pull request #2666 from cowagent/fix-model-type-validation
fix: handle non-string model_type to prevent AttributeError
2026-02-10 11:31:45 +08:00
cowagent
23fd6b8d2b fix: handle non-string model_type to prevent AttributeError
When numeric model names (e.g., '1') are used with vLLM and configured
in YAML without quotes, they are parsed as integers. This causes
AttributeError when calling startswith() method.

Changes:
- Add type checking for model_type
- Convert non-string model_type to string with warning log
- Prevents crash when using custom numeric model names

Fixes #2664
2026-02-10 11:07:10 +08:00
zhayujie
4f0ea5d756 feat: make web search a built-in tool 2026-02-09 11:37:11 +08:00
zhayujie
6c218331b1 fix: improve skill system prompts and simplify tool descriptions
- Simplify skill-creator installation flow
- Refine skill selection prompt for better matching
- Add parameter alias and env variable hints for tools
- Skip linkai-agent when unconfigured
- Create skills/ dir in workspace on init
2026-02-08 18:59:59 +08:00
zhayujie
cea7fb7490 fix: add intelligent context cleanup #2663 2026-02-07 20:42:41 +08:00
zhayujie
8acf2dbdfe fix: chat context overflow #2663 2026-02-07 20:36:24 +08:00
zhayujie
0542700f90 fix: issues with empty tool calls and handling excessively long tool results 2026-02-07 20:25:05 +08:00
zhayujie
5264f7ce18 fix: getuid not found in windows 2026-02-07 11:17:58 +08:00
zhayujie
051ffd78a3 fix: windows path and encoding adaptation 2026-02-06 18:37:05 +08:00
zhayujie
bea95d4fae Merge pull request #2661 from cowagent/feat-add-claude-opus-4-6
feat: 添加 Claude Opus 4.6 模型支持
2026-02-06 15:09:49 +08:00
cowagent
fdf7bc312f feat: 添加 Claude Opus 4.6 模型支持
- 在 common/const.py 中添加 CLAUDE_4_6_OPUS 常量
- 将 claude-opus-4-6 添加到 MODEL_LIST
- 在 README.md 中更新 Agent 推荐模型列表
- 在 Claude 配置说明中添加 claude-opus-4-6 支持

Claude Opus 4.6 是 Anthropic 于 2026年2月5日发布的最新模型,
具有更强的规划能力和代码能力,适合作为 Agent 推荐模型。
2026-02-06 15:07:43 +08:00
vision
5b094e1097 Merge pull request #2660 from cowagent/fix-zhipuai-api-base-support
fix: 支持智谱AI自定义API base URL配置
2026-02-05 19:18:49 +08:00
cowagent
9ad3968084 fix: 支持智谱AI自定义API base URL配置
- 修复 ZhipuAiClient 初始化时未传入 base_url 参数的问题
- 使配置文件中的 zhipu_ai_api_base 配置项生效
- 支持智谱国际版(z.ai)等自定义API端点
- 同时修复对话和图片生成功能
- 添加日志输出便于确认使用的API地址

Fixes #2659
2026-02-05 19:06:46 +08:00
zhayujie
3958b6aae1 Merge pull request #2657 from cowagent/fix-missing-runtime-info-parameter
fix: 补充缺失的 runtime_info 参数传递
2026-02-04 22:51:53 +08:00
cowagent
eaa413caf0 fix: 补充缺失的 runtime_info 参数传递
问题:
PR #2655 已合并,但遗漏了关键的参数传递环节。runtime_info 在 agent_initializer.py 中创建并传递给 create_agent(),但 agent_bridge.py 的 create_agent() 方法中没有将其传递给 Agent 实例,导致动态时间更新功能无法生效。

影响:
- Agent 实例的 self.runtime_info 为 None
- get_full_system_prompt() 无法检测到动态时间函数
- 时间戳仍然是静态的,不会实时更新

修复:
在 agent_bridge.py 第 236 行添加:
runtime_info=kwargs.get("runtime_info")

这确保了完整的参数传递链路:
agent_initializer → agent_bridge.create_agent → Agent.__init__

---

*来自 [CowAgent](https://github.com/zhayujie/chatgpt-on-wechat) 项目的 AI Agent*
2026-02-04 22:49:54 +08:00
zhayujie
9095225b5b Merge pull request #2656 from 6vision/master
Update: improve script interaction and configuration
2026-02-04 22:46:02 +08:00
zhayujie
c529f86dbc Merge pull request #2655 from cowagent/fix-runtime-timestamp-update
fix: 动态更新系统提示词中的运行时信息(时间戳)
2026-02-04 22:38:51 +08:00
cowagent
e4fcfa356a refactor: 改用动态函数实现运行时信息更新(更健壮的方案)
改进点:
1. builder.py: _build_runtime_section() 支持 callable 动态时间函数
2. agent_initializer.py: 传入 get_current_time 函数而非静态时间值
3. agent.py: _rebuild_runtime_section() 动态调用时间函数并重建该部分

优势:
- 解耦模板:不依赖具体的提示词格式
- 健壮性:提示词模板改变不会导致功能失效
- 向后兼容:保留对静态时间的支持
- 性能优化:只在需要时才计算时间

相比之前的正则匹配方案,这个方案更加优雅和可维护。
2026-02-04 22:37:19 +08:00
vision
8218cff7c1 Merge branch 'zhayujie:master' into master 2026-02-04 22:32:20 +08:00
6vision
6949bbcf39 update: Improve script interaction and configuration 2026-02-04 22:31:40 +08:00
cowagent
480c60c0a7 fix: 动态更新系统提示词中的运行时信息(时间戳)
问题:
- system_prompt 在 Agent 初始化时固定,导致模型获取的时间信息过时
- 长时间运行的会话中,模型对时间判断不准确

解决方案:
- 在 get_full_system_prompt() 中添加动态更新逻辑
- 每次获取系统提示词时,使用正则表达式替换运行时信息中的时间戳
- 保持其他运行时信息(模型、工作空间等)不变

测试:
- 创建测试脚本验证时间动态更新功能
- 等待3秒后时间正确更新(22:19:45 -> 22:19:48)
2026-02-04 22:27:24 +08:00
zhayujie
eec10cb5db fix: claude remove toolname 2026-02-04 22:15:10 +08:00
zhayujie
02c83d8689 docs: update agent.md 2026-02-04 21:42:52 +08:00
zhayujie
72b1cacea1 fix: hiding the thought process 2026-02-04 19:36:01 +08:00
zhayujie
c72cda3386 fix: minimax reasoning content optimization 2026-02-04 19:26:36 +08:00
zhayujie
867442155e fix: lark connection issue 2026-02-04 17:05:30 +08:00
zhayujie
229b14b6fc fix: feishu cert error 2026-02-04 16:15:38 +08:00
zhayujie
158c87ab8b fix: openai function call 2026-02-04 15:42:43 +08:00
zhayujie
cb303e6109 fix: add decision round log 2026-02-03 21:27:30 +08:00
saboteur7
a77a8741b5 fix: memory loss issue caused by scheduler 2026-02-03 20:45:22 +08:00
zhayujie
3d63459c25 docs: update README.md 2026-02-03 15:44:00 +08:00
saboteur7
ce63de3c58 feat: release 2.0.0 2026-02-03 14:48:30 +08:00
saboteur7
4b3b1219b5 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-02-03 12:20:04 +08:00
saboteur7
73b069a76c docs: update 2.0 README.md 2026-02-03 12:19:36 +08:00
Saboteur7
101cf8d108 Merge pull request #2653 from 6vision/deploy-script
feat: enhance one-click deployment script with full lifecycle management
2026-02-03 03:18:49 +08:00
saboteur7
2e926dfb6e fix: python 3.8 compatibility issues 2026-02-03 03:17:11 +08:00
saboteur7
501866d12a feat: optimize document and model usage 2026-02-03 02:58:15 +08:00
6vision
39bcb0869f feat: enhance one-click deployment script with full lifecycle management 2026-02-03 02:56:46 +08:00
saboteur7
a7b99cde4e Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-02-03 01:18:17 +08:00
saboteur7
60abcd92a3 feat: update README.md and solving Python compatibility issues 2026-02-03 01:17:25 +08:00
zhayujie
cdd36e7052 docs: update README.md 2026-02-03 00:48:03 +08:00
saboteur7
c6ac175ce4 docs: update README.md 2026-02-03 00:43:42 +08:00
zhayujie
46bcd87c23 feat: support minimax M2 models 2026-02-02 23:36:23 +08:00
zhayujie
ab74be8e33 feat: add qwen models tool call 2026-02-02 23:08:24 +08:00
zhayujie
d8298b3eab fix: support glm-4.7 2026-02-02 22:43:08 +08:00
zhayujie
50e60e6d05 fix: bug fixes 2026-02-02 22:22:10 +08:00
zhayujie
5d02acbf37 config: add config template 2026-02-02 14:25:34 +08:00
zhayujie
8901d91f96 feat: startup log optimization 2026-02-02 12:25:47 +08:00
zhayujie
b55021bb3d feat: system Initialization log 2026-02-02 12:18:57 +08:00
zhayujie
0ef51b85e6 Merge branch 'feat-cow-agent' 2026-02-02 12:03:55 +08:00
zhayujie
c77566cc02 fix: adjust the maximum step size 2026-02-02 12:03:16 +08:00
zhayujie
c1bcedfb51 Merge pull request #2652 from zhayujie/feat-cow-agent
feat: cow super agent
2026-02-02 11:59:45 +08:00
zhayujie
d085a3c7d7 fix: dingtalk picture and file process 2026-02-02 11:58:19 +08:00
zhayujie
46fa07e4a9 feat: optimize agent configuration and memory 2026-02-02 11:48:53 +08:00
zhayujie
a8d5309c90 feat: add skills and upgrade feishu/dingtalk channel 2026-02-02 00:42:39 +08:00
zhayujie
77c2bfcc1e fix: scheduler in feishu 2026-02-01 19:40:27 +08:00
zhayujie
4c8712d683 feat: key management and scheduled task tools 2026-02-01 19:21:12 +08:00
zhayujie
d337140577 feat: optimize editing tools 2026-02-01 17:46:43 +08:00
zhayujie
99c273a293 fix: write too long file 2026-02-01 17:29:48 +08:00
zhayujie
85578a06b7 fix: memory edit bug 2026-02-01 17:13:32 +08:00
zhayujie
6f70a8efda fix: fts5 not available bug 2026-02-01 17:08:02 +08:00
zhayujie
c693e39196 feat: improve the memory system 2026-02-01 17:04:46 +08:00
zhayujie
4a1fae3cb4 chore: the bot directory was changed to models 2026-02-01 15:21:28 +08:00
zhayujie
08b592816b Merge pull request #2651 from zhayujie/feat-cow-agent
fix: optimize suggestion words and retries
2026-02-01 14:11:53 +08:00
zhayujie
0e85fcfe51 fix: optimize suggestion words and retries 2026-02-01 14:00:28 +08:00
zhayujie
8ef788e799 Merge pull request #2650 from zhayujie/feat-cow-agent
feat: cow agent
2026-02-01 13:14:00 +08:00
zhayujie
645c8899b1 fix: remove tool 2026-02-01 12:38:00 +08:00
zhayujie
9bf5b0fc48 fix: tool call failed problem 2026-02-01 12:31:58 +08:00
zhayujie
07959a3bff fix: first conversation bug 2026-01-31 17:53:12 +08:00
zhayujie
86a6182e41 fix: add logs 2026-01-31 17:29:32 +08:00
zhayujie
89e229ab75 feat: prompt optimization 2026-01-31 17:13:55 +08:00
zhayujie
624917fac4 fix: memory and path bug 2026-01-31 16:53:33 +08:00
zhayujie
489894c61d fix: path prompt 2026-01-31 16:05:20 +08:00
zhayujie
ac87979cb7 fix: bash prompt optimize 2026-01-31 16:01:37 +08:00
zhayujie
5fd3e85a83 feat: add llm retry 2026-01-31 15:53:24 +08:00
zhayujie
0e53ba4311 fix: gemini error process 2026-01-31 14:59:55 +08:00
Saboteur7
3ce57ef851 Merge pull request #2648 from zhayujie/feat-cow-agent
feat: cow agent core
2026-01-31 13:14:05 +08:00
zhayujie
481570d059 fix: invalid syntax 2026-01-31 13:07:51 +08:00
zhayujie
04442b7ddb fix: prompt optimization and gemini fix 2026-01-31 13:02:58 +08:00
zhayujie
e1a71723bc fix: gemini support api base 2026-01-31 12:50:21 +08:00
zhayujie
f044fb8b47 feat: add feishu websocket mode 2026-01-31 12:32:41 +08:00
zhayujie
e3350d5bec feat: optimize prompts and skill creator 2026-01-31 11:20:57 +08:00
saboteur7
8a69d4354e feat: Optimize the first dialogue and memory 2026-01-30 19:10:37 +08:00
saboteur7
dd6a9c26bd feat: support skills creator and gemini models 2026-01-30 18:00:10 +08:00
saboteur7
49fb4034c6 feat: support skills 2026-01-30 14:27:03 +08:00
saboteur7
5a466d0ff6 fix: long-term memory bug 2026-01-30 11:31:13 +08:00
saboteur7
bb850bb6c5 feat: personal ai agent framework 2026-01-30 09:53:46 +08:00
saboteur7
25cf6823d0 fix: remove useless files 2026-01-29 20:00:23 +08:00
vision
7e12744b8b Merge pull request #2634 from 6vision/master
update: delet some banwords
2025-10-22 18:32:10 +08:00
vision
8f2432e0f8 Merge pull request #2632 from 6vision/banwords-delet
Update: delet some bangwords
2025-10-22 17:00:26 +08:00
6vision
94451db638 update: delet some bangwords 2025-10-22 16:58:40 +08:00
zhayujie
f8b8eeec3a Merge pull request #2622 from 6vision/support_gpt-5
feat:Support for the GPT-5 series models
2025-08-08 10:47:49 +08:00
6vision
a4260cc5de feat:Support for the GPT-5 series models 2025-08-08 10:24:15 +08:00
zhayujie
8c1622798b Merge pull request #2612 from 6vision/master
docs: expand channel usage
2025-06-29 22:41:10 +08:00
6vision
e75bed1be5 docs: update README.md 2025-06-29 18:34:49 +08:00
vision
8c0517de0f Merge branch 'zhayujie:master' into master 2025-06-29 17:49:44 +08:00
6vision
94e78365a5 docs: expand channel usage 2025-06-29 17:49:26 +08:00
vision
29c056ca65 Merge pull request #2611 from 6vision/web_channel_update
refactor: improve logger message to use dynamic port
2025-06-29 17:20:00 +08:00
vision
d8c57f27db Merge branch 'zhayujie:master' into master 2025-06-29 17:17:59 +08:00
6vision
3cac2bad55 refactor: improve logger message to use dynamic port 2025-06-29 17:12:28 +08:00
vision
e7905fdf49 docs: expand channel usage
Improve channel integration docs
2025-06-26 19:27:11 +08:00
vision
a492bc2242 docs: expand channel usage 2025-06-26 19:24:39 +08:00
zhayujie
e663364f64 Merge pull request #2609 from 6vision/master
docs: update README.md
2025-06-24 20:45:28 +08:00
6vision
ef6466e26f docs: update README.md 2025-06-24 20:33:52 +08:00
6vision
7fcbbf1cdc docs: update README.md 2025-06-24 17:24:01 +08:00
6vision
ec6ad51ff7 docs: update README.md 2025-06-24 17:20:53 +08:00
zhayujie
1e80c59448 docs: update README.md 2025-06-15 17:44:44 +08:00
zhayujie
e48cb4fd5d chore: remove useless files 2025-06-15 17:33:40 +08:00
zhayujie
7c9fbd2625 docs: improve the readme document 2025-06-15 17:31:41 +08:00
zhayujie
0f504415fb docs: optimize the documentation 2025-06-15 12:42:05 +08:00
zhayujie
4998c324d1 fix: remove chat prefix in web channel 2025-06-07 15:30:22 +08:00
zhayujie
fb5fbe76e8 docs: update docs 2025-05-30 17:06:40 +08:00
zhayujie
223b0bfc88 docs: update README.md 2025-05-30 17:05:04 +08:00
vision
51094a68c8 feat: update Gemini models 2025-05-25 17:44:28 +08:00
6vision
83cb1ec911 feat: update Gemini models 2025-05-25 17:39:17 +08:00
vision
a77e4bfb7a Merge pull request #2596 from 6vision/master
feat: support claude-4-opus and claude-4-sonnet models
2025-05-23 17:19:05 +08:00
6vision
654c177333 docs: update readme.md 2025-05-23 17:12:58 +08:00
vision
b92669ba33 Merge branch 'zhayujie:master' into master 2025-05-23 17:08:23 +08:00
6vision
f2e4f6607d feat:support claude-4-opus and claude-4-sonnet models 2025-05-23 17:07:46 +08:00
zhayujie
5ec909c565 docs: update readme.md 2025-05-23 16:54:58 +08:00
vision
a84f31d54a Merge pull request #2592 from thzjy/fix-1037-baidu-voice
fix: 修复百度语音合成长文处理
2025-05-23 15:14:11 +08:00
vision
e0dd21406d Update baidu_voice.py 2025-05-23 15:13:28 +08:00
vision
72f5f7a0b8 Merge pull request #2565 from dhyarcher/master
Fix access_token expiration handling by processing expires_in and ref…
2025-05-23 14:31:16 +08:00
zhayujie
e3d20085c5 Merge pull request #2595 from zhayujie/feat-agent-plugin
feat: add agent plugin and optimize web channel
2025-05-23 11:59:54 +08:00
zhayujie
8bf1aef801 docs: add web channel and agent plugin docs 2025-05-23 11:56:41 +08:00
Saboteur7
5f7ade20dc feat: web channel support multiple message and picture display 2025-05-23 00:43:54 +08:00
Saboteur7
70d7e52df0 feat: 优化agent插件及webUI对话页面 2025-05-22 17:31:32 +08:00
zhayujie
8e6afa5614 Merge pull request #2593 from zhayujie/feat-web-ui
feat: web ui channel optimization
2025-05-19 11:48:34 +08:00
Saboteur7
a1ae3804e3 feat: web ui channel optimization 2025-05-19 11:41:20 +08:00
thzjy
814ce7a43b fix: 修复百度语音合成长文处理 2025-05-18 17:32:17 +08:00
Saboteur7
628f75009e Merge pull request #2591 from zhayujie/feat-web-ui
feat: new web UI channel
2025-05-18 16:57:57 +08:00
Saboteur7
03fc8c1202 feat: web ui channel update 2025-05-18 16:56:50 +08:00
Saboteur7
8c8e996c87 feat: web channel optimization 2025-05-18 15:23:02 +08:00
vision
933bb0b1fb Merge pull request #2579 from 6vision/web_channel_bug_fix
Fix: fix 'NoneType' object does not support item assignment error (#2525)
2025-04-20 17:22:54 +08:00
6vision
931fbc3eb5 fix: fix 'NoneType' object does not support item assignment error (#2525)
### Problem Description
When `context` is `None`, it should not be used for assignment operations.

### Solution
Adjusted the code logic to ensure that `context` is not `None` before performing any item assignment.
2025-04-20 16:27:44 +08:00
Saboteur7
3db5e70a3d docs: Update README.md 2025-04-15 09:54:24 +08:00
zhayujie
7b19b70d90 Merge pull request #2575 from 6vision/master
feat: support gpt-4.1 series models
2025-04-15 09:25:02 +08:00
6vision
99b8103d70 feat: support gpt-4.1 series models 2025-04-15 09:15:13 +08:00
vision
7167310ccd Merge pull request #2571 from 6vision/master
update readme and adjust some dependency packages.
2025-04-11 16:04:55 +08:00
6vision
263667a2d4 update 2025-04-11 16:03:22 +08:00
6vision
d5cef291f6 update readme and adjust some dependency packages. 2025-04-11 15:50:28 +08:00
vision
c8d166e833 Merge pull request #2544 from wahahage/master
新增腾讯语音
2025-04-11 14:14:55 +08:00
vision
6e25782d8b docs: Delete channel/wechat/README.md 2025-04-11 10:23:05 +08:00
vision
c3127f7e84 Merge pull request #2562 from josephier/support_wcferry
feat: add support for WeChat integration via the wcferry protocol
2025-04-09 18:51:01 +08:00
dhyarcher
7b90fb018b Fix access_token expiration handling by processing expires_in and refreshing the token when expired;修复 access_token 过期处理,添加对 expires_in 的处理并在过期时刷新 token; 2025-04-03 10:13:57 +08:00
josephier
e8bc173cd7 doc: Update and rename readme.md to README.md 2025-03-31 19:39:01 +08:00
josephier
4d1cdf5207 doc:update git url 2025-03-30 16:20:04 +08:00
josephier
57a473364e Merge branch 'zhayujie:master' into master 2025-03-30 15:14:45 +08:00
vision
40b62e9d38 Add support for ModelScope API-Inference
Add support for ModelScope API-Inference
2025-03-30 15:12:29 +08:00
gaojia
ead5f9926b 删除funasr 2025-03-27 10:13:38 +08:00
gaojia
814b6753c2 删除配置文件中的注释 2025-03-26 17:33:39 +08:00
gaojia
ce505251f8 修改配置文件及文件夹名称 2025-03-26 10:01:41 +08:00
yrk
5d2a987aaa Update README.md 2025-03-25 10:38:32 +08:00
yanrk123
4d67e08723 Fix the issue with Chinese description in drawing. 2025-03-18 14:11:22 +08:00
yanrk123
2e71dd5fe2 Fix bug in modelscope_bot.py 2025-03-18 09:47:39 +08:00
yanrk123
c3b9643227 Modify ms_bot.py 2025-03-17 15:46:50 +08:00
josephier
0aad5dc2b7 Update wcferry version
Update wcferry version
2025-03-16 19:16:59 +08:00
yanrk123
cec900168f Modify model list 2025-03-14 13:56:00 +08:00
josephier
f9b1c403d5 docs: Update readme.md 2025-03-12 20:33:35 +08:00
yrk111222
9024b602f5 Update modelscope_bot.py 2025-03-12 16:15:40 +08:00
yanrk123
c139fd9a57 support stream mode for QwQ-32B 2025-03-12 15:45:52 +08:00
yrk111222
e299b68163 Update const.py 2025-03-11 16:48:37 +08:00
yanrk123
7777a53a82 Add supported model list 2025-03-11 16:34:43 +08:00
yanrk123
3e185dbbfe Add support for ModelScope API 2025-03-11 11:12:57 +08:00
josephier
e8a32af369 docs: add README for wx channel based on wcferry
docs: add README for wx channel based on wcferry
2025-03-10 20:36:41 +08:00
josephier
7b0ec6687e docs:add README for WechatFerry channel 2025-03-10 20:29:37 +08:00
gaojia
ec1c6c7b92 新增腾讯语音 2025-03-04 09:56:26 +08:00
josephier
8dfaa86760 chore: remove incomplete features for wchatferry 2025-02-14 00:41:31 +08:00
josephier
323aebd1be feat: add support for WeChat integration via the wchatferry 2025-02-14 00:25:09 +08:00
Saboteur7
436c038a2f fix: temporarily remove unavailable channels 2025-02-05 12:25:30 +08:00
vision
ccd50ec6c0 Merge pull request #2485 from 6vision/master
feat: Add support for deepseek-chat and deepseek-reasoner models
2025-02-04 10:29:24 +08:00
6vision
a7541c2c0f feat: Support #model directive to set model to deepseek-chat and deepseek-reasoner 2025-02-03 21:23:05 +08:00
Saboteur7
c3a57d756c fix: remove channel restrictions 2025-01-31 00:27:20 +08:00
Saboteur7
aa300a4c98 fix: temporarily close the wx channel to prevent account ban 2025-01-17 17:24:42 +08:00
vision
83ea7352b9 Merge pull request #2430 from PJ-568/master
fix: domain type of xunfei lite
2025-01-15 20:03:43 +08:00
Saboteur7
9050712cd8 Update README.md 2024-12-28 16:28:35 +08:00
Saboteur7
8d92fdbb6e Update README.md 2024-12-28 16:27:31 +08:00
zhayujie
a2442ec1b9 Merge pull request #2435 from 6vision/master
fix: resolve display issue for replies containing only image URLs
2024-12-27 00:02:55 +08:00
vision
71662c9cd9 Merge branch 'zhayujie:master' into master 2024-12-26 23:17:21 +08:00
vision
54ff5dbcc2 fix: resolve display issue for replies containing only URLs 2024-12-26 23:16:05 +08:00
zhayujie
4ab7bd3b51 Merge pull request #2431 from 6vision/support-GiteeAI
feat: add gitee-ai models that are compatible with openai format
2024-12-24 20:42:17 +08:00
vision
ef3c61a297 update readme 2024-12-24 19:57:26 +08:00
vision
abf79bf60c add gitee-ai model resources that are compatible with openai format 2024-12-21 17:24:32 +08:00
PJ568
5d3cecd926 fix: domain type of xunfei lite
Reference: [Web API 接口说明](https://www.xfyun.cn/doc/spark/Web.html#_1-%E6%8E%A5%E5%8F%A3%E8%AF%B4%E6%98%8E)的 `parameter.chat部分`。
2024-12-20 14:46:25 +08:00
Saboteur7
16324e7283 Merge pull request #2407 from ayasa520/fix_reloadp
fix(plugin): fix reloadp command not taking effect
2024-12-13 15:39:33 +08:00
Saboteur7
9f7e2e1572 Merge pull request #2413 from ayasa520/fix-scanp
fix: Memory leak caused by scanp command due to handler's reference of plugin instance
2024-12-13 14:57:22 +08:00
vision
857ce1d530 Merge pull request #2398 from stonyz/web-channel
增加web channel
2024-12-13 11:45:01 +08:00
vision
be0d72775d Merge pull request #2423 from 6vision/reedme_update_docker_deploy
update readme
2024-12-13 11:41:17 +08:00
vision
7832a2495b Merge pull request #2422 from printlndarling/master
add: add gemini-2.0-flash-exp model
2024-12-13 11:35:26 +08:00
6vision
0506b7f735 update readme 2024-12-13 11:25:36 +08:00
繁星_逐梦
4c0b7942f0 add: gemini-2.0-flash-exp model 2024-12-12 22:22:14 +08:00
繁星_逐梦
651c840c4a add: gemini-2.0-flash-exp model 2024-12-12 22:19:13 +08:00
rikka
2a351ca415 fix(reloadp): clear handlers when reloading plugin to avoid memory leaks 2024-12-05 00:33:00 +08:00
rikka
49b7106d71 fix: Memory leak caused by scanp command due to handler's reference to plugin instance.
close #2412
2024-12-03 22:39:56 +08:00
zhayujie
8bf633f539 Merge pull request #2408 from 6vision/fix-summary-image
图像识别逻辑优化
2024-12-02 21:53:52 +08:00
6vision
0f8efcb4b0 图像识别逻辑优化 2024-12-02 21:16:59 +08:00
Rikka
c567641c5c fix(plugin): fix reloadp command not taking effect
- Use write_plugin_config() instead of directly modifying plugin_config dict
- Add remove_plugin_config() to clear plugin config before reload
- Update plugins to use pconf() and write_plugin_config() for better config management
2024-12-02 16:38:21 +08:00
vision
bdc3820382 Merge pull request #2405 from 6vision/role-plugin-linkai
Linkai bot is compatible with the role plugin.
2024-12-02 12:16:30 +08:00
6vision
33a69a7907 Linkai bot is compatible with the role plugin. 2024-12-02 12:13:26 +08:00
vision
a4d0e9bbc3 Merge pull request #2401 from 6vision/plugins_source_update
插件列表更新
2024-11-29 11:09:27 +08:00
6vision
afc753e1d2 插件列表更新 2024-11-29 11:07:16 +08:00
zhayujie
e641a41224 Update README.md 2024-11-28 21:48:42 +08:00
vision
79305c0632 Merge pull request #2400 from 6vision/readme_update
readme update
2024-11-28 12:59:00 +08:00
6vision
ef2ce3f09d 说明文档更新 2024-11-28 12:41:00 +08:00
Stony
71c18c04fc 增加web channel 2024-11-27 08:53:13 +08:00
Saboteur7
cf84e57f81 fix: add exception handling 2024-11-15 11:58:10 +08:00
vision
9421d44579 Merge pull request #2373 from 6vision/summary_app_code
Buy using app code, supports custom summary prompt .
2024-11-07 20:16:53 +08:00
6vision
5cd2ae8cc8 Summary supports app_code 2024-11-06 21:45:03 +08:00
vision
22d67b3a59 Merge pull request #2364 from 6vision/1031
1.7.3 release readme
2024-10-31 14:44:55 +08:00
6vision
e102cbb8c4 1.7.3 release readme 2024-10-31 14:39:11 +08:00
vision
d90eeb7ee4 Merge pull request #2363 from 6vision/linkai_plugin
Summary and MJ  support can be configured through LinkAI platform app plugins
2024-10-31 11:50:53 +08:00
vision
1989d53031 Merge pull request #2361 from 6vision/claude_model_update
Claude model update
2024-10-31 11:50:11 +08:00
6vision
04ef0907b4 Summary and MJ support can be configured through LinkAI platform app plugins. 2024-10-31 11:15:44 +08:00
6vision
517b43561c Merge branch 'claude_model_update' of git@github.com:6vision/chatgpt-on-wechat.git into claude_model_update 2024-10-28 00:32:46 +08:00
6vision
ccb8c7227f Support setting base URL and proxy for Claude model. Also support reset command. 2024-10-28 00:32:05 +08:00
vision
9fbfeeb04f Merge branch 'zhayujie:master' into claude_model_update 2024-10-27 23:43:16 +08:00
6vision
8b753a5a1f Signed-off-by: 6vision <vision_wangpc@sina.com> 2024-10-27 21:44:06 +08:00
6vision
d25cab0627 Claude model supports system prompts. 2024-10-27 21:37:58 +08:00
6vision
84da0a8a35 feat:update claude-35-sonnet model 2024-10-24 20:57:03 +08:00
vision
6f665cffba Merge pull request #2354 from 6vision/group_patpat_note
fix: group patpat notes
2024-10-24 19:53:18 +08:00
6vision
aea8ac2e97 Signed-off-by: 6vision <vision_wangpc@sina.com> 2024-10-24 19:48:50 +08:00
vision
8418fa7b45 Merge pull request #2344 from 6vision/markdown_format_display
Optimize markdown format display
2024-10-21 10:27:03 +08:00
6vision
9cc4d0ee07 Optimize markdown format display 2024-10-21 10:23:39 +08:00
Saboteur7
da60831c44 fix: fixed the version of qrcode dependency 2024-10-19 16:14:49 +08:00
Saboteur7
0773174a20 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-10-19 15:55:04 +08:00
Saboteur7
70e007d8ca fix: try to solve the unresponsiveness problem 2024-10-19 15:49:57 +08:00
vision
fcc4d02c2f Merge pull request #2339 from 6vision/master
Optimize Gemini model character statistics
2024-10-14 12:19:27 +08:00
vision
f4a5f00593 Merge branch 'zhayujie:master' into master 2024-10-14 12:18:33 +08:00
6vision
1170ed6566 Optimize Gemini model character statistics 2024-10-14 12:17:10 +08:00
zhayujie
883f0d449b Merge pull request #2317 from 6vision/master
feat: add install.sh and run.sh
2024-09-26 16:43:56 +08:00
6vision
f4c62e7844 update install.sh url 2024-09-26 16:43:12 +08:00
6vision
f0d212a9d2 Merge branch 'master' of github.com:6vision/chatgpt-on-wechat 2024-09-26 16:02:19 +08:00
6vision
76a8974034 update run.sh 2024-09-26 16:01:44 +08:00
vision
0614e822f4 Merge branch 'zhayujie:master' into master 2024-09-26 13:07:45 +08:00
vision
6f682c9a2e Merge pull request #2311 from cmgzn/master
fix: gemini doesn't receive system messages...
2024-09-26 13:04:47 +08:00
6vision
a9fdbc31c5 update date 2024-09-26 13:02:38 +08:00
cmgzn
086fdb5856 fix gemini logger 2024-09-26 02:49:52 +01:00
6vision
63c8ef4f17 feat: install.sh and run.sh 2024-09-26 00:34:52 +08:00
zhayujie
736f6523c7 Merge branch 'master' into master 2024-09-25 23:11:13 +08:00
vision
8b0b360d25 Merge pull request #2288 from KuroIVeko/patch-3
Support more models from Zhipu AI
2024-09-25 22:28:16 +08:00
vision
80b84e2ee6 Merge pull request #2277 from KuroIVeko/patch-1
Lower Gemini's safety thresholds
2024-09-25 22:24:20 +08:00
vision
b5b7d86f7b Merge pull request #2278 from 6vision/moonshoot
fix: "model":"mooshoot", which defaults to "moonshot-v1-32k".
2024-09-25 22:10:40 +08:00
cmgzn
f20d704390 fix: gemini doesn't receive system messages; change session to gpt method, add system messages as user messages to the gemini, and logging historical messages 2024-09-20 09:10:21 +01:00
vision
e4e1e2e944 Merge pull request #2306 from 6vision/master
fix: Linkai voice configuration
2024-09-18 19:43:41 +08:00
vision
6bc7eeb4cc Merge branch 'zhayujie:master' into master 2024-09-18 19:41:23 +08:00
6vision
656ed5de7b fix: LinkAI voice onfiguration 2024-09-18 19:40:51 +08:00
zhayujie
a11d695c78 Merge pull request #2300 from 6vision/master
feat: support o1-preview and o1-mini model
2024-09-13 10:50:04 +08:00
6vision
c4f9acd5c5 update 2024-09-13 10:48:51 +08:00
6vision
5ef929dc42 o1 model support #model 2024-09-13 10:21:38 +08:00
6vision
c8cf27b544 feat: support o1-preview and o1-mini model 2024-09-13 10:13:23 +08:00
vision
bb5ecfc398 Merge pull request #2298 from 6vision/error_print_ascii_windows
Handle ASCII QR code print error on Windows
2024-09-11 22:35:30 +08:00
6vision
c91e7c35bb Remove unused imports 2024-09-11 22:34:33 +08:00
6vision
532d56df2d Handle ASCII QR code print error on Windows 2024-09-11 22:30:25 +08:00
KurolVeko
111ad44029 Update const.py 2024-09-05 11:07:06 +08:00
KurolVeko
6b02bae957 Update bridge.py 2024-09-05 10:59:57 +08:00
vision
6831743416 Merge pull request #2286 from 6vision/gpt
feat: support gpt-4o-2024-08-06 model
2024-09-04 18:44:08 +08:00
6vision
63e2f42636 feat: support gpt-4o-2024-08-06 model 2024-09-04 18:39:29 +08:00
6vision
f6e6805453 fix: "model":"mooshoot", which defaults to "moonshot-v1-32k". 2024-08-31 16:09:10 +08:00
KurolVeko
ad77ad8f2b Lower Gemini's safety thresholds
Gemini's default safety thresholds are set too high, resulting in frequent censorship of generated text. I have lowered the thresholds for all four safety categories according to Google's documentation.
2024-08-30 17:00:51 +08:00
Saboteur7
469524e8ae Merge pull request #2206 from VanJohnPK/master
fix azure voice error 修复Azure语音服务报错问题
2024-08-29 11:33:49 +08:00
Saboteur7
f4f55d5dfd Merge pull request #2247 from byang822/abacusoft-alex
wenxin character model supports prompt
2024-08-29 11:31:45 +08:00
Saboteur7
c248d0f3f4 Merge pull request #2262 from 6vision/cancel_wecom_subscribe
Cancel subscribe_msg of wechatcomapp channel
2024-08-29 11:31:04 +08:00
Saboteur7
648a04b513 Merge pull request #2265 from 6vision/feat0825
Support configuration whether to be @ in group chat.
2024-08-29 11:30:46 +08:00
vision
bdc86c16ec Merge pull request #2268 from 6vision/xunfei_system_prompt
Xunfei supports system prompt(character_desc).
2024-08-27 20:46:07 +08:00
6vision
21efd17c17 Xunfei supports system prompt(character_desc). 2024-08-25 22:22:29 +08:00
Saboteur7
aaa75e7b62 Merge pull request #2267 from 6vision/master
Optimize the welcome message for new members.
2024-08-25 17:16:11 +08:00
6vision
6d0cef3152 Optimize the welcome message for new members. 2024-08-25 17:10:44 +08:00
Saboteur7
c18472289f Merge pull request #2207 from Abyss-Seeker/master
支持更多语言(英语)的微信客户端
2024-08-25 16:10:33 +08:00
6vision
02b7c70a81 Support configuration whether to be @ in group chat. 2024-08-25 15:13:25 +08:00
6vision
4eaa2b93c6 Cancel subscribe_msg of wechatcomapp channel 2024-08-22 22:03:04 +08:00
darkVinci
d347905373 Merge pull request #1 from zhayujie/master
merge 15 commits
2024-08-21 11:21:31 +08:00
vision
f495213b2c Merge pull request #2237 from 6vision/fix_role
Optimize log information printing
2024-08-17 17:01:08 +08:00
Alex Yang
9b125913ae wenxin character model supports prompt 2024-08-16 14:58:17 +08:00
6vision
da81f05804 Optimize log information printing 2024-08-14 23:03:57 +08:00
Abyss-Seeker
9a371a4d4d Update wechat_message.py
加入更多英文适配(通过QR code加入群聊)
2024-08-06 23:30:32 +08:00
Abyss-Seeker
1e92828f1a 支持更多语言(英语)
加入了notes_join_group,notes_exit_group,notes_patpat列表,可以在加入群聊,退出群聊和拍一拍消息中匹配更多的字符。在此完成了英语(invited, removed, tickled)的匹配,使如果微信语言是英文的话也可以正常识别啦!同时,以后也可以通过加list和判断语句的方式支持更多语言!
2024-08-04 10:14:23 +08:00
Saboteur7
7e724b3fa3 Update README.md 2024-08-02 16:06:25 +08:00
vision
3f5b976a87 Merge pull request #2181 from 6vision/webp_images
Support images in webp format.
2024-08-02 13:47:39 +08:00
vision
49f2339cc2 Merge pull request #2203 from 6vision/fix_issues
Fix issues
2024-08-02 13:30:14 +08:00
vision
29f1699de8 Merge pull request #2198 from 6vision/update_spark
Support Spark4.0 Ultra model, optimize model configuration.
2024-08-02 01:38:15 +08:00
6vision
c415485801 Support Spark4.0 Ultra model, optimize model configuration. 2024-08-01 17:57:48 +08:00
zhayujie
6937673472 Merge pull request #2193 from 6vision/fix_tool
Default close tool plugin.
2024-07-31 14:09:33 +08:00
6vision
c4f10fe876 fix: Default close tool plugin. 2024-07-31 00:01:56 +08:00
6vision
55ca652ad8 Default close tool plugin. 2024-07-30 23:14:23 +08:00
Zheng
3effd5afd1 fix azure voice error 2024-07-30 17:10:02 +08:00
Saboteur7
000c2029de fix: remove some tools 2024-07-30 12:35:12 +08:00
Saboteur7
ab88e3af06 fix: remove some default tools 2024-07-30 12:15:35 +08:00
6vision
b544a4c954 fix: Use default expiration time for ExpiredDict if not set in config 2024-07-29 20:14:41 +08:00
6vision
baff5fafec Optimization 2024-07-28 00:03:16 +08:00
6vision
1673de73ba Role plugin supports more bots. 2024-07-25 22:58:57 +08:00
6vision
e68936e36e Support images in webp format. 2024-07-25 01:19:44 +08:00
6vision
7dbd195e45 Support images in webp format. 2024-07-25 01:12:53 +08:00
vision
3dc22f98bf Merge pull request #2177 from 6vision/Opti-azure-dalle
Optimize error messages when using Azure Dalle
2024-07-24 12:38:13 +08:00
6vision
805e870c18 Optimize error messages when using Azure Dalle 2024-07-24 00:06:18 +08:00
Saboteur7
de2c031797 docs: update readme 2024-07-19 15:46:19 +08:00
Saboteur7
3aa571aa1b Merge pull request #2163 from 6vision/wechatcom_app
Ensure compatibility for /wxcomapp URL with trailing slash
2024-07-19 15:38:20 +08:00
Saboteur7
3e4969efe6 Merge branch 'master' into wechatcom_app 2024-07-19 15:38:08 +08:00
Saboteur7
446e94df76 Merge pull request #2164 from 6vision/mini_bot
Support gpt-4o-mini model
2024-07-19 15:37:30 +08:00
Saboteur7
5b26066a4c Merge pull request #2154 from distiny-cool/ali_api
增加了使用阿里云进行语音识别的引擎
2024-07-19 15:37:05 +08:00
Saboteur7
8a80de5c3f Merge pull request #2141 from Yanyutin753/new
PictureChange插件功能升级
2024-07-19 15:36:02 +08:00
6vision
52a490c87e Support gpt-4o-mini model 2024-07-19 11:04:45 +08:00
6vision
29490741fd Ensure compatibility for /wxcomapp URL with trailing slash 2024-07-18 23:21:45 +08:00
kody
f0e416455f 增加了使用阿里云进行语音识别的引擎 2024-07-15 22:03:31 +08:00
vision
f7a2c97943 Merge pull request #2153 from 6vision/update_linkaibot
support more file types.
2024-07-15 19:09:05 +08:00
6vision
993853757b Linkai bot supports more file types. 2024-07-15 18:57:58 +08:00
6vision
a3abfb987d update 2024-07-15 18:50:38 +08:00
Saboteur7
2711fa1b1b Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-07-08 19:00:03 +08:00
Saboteur7
1f7afaba07 fix: client cmd config bug 2024-07-08 18:57:27 +08:00
Clivia
e02c8bff81 PictureChange插件功能升级 2024-07-08 17:58:59 +08:00
Saboteur7
22391ba1a5 Update README.md 2024-07-05 15:45:54 +08:00
Saboteur7
a05781ec19 Merge pull request #2103 from 6vision/claude-3.5-sonnet
feat: support claude-3.5-sonnet model
2024-07-05 14:39:17 +08:00
Saboteur7
f898ed6a2a Merge branch 'master' into claude-3.5-sonnet 2024-07-05 14:32:45 +08:00
Saboteur7
e6d0a15b54 Merge pull request #2110 from He0607/新增高铁(火车)票查询插件
新增高铁(火车)票查询插件
2024-07-05 14:31:15 +08:00
Saboteur7
49cff026e2 Merge pull request #2113 from 6vision/update-0626
Update parameter descriptions for clarity
2024-07-05 14:26:33 +08:00
Saboteur7
08f0023cfd Merge pull request #2124 from 6vision/update_gemini_model
Update gemini 1.5model
2024-07-05 14:26:13 +08:00
Saboteur7
e311466ee6 Merge pull request #2128 from Maroon9/fix-docker-compose
fix:在docker-compose.yml文件中增加时区设置
2024-07-05 14:25:56 +08:00
wanxiangze
56789e68d7 fix:在docker-compose.yml文件中增加时区设置 2024-07-05 10:18:21 +08:00
6vision
87525bb383 update gemini model 2024-07-04 01:44:53 +08:00
6vision
bb2880191a update gemini model 2024-07-04 01:22:55 +08:00
6vision
4f1acf26d6 Merge branch 'update-0626' of https://github.com/6vision/chatgpt-on-wechat into update-0626 2024-06-27 21:11:14 +08:00
6vision
fc2d6b21ac update 2024-06-27 21:09:54 +08:00
zhayujie
b9e84fefbd Merge pull request #2114 from 6vision/fix_dingtalk_group_chat
fix: dingtalk channel group chat bug
2024-06-27 10:29:51 +08:00
6vision
91f5ffb2d9 Correct the log information 2024-06-26 22:34:35 +08:00
6vision
70ff2341cb fix:dingtalk channel group chat bug 2024-06-26 22:10:58 +08:00
vision
74eed93497 Merge branch 'zhayujie:master' into update-0626 2024-06-26 15:15:32 +08:00
6vision
d02e26c014 Update parameter descriptions for clarity 2024-06-26 15:14:29 +08:00
Wu_Cool
523cade7c3 新增高铁(火车)票查询插件 2024-06-26 09:13:40 +08:00
Wu_Cool
e22c183ca9 新增高铁(火车)票查询插件 2024-06-26 09:11:04 +08:00
vision
3afd99da30 Merge pull request #2106 from 6vision/fix_sensitive
Fix TypeError in config drag_sensitive function
2024-06-24 22:04:56 +08:00
6vision
f44979f983 Fix TypeError in config drag_sensitive function 2024-06-24 21:57:58 +08:00
6vision
095f9cc108 feat: support claude-3.5-sonnet model 2024-06-24 11:20:50 +08:00
zhayujie
1089076fce Merge pull request #2044 from Wang-zhechao/add-plugins-solitaire
添加微信接龙插件
2024-06-20 20:41:37 +08:00
Saboteur7
cad3b691a9 Update README.md 2024-06-20 16:09:19 +08:00
Saboteur7
bac21426d3 fix: minimax model list 2024-06-20 15:26:16 +08:00
Saboteur7
c4a35314cd Merge pull request #2071 from lmy668/master
feat#add minmax model
2024-06-20 15:21:41 +08:00
Saboteur7
7090722565 Merge branch 'master' into master 2024-06-20 15:21:20 +08:00
Saboteur7
6d972c7c18 Merge pull request #2046 from 6vision/update_mode_list
Update mode list
2024-06-20 15:09:05 +08:00
Saboteur7
6961a88feb Merge pull request #2060 from k8scat/remove-unused-import
remove unused import
2024-06-20 15:06:44 +08:00
6vision
c41ec13984 fix terminal channel 2024-06-15 16:34:32 +08:00
6vision
ca8e06e562 兼容符合openai请求格式的三方服务,根目录的config.json里增加配置"bot_type": "chatGPT" 2024-06-13 16:43:03 +08:00
limy26
200cd33a8e feat#add minmax model 2024-06-12 19:30:24 +08:00
6vision
1da7991c65 fix 2024-06-08 00:09:05 +08:00
K8sCat
fdfb7e369a remove unused import
Signed-off-by: K8sCat <k8scat@gmail.com>
2024-06-07 14:48:54 +08:00
6vision
c2b01cc957 Add configuration to plugin configuration template. 2024-06-05 17:10:08 +08:00
6vision
5de8e94bb4 update readme 2024-06-05 01:25:03 +08:00
6vision
7a2c15d912 Update model list 2024-06-05 00:44:08 +08:00
Wang Zhechao
70344dd214 添加微信接龙插件 2024-06-04 22:39:59 +08:00
zhayujie
405372d1a7 Merge pull request #1753 from MasterKeee/master
新增公众号的回复视频类型
2024-06-04 14:25:11 +08:00
Saboteur7
b8c5174da5 docs: xunfei voice comment 2024-06-04 13:49:44 +08:00
Saboteur7
1f6f9103d9 docs: update README.md 2024-06-04 12:50:59 +08:00
Saboteur7
6431487c7a fix: drag sensitive bug 2024-06-04 12:02:23 +08:00
Saboteur7
8b2d1189db Merge pull request #1999 from njnuko/voice-xunfei
add xunfei voice
2024-06-04 11:43:55 +08:00
Saboteur7
b777f27cb7 chore: remove some xunfei voice log 2024-06-04 11:42:05 +08:00
Saboteur7
b31c3b124a Merge pull request #1972 from Undertone0809/zeeland/add-logger-drag-sensitive
feat: add logger drag sensitive
2024-06-04 11:26:05 +08:00
Saboteur7
fa1e965fba feat: add dingtalk card switch 2024-06-04 11:23:45 +08:00
Saboteur7
91dc8b4d58 Merge pull request #1994 from baojingyu/feat-05-17
钉钉接入增加流式输出支持,语音、图片或富文本消息接收
2024-06-04 10:53:02 +08:00
Saboteur7
6d16ea8830 Update requirements.txt 2024-06-04 10:49:17 +08:00
Saboteur7
7db4253264 Update chat_channel.py 2024-06-04 10:47:56 +08:00
Saboteur7
4d2b7d9bf9 Update chat_channel.py 2024-06-04 10:47:05 +08:00
Saboteur7
8f6f4acb88 Update chat_channel.py 2024-06-04 10:43:19 +08:00
Saboteur7
f20d84cb37 Merge pull request #1809 from whw23/master
Azure OpenAI Dalle fix
2024-06-03 22:46:07 +08:00
Saboteur7
afbdf1d5d5 Merge pull request #2002 from 6vision/time_check
fix: time_check model
2024-06-03 22:40:01 +08:00
Haowei
bc8364d594 Merge branch 'zhayujie:master' into master 2024-05-25 23:34:47 +08:00
vision
c8d388f70f Merge pull request #2013 from 6vision/fix_baidu_voice
Changed sampling rate
2024-05-23 01:36:00 +08:00
6vision
be13cc3194 Changed sampling rate 2024-05-23 01:34:20 +08:00
vision
a46320e744 Merge pull request #2012 from 6vision/fix_issue_1959_
Fix issue 1959 wenxin模型返回报错
2024-05-22 21:45:20 +08:00
6vision
071709d263 fix: 1959-百度文心偶发报错336006 2024-05-22 16:01:46 +08:00
6vision
93a32ae5ff 修复模型请求异常时的bug 2024-05-22 15:57:22 +08:00
vision
eee96f226f Merge pull request #2005 from 6vision/fix_baidu_voice
fix: baidu voice bug
2024-05-21 22:38:54 +08:00
6vision
e19a8b479c fix: baidu voice bug 2024-05-21 22:32:35 +08:00
6vision
9ef459112e fix: time_check model 2024-05-20 20:37:00 +08:00
Haowei
e96474bd5c Merge branch 'zhayujie:master' into master 2024-05-20 16:53:02 +08:00
njnuko
6fed719e09 add Xunfei Voice
Signed-off-by: njnuko <njnuko@163.com>
2024-05-20 15:04:23 +08:00
zhayujie
99aac76618 docs: update readme 2024-05-18 19:03:17 +08:00
baojingyu
599f458201 Update plugins source.js add midjourney实现ai绘图的的插件 2024-05-17 15:38:19 +08:00
baojingyu
2f8099059c 修复chat_channel配置参数取值错误bug,优化dingtalk_channel回复打字机效果流式 AI卡片、dingtalk_message图片或富文本消息接收 2024-05-17 14:48:52 +08:00
zhayujie
e24f177832 Merge pull request #1993 from 6vision/fix_linkai_pconf
fix: linkai plugin config_template
2024-05-17 01:25:30 +08:00
6vision
48cc143e88 fix: linkai plugin config_template 2024-05-17 01:22:38 +08:00
zhayujie
b09b46c045 fix: summary switch bug 2024-05-14 17:48:18 +08:00
zhayujie
2c6583cc9c fix: summary switch bug 2024-05-14 17:26:10 +08:00
zhayujie
e381d1bfb8 feat: support gpt-4o model 2024-05-14 09:50:03 +08:00
zeeland
eac619d54f feat: add logger drag sensitive 2024-05-13 19:53:33 +08:00
zhayujie
a6ef3bc0ce fix: add channel login exception log 2024-05-08 12:54:13 +08:00
zhayujie
118122c541 docs: update README.md 2024-05-08 12:07:59 +08:00
zhayujie
bfdf33ac09 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-05-07 11:37:53 +08:00
zhayujie
fa3370df5b fix: image model check 2024-05-07 11:37:27 +08:00
zhayujie
f1e51672c5 Merge pull request #1944 from alvinsuDL/patch-1
Update README.md
2024-05-07 11:20:43 +08:00
alvinsuDL
91f97b2728 Update README.md 2024-05-07 11:16:41 +08:00
zhayujie
2c542e03fe Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-05-07 11:10:41 +08:00
zhayujie
71a11b4267 feat: support mj client config 2024-05-07 11:09:49 +08:00
zhayujie
ea642757db docs: update README.md 2024-05-06 22:19:49 +08:00
zhayujie
fb72b601aa fix: model config 2024-05-03 19:41:12 +08:00
zhayujie
27e507e744 fix: update client sdk version 2024-05-03 19:10:27 +08:00
zhayujie
4db19f816f feat: update service url 2024-05-03 14:10:07 +08:00
zhayujie
096d5776d1 feat: v1.6.0 verson update 2024-04-26 16:13:53 +08:00
zhayujie
3d799eb4d9 Merge pull request #1893 from uxfion/fix-openai-whisper
fix openai voice_to_text whisper
2024-04-26 15:37:34 +08:00
zhayujie
e4ac3afa4d Merge pull request #1849 from wayshall/kimi
feat: 增加moonshot api集成
2024-04-26 15:17:52 +08:00
zhayujie
d38e4eed5b Merge pull request #1904 from fatwang2/master
新增url解析逻辑,解决itchat中分享卡片无法解析的问题
2024-04-20 11:09:51 +08:00
fatwang2
97787fac91 新增url解析逻辑,解决itchat中分享卡片无法解析的问题 2024-04-20 00:48:33 +08:00
Lecter
b494ee2f1c fix openai voice_to_text whisper 2024-04-14 14:33:17 +08:00
zhayujie
31ac80a074 Merge pull request #1851 from wayshall/qwen-dashscope
feat: 通义千问使用新版的sdk实现
2024-04-09 16:06:33 +08:00
zhayujie
c8896450f6 fix: add warn log in glm 2024-04-09 15:57:59 +08:00
zhayujie
c662fa4c63 Merge pull request #1871 from cgnannan/master
修复 Issues #1868提到的elevenlabs sdk更新问题
2024-04-09 15:52:35 +08:00
zhayujie
db2ee802ca chore: log optimization 2024-04-09 15:35:18 +08:00
Haowei
d40e915e2b Merge branch 'zhayujie:master' into master 2024-04-09 11:31:57 +08:00
zhayujie
c0616e7efa Merge pull request #1881 from 6vision/feat_local
优化Hello插件。支持自定义欢迎语提示词以及为不同群设置不同的固定欢迎语
2024-04-09 10:46:22 +08:00
6vision
01660597e3 Merge branch 'feat_local' of git@github.com:6vision/chatgpt-on-wechat.git into feat_local 2024-04-08 23:09:08 +08:00
6vision
c5b549f450 优化hello插件 2024-04-08 23:06:35 +08:00
vision
802d8457bb Merge branch 'zhayujie:master' into feat_local 2024-04-08 23:05:39 +08:00
zhayujie
c3a3df67b0 Merge pull request #1847 from Yanyutin753/master
fix ReplyType.IMAGE 回复图片为空的BUG
2024-04-08 12:15:49 +08:00
6vision
5798aeb3cd Merge branch 'update-hello' of git@github.com:6vision/chatgpt-on-wechat.git into feat_local 2024-04-07 22:34:52 +08:00
6vision
cc81dd9172 Signed-off-by: 6vision <vision_wangpc@sina.com> 2024-04-07 22:31:08 +08:00
Haowei
44fdadda08 Merge branch 'zhayujie:master' into master 2024-04-07 14:54:48 +08:00
zhayujie
66a014150b fix: config update bug 2024-04-06 01:03:26 +08:00
zhayujie
1da596639f feat: update sdk version 2024-04-06 00:19:22 +08:00
zhayujie
76614ae9e5 fix: remote config load bug 2024-04-05 23:47:02 +08:00
cgnannan
6ddddffc0f update SDK version of elevenlabs and corresponding code snippets. 2024-04-01 06:26:39 +00:00
unknown
dd95f849d4 Merge branch 'master' of https://github.com/whw23/chatgpt-on-wechat 2024-03-30 01:08:07 +08:00
unknown
22c7f8fe9e add dall-e-2 retry_count limit 2024-03-30 01:07:52 +08:00
Haowei
3d47be1f49 Merge branch 'zhayujie:master' into master 2024-03-30 00:54:38 +08:00
weishao zeng
5e399c46b1 feat: 通义千问使用新版的sdk实现
现在项目使用的通义千问是旧版本的百炼sdk,
这里增加一个新版本sdk(dashscope)的实现
2024-03-27 19:12:39 +08:00
weishao zeng
38e1db7a37 feat: 增加moonshot api集成
moonshot本来可直接使用openai sdk,
但是要求openai sdk必须在1.0以上,与本项目冲突,
故现使用http接口对接的方式集成
2024-03-27 15:02:51 +08:00
Clivia
8309f7cdbe feat ReplyType.IMAGE 回复图片为空的BUG 2024-03-27 14:49:54 +08:00
zhayujie
b8cc62ae95 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-03-27 10:35:42 +08:00
zhayujie
c0eb433fa2 fix: remove unused import 2024-03-27 10:35:12 +08:00
zhayujie
7f857d66f6 docs: update README.md 2024-03-26 20:12:25 +08:00
zhayujie
93b14d38f4 Merge pull request #1837 from dividduang/master
blackroom
2024-03-26 16:10:18 +08:00
zhayujie
21825faab0 docs: update README.md 2024-03-26 16:01:05 +08:00
zhayujie
1fafd39298 fix: gemini session bug 2024-03-26 00:06:50 +08:00
WILMAR\dengjingren
23b750fc4f blackroom 2024-03-25 21:56:26 +08:00
zhayujie
90581c840d Merge pull request #1760 from xiexin12138/feature-优化智谱-AI-的命令操作
add feature 优化智谱 AI 的命令操作,使其支持重置会话
2024-03-25 21:43:23 +08:00
zhayujie
cac7a6228a fix: claude api optimize 2024-03-25 21:41:40 +08:00
zhayujie
674fbc3f69 Merge pull request #1810 from FB208/master
增加了claude api的调用方法
2024-03-25 20:42:59 +08:00
zhayujie
9577bf1cc7 Merge pull request #1724 from stx116/patch-1
Update xunfei_spark_bot.py修改,修改讯飞大语言模型至3.5版本
2024-03-25 15:31:48 +08:00
zhayujie
654ebe93e7 Merge branch 'master' into patch-1 2024-03-25 15:31:38 +08:00
zhayujie
ecb1b3c491 Merge pull request #1763 from JobsLee0/master
升级讯飞接口版本及协议,避免11200错误码问题[Update xunfei_spark_bot.py]
2024-03-25 15:29:12 +08:00
zhayujie
c3d1711edc Merge branch 'master' into master 2024-03-25 15:28:41 +08:00
zhayujie
c12c7f10f0 Merge pull request #1826 from Meng-de-Cao/master
Update xunfei_spark_bot.py
2024-03-25 15:26:53 +08:00
zhayujie
f71820bf4e Merge pull request #1787 from uxfion/edge-tts
feat: edge-tts
2024-03-25 15:24:14 +08:00
Haowei
748c53c774 Merge branch 'zhayujie:master' into master 2024-03-23 21:13:36 +08:00
zhayujie
b290a71bfb Merge pull request #1686 from xiaodonghsu/new
百度语音转写支持8000采样率, pcm_s16le编码, 单通道语音的组合
2024-03-21 15:47:20 +08:00
Saboteur7
3204c51eca Merge pull request #1412 from Yanyutin753/patch-6
Update source.json
2024-03-21 15:39:42 +08:00
Saboteur7
2c4b8a44dc Merge pull request #1816 from xywhnh/master
修复gemini 插件的两个问题
2024-03-21 15:34:42 +08:00
卡Q因
943aa05eaa Update xunfei_spark_bot.py
默认使用讯飞3.5模型
2024-03-20 21:22:15 +08:00
Haowei
d0fd36e7e1 Merge branch 'zhayujie:master' into master 2024-03-20 15:31:31 +08:00
zhayujie
f45ff5fd0a Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-03-20 12:08:07 +08:00
zhayujie
c22c7102d5 fix: no need to send when message is empty 2024-03-20 12:07:05 +08:00
Saboteur7
11ecfd1b41 Merge pull request #1819 from 13476573407/master
由于使用#scanp和#reloadp扫描插件时,当更新已存在的插件以后并不会实现重载更新后的插件
2024-03-20 12:04:01 +08:00
Saboteur7
798e30e5ac Merge pull request #1821 from gufei/fix-bug
修复两处BUG
2024-03-20 11:50:40 +08:00
13476573407
15e0702329 解决使用scanp重载时会重新生成godcmd的实例,导致auth权限被清空 2024-03-20 10:52:34 +08:00
13476573407
a2bc22c37d 由于使用#scanp和#reloadp扫描插件时,当更新插件以后并不会实现重载新的插件
所以取消了已载入的插件判断重载除Godcmd以外的所有插件来实现不需要重启项目即可更新插件
2024-03-18 14:40:01 +08:00
rowan.wu
8093fcc64c 修复两处BUG
1、类型定义中使用了驼峰,但其他位置使用的大写
2、微信channel中,发送IMAGE,多余了seek方法
2024-03-16 12:34:40 +08:00
熊伟(10007228)
800419e7cc 修复如下问题:
1.调用gemini api出现异常时没有向下游返回错误信息,后续处理流程可能要根据错误信息做相应补偿机制
2.修复特殊场景中出现索引越界导导致应用退出
2024-03-14 13:44:14 +08:00
FB208
a241dc6785 Update README.md 2024-03-12 13:09:55 +08:00
FB208
805bea0d5f 增加了claude api的调用方法 2024-03-12 10:39:51 +08:00
unknown
9d394adf24 1.修复Azure Openai Dalle请求 2.增加Azure Openai Dalle3 请求参数 3.将用于回复文字和回复Dalle3的Azure Openai资源分离开 2024-03-12 08:32:24 +08:00
Saboteur7
2074f27aff Merge pull request #1806 from goldfishh/master
disable plugin(tool) log printing
2024-03-10 13:28:32 +08:00
goldfishh
283ad48b86 disable plugin(tool) log printing 2024-03-10 13:11:45 +08:00
zhayujie
07e10a7943 Update README.md 2024-03-08 00:19:59 +08:00
zhayujie
2812a5026c Update README.md 2024-03-05 20:56:37 +08:00
Lecter
3a20461abf add edge-tts 2024-03-04 00:14:19 +08:00
Zhuoheng Lee
64ae3d1e21 Update xunfei_spark_bot.py
讯飞接口升级到v3.5版本,同时升级到wss协议,避免请求时出现11200错误码的问题
2024-02-21 14:14:19 +08:00
xiexin12138
a25d7ea65b add feature 优化智谱 AI 的命令操作,使其支持重置会话 2024-02-20 16:40:00 +08:00
zhayujie
74ebbdd761 fix: client resource usage bug 2024-02-19 13:32:32 +08:00
MasterKeee
a0427b569e 新增公众号的回复视频类型 2024-02-19 00:45:53 +08:00
zhayujie
5346dfdd8b feat: code tidying up 2024-02-05 12:21:50 +08:00
zhayujie
3ee4147285 Merge pull request #1723 from zRzRzRzRzRzRzR/master
支持ZhipuAI GLM系列模型和画图代码
2024-02-05 12:15:51 +08:00
zhayujie
c41e486bfc Update config.py 2024-02-05 12:15:28 +08:00
zhayujie
eda3ba92fd Merge branch 'master' into master 2024-02-05 12:14:26 +08:00
zhayujie
40255290b0 Merge pull request #1716 from wayshall/zhipu
feat: 增加智谱chatglm4模型支持
2024-02-05 12:05:07 +08:00
zhayujie
af5bc73dc0 feat: optimize consumer thread pool 2024-02-05 12:01:41 +08:00
zR
0247cd4c45 改善模型选择 2024-02-02 11:08:06 +08:00
stx116
916762cc8c Update xunfei_spark_bot.py
更新讯飞大语言模型到3.5版本
2024-02-01 15:18:56 +08:00
zR
d6fdf8ca2a 支持ZhipuAI GLM系列模型和画图代码 2024-02-01 11:31:56 +08:00
zhayujie
95708489c9 fix: wxcomapp user name 2024-01-31 16:24:29 +08:00
weishao zeng
ced0fa4608 feat: 增加智谱chatglm4模型支持 2024-01-30 10:17:53 +08:00
zhayujie
7e0fbd600f feat: add media send limit and interval 2024-01-29 11:46:00 +08:00
zhayujie
f33e4e0323 fix: close tool debug level 2024-01-27 11:08:44 +08:00
zhayujie
d0fd78497d Merge pull request #1680 from V-know/patch-1
Doc: 优化【服务器部署】
2024-01-26 16:29:11 +08:00
zhayujie
8045019603 feat: add 4-turbo-preview model 2024-01-26 16:21:11 +08:00
zhayujie
7d92b9435e Merge pull request #1678 from goldfishh/master
tool 0.5.0
2024-01-26 11:17:15 +08:00
zhayujie
1e0822703a fix: image num 2024-01-25 18:00:02 +08:00
zhayujie
0403ff88ef feat: image num limit 2024-01-25 15:45:24 +08:00
zhayujie
78376d591b fix: image limit 2024-01-25 15:40:52 +08:00
zhayujie
8e23d0df20 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-01-25 15:39:41 +08:00
zhayujie
9e281d20ab fix: image num limit 2024-01-25 15:34:59 +08:00
zhayujie
644bd4a106 Merge pull request #1698 from 6vision/6vision-patch-1
Update wework_message.py
2024-01-23 20:09:16 +08:00
zhayujie
7729e66a96 docs: update README.md 2024-01-23 20:01:55 +08:00
zhayujie
d67d6b7948 feat: knowledge base send file 2024-01-22 18:03:04 +08:00
vision
4c4a46bfbe Update wework_message.py 2024-01-22 13:38:11 +08:00
zhayujie
4536f9c177 feat: client mng 2024-01-19 14:38:14 +08:00
FMStereo
977d3bc02e 百度语音转写支持8000采样率, pcm_s16le编码, 单通道语音的组合 2024-01-18 12:46:18 +08:00
zhayujie
eae95dfef5 fix: api base bug 2024-01-17 18:25:57 +08:00
Cancellara
b67d4460ca Doc: 优化【服务器部署】
不必单独创建nohup.out文件
nohup 命令执行时会自动创建
2024-01-17 01:13:39 +08:00
goldfishh
3dea8311b1 change chatgpt_tool_hub version to 0.5.0 2024-01-16 23:39:40 +08:00
zhayujie
11f6e98874 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2024-01-16 23:22:10 +08:00
zhayujie
2609e595f4 fix: client host 2024-01-16 22:38:33 +08:00
zhayujie
ac6e41abc8 Merge pull request #1644 from PoseidonLi0514/master
Image generation supports custom endpoint
2024-01-16 22:35:57 +08:00
zhayujie
9c17e16d0a fix: optimize code format 2024-01-16 19:17:32 +08:00
goldfishh
55e9064307 tool ver0.5
1. 新增工具pure模式,支持单个工具调用
2. 新增消息转发工具:email, sms, wechat, 可以根据规则向其他平台发送消息
3. 替换visual-dl(更名为visual)实现,目前识别图片链接效果较好。
4. 修复了0.4版本大部分工具返回结果不可靠问题
2024-01-16 01:13:40 +08:00
zhayujie
91cabd7d49 Merge pull request #1628 from huiwenTT/dingdinggpt
添加语音发送消息
2024-01-15 22:45:46 +08:00
zhayujie
7456950530 Merge pull request #1658 from I-E-E-E/patch-1
fixed a typo
2024-01-15 22:41:12 +08:00
zhayujie
8fcdda625d Merge pull request #1675 from zhayujie/feat-client
feat: channel client
2024-01-15 22:37:53 +08:00
zhayujie
40a10ee926 Merge branch 'master' into feat-client 2024-01-15 22:37:47 +08:00
zhayujie
c3f7e2645c feat: channel client 2024-01-15 22:35:30 +08:00
I-E-E-E
b264af1892 fixed a typo 2024-01-08 17:51:15 +08:00
Haikui Yang
43e93e8e22 Update open_ai_image.py 2024-01-01 22:43:03 +08:00
Haikui Yang
d6c4789688 Merge branch 'zhayujie:master' into master 2024-01-01 22:42:10 +08:00
惠文
cb31ee6f01 Merge branch 'dingdinggpt' of github.com:huiwenTT/chatgpt-on-wechat-1 into dingdinggpt 2023-12-26 15:56:35 +08:00
huiwen
f7b694ac56 添加语音发送消息和修复上下文的关联 2023-12-26 14:48:54 +08:00
zhayujie
eb809055d4 Merge pull request #1559 from huiwenTT/dingdinggpt
钉钉机器人
2023-12-25 18:15:33 +08:00
zhayujie
78d9be82b2 fix: add gemini dependency 2023-12-19 11:47:33 +08:00
Haikui Yang
76a95c0226 Update open_ai_image.py 2023-12-17 19:50:06 +08:00
huiwen
d3ab8fb04a Merge branch 'dingdinggpt' of 47.98.110.173:/opt/python_app/gpt into dingdinggpt 2023-12-17 09:52:24 +08:00
huiwen
f7a0b63a00 Merge branch 'zhayujie:master' into dingdinggpt 2023-12-17 09:27:30 +08:00
huiwen
a21dd97786 钉钉app_id,变更为_client_id,和逻辑优化 2023-12-17 09:23:15 +08:00
zhayujie
04943c0bfa Update README.md 2023-12-16 01:11:05 +08:00
zhayujie
203d4d8bfb Update README.md 2023-12-15 19:16:13 +08:00
zhayujie
c049a619dc chore: remove useless code 2023-12-15 16:49:23 +08:00
zhayujie
cc1b14b607 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-12-15 14:44:54 +08:00
zhayujie
e04a12a8f4 Merge branch 'hanfangyuan4396-master' 2023-12-15 14:40:34 +08:00
zhayujie
a2c82bc583 Merge branch 'master' of https://github.com/hanfangyuan4396/chatgpt-on-wechat into hanfangyuan4396-master 2023-12-15 14:40:15 +08:00
zhayujie
b4dc382f7c Merge pull request #1598 from zhayujie/feat-gemini
feat: support gemini model
2023-12-15 14:24:26 +08:00
zhayujie
eca1892e2a fix: gemini no content bug 2023-12-15 14:23:36 +08:00
zhayujie
23a237074e feat: support gemini model 2023-12-15 10:19:48 +08:00
zhayujie
219e9eca4f Merge pull request #1595 from 6vision/master
企微优化
2023-12-14 12:00:28 +08:00
6vision
413e09fb9e 1、企微个人号支持文件和链接消息
2、修复企微个人号群名获取bug
2023-12-14 00:50:34 +08:00
zhayujie
3514c37e4c fix: railway fork does not need action 2023-12-13 20:57:04 +08:00
zhayujie
95260e303c fix: process markdown url in knowledge base 2023-12-11 20:48:13 +08:00
hanfangyuan4396
0cef34bdfa Merge branch 'zhayujie:master' into master 2023-12-09 19:41:01 +08:00
Han Fangyuan
9838979bbd refactor: update class name of qwen bot 2023-12-09 19:40:07 +08:00
Han Fangyuan
c8910b8e14 fix: set correct top_p params of ali qwen model 2023-12-09 19:26:11 +08:00
Han Fangyuan
207fa1d019 feat: hot reload conf of ali qwen model 2023-12-09 18:40:17 +08:00
zhayujie
be0bb591e7 fix: do not draw when text_to_image is empty 2023-12-09 17:12:08 +08:00
Han Fangyuan
bfacdb9c3b feat: support character description of ali qwen model 2023-12-09 12:39:09 +08:00
zhayujie
ae4077ed6c fix: config adjust 2023-12-08 14:29:14 +08:00
zhayujie
6eb3c90e18 feat: qwen model modify 2023-12-08 14:12:21 +08:00
zhayujie
8c2a53a504 Merge pull request #1573 from chazzjimel/master
add ali voice output
2023-12-08 13:34:54 +08:00
zhayujie
74db1e0308 Merge pull request #1537 from hanfangyuan4396/master
支持阿里云百炼平台通义千问模型
2023-12-08 13:27:52 +08:00
zhayujie
b9dfdcef3d Merge pull request #1577 from xyshell/patch-1
Update chat_gpt_bot.py retry APIConnectionError
2023-12-08 13:26:59 +08:00
zhayujie
9d4afeac31 feat: speech support app_code bind 2023-12-07 22:44:43 +08:00
zhayujie
14ae2f169a fix: hello plugin trigger app bug 2023-12-07 19:41:50 +08:00
You Xie
55df19142f Update chat_gpt_bot.py retry APIConnectionError 2023-12-06 02:27:22 -06:00
zhayujie
40fd545b2c fix: exit group optimize 2023-12-06 10:51:47 +08:00
zhayujie
95fb07343e Merge pull request #1570 from erayyym/master
adding features: 退群提醒
2023-12-06 10:42:15 +08:00
erayyym
4d87906559 增加了配置项
本地跑没有问题,用户打开这个功能需要在config.json加入  "group_chat_exit_group": true,

(但是不确定写的对不对,刚开始学cs哈哈,之前没搞过这个)
2023-12-05 13:18:42 -05:00
跃迁
6b30dced43 Merge branch 'zhayujie:master' into master 2023-12-06 00:44:18 +08:00
chazzjimel
293a03b7c8 add ali voice output
增加阿里云语音输出接口
2023-12-06 00:43:19 +08:00
zhayujie
c010549f17 Merge pull request #1563 from malsony/master
Update xunfei_spark_bot.py
2023-12-06 00:40:23 +08:00
zhayujie
cc0be22026 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-12-06 00:31:59 +08:00
zhayujie
e5ba26febe fix: tts voice base url 2023-12-06 00:31:31 +08:00
erayyym
36f9680eec adding features: 退群提醒
后面还打算想办法加用户自己退出的提醒,目前版本是可以在群主(且群主/管理员自己是bot)踢人时候发出提醒
2023-12-05 03:58:42 -05:00
zhayujie
f4f5be5b08 Create LICENSE 2023-12-04 11:14:55 +08:00
chazzjimel
d89b056886 add ali voice output
增加阿里云语音输出支持。
2023-12-03 18:19:03 +08:00
malsony
65424c7db9 Update xunfei_spark_bot.py
update API URL for v3.0 version of Xunfei Spark.
2023-12-01 16:09:15 +08:00
huiwen
32a8a847fc 修复小bug 2023-11-30 12:09:03 +08:00
zhayujie
88fb3dbf60 fix: generate break by bug 2023-11-30 11:51:04 +08:00
惠文
f6bee3aa58 新增钉钉机器人(Stream模式) 2023-11-30 10:41:34 +08:00
zhayujie
5f19f37dcb feat: hello plugin support app code 2023-11-29 23:15:31 +08:00
zhayujie
dd36d8ce9e Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-11-29 17:41:44 +08:00
zhayujie
865e4b5349 feat: hello plugin support system prompt 2023-11-29 17:41:14 +08:00
hanfangyuan4396
e70564752b Merge branch 'master' into master 2023-11-29 10:16:50 +08:00
zhayujie
6e0d2f9437 fix: remove unuse log and add plugin config in docker config 2023-11-28 16:29:32 +08:00
zhayujie
291f936097 Update README.md 2023-11-27 20:24:42 +08:00
zhayujie
0b2ce48586 Update README.md 2023-11-27 18:20:52 +08:00
zhayujie
da87fd9e20 feat: add single chat blacklist 2023-11-27 14:45:25 +08:00
zhayujie
d4da4d2575 fix: nick name config name 2023-11-27 14:38:45 +08:00
zhayujie
bad20ff483 Merge pull request #1538 from dividduang/blacklist
Blacklist
2023-11-27 14:29:06 +08:00
zhayujie
21ad51ffbf fix: remove repeat util 2023-11-27 14:24:26 +08:00
zhayujie
697c6d5fbe Merge pull request #1541 from Saboteur7/master
新增飞书应用通道
2023-11-27 14:22:23 +08:00
zhayujie
293c659053 Merge pull request #1553 from zhayujie/feat-11-27
feat: add image chat and fix session discard
2023-11-27 14:21:53 +08:00
zhayujie
a12507abbd feat: default close image summary 2023-11-27 14:07:14 +08:00
zhayujie
4e675b84fb feat: image input and session optimize 2023-11-27 12:47:00 +08:00
Han Fangyuan
c1022feab8 fix: add tongyi model to model list 2023-11-25 10:06:10 +08:00
Saboteur7
ddcfcf21fe 群聊只有艾特机器人才回复 2023-11-23 22:05:10 +08:00
Saboteur7
86a58c3d80 新增飞书应用通道
- 支持自建机器人的私聊和群聊
 - 支持图片生成
 - 支持文件总结
2023-11-21 22:41:54 +08:00
divid
abf9a9048d feat:blasklist 2023-11-20 21:59:00 +08:00
divid
b1030a527a blacklist 2023-11-20 21:51:59 +08:00
Han Fangyuan
8d07ba6332 fix: add tongyi type when init bridge 2023-11-19 23:00:18 +08:00
Han Fangyuan
4ce37f84e4 feat: support Tongyi Qwen model of alibaba 2023-11-19 22:42:44 +08:00
zhayujie
061d8a3a5f Merge pull request #1488 from yy1781051483/master
add xunfei v3.0
2023-11-17 16:29:39 +08:00
zhayujie
374cd5dbb8 feat: support send knowledge base image 2023-11-17 16:27:44 +08:00
zhayujie
5ad53c2b9c fix: reduce error noise when converting speech to text 2023-11-16 10:54:24 +08:00
zhayujie
a2ec1a063d fix: typo 2023-11-10 17:16:15 +08:00
zhayujie
e431dbe2df docs: update readme.md 2023-11-10 17:13:13 +08:00
zhayujie
7218463f9e docs: update README 2023-11-10 16:06:58 +08:00
zhayujie
aeb09a95b0 fix: image vision temporarily cancel error logging 2023-11-10 14:31:07 +08:00
zhayujie
0c8f292e12 feat: add tts speech model 2023-11-10 10:48:52 +08:00
zhayujie
f001ac6903 feat: add dalle3 gpt-4-turbo model change 2023-11-10 10:11:02 +08:00
zhayujie
db8e506de0 feat: add gpt-4-turbo tokens calc 2023-11-07 23:10:39 +08:00
zhayujie
099f859dd4 fix: limit openai sdk version to prevent compatibility issues 2023-11-07 10:34:46 +08:00
Daydreamer
b7684c1c2b add xunfei v3.0 2023-10-29 17:38:56 +08:00
zhayujie
058c167f79 docs: trim help cmd 2023-10-27 14:30:33 +08:00
zhayujie
49446d4872 feat: add wenxin 4.0 model 2023-10-27 14:18:55 +08:00
zhayujie
ced560e1e1 Merge pull request #1485 from zhayujie/feat-agent
feat: show thought and plugin in agent process
2023-10-27 13:27:38 +08:00
zhayujie
339102c3cd Merge pull request #1482 from 6vision/master
自定义入群欢迎语和apilot插件
2023-10-27 12:35:11 +08:00
zhayujie
6331350239 Merge branch 'master' into feat-agent 2023-10-27 12:32:35 +08:00
zhayujie
34e06fcbf8 feat: show thought and plugin in agent process 2023-10-27 12:28:34 +08:00
vision
70aac312ff Merge branch 'zhayujie:master' into master 2023-10-25 21:12:48 +08:00
zhayujie
5e00704152 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-10-23 21:09:54 +08:00
zhayujie
1a9edb6907 fix: plugin config not exist warning 2023-10-23 21:09:18 +08:00
zhayujie
0c18c3a6dd docs: update demo vedio 2023-10-19 21:51:57 +08:00
6vision
847bb51ce4 增加Apilot插件 2023-10-19 19:34:36 +08:00
6vision
fa60a5dc63 增加新人入群自定义欢迎语参数 2023-10-19 19:20:41 +08:00
zhayujie
aaed3f9839 fix: ignore system message 2023-10-18 11:14:44 +08:00
zhayujie
21b956b983 fix: mj open auth bug 2023-10-16 16:44:06 +08:00
zhayujie
792e940279 fix: knowledge base miss suffix bug 2023-10-13 19:12:23 +08:00
zhayujie
c2477b26c0 fix: summary no user_id bug 2023-10-13 18:58:13 +08:00
zhayujie
4b27de809b fix: image create prefix 2023-10-13 18:10:05 +08:00
zhayujie
572932d8e8 docs: update README.md 2023-10-13 16:31:02 +08:00
zhayujie
270dd778d9 docs: update config-template and readme 2023-10-13 16:26:29 +08:00
zhayujie
dd04287b0a Merge pull request #1454 from befantasy/patch-5
Update chat_channel.py fix SHARING Type 报错。
2023-10-13 15:45:00 +08:00
zhayujie
36ac6d005a Merge pull request #1457 from befantasy/master
新增”ContextType.ACCEPT_FRIEND“,方便插件对“同意好友请求”后的事件进行处理。
2023-10-13 15:44:25 +08:00
zhayujie
701daedf49 feat: multi agent plugin 2023-10-13 15:36:20 +08:00
zhayujie
238f05f453 fix: summary plugin group enable bug 2023-10-07 10:50:59 +08:00
zhayujie
dd082bd212 fix: search miss config 2023-09-30 20:02:26 +08:00
zhayujie
cfd2f27b0b feat: knowledge base search miss config 2023-09-30 15:21:26 +08:00
zhayujie
a2160d135e feat: knowledge base miss prefix 2023-09-30 15:14:42 +08:00
zhayujie
16d7836369 fix: summary failed tips 2023-09-29 17:00:47 +08:00
zhayujie
f3de4dcc5f fix: remove mini-program url 2023-09-29 16:37:21 +08:00
zhayujie
e34523028f fix: admin auth bug 2023-09-29 15:52:34 +08:00
zhayujie
efe2fbacd6 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-09-28 16:27:52 +08:00
zhayujie
2fa1df29be fix: file size calc bug 2023-09-28 16:26:53 +08:00
befantasy
f72cd13fba Update wechat_message.py 2023-09-28 16:18:04 +08:00
befantasy
5b552dffbf Update wechat_channel.py 新增 ContextType.ACCEPT_FRIEND 2023-09-28 16:16:30 +08:00
befantasy
a0ae2d13dc Update context.py 新增ContextType "ACCEPT_FRIEND" 2023-09-28 16:11:09 +08:00
befantasy
f7262a0a3a Update chat_channel.py fix SHARING Type 报错。
chatgpt-on-wechat    | [ERROR][2023-09-27 18:48:41][chat_channel.py:211] - [WX] unknown context type: SHARING
2023-09-27 19:26:47 +08:00
zhayujie
9736f121eb Update README.md 2023-09-26 18:43:25 +08:00
zhayujie
7c8fb7eacc Merge pull request #1428 from scut-chenzk/chenzk
修复收到从微信发出的图片消息保存到本地失败的问题
2023-09-26 15:59:23 +08:00
zhayujie
b45eea5908 Merge pull request #1427 from befantasy/master
itchat通道增加ReplyType.FILE/ReplyType.VIDEO/ReplyType.VIDEO_URL,以方便插件的开发。keyword插件增加文件和视频匹配回复
2023-09-26 01:27:35 +08:00
zhayujie
6babf4ee6c Merge pull request #1445 from befantasy/patch-3
Update godcmd.py 增加debug模式的关闭
2023-09-26 00:37:17 +08:00
zhayujie
576526d4ee Merge pull request #1446 from 6vision/master
个人订阅号消息存储优化
2023-09-26 00:36:36 +08:00
zhayujie
c03e31b7be fix: linkai instruction bug 2023-09-25 23:15:59 +08:00
zhayujie
a1aa925019 fix: no summary config bug 2023-09-25 18:30:19 +08:00
zhayujie
a5a234ed97 fix: remove file after summary 2023-09-25 16:42:36 +08:00
zhayujie
5b5dbcd78b feat: remove file word calc and support url link 2023-09-24 14:33:39 +08:00
zhayujie
bd1c6361d3 Update README.md 2023-09-24 12:54:34 +08:00
zhayujie
1fc1febf03 Merge pull request #1450 from zhayujie/feat-doc-chat
feat: 文档总结和与内容对话
2023-09-24 12:30:45 +08:00
zhayujie
55cc35efa9 feat: document summary and chat with content 2023-09-24 12:27:09 +08:00
vision
5ba8fdc5e7 fix 2023-09-23 14:31:54 +08:00
vision
6ea295e227 Merge pull request #1 from 6vision/feat
个人订阅号长语音支持
2023-09-23 13:46:25 +08:00
befantasy
5010c76ef7 Update godcmd.py 增加debug模式的关闭 2023-09-23 13:37:01 +08:00
6vision
79c7f0c29f 个人订阅号长语音支持 2023-09-23 13:27:36 +08:00
6vision
2b3e643786 适配一次请求多条回复 2023-09-23 11:59:01 +08:00
chenzhenkun
90cdff327c 修复收到从微信发出的图片消息保存到本地失败的问题 2023-09-15 19:07:52 +08:00
zhayujie
55c116e727 Update README.md 2023-09-15 18:42:56 +08:00
befantasy
3dd83aa6b7 Update chat_channel.py 2023-09-15 18:38:31 +08:00
befantasy
a74aa12641 Update wechat_channel.py 2023-09-15 18:37:05 +08:00
befantasy
151e8c69f9 Update keyword.py 2023-09-15 18:22:10 +08:00
befantasy
d8bfa77705 Update keyword.py 2023-09-15 16:56:51 +08:00
befantasy
6bd286e8d5 Update wechat_channel.py to support ReplyType.FILE 2023-09-15 16:22:46 +08:00
befantasy
905532b681 Update chat_channel.py to support ReplyType.FILE 2023-09-15 16:21:27 +08:00
zhayujie
04d5c1ab01 Delete .github/ISSUE_TEMPLATE/config.yml 2023-09-15 15:45:23 +08:00
zhayujie
28be141dc7 Merge pull request #1422 from scut-chenzk/chenzk
修复接语音回复失效的问题
2023-09-15 15:14:00 +08:00
chenzk
652b786baf Merge branch 'zhayujie:master' into chenzk 2023-09-14 23:42:00 +08:00
chenzhenkun
ba6c671051 修复收到图片消息保存到本地失败的问题 2023-09-14 23:39:07 +08:00
chenzhenkun
ca25d0433f 修复接语音回复失效的问题 2023-09-14 17:52:11 +08:00
zhayujie
5338106dfa Merge pull request #1308 from leesonchen/master
企业服务号的语音输出进行切割
2023-09-12 18:18:17 +08:00
Clivia
854d613a81 Update source.json 2023-09-09 12:25:40 +08:00
zhayujie
b6b76be4f6 fix: add summary plugin bot type 2023-09-06 16:50:23 +08:00
zhayujie
03d94fcfa0 fix: not enable user_image_create_prefix by default 2023-09-06 12:02:13 +08:00
zhayujie
b2c5f0d455 feat: mj use default config 2023-09-06 11:53:33 +08:00
zhayujie
54f60dd38c chore: remove dependencies that can only be used under windows 2023-09-04 11:14:48 +08:00
zhayujie
42f181aca2 Merge pull request #1394 from resphinas/claude_bot
Update claude_ai_bot.py
2023-09-04 10:47:02 +08:00
resphina
9c3a27894f Update claude_ai_bot.py 2023-09-03 19:12:27 +08:00
resphina
f7cd348912 Update claude_ai_bot.py 2023-09-03 19:04:43 +08:00
zhayujie
aeaeb75d3b Merge pull request #1396 from 6vision/master
Optimize image download and storage logic
2023-09-03 17:32:30 +08:00
vision
96542b532e Update requirements-optional.txt 2023-09-03 17:14:28 +08:00
vision
139295fe0d Update requirements-optional.txt
增加企微个人号channel所需依赖
2023-09-03 16:47:25 +08:00
vision
13217b2ce2 Merge pull request #1 from 6vision/patch-1
Optimize image download and storage logic
2023-09-03 16:35:01 +08:00
vision
5cc8b56a7c Optimize image download and storage logic
- Implement new compression logic for files larger than 10MB to improve storage efficiency.
- Switch from JPEG to PNG to enhance image quality and compatibility.
2023-09-03 16:29:19 +08:00
resphina
e23e01c95e Update claude_ai_bot.py 2023-09-03 15:40:08 +08:00
resphina
bca8ba12c7 Update claude_ai_bot.py 2023-09-03 15:22:25 +08:00
vision
3c44bdbe1c Update requirements-optional.txt 2023-09-03 15:10:05 +08:00
zhayujie
db93ed025b Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-09-02 21:50:28 +08:00
zhayujie
4209e108d0 fix: wework single chat no prefix circle reply 2023-09-02 21:49:43 +08:00
zhayujie
14cbf011af Merge pull request #1391 from resphinas/claude_bot
Rename claude_ai_session to claude_ai_session.py
2023-09-02 10:42:29 +08:00
resphina
03a41ec199 Rename claude_ai_session to claude_ai_session.py 2023-09-02 02:40:57 +08:00
zhayujie
125fe2a026 Merge pull request #1390 from scut-chenzk/chenzk
Chenzk
2023-09-01 19:42:21 +08:00
chenzhenkun
ac4adac29e 兼容微信艾特的情况 2023-09-01 19:37:19 +08:00
chenzhenkun
ac449d078e Merge remote-tracking branch 'origin/chenzk' into chenzk
# Conflicts:
#	channel/chat_channel.py
2023-09-01 19:22:02 +08:00
chenzhenkun
79be4530d4 防止命中前缀导致死循环的情况 2023-09-01 19:18:53 +08:00
chenzk
85ce52d70c Merge branch 'zhayujie:master' into chenzk 2023-09-01 18:57:52 +08:00
chenzhenkun
7ab56b9076 添加日志以方便定位问题 2023-09-01 18:56:24 +08:00
zhayujie
dedf976375 Merge pull request #1389 from scut-chenzk/chenzk
修复自己艾特自己会死循环的问题
2023-09-01 18:42:41 +08:00
chenzhenkun
89f438208a 修复自己艾特自己会死循环的问题 2023-09-01 18:39:31 +08:00
zhayujie
ffbc5080ae Merge pull request #1388 from resphinas/claude_bot
实现claude对接配置中的 共享上下文开关
2023-09-01 18:34:43 +08:00
resphina
4167f13bac Update README.md 2023-09-01 18:12:48 +08:00
resphina
6ba0baabb0 Update claude_ai_bot.py 2023-09-01 18:04:39 +08:00
resphina
081003df47 Update config.py 2023-09-01 17:55:09 +08:00
resphina
559194ffb2 Update config.py 2023-09-01 17:54:03 +08:00
resphina
97a26d4a46 Update README.md 2023-09-01 17:53:21 +08:00
resphina
503c6c9b7e Update claude_ai_bot.py 2023-09-01 17:31:30 +08:00
resphina
9a1e10deff Create claude_ai_session 2023-09-01 17:30:31 +08:00
zhayujie
054f927c05 fix: at_list bug in wechat channel 2023-09-01 13:45:04 +08:00
resphina
22210747d0 Update README.md 2023-09-01 12:40:09 +08:00
resphina
53b2deb72c 更新机器人相关接口文档说明 2023-09-01 12:38:58 +08:00
zhayujie
6fc158e7d6 hotfix: config.py format 2023-09-01 11:32:58 +08:00
zhayujie
a23a65c731 Merge pull request #1382 from resphinas/claude_bot
新增Claude聊天机器人接口(逆向cookie实现,稳定不失效)
2023-09-01 10:48:33 +08:00
resphina
7dc7105ee2 Update requirements-optional.txt 2023-09-01 10:32:33 +08:00
resphina
bac70108b2 Update requirements.txt 2023-09-01 10:32:03 +08:00
resphina
297404b21e Update config-template.json 2023-09-01 10:31:45 +08:00
resphina
33a7f8b558 Delete chatgpt-on-wechat-master.iml 2023-09-01 10:08:34 +08:00
resphina
4a670b7df7 Update config-template.json 2023-09-01 09:40:26 +08:00
resphina
79e4af315e Update log.py 2023-09-01 09:39:45 +08:00
resphina
c6e31b2fdc Update chat_gpt_bot.py 2023-09-01 09:39:08 +08:00
resphina
91dc44df53 Update const.py 2023-09-01 09:38:47 +08:00
resphina
7e57f8f157 Merge branch 'master' into claude_bot 2023-09-01 09:37:10 +08:00
zhayujie
15f6b7c6d3 Merge pull request #1385 from scut-chenzk/chenzk
支持wework企业微信机器人
2023-08-31 22:44:17 +08:00
chenzhenkun
b213ba541d 新增wework企业微信机器人支持插件功能 2023-08-31 21:02:00 +08:00
chenzhenkun
7c6ed9944e 支持wework企业微信机器人 2023-08-30 20:49:00 +08:00
resphinas
a5a825e439 system role remove 2023-08-29 06:45:21 +08:00
resphinas
a4ab547f77 proxy update 2023-08-29 05:59:59 +08:00
resphinas
76ed763abe proxy update 2023-08-29 05:58:39 +08:00
resphinas
b9e3125610 格式纠正2 2023-08-28 18:04:28 +08:00
resphina
8d9d5b7b6f Update claude_ai_bot.py 2023-08-28 17:40:27 +08:00
resphina
187601da1e Update config-template.json 2023-08-28 17:30:03 +08:00
resphina
cc3a0fc367 Update config-template.json 2023-08-28 17:28:13 +08:00
resphinas
44cc4165d1 claude_bot 2023-08-28 17:22:20 +08:00
resphinas
f98b43514e claude_bot 2023-08-28 17:18:00 +08:00
resphinas
3c9b1a14e9 claude bot update 2023-08-28 16:43:26 +08:00
zhayujie
827e8eddf8 chore: remove dockerhub in arm build 2023-08-27 12:28:10 +08:00
zhayujie
7bc27d6167 fix: remove docker hub register in arm build 2023-08-27 12:10:08 +08:00
zhayujie
ba06edd63a fix: remove pysilk_mod 2023-08-26 17:32:52 +08:00
zhayujie
cacf553a5b feat: add arm workflows 2023-08-26 17:17:03 +08:00
zhayujie
d89091a8ea fix: git action deploy 2023-08-26 14:14:32 +08:00
zhayujie
01a56e1155 feat: try arm docker image 2023-08-26 12:45:16 +08:00
zhayujie
a64d7c42b1 fix: xunfei ws error log 2023-08-26 11:46:01 +08:00
zhayujie
36b6cc58bf fix: on_close params 2023-08-26 11:37:27 +08:00
zhayujie
5ac8a257e7 fix: add gpt-3.5-turbo in model_list 2023-08-26 10:50:31 +08:00
zhayujie
74119d0372 fix: websocket version 2023-08-25 23:57:59 +08:00
zhayujie
4e162c73e5 fix: update websocket version 2023-08-25 23:10:47 +08:00
zhayujie
5ff753a492 feat: add global model check 2023-08-25 17:26:40 +08:00
zhayujie
89400630c0 fix: xunfei client bug 2023-08-25 16:55:32 +08:00
zhayujie
3899c0cfe3 Merge pull request #1371 from uezhenxiang2023/Peter
add ElevenLabs TTS to voice factory
2023-08-25 16:15:18 +08:00
zhayujie
a086f1989f feat: add xunfei spark bot 2023-08-25 16:06:55 +08:00
zhayujie
1171b04e93 fix: wenxin token discard bug 2023-08-25 12:24:16 +08:00
uezhenxiang2023
c55d81825a Merge branch 'zhayujie:master' into Peter 2023-08-25 12:12:06 +08:00
zhayujie
2dcd026e9f logs: add baidu reply log 2023-08-25 11:19:00 +08:00
zhayujie
cdf8609d24 Merge pull request #1360 from zyqfork/master
dockerfile fallback debian11,fix azure cognitiveservices speech error
2023-08-25 01:24:34 +08:00
zhayujie
36580c5f7f Merge pull request #1363 from iRedScarf/master
把温度值设置默认放进config.json
2023-08-25 01:24:02 +08:00
zhayujie
1cff2521f4 fix: add web.py and linkai base url 2023-08-22 11:09:01 +08:00
uezhenxiang2023
db4998a56b replace requests with elevenlabs for audio generation 2023-08-20 10:58:26 +08:00
uezhenxiang2023
acbd506568 add ElevenLabs TTS to voice factory 2023-08-19 11:20:47 +08:00
eks
0cf8e3be73 Merge branch 'zhayujie:master' into master 2023-08-16 16:54:34 +08:00
zhayujie
2473334dfc fix: channel send compatibility and add log 2023-08-14 23:09:51 +08:00
eks
1ff72d1d37 Merge branch 'zhayujie:master' into master 2023-08-11 13:50:11 +08:00
eks
241fad5524 Update config-template.json
把温度值默认放进config.json
2023-08-11 13:49:47 +08:00
zouyq
1b48cea50a dockerfile fallback debian11,fix azure cognitiveservices speech error
Python 3.10-slim based Debian 12, using Azure TextToVoice may result in an error. the Speech SDK does not currently support OpenSSL 3.0, which is the default version in Ubuntu 22.04 and Debian 12
2023-08-10 17:39:25 +08:00
zhayujie
88bf345b91 docs: update plugin README 2023-08-08 17:03:18 +08:00
zhayujie
ab4ff3d1a3 config: reduce the config of baidu-wenxin 2023-08-08 16:04:25 +08:00
zhayujie
3502e0d643 Merge pull request #1336 from kevin808/master
添加百度文心一言接口
2023-08-08 15:46:47 +08:00
zhayujie
995894d3aa Merge branch 'master' into master 2023-08-08 15:46:07 +08:00
zhayujie
4da8714124 Merge pull request #1358 from zhayujie/feat-1.3.5
feat: add midjourney variation and reset
2023-08-08 11:21:35 +08:00
zhayujie
6b247ae880 feat: add midjourney variation and reset 2023-08-07 19:14:09 +08:00
zhayujie
176941ea3b Merge pull request #1357 from zhayujie/feat-1.3.5
feat: add plugin instructions and fix some issues
2023-08-07 14:44:03 +08:00
zhayujie
5176b56d3b fix: global plugin read encoding 2023-08-07 14:42:24 +08:00
zhayujie
8abf18ab25 feat: add knowledge base and midjourney switch instruction 2023-08-06 17:57:07 +08:00
zhayujie
395edbd9f4 fix: only filter messages sent by the bot itself in private chat 2023-08-06 16:02:02 +08:00
zhayujie
2386eb8fc2 fix: unable to use plugin when group nickname is set 2023-08-06 15:44:48 +08:00
zhayujie
68208f82a0 docs: update README.md 2023-08-01 00:08:39 +08:00
zhayujie
ca916b7ce5 fix: default to fast mode 2023-07-31 21:40:50 +08:00
zhayujie
01e02934da Merge pull request #1334 from zyqfork/master
azure api add api-version https://learn.microsoft.com/zh-cn/azure/ai-serv…
2023-07-31 18:40:06 +08:00
zhayujie
c81a79f7b9 Merge pull request #1104 from mari1995/feat_my_msg
feat: 手机上回复消息,不触发机器人
2023-07-31 18:02:41 +08:00
zhayujie
1133648bf6 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2023-07-31 17:58:06 +08:00
zhayujie
e05bc541d7 Merge pull request #1346 from befantasy/patch-1
Update keyword.py 增加返回图片的功能
2023-07-31 17:53:46 +08:00
zhayujie
d689d20482 docs: update README.md 2023-07-31 17:52:05 +08:00
zhayujie
39dd99b272 Merge pull request #1343 from zhayujie/feat-1.3.4
feat: add midjourney and app manager plugin
2023-07-31 17:15:22 +08:00
zhayujie
cda21acb43 feat: use new linkai completion api 2023-07-31 16:11:33 +08:00
zhayujie
9bd7d09f20 fix: remove relax mode temporarily 2023-07-31 14:42:50 +08:00
zhayujie
b22994c2d2 fix: some image bug 2023-07-30 19:55:56 +08:00
zhayujie
e027286b6d fix: midjourney check task thread 2023-07-30 15:16:19 +08:00
befantasy
d6e16995e0 Update keyword.py 增加返回图片的功能
增加返回图片的功能。以http/https开头,且以.jpg/.jpeg/.png/.gif结尾的内容,识别为URL,自动以图片发送。
2023-07-30 14:40:07 +08:00
zhayujie
782bff3a51 fix: add debug log 2023-07-29 12:22:45 +08:00
zhayujie
de26dc0597 fix: fast mode and relax mode checkout 2023-07-28 18:50:21 +08:00
zhayujie
233b24ab0f feat: add global admin config 2023-07-28 16:33:41 +08:00
zhayujie
2f9e5b1219 feat: check app_code dynamically 2023-07-28 12:40:06 +08:00
zhayujie
dd36b8b150 config: add config template 2023-07-27 21:29:50 +08:00
zhayujie
f81ac31fe1 feat: add linkai plugin to support midjourney and distinguish app between groups 2023-07-27 21:21:36 +08:00
Kevin Li
24b63bc5bd Add Baidu access token validation 2023-07-25 11:11:02 +08:00
Kevin Li
1817a972c6 Add Baidu Wenxin Bot 2023-07-25 09:52:47 +08:00
zyqcn@live.com
74a253f521 azure api add api-version:https://learn.microsoft.com/zh-cn/azure/ai-services/openai/reference 2023-07-24 16:28:05 +08:00
zhayujie
41762a1c57 Merge pull request #1332 from zhayujie/feat-1.3.3
fix: reduce memory usage
2023-07-21 17:18:56 +08:00
zhayujie
a786fa4b75 fix: reduce the expiration time and avoid storing the original message text to decrease memory usage 2023-07-21 17:16:34 +08:00
zhayujie
e4c7602c0c docs: update README.md 2023-07-21 17:14:11 +08:00
zhayujie
e0d2e34980 Merge pull request #1328 from zhayujie/feat-1.3.3
feat: support global plugin config for docker env
2023-07-21 10:50:16 +08:00
zhayujie
9ef8e1be3f feat: move loading config method to base class 2023-07-20 16:08:19 +08:00
zhayujie
aae9b64833 fix: reduce unnecessary error traceback logs 2023-07-20 14:46:41 +08:00
zhayujie
4bab4299f2 fix: global plugin config read 2023-07-20 14:24:40 +08:00
zhayujie
954e55f4b4 feat: add plugin global config to support docker volumes 2023-07-20 11:36:02 +08:00
zhayujie
2361e3c28c docs: update README for railway cancelled free service 2023-07-19 18:23:59 +08:00
leeson
8224c2fc16 企业服务号的语音输出进行切割 2023-07-08 23:58:07 +08:00
zhayujie
8aac86f0a9 Merge pull request #1291 from 6vision/master
(tool)fix azure model
2023-07-05 01:44:06 +08:00
vision
6384e9310b plugin(tool): 更新0.4.6
1、temp fix summary tool not ending bug
2、兼容0613 gpt-3.5
3、add azure's model name: gpt-35-turbo
2023-07-05 01:06:53 +08:00
vision
7a9205dfba fix azure model
更新chatgpt_tool_hub至0.4.6,拉取最新代码。tool即可使用azure接口!
2023-07-05 01:01:46 +08:00
Jianglang
94b47a56f4 Merge pull request #1282 from haikerapples/master_haiker_timetask
内置 timetask 插件
2023-07-01 18:37:07 +08:00
zhayujie
709b5be634 fix: group voice config and azure model calc support 2023-07-01 13:17:08 +08:00
haikerwang
f970b2c168 内置 timetask 插件 2023-06-29 00:58:57 +08:00
zhayujie
973acb37ed docs: update README.md 2023-06-27 22:28:51 +08:00
zhayujie
1c9020a565 docs: update README.md 2023-06-26 23:52:32 +08:00
zhayujie
c5f1d0042c docs: update README.md 2023-06-26 20:11:35 +08:00
zhayujie
fa706e8b1d Merge pull request #1275 from zhayujie/feat-docker
chore: remove useless docker files
2023-06-26 14:16:18 +08:00
zhayujie
12c170f227 chore: remove useless docker files 2023-06-26 14:05:08 +08:00
zhayujie
db27dfe227 docs: modify docker deploy steps 2023-06-26 13:10:51 +08:00
zhayujie
2db4673392 chore: fixed openai version 2023-06-26 12:29:09 +08:00
zhayujie
38619db629 Merge pull request #1274 from zhayujie/feat-dockerhub
feat: modify docker-compose file to pull image from dockerhub
2023-06-26 12:00:57 +08:00
zhayujie
930fd436ea feat: modify docker-compose file to pull image from dockerhub 2023-06-26 11:58:55 +08:00
zhayujie
98b8ff2fc8 Merge pull request #1271 from zhayujie/feat-dockerhub
feat: publish to dockerhub in github CI simultaneously
2023-06-26 01:24:24 +08:00
zhayujie
d0662683f9 feat: publish to dockerhub in github CI simultaneously 2023-06-26 01:20:04 +08:00
zhayujie
957f2574a9 Merge pull request #1257 from 6vision/master
add reply_suffix
2023-06-17 16:50:11 +08:00
vision
109b362ebd Update config.py 2023-06-17 16:42:52 +08:00
vision
ff3fdfa738 add reply_suffix 2023-06-17 16:36:08 +08:00
vision
e2636ed54a add replay_suffix
增加自动回复后缀的可选配置参数
2023-06-17 15:53:49 +08:00
vision
dbe2f17e1a add reply_suffix
增加私聊和群聊回复后缀的可选配置
2023-06-17 15:46:03 +08:00
zhayujie
4dc535673f Merge pull request #1252 from 6vision/master
Update Tool README.md
2023-06-16 15:48:04 +08:00
vision
f414b6408e Update README.md 2023-06-16 15:08:57 +08:00
lanvent
3aa2e6a04d fix: caclucate tokens correctly for *0613 models 2023-06-16 00:51:29 +08:00
lanvent
1963ff273f chore(hello): change plugin logic 2023-06-14 13:40:20 +08:00
lanvent
bb737a71d5 feat: update counting tokens for new models 2023-06-14 13:36:07 +08:00
zhayujie
a582a46ce9 fix: call super init 2023-06-12 14:05:47 +08:00
zhayujie
abf80a3266 docs: update README 2023-06-12 13:52:49 +08:00
Jianglang
d768f5c66d Update README.md 2023-06-11 00:02:18 +08:00
lanvent
b25e843351 feat(link_ai_bot.py): add support for creating images using OpenAI's DALL-E API 2023-06-10 23:52:25 +08:00
lanvent
419a3e518e feat: make plugin compatible with LINKAI in most cases 2023-06-10 23:42:43 +08:00
lanvent
d1b867a7c0 feat: support scene without app code in linkai 2023-06-10 21:28:15 +08:00
lanvent
c34d70b3cb fix: add warning log when pysilk module is not installed 2023-06-10 11:22:12 +08:00
lanvent
a33df9312f fix: warning message when using azure model 2023-06-10 11:06:50 +08:00
Jianglang
ebf8db0b37 Merge pull request #1238 from chenzefeng09/fix_baidu_voice_init
fix: baidu voice init params type error
2023-06-10 00:48:41 +08:00
chenzefeng.09
e539ae3b69 fix: baidu voice init params type error 2023-06-09 18:54:58 +08:00
lanvent
4c5e8850aa fix: env vars type error (#1127) 2023-06-09 14:46:43 +08:00
zhayujie
94c0af3037 feat: support scen without app code 2023-06-08 23:57:59 +08:00
zhayujie
165182c68f config: remove the config temporarily and consider integrating it as a plugin 2023-06-08 20:58:59 +08:00
Jianglang
65b9542599 Merge pull request #1221 from Zhaoyi-Yan/patch-3
add \n after @nickname for group chat
2023-06-08 11:53:14 +08:00
Jianglang
d01d1f8830 Merge pull request #1220 from Zhaoyi-Yan/patch-2
Add azure_deployment_id to Readme for Azure chatgpt.
2023-06-08 11:48:44 +08:00
Jianglang
ad3e9f3d42 Update README.md 2023-06-08 11:44:17 +08:00
Jianglang
4589974095 Update README.md 2023-06-08 11:42:39 +08:00
Jianglang
ed4553ddf8 Update README.md 2023-06-08 11:42:12 +08:00
Zhaoyi-Yan
ff97ae73f1 add \n after @nickname for group chat 2023-06-06 15:16:57 +08:00
Zhaoyi-Yan
f96b4d2781 Add azure_deployment_id to Readme for Azure chatgpt. 2023-06-06 14:44:09 +08:00
zhayujie
ce32cfffdb docs: update README.md 2023-06-06 14:02:32 +08:00
zhayujie
f66df8531e Update README.md 2023-06-06 09:54:34 +08:00
zhayujie
dfe1c23e76 Merge pull request #1218 from zhayujie/feature-app-market
feat: no quota hint and add group qrcode
2023-06-05 23:55:25 +08:00
zhayujie
07fd81919f docs: udapte readme 2023-06-05 23:53:34 +08:00
zhayujie
210042bb81 feat: no quota hint and add group qrcode 2023-06-05 23:21:24 +08:00
lanvent
12dc7427e9 make railway happy 2023-06-02 22:15:20 +08:00
lanvent
b476085110 fix: custom GPT model bug 2023-05-30 23:42:06 +08:00
zhayujie
776cdaf63c Merge pull request #1168 from zhayujie/feature-app-market
fix: config name optimize
2023-05-29 16:36:38 +08:00
zhayujie
69b6855745 fix: comment modify 2023-05-29 15:55:48 +08:00
zhayujie
3590babd8b fix: config name optimize 2023-05-29 15:52:26 +08:00
zhayujie
c29d391c1d Merge pull request #1167 from zhayujie/feature-app-market
feature:  support online knowledge base
2023-05-29 15:41:12 +08:00
zhayujie
50e44dbb2a fix: session save 2023-05-28 22:12:36 +08:00
zhayujie
34277a3940 feat: add app market 2023-05-28 19:08:23 +08:00
lanvent
f1a00d58ca chore(Dockerfile.latest): comment out the sed command to replace apt source with tuna mirror
The sed command to replace the apt source with the tuna mirror has been commented out. This is because the command is not necessary for the current build and may cause issues in the future.
2023-05-17 22:24:25 +08:00
Jianglang
d1a5f17ae8 Merge pull request #1102 from goldfishh/master
plugin(tool): 更新0.4.4
2023-05-17 16:13:03 +08:00
SSMario
4dbc54fa15 Revert "feat: 增加eleventLabs"
This reverts commit 1d4ff796d7.
2023-05-16 12:00:05 +08:00
SSMario
1d4ff796d7 feat: 增加eleventLabs 2023-05-16 11:50:54 +08:00
SSMario
44cb54a9ea feat: 手机上回复消息,不触发机器人 2023-05-16 09:38:38 +08:00
goldfishh
6409f49609 plugin(tool): 更新0.4.4
1. 支持azure、api转发服务
2. 修复browser代理无前缀报错的问题
3. 优化core prompt
4. 修复系列issue提到的问题
2023-05-16 00:22:32 +08:00
Jianglang
9ee0ea88b5 Merge pull request #1089 from taoguoliang/master-fork
feat(命令): 添加set_gpt_model、set_gpt_model、set_gpt_model 几个命令的使用
2023-05-15 23:34:04 +08:00
Jianglang
a3819d8673 Merge pull request #1096 from lichengzhe/master
处理cloudflare Bad Gateway异常,自动重试。
2023-05-15 23:32:03 +08:00
lichengzhe
2d7dd71a3d Bad Gateway exception retry 2023-05-15 14:04:55 +08:00
lichengzhe
0e8195ae61 Bad Gateway exception retry 2023-05-15 13:55:14 +08:00
taoguoliang
3e92d07618 feat(命令): 添加set_gpt_model、set_gpt_model、set_gpt_model 几个命令的使用 2023-05-13 16:57:02 +08:00
Jianglang
e59597280d Merge pull request #1079 from 6vision/6vision-patch-1
Update README.md
2023-05-11 20:21:05 +08:00
vision
f2e3d69d8a Update README.md
新闻类工具整合后,工具名称变更了,调整一下位置,更能引起注意
2023-05-11 15:49:55 +08:00
lanvent
9d2cb75c84 fix(docker): chown /usr/local/lib in debian dockerfile 2023-05-10 23:12:43 +08:00
Jianglang
f971505c4a Update README.md 2023-05-09 23:29:03 +08:00
lanvent
2133c1d6af fix(Dockerfile): create /home/noroot directory and change ownership of it 2023-05-09 23:08:20 +08:00
Jianglang
0bf06ddfd3 Merge pull request #1046 from theLastWinner/master
fix(企业微信):补充缺失依赖textwrap
2023-05-08 17:33:46 +08:00
Jianglang
024a50d642 Merge pull request #1045 from wqh0109663/master
fix docker entrypoint
2023-05-08 17:33:22 +08:00
林督翔
e4eebd64d1 fix(企业微信):补充缺失依赖textwrap 2023-05-08 09:39:32 +08:00
wuqih
c9055989e9 fix 2023-05-08 09:09:46 +08:00
lanvent
4f1ed197ce fix: compatible with python 3.7 2023-05-07 23:36:35 +08:00
Jianglang
3e710aa2a1 Merge pull request #1032 from wqh0109663/master
修复docker入口错误
2023-05-06 17:16:06 +08:00
wuqih
b6226a45bb fix 2023-05-06 14:29:36 +08:00
lanvent
3001ba9266 fix: azure dalle generate image 2023-04-28 11:06:17 +08:00
lanvent
b0a401a1ed fix(azure_dalle): use openai.api_base 2023-04-28 10:53:30 +08:00
Jianglang
6b4dc37428 Update README.md 2023-04-28 01:24:26 +08:00
lanvent
8528c9b262 feat(tool.py): add new configuration options for think_depth, arxiv_summary, and morning_news_use_llm 2023-04-28 00:24:07 +08:00
lanvent
7222a5c2f4 feat: add VERSION constant 2023-04-28 00:13:13 +08:00
lanvent
59050001ef Update README.md 2023-04-28 00:10:57 +08:00
lanvent
2ba8f18724 feat: add railway method for wechatcomapp 2023-04-28 00:04:55 +08:00
lanvent
fb22e01b89 fix: send voice in wechatcomapp rightly 2023-04-27 23:04:24 +08:00
lanvent
76a81d5360 feat(wechatcomapp): add support for splitting long audio files 2023-04-27 22:47:50 +08:00
lanvent
3314b05648 feat: add support for azure dalle 2023-04-27 22:16:42 +08:00
lanvent
45b89218de fix: support set_openai_api_key for all channels 2023-04-27 20:43:12 +08:00
lanvent
beb7bda243 fix(docker): use debian.latest as latest image 2023-04-27 19:45:51 +08:00
lanvent
bef2896f50 add libavcodec-extra to Dockerfile 2023-04-27 15:09:24 +08:00
lanvent
9fea949b25 fix(azure_voice.py): log error details instead of cancellation details 2023-04-27 11:42:19 +08:00
lanvent
be258e5b05 fix: add more log in itchat 2023-04-27 11:23:28 +08:00
lanvent
008178d737 fix(login.py): add error message when retry count is exceeded 2023-04-27 11:03:08 +08:00
lanvent
527d5e1dbc fix(itchat): add error log when hot reload fails and log out before logging in normally 2023-04-27 02:46:53 +08:00
lanvent
9b47e2d6f9 fix: output itchat error msg rightly 2023-04-26 22:54:53 +08:00
lanvent
8781b1e976 fix: role,dungeon,godcmd support azure bot 2023-04-26 01:05:23 +08:00
Jianglang
38c653d8d8 Merge pull request #957 from goldfishh/master
plugin(tool): 更新0.4.2
2023-04-26 00:53:07 +08:00
lanvent
74e48bb137 Update README.md 2023-04-26 00:49:40 +08:00
goldfishh
c3aaa1f735 plugin(tool): 更新0.4.2 2023-04-26 00:48:54 +08:00
lanvent
bead2aa228 fix: a typo in template 2023-04-26 00:23:08 +08:00
Jianglang
dc52ab8aa9 Merge pull request #944 from zhayujie/wechatcom-app
添加企业微信应用号部署方式,支持插件,支持语音图片交互
2023-04-26 00:02:31 +08:00
lanvent
20b71f206b feat: add subscribe_msg option for wechatmp, wechatmp_service, and wechatcom_app channels 2023-04-26 00:01:04 +08:00
lanvent
73c87d5959 fix(wechatcomapp): split long text messages into multiple parts 2023-04-25 01:48:15 +08:00
lanvent
c6601aaeed fix: ensure get access_token thread-safe 2023-04-25 01:11:50 +08:00
lanvent
6e14fce1fe docs: update README.md for wechatcom_app 2023-04-25 00:44:16 +08:00
lanvent
be5a62f1b8 Merge Pull Request #936 into wechatcom-app 2023-04-24 22:41:42 +08:00
Jianglang
1fa8cefaea Add contact link in ISSUE_TEMPLATE 2023-04-24 16:38:19 +08:00
Jianglang
d7c251ac83 Update README.md 2023-04-24 02:21:44 +08:00
lanvent
d03229a183 Update ISSUE_TEMPLATE 2023-04-24 02:06:34 +08:00
lanvent
243482e829 Update ISSUE_TEMPLATE 2023-04-24 02:02:16 +08:00
lanvent
79d10be8a0 fix(wechatmp): add clear_quota_lock to ensure thread safe 2023-04-24 00:38:34 +08:00
JS00000
dca5c058e0 fix: Avoid the same filename under multithreading (#933) 2023-04-23 23:56:32 +08:00
lanvent
9163ce71fd fix: enable plugins for wechatcom_app 2023-04-23 16:51:16 +08:00
lanvent
2ec5374765 feat:modify wechatcom to wechatcom_app 2023-04-23 15:40:28 +08:00
lanvent
d6a4b35cd3 chore: add numpy version constraint 2023-04-23 15:07:38 +08:00
lanvent
8205d2552c fix(Dockerfile): add extra-index-url to pip install command 2023-04-23 15:01:10 +08:00
lanvent
9a99caeb9d chore: add fetch_translate method to Bridge class 2023-04-23 05:12:50 +08:00
lanvent
1e09bd0e76 feat(azure_voice): add language detection, support mulitple languages 2023-04-23 04:28:46 +08:00
lanvent
cae12eb187 feat: add baidu translate api 2023-04-23 03:54:16 +08:00
zhayujie
8bb36e0eb6 Merge pull request #926 from zhayujie/dev
docs: update README
2023-04-22 18:04:04 +08:00
zhayujie
d183204caa docs: update README.md 2023-04-22 18:02:12 +08:00
zhayujie
4a22ae6b61 docs: update README.md 2023-04-22 17:53:43 +08:00
lanvent
a52f54d988 docs(wechatmp): Update README.md 2023-04-22 12:15:56 +08:00
lanvent
618c94edb8 formatting: run precommit on all files 2023-04-22 12:01:29 +08:00
lanvent
eaf4e9174f style(linting): increase max-line-length to 176
The max-line-length configuration was increased to 176 in both .flake8 and pyproject.toml files to allow for longer lines of code.
2023-04-22 11:59:12 +08:00
lanvent
4af2c7f3d7 fix: escape regex pattern 2023-04-22 11:39:59 +08:00
lanvent
361f599df0 fix: escape regex patterns when matching name 2023-04-22 11:29:41 +08:00
Jianglang
ffe4ea5e4c Update README.md 2023-04-22 11:12:30 +08:00
Jianglang
9461e3e01a Merge pull request #912 from zhayujie/wechatmp
公众号功能优化:支持图片输入、消息加密模式、用户体验优化
2023-04-22 11:08:08 +08:00
lanvent
7c85c6f742 feat(wechatmp): add support for message encryption
- Add support for message encryption in WeChat MP channel.
- Add `wechatmp_aes_key` configuration item to `config.json`.
2023-04-22 02:33:51 +08:00
lanvent
b5df6faadf feat: verify server when receive message in wechatmp 2023-04-22 01:30:21 +08:00
lanvent
7cefe2d825 fix: split long text messages into multiple parts in wechatmp_service 2023-04-21 21:03:38 +08:00
lanvent
350633b69b Merge Purll Request #920 into wechatmp 2023-04-21 20:46:16 +08:00
JS00000
1cd6a71ce0 fix the bug of pytts in linux 2023-04-21 18:31:20 +08:00
JS00000
3a08b002a0 Merge remote-tracking branch 'origin/wechatmp' into wechatmp 2023-04-21 16:20:57 +08:00
lanvent
665001732b feat: add image compression
Add image compression feature to WechatComAppChannel to compress images larger than 10MB before uploading to WeChat server. The compression is done using the `compress_imgfile` function in `utils.py`. The `fsize` function is also added to `utils.py` to calculate the size of a file or buffer.
2023-04-21 15:29:59 +08:00
lanvent
cca49da730 fix: fix subscribe_msg 2023-04-21 13:49:51 +08:00
lanvent
f6d370ad29 fix: check if event is subscribe 2023-04-21 13:43:01 +08:00
lanvent
c9131b333b feat: add clear_quota_v2 method to clear API quota when it's used up 2023-04-21 13:41:21 +08:00
lanvent
e44161bf42 fix: voice_reply_voice not work 2023-04-21 03:28:31 +08:00
lanvent
a26189fb25 chore: remove passive_reply_message.py 2023-04-21 03:04:50 +08:00
lanvent
89dd8a1db6 refactor(wechatmp): use wechatpy to handle wechatmp messages
feat(wechatmp): add support for image and voice replies
2023-04-21 02:47:33 +08:00
JS00000
650e0b4ad4 wechatmp: adjust log 2023-04-21 02:16:13 +08:00
lanvent
c60f0517fb refactor(audio_convert.py): remove redundant functions 2023-04-20 23:22:08 +08:00
lanvent
0f8dc91a8b fix: add check for empty command and return error message if so 2023-04-20 23:13:07 +08:00
lanvent
b58feb5d8e Merge Pull Request #904 into master 2023-04-20 23:06:17 +08:00
JS00000
71c8043699 update README 2023-04-20 12:35:54 +08:00
JS00000
40264bc9cb fix: delete permanent media 2023-04-20 12:03:48 +08:00
JS00000
a7772316f9 feat: wechatmp channel support voice/image reply 2023-04-20 10:26:58 +08:00
JS00000
34209021c8 fix: pytts second round not work 2023-04-20 09:04:42 +08:00
lanvent
3e9e8d442a docs: add README.md for wechatcomapp channel 2023-04-20 08:43:17 +08:00
lanvent
d2bf90c6c7 refactor: rename WechatComChannel to WechatComAppChannel 2023-04-20 08:31:42 +08:00
JS00000
1e58c1ad2b fix: wechatmp channel now do not need client 2023-04-20 04:35:06 +08:00
JS00000
8cea022ec5 Merge branch 'master' into wechatmp 2023-04-20 03:41:37 +08:00
JS00000
f32f8aa08e Update readme, and make the structure more clear 2023-04-20 03:18:21 +08:00
lanvent
3ea8781381 feat(wechatcom): add support for sending image 2023-04-20 02:14:52 +08:00
lanvent
ab83dacb76 feat(wechatcom): add support for sending voice messages 2023-04-20 01:46:23 +08:00
lanvent
4cbf46fd4d feat: add support for wechatcom channel 2023-04-20 01:03:04 +08:00
goldfish菌
0a7d6e4577 plugin(tool) ver0.4.1 (#891)
* plugin(tool) fix bugs

* plugin(tool) tool插件更新至0.4.1 版本
2023-04-19 10:05:28 +08:00
JS00000
df4c1f0401 wechatmp: logic simplification 2023-04-19 01:56:25 +08:00
JS00000
9a86a67984 update readme 2023-04-19 01:54:20 +08:00
lanvent
a0cbe9c3e2 feat(azure_voice.py): improve error logging in voiceToText method 2023-04-19 00:55:22 +08:00
lanvent
a83e5a9b65 feat(azure_voice.py): improve error logging in textToVoice method 2023-04-19 00:51:52 +08:00
lanvent
de33911460 feat: add support for PATPAT context 2023-04-18 23:34:08 +08:00
lanvent
0be56e5b25 Merge branch Pull Request #882 into master 2023-04-18 14:26:16 +08:00
lanvent
abcbb34b1c fix(chat_gpt_bot.py, open_ai_bot.py): increase retry time to 20 seconds when encountering RateLimitError 2023-04-18 14:18:22 +08:00
林督翔
6a13dd04a3 feat(插件开发):新增关键字匹配插件 2023-04-18 13:57:20 +08:00
lanvent
f2e29f3f2e fix: banwords help 2023-04-18 11:43:34 +08:00
JS00000
68361cddd2 wechatmp_service: image and voice reply supported 2023-04-18 03:08:18 +08:00
lanvent
6404332adc feat: itchat support joingroup message 2023-04-18 02:21:41 +08:00
JS00000
e060b6fea2 Merge branch 'master' into wechatmp 2023-04-17 20:11:41 +08:00
lanvent
e8aae27ee9 fix: missing lib in banwords 2023-04-17 15:41:29 +08:00
lanvent
2f732e5493 fix: toolhub request_timeout should be str 2023-04-17 12:00:28 +08:00
lanvent
65f20ff2c1 Merge Pull Request #860 into master 2023-04-17 01:24:39 +08:00
lanvent
8f72e8c3e6 formatting code 2023-04-17 01:01:02 +08:00
lanvent
3b8972ce1f add pre-commit hook 2023-04-17 00:57:48 +08:00
李超
fc5d3e4e9c feat: Make the size parameter of the resulting picture configurable 2023-04-16 22:31:53 +08:00
李超
29fbf69945 feat: Add configuration items to support custom data directories and facilitate the storage of itchat.pkl 2023-04-16 22:31:53 +08:00
lanvent
583440b82b banwords: move WordsSearch to lib 2023-04-16 19:04:21 +08:00
lanvent
720de9d73f chore: strip content 2023-04-16 00:47:32 +08:00
JS00000
7fb4f72b84 update wechatmp README 2023-04-12 05:52:26 +08:00
JS00000
d4fc322101 Merge branch 'master' into wechatmp 2023-04-12 05:43:05 +08:00
JS00000
8fa3da9ca5 wechatmp: voice input support 2023-04-12 05:41:48 +08:00
JS00000
68ef5aa3ae ctrl+c exit 2023-04-12 05:35:31 +08:00
JS00000
15e6cf850b Merge branch 'master' into wechatmp 2023-04-10 18:57:01 +08:00
JS00000
f687b2b6f4 remove _success_callback 2023-04-09 18:32:09 +08:00
JS00000
8ee7a48151 fix: wechatmp's deadloop when reply is None 2023-04-09 18:00:34 +08:00
649 changed files with 97469 additions and 7999 deletions

View File

@@ -1,31 +0,0 @@
### 前置确认
1. 网络能够访问openai接口
2. python 已安装:版本在 3.7 ~ 3.10 之间
3. `git pull` 拉取最新代码
4. 执行`pip3 install -r requirements.txt`,检查依赖是否满足
5. 拓展功能请执行`pip3 install -r requirements-optional.txt`,检查依赖是否满足
6. 在已有 issue 中未搜索到类似问题
7. [FAQS](https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs) 中无类似问题
### 问题描述
> 简要说明、截图、复现步骤等,也可以是需求或想法
### 终端日志 (如有报错)
```
[在此处粘贴终端日志, 可在主目录下`run.log`文件中找到]
```
### 环境
- 操作系统类型 (Mac/Windows/Linux)
- Python版本 ( 执行 `python3 -V` )
- pip版本 ( 依赖问题此项必填,执行 `pip3 -V`)

46
.github/ISSUE_TEMPLATE/1.bug.yml vendored Normal file
View File

@@ -0,0 +1,46 @@
name: Bug report 🐛
description: Report a bug or unexpected behavior.
title: "[Bug] "
labels: ['status: needs check']
body:
- type: markdown
attributes:
value: |
> 💡 English is recommended so global developers can help. 推荐使用英文提交,谢谢 ❤️
- type: checkboxes
attributes:
label: Self check
options:
- label: I'm on the latest version and searched [existing issues](https://github.com/zhayujie/CowAgent/issues) (incl. closed) — no duplicate.
required: true
- type: textarea
attributes:
label: Environment
description: "Version (`cow status`), OS, Python version, install method, model & channel."
placeholder: |
Version: v1.2.0
OS: macOS / Linux / Windows / Docker
Python: 3.11
Install: installer / Docker / source
Model & channel: deepseek-v4-flash, web
validations:
required: true
- type: textarea
attributes:
label: What happened?
description: "Steps to reproduce, what you expected, and what happened instead. Screenshots welcome."
placeholder: |
1. ...
2. ...
Expected: ...
Actual: ...
validations:
required: true
- type: textarea
attributes:
label: Logs
description: "Relevant logs from `run.log` (set `\"debug\": true` for more detail). ⚠️ Redact your API keys."
render: shell
validations:
required: false

33
.github/ISSUE_TEMPLATE/2.feature.yml vendored Normal file
View File

@@ -0,0 +1,33 @@
name: Feature request 🚀
description: Suggest a new idea or improvement.
title: "[Feature] "
labels: ['status: needs check']
body:
- type: markdown
attributes:
value: |
> 💡 English is recommended so global developers can help. 推荐使用英文提交,谢谢 ❤️
- type: checkboxes
attributes:
label: Self check
options:
- label: I searched [existing issues](https://github.com/zhayujie/CowAgent/issues) (incl. closed) — no duplicate.
required: true
- type: textarea
attributes:
label: What's the problem?
description: "The pain point or what's not working for you right now."
validations:
required: true
- type: textarea
attributes:
label: What would you like?
description: "How you'd expect it to work. Examples, sketches, or links welcome."
validations:
required: false
- type: checkboxes
attributes:
label: Contribution
options:
- label: I'd be interested in helping implement this.
required: false

5
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,5 @@
blank_issues_enabled: true
contact_links:
- name: 📖 Documentation
url: https://docs.cowagent.ai
about: Setup guides, configuration, and FAQ.

21
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,21 @@
<!--
Thanks for your contribution! Please write this PR in English.
【中文开发者】请使用英文填写,感谢 ❤️
-->
## What does this PR do?
<!-- A short description of the change and why it's needed. -->
## Type of change
- [ ] Bug fix
- [ ] New feature
- [ ] Docs
- [ ] Refactor / chore
## Checklist
- [ ] I tested this change locally
- [ ] Code comments and docs are in English
- [ ] Linked related issue (if any): closes #

77
.github/workflows/deploy-image-arm.yml vendored Normal file
View File

@@ -0,0 +1,77 @@
# This workflow uses actions that are not certified by GitHub.
# They are provided by a third-party and are governed by
# separate terms of service, privacy policy, and support
# documentation.
# GitHub recommends pinning actions to a commit SHA.
# To get a newer version, you will need to update the SHA.
# You can also reference a tag or branch, but the action may change without warning.
name: Create and publish a Docker image
on:
push:
branches: ['master']
create:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
if: github.repository == 'zhayujie/CowAgent'
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v3
- name: Set up QEMU
uses: docker/setup-qemu-action@v1
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@v1
- name: Available platforms
run: echo ${{ steps.buildx.outputs.platforms }}
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: |
${{ env.REGISTRY }}/zhayujie/chatgpt-on-wechat
${{ env.REGISTRY }}/zhayujie/cowagent
tags: |
type=raw,value=latest-arm64,enable={{is_default_branch}}
type=ref,event=branch,suffix=-arm64
type=ref,event=tag,suffix=-arm64
- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
context: .
push: true
file: ./docker/Dockerfile.latest
platforms: linux/arm64
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- uses: actions/delete-package-versions@v4
with:
package-name: 'chatgpt-on-wechat'
package-type: 'container'
min-versions-to-keep: 10
delete-only-untagged-versions: 'true'
token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -16,9 +16,11 @@ on:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
DOCKERHUB_IMAGE: zhayujie/chatgpt-on-wechat
jobs:
build-and-push-image:
if: github.repository == 'zhayujie/CowAgent'
runs-on: ubuntu-latest
permissions:
contents: read
@@ -28,6 +30,12 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v3
- name: Login to Docker Hub
uses: docker/login-action@v2
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Log in to the Container registry
uses: docker/login-action@v2
with:
@@ -39,7 +47,15 @@ jobs:
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
images: |
zhayujie/chatgpt-on-wechat
zhayujie/cowagent
${{ env.REGISTRY }}/zhayujie/chatgpt-on-wechat
${{ env.REGISTRY }}/zhayujie/cowagent
tags: |
type=raw,value=latest,enable={{is_default_branch}}
type=ref,event=branch
type=ref,event=tag
- name: Build and push Docker image
uses: docker/build-push-action@v3
@@ -49,9 +65,9 @@ jobs:
file: ./docker/Dockerfile.latest
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- uses: actions/delete-package-versions@v4
with:
with:
package-name: 'chatgpt-on-wechat'
package-type: 'container'
min-versions-to-keep: 10

29
.gitignore vendored
View File

@@ -1,18 +1,23 @@
.DS_Store
.idea
.vscode
.wechaty/
.venv
.vs
__pycache__/
venv*
*.pyc
python
config.json
QR.png
nohup.out
tmp
plugins.json
itchat.pkl
*.log
logs/
workspace
config.yaml
user_datas.pkl
chatgpt_tool_hub/
plugins/**/
!plugins/bdunit
!plugins/dungeon
@@ -20,5 +25,23 @@ plugins/**/
!plugins/godcmd
!plugins/tool
!plugins/banwords
!plugins/banwords/**/
plugins/banwords/__pycache__
plugins/banwords/lib/__pycache__
!plugins/hello
!plugins/role
!plugins/role
!plugins/keyword
!plugins/linkai
!plugins/cow_cli
client_config.json
ref/
**/.dev.vars
.cursor/
local/
node_modules/
# cow cli
dist/
build/
*.egg-info/
.cow.pid

61
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,61 @@
# Contributing to CowAgent
Thanks for taking the time to contribute! 🎉 CowAgent is built by a global
community, and contributions of all sizes are welcome — from typo fixes to new
features.
## Language policy
To keep the project accessible to a global community, **please write issues,
pull requests, code comments, and commit messages in English.**
> 为方便全球开发者协作,请尽量使用**英文**提交 issue、PR、代码注释与
> commit message。不必担心英文不完美——表达清楚即可工具翻译也完全没问题。感谢理解 ❤️
## Reporting issues
Found a bug or have an idea? [Open an issue](https://github.com/zhayujie/CowAgent/issues/new/choose).
Before opening one, please search existing issues (including closed ones) to
avoid duplicates, and make sure you're on the latest version.
## Submitting a pull request
1. **Fork** the repo and create a branch from `master`
(e.g. `feat/web-search`, `fix/telegram-reconnect`).
2. Make your change. Keep it focused — one logical change per PR.
3. Follow the existing code style. Write comments and docstrings in English.
4. Run the app locally to confirm your change works.
5. Open a PR with a clear title and a short description of **what** and **why**.
We keep the bar friendly: clear, focused, and working is enough. Maintainers are
happy to help polish details during review.
### Commit & PR titles
Use a short, imperative summary. The [Conventional Commits](https://www.conventionalcommits.org/)
style is preferred but not required:
```
feat: add web search tool
fix: reconnect Telegram websocket on timeout
docs: clarify Docker setup
```
## Development setup
See the [Install from Source](https://docs.cowagent.ai/guide/manual-install)
guide. In short:
```bash
git clone https://github.com/zhayujie/CowAgent.git
cd CowAgent
pip install -r requirements.txt
pip install -e .
cow start
```
## Code of conduct
Be respectful and constructive. We want CowAgent to be a welcoming place for
everyone.

344
README.md
View File

@@ -1,221 +1,261 @@
# 简介
<p align="center"><img src="https://github.com/user-attachments/assets/eca9a9ec-8534-4615-9e0f-96c5ac1d10a3" alt="CowAgent" width="420" /></p>
> ChatGPT近期以强大的对话和信息整合能力风靡全网可以写代码、改论文、讲故事几乎无所不能这让人不禁有个大胆的想法能否用他的对话模型把我们的微信打造成一个智能机器人可以在与好友对话中给出意想不到的回应而且再也不用担心女朋友影响我们 ~~打游戏~~ 工作了。
<p align="center">
<a href="https://github.com/zhayujie/CowAgent/releases/latest"><img src="https://img.shields.io/github/v/release/zhayujie/CowAgent" alt="Latest release"></a>
<a href="https://github.com/zhayujie/CowAgent/blob/master/LICENSE"><img src="https://img.shields.io/github/license/zhayujie/CowAgent" alt="License: MIT"></a>
<a href="https://github.com/zhayujie/CowAgent"><img src="https://img.shields.io/github/stars/zhayujie/CowAgent?style=flat-square" alt="Stars"></a> <br/>
[English] | [<a href="docs/zh/README.md">中文</a>] | [<a href="docs/ja/README.md">日本語</a>]
</p>
**CowAgent** is an open-source super AI assistant that proactively plans tasks, controls your computer and external services, creates and runs Skills, and grows alongside you through a personal knowledge base and long-term memory — a reference implementation of Agent Harness engineering.
基于ChatGPT的微信聊天机器人通过 [ChatGPT](https://github.com/openai/openai-python) 接口生成对话内容,使用 [itchat](https://github.com/littlecodersh/ItChat) 实现微信消息的接收和自动回复。已实现的特性如下:
CowAgent is lightweight, easy to deploy, and built to extend. Plug in any major LLM provider and run it 24/7 on a personal computer or server, across the web and all major IM platforms.
- [x] **文本对话:** 接收私聊及群组中的微信消息使用ChatGPT生成回复内容完成自动回复
- [x] **规则定制化:** 支持私聊中按指定规则触发自动回复,支持对群组设置自动回复白名单
- [x] **图片生成:** 支持根据描述生成图片,支持图片修复
- [x] **上下文记忆**:支持多轮对话记忆,且为每个好友维护独立的上下会话
- [x] **语音识别:** 支持接收和处理语音消息,通过文字或语音回复
- [x] **插件化:** 支持个性化插件,提供角色扮演、文字冒险、与操作系统交互、访问网络数据等能力
<p align="center">
<a href="https://cowagent.ai/">🌐 Website</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/intro/index">📖 Docs</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/guide/quick-start">🚀 Quick Start</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 Skill Hub</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ Try Online</a>
</p>
> 目前支持微信和微信公众号部署,欢迎接入更多应用,参考 [Terminal代码](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/terminal/terminal_channel.py)实现接收和发送消息逻辑即可接入。 同时欢迎增加新的插件,参考 [插件说明文档](https://github.com/zhayujie/chatgpt-on-wechat/tree/master/plugins)。
<br/>
## 🌟 Highlights
**一键部署:**
| Capability | Description |
| :--- | :--- |
| [Planning](https://docs.cowagent.ai/intro/architecture) | Decomposes complex tasks and executes them step by step, looping over tools until the goal is reached |
| [Memory](https://docs.cowagent.ai/memory/index) | Three-tier architecture (context → daily → core), automatic Deep Dream distillation, hybrid keyword + vector retrieval |
| [Knowledge](https://docs.cowagent.ai/knowledge/index) | Auto-curates structured knowledge into a Markdown wiki, builds an evolving knowledge graph with visual browsing |
| [Skills](https://docs.cowagent.ai/skills/index) | One-click install from [Skill Hub](https://skills.cowagent.ai/), GitHub, ClawHub; or create custom skills via natural-language conversation |
| [Tools](https://docs.cowagent.ai/tools/index) | Built-in file I/O, terminal, browser, scheduler, memory retrieval, web search, and 10+ more tools — with native MCP integration |
| [Channels](https://docs.cowagent.ai/channels/index) | Integrates with Web, WeChat, Feishu, DingTalk, WeCom, QQ, Official Accounts, Telegram, and Slack |
| Multimodal | First-class support for text, images, voice, and files — recognition, generation, and delivery |
| [Models](https://docs.cowagent.ai/models/index) | Claude, GPT, Gemini, DeepSeek, Qwen, GLM, Kimi, MiniMax, Doubao, and more — swap providers from the Web console with one click |
| [Deploy](https://docs.cowagent.ai/guide/quick-start) | One-line installer, unified Web console, multiple deployment modes (local, Docker, server) |
[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/qApznZ?referralCode=RC3znh)
<br/>
## 🏗️ Architecture
# 更新日志
<img src="https://cdn.jsdelivr.net/gh/zhayujie/cowagent-assets@main/architecture/en/architecture.jpg" alt="CowAgent Architecture" width="750"/>
>**2023.04.05** 支持微信个人号部署,兼容角色扮演等预设插件,[使用文档](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/wechatmp/README.md)。(contributed by [@JS00000](https://github.com/JS00000) in [#686](https://github.com/zhayujie/chatgpt-on-wechat/pull/686))
CowAgent is a complete **Agent Harness**: messages flow in through **Channels**; the **Agent Core** plans and reasons over memory, knowledge, and the available tools and skills; **Models** generate the response, which is sent back through the originating channel. Every layer is decoupled and independently extensible.
>**2023.04.05** 增加能让ChatGPT使用工具的`tool`插件,[使用文档](https://github.com/goldfishh/chatgpt-on-wechat/blob/master/plugins/tool/README.md)。工具相关issue可反馈至[chatgpt-tool-hub](https://github.com/goldfishh/chatgpt-tool-hub)。(contributed by [@goldfishh](https://github.com/goldfishh) in [#663](https://github.com/zhayujie/chatgpt-on-wechat/pull/663))
Read more in [Architecture](https://docs.cowagent.ai/intro/architecture).
>**2023.03.25** 支持插件化开发,目前已实现 多角色切换、文字冒险游戏、管理员指令、Stable Diffusion等插件使用参考 [#578](https://github.com/zhayujie/chatgpt-on-wechat/issues/578)。(contributed by [@lanvent](https://github.com/lanvent) in [#565](https://github.com/zhayujie/chatgpt-on-wechat/pull/565))
<br/>
>**2023.03.09** 基于 `whisper API`(后续已接入更多的语音`API`服务) 实现对微信语音消息的解析和回复,添加配置项 `"speech_recognition":true` 即可启用,使用参考 [#415](https://github.com/zhayujie/chatgpt-on-wechat/issues/415)。(contributed by [wanggang1987](https://github.com/wanggang1987) in [#385](https://github.com/zhayujie/chatgpt-on-wechat/pull/385))
## 🚀 Quick Start
>**2023.03.02** 接入[ChatGPT API](https://platform.openai.com/docs/guides/chat) (gpt-3.5-turbo)默认使用该模型进行对话需升级openai依赖 (`pip3 install --upgrade openai`)。网络问题参考 [#351](https://github.com/zhayujie/chatgpt-on-wechat/issues/351)
A one-line installer takes care of dependencies, configuration, and startup:
>**2023.02.09** 扫码登录存在封号风险,请谨慎使用,参考[#58](https://github.com/AutumnWhj/ChatGPT-wechat-bot/issues/158)
>**2023.02.05** 在openai官方接口方案中 (GPT-3模型) 实现上下文对话
>**2022.12.18** 支持根据描述生成图片并发送openai版本需大于0.25.0
>**2022.12.17** 原来的方案是从 [ChatGPT页面](https://chat.openai.com/chat) 获取session_token使用 [revChatGPT](https://github.com/acheong08/ChatGPT) 直接访问web接口但随着ChatGPT接入Cloudflare人机验证这一方案难以在服务器顺利运行。 所以目前使用的方案是调用 OpenAI 官方提供的 [API](https://beta.openai.com/docs/api-reference/introduction)回复质量上基本接近于ChatGPT的内容劣势是暂不支持有上下文记忆的对话优势是稳定性和响应速度较好。
# 使用效果
### 个人聊天
![single-chat-sample.jpg](docs/images/single-chat-sample.jpg)
### 群组聊天
![group-chat-sample.jpg](docs/images/group-chat-sample.jpg)
### 图片生成
![group-chat-sample.jpg](docs/images/image-create-sample.jpg)
# 快速开始
## 准备
### 1. OpenAI账号注册
前往 [OpenAI注册页面](https://beta.openai.com/signup) 创建账号,参考这篇 [教程](https://www.pythonthree.com/register-openai-chatgpt/) 可以通过虚拟手机号来接收验证码。创建完账号则前往 [API管理页面](https://beta.openai.com/account/api-keys) 创建一个 API Key 并保存下来后面需要在项目中配置这个key。
> 项目中使用的对话模型是 davinci计费方式是约每 750 字 (包含请求和回复) 消耗 $0.02,图片生成是每张消耗 $0.016,账号创建有免费的 $18 额度 (更新3.25: 最新注册的已经无免费额度了),使用完可以更换邮箱重新注册。
### 2.运行环境
支持 Linux、MacOS、Windows 系统可在Linux服务器上长期运行),同时需安装 `Python`
> 建议Python版本在 3.7.1~3.9.X 之间推荐3.8版本3.10及以上版本在 MacOS 可用,其他系统上不确定能否正常运行。
**(1) 克隆项目代码:**
**Linux / macOS:**
```bash
git clone https://github.com/zhayujie/chatgpt-on-wechat
cd chatgpt-on-wechat/
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
**(2) 安装核心依赖 (必选)**
> 能够使用`itchat`创建机器人,并具有文字交流功能所需的最小依赖集合。
```bash
pip3 install -r requirements.txt
**Windows (PowerShell):**
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
**(3) 拓展依赖 (可选,建议安装)**
**Docker:**
```bash
pip3 install -r requirements-optional.txt
curl -O https://cdn.link-ai.tech/code/cow/docker-compose.yml
docker compose up -d
```
> 如果某项依赖安装失败请注释掉对应的行再继续。
其中`tiktoken`要求`python`版本在3.8以上它用于精确计算会话使用的tokens数量强烈建议安装。
Once started, open `http://localhost:9899` to access the **Web console** — your one-stop hub to chat with the Agent, configure models, connect channels, and install skills.
> Deploying on a server? Set `web_host` to `0.0.0.0` in `config.json` to make the console reachable from outside, and set `web_password` to protect it. Don't forget to open port `9899` in your firewall or security group.
使用`google``baidu`语音识别需安装`ffmpeg`
> 📖 Detailed guides: [Quick Start](https://docs.cowagent.ai/guide/quick-start) · [Install from Source](https://docs.cowagent.ai/guide/manual-install) · [Upgrade](https://docs.cowagent.ai/guide/upgrade)
默认的`openai`语音识别不需要安装`ffmpeg`
参考[#415](https://github.com/zhayujie/chatgpt-on-wechat/issues/415)
使用`azure`语音功能需安装依赖(列在`requirements-optional.txt`内,但为便于`railway`部署已注释):
After installation, manage the service with the [cow CLI](https://docs.cowagent.ai/cli/index):
```bash
pip3 install azure-cognitiveservices-speech
cow start | stop | restart # service control
cow status | logs # status and logs
cow update # pull latest code and restart
cow skill install <name> # install a skill
cow install-browser # install browser automation
```
> 目前默认发布的镜像和`railway`部署,都基于`apline`,无法安装`azure`的依赖。若有需求请自行基于[`debian`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/docker/Dockerfile.debian.latest)打包。
参考[文档](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/quickstarts/setup-platform?pivots=programming-language-python&tabs=linux%2Cubuntu%2Cdotnet%2Cjre%2Cmaven%2Cnodejs%2Cmac%2Cpypi)
<br/>
## 配置
## 🤖 Models
配置文件的模板在根目录的`config-template.json`中,需复制该模板创建最终生效的 `config.json` 文件:
CowAgent supports all mainstream LLM providers. **Chat, vision, image generation, ASR/TTS, and embeddings** can each be routed to a different vendor. Providers are configured directly in the Web console — no manual file editing required.
| Provider | Featured Models | Chat | Vision | Image Gen | ASR | TTS | Embedding |
| --- | --- | :-: | :-: | :-: | :-: | :-: | :-: |
| [Claude](https://docs.cowagent.ai/models/claude) | claude-opus-4-8 | ✅ | ✅ | | | | |
| [OpenAI](https://docs.cowagent.ai/models/openai) | gpt-5.5, o-series | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [Gemini](https://docs.cowagent.ai/models/gemini) | gemini-3.5-flash | ✅ | ✅ | ✅ | | | |
| [DeepSeek](https://docs.cowagent.ai/models/deepseek) | deepseek-v4-flash / pro | ✅ | | | | | |
| [Qwen](https://docs.cowagent.ai/models/qwen) | qwen3.7-max | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [GLM](https://docs.cowagent.ai/models/glm) | glm-5.1, glm-5v-turbo | ✅ | ✅ | | ✅ | | ✅ |
| [Doubao](https://docs.cowagent.ai/models/doubao) | doubao-seed-2.0 series | ✅ | ✅ | ✅ | | | ✅ |
| [Kimi](https://docs.cowagent.ai/models/kimi) | kimi-k2.6 | ✅ | ✅ | | | | |
| [MiniMax](https://docs.cowagent.ai/models/minimax) | MiniMax-M2.7 | ✅ | ✅ | ✅ | | ✅ | |
| [ERNIE](https://docs.cowagent.ai/models/qianfan) | ernie-5.1 | ✅ | ✅ | | | | |
| [MiMo](https://docs.cowagent.ai/models/mimo) | mimo-v2.5 / pro | ✅ | ✅ | | | ✅ | |
| [LinkAI](https://docs.cowagent.ai/models/linkai) | One key for 100+ models | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| [Custom](https://docs.cowagent.ai/models/custom) | Local models / third-party proxy | ✅ | | | | | |
> For details on each provider, see the [Models overview](https://docs.cowagent.ai/models/index).
<br/>
## 💬 Channels
A single Agent instance can serve multiple channels in parallel. Most channels can be onboarded right from the Web console.
| Channel | Text | Image | File | Voice | Group |
| --- | :-: | :-: | :-: | :-: | :-: |
| [Web Console](https://docs.cowagent.ai/channels/web) (default) | ✅ | ✅ | ✅ | ✅ | |
| [Telegram](https://docs.cowagent.ai/channels/telegram) | ✅ | ✅ | ✅ | ✅ | ✅ |
| [Slack](https://docs.cowagent.ai/channels/slack) | ✅ | ✅ | ✅ | | ✅ |
| [Discord](https://docs.cowagent.ai/channels/discord) | ✅ | ✅ | ✅ | | ✅ |
| [WeChat](https://docs.cowagent.ai/channels/weixin) | ✅ | ✅ | ✅ | ✅ | |
| [Feishu / Lark](https://docs.cowagent.ai/channels/feishu) | ✅ | ✅ | ✅ | ✅ | ✅ |
| [DingTalk](https://docs.cowagent.ai/channels/dingtalk) | ✅ | ✅ | ✅ | ✅ | ✅ |
| [WeCom Bot](https://docs.cowagent.ai/channels/wecom-bot) | ✅ | ✅ | ✅ | ✅ | ✅ |
| [QQ](https://docs.cowagent.ai/channels/qq) | ✅ | ✅ | ✅ | | ✅ |
| [WeCom App](https://docs.cowagent.ai/channels/wecom) | ✅ | ✅ | ✅ | ✅ | |
| [WeChat Customer Service](https://docs.cowagent.ai/channels/wechat-kf) | ✅ | ✅ | ✅ | ✅ | |
| [WeChat Official Account](https://docs.cowagent.ai/channels/wechatmp) | ✅ | ✅ | | ✅ | |
> See the [Channels overview](https://docs.cowagent.ai/channels/index) for setup details.
<img src="https://cdn.jsdelivr.net/gh/zhayujie/cowagent-assets@main/screenshots/en/web-console-chat.png" alt="CowAgent Web Console" width="800"/>
*The Web console is the default channel and the unified entry point to configure models, channels, skills, memory, and more.*
<br/>
## 🧠 Memory & Knowledge Base
**Long-term memory** uses a three-tier architecture: conversation context (short-term) → daily memory (mid-term) → MEMORY.md (long-term). A nightly **Deep Dream** pass distills scattered memories into refined long-term entries and a narrative journal. See [Long-term Memory](https://docs.cowagent.ai/memory/index) · [Deep Dream](https://docs.cowagent.ai/memory/deep-dream).
**Personal knowledge base** complements the time-ordered memory by organizing structured knowledge **by topic**. The Agent automatically curates valuable information from conversations, maintains cross-references and indexes, and the Web console offers an interactive knowledge-graph view. See [Personal Knowledge Base](https://docs.cowagent.ai/knowledge/index).
<table>
<tr>
<td width="50%">
<img src="https://cdn.jsdelivr.net/gh/zhayujie/cowagent-assets@main/screenshots/en/web-console-memory.png" alt="Long-term Memory" />
<p align="center"><em>Long-term Memory · Three-tier architecture + Deep Dream</em></p>
</td>
<td width="50%">
<img src="https://cdn.jsdelivr.net/gh/zhayujie/cowagent-assets@main/screenshots/en/web-console-knowledge.png" alt="Personal Knowledge Base" />
<p align="center"><em>Knowledge Base · Auto-curated Markdown wiki</em></p>
</td>
</tr>
</table>
<br/>
## 🔧 Tools & Skills
**Tools** are atomic capabilities the Agent uses to interact with system resources. **Skills** are higher-level workflows defined by a manifest file that compose multiple tools to accomplish complex tasks.
### Tool System
**Built-in tools** cover file I/O (`read` / `write` / `edit` / `ls`), terminal (`bash`), file sending (`send`), memory retrieval (`memory`), environment variables (`env_config`), web fetching (`web_fetch`), scheduling (`scheduler`), web search (`web_search`), vision (`vision`), and browser automation (`browser`).
**MCP protocol** integrates the open ecosystem of [Model Context Protocol](https://modelcontextprotocol.io) servers. A single `mcp.json` is enough — supports stdio / SSE transports, hot reload, and zero-code integration.
Learn more: [Tools overview](https://docs.cowagent.ai/tools/index) · [MCP integration](https://docs.cowagent.ai/tools/mcp).
### Skills System
- **[Skill Hub](https://skills.cowagent.ai/)** — open skill marketplace: browse, search, install in one click
- **GitHub / ClawHub / URL and more** — install skills from any source
- **Conversational authoring** — generate custom skills through dialogue with `skill-creator`; turn any workflow or third-party API into a reusable skill
```bash
cp config-template.json config.json
/skill list # list installed skills
/skill search <keyword> # search the marketplace
/skill install <name> # one-click install
```
然后在`config.json`中填入配置,以下是对默认配置的说明,可根据需要进行自定义修改:
Learn more: [Skills overview](https://docs.cowagent.ai/skills/index) · [Creating Skills](https://docs.cowagent.ai/skills/create).
```bash
# config.json文件内容示例
{
"open_ai_api_key": "YOUR API KEY", # 填入上面创建的 OpenAI API KEY
"model": "gpt-3.5-turbo", # 模型名称。当use_azure_chatgpt为true时其名称为Azure上model deployment名称
"proxy": "127.0.0.1:7890", # 代理客户端的ip和端口
"single_chat_prefix": ["bot", "@bot"], # 私聊时文本需要包含该前缀才能触发机器人回复
"single_chat_reply_prefix": "[bot] ", # 私聊时自动回复的前缀,用于区分真人
"group_chat_prefix": ["@bot"], # 群聊时包含该前缀则会触发机器人回复
"group_name_white_list": ["ChatGPT测试群", "ChatGPT测试群2"], # 开启自动回复的群名称列表
"group_chat_in_one_session": ["ChatGPT测试群"], # 支持会话上下文共享的群名称
"image_create_prefix": ["画", "看", "找"], # 开启图片回复的前缀
"conversation_max_tokens": 1000, # 支持上下文记忆的最多字符数
"speech_recognition": false, # 是否开启语音识别
"group_speech_recognition": false, # 是否开启群组语音识别
"use_azure_chatgpt": false, # 是否使用Azure ChatGPT service代替openai ChatGPT service. 当设置为true时需要设置 open_ai_api_base如 https://xxx.openai.azure.com/
"character_desc": "你是ChatGPT, 一个由OpenAI训练的大型语言模型, 你旨在回答并解决人们的任何问题,并且可以使用多种语言与人交流。", # 人格描述,
}
```
**配置说明:**
<br/>
**1.个人聊天**
## 🏷 Changelog
+ 个人聊天中,需要以 "bot"或"@bot" 为开头的内容触发机器人,对应配置项 `single_chat_prefix` (如果不需要以前缀触发可以填写 `"single_chat_prefix": [""]`)
+ 机器人回复的内容会以 "[bot] " 作为前缀, 以区分真人,对应的配置项为 `single_chat_reply_prefix` (如果不需要前缀可以填写 `"single_chat_reply_prefix": ""`)
> **2026.06.01:** [v2.1.0](https://github.com/zhayujie/CowAgent/releases/tag/2.1.0) — Internationalization, new channels (Telegram, Discord, Slack, WeChat Customer Service), CLI interaction upgrades, streamlined one-line install, MCP Streamable HTTP support, new models (claude-opus-4-8, MiMo).
**2.群组聊天**
> **2026.05.22:** [v2.0.9](https://github.com/zhayujie/CowAgent/releases/tag/2.0.9) — Model management, MCP protocol support, persistent browser sessions, new models (gpt-5.5, gemini-3.5-flash, qwen3.7-max), deployment hardening.
+ 群组聊天中,群名称需配置在 `group_name_white_list ` 中才能开启群聊自动回复。如果想对所有群聊生效,可以直接填写 `"group_name_white_list": ["ALL_GROUP"]`
+ 默认只要被人 @ 就会触发机器人自动回复;另外群聊天中只要检测到以 "@bot" 开头的内容,同样会自动回复(方便自己触发),这对应配置项 `group_chat_prefix`
+ 可选配置: `group_name_keyword_white_list`配置项支持模糊匹配群名称,`group_chat_keyword`配置项则支持模糊匹配群消息内容用法与上述两个配置项相同。Contributed by [evolay](https://github.com/evolay))
+ `group_chat_in_one_session`:使群聊共享一个会话上下文,配置 `["ALL_GROUP"]` 则作用于所有群聊
> **2026.05.06:** [v2.0.8](https://github.com/zhayujie/CowAgent/releases/tag/2.0.8) — Feishu channel overhaul (voice, streaming, QR onboarding), DeepSeek V4 and Baidu Qianfan support, scheduler tool upgrades.
**3.语音识别**
> **2026.04.22:** [v2.0.7](https://github.com/zhayujie/CowAgent/releases/tag/2.0.7) — Built-in image generation (GPT Image 2, Nano Banana), new models (Kimi K2.6, Claude Opus 4.7, GLM 5.1), memory and knowledge enhancements.
+ 添加 `"speech_recognition": true` 将开启语音识别默认使用openai的whisper模型识别为文字同时以文字回复该参数仅支持私聊 (注意由于语音消息无法匹配前缀,一旦开启将对所有语音自动回复,支持语音触发画图)
+ 添加 `"group_speech_recognition": true` 将开启群组语音识别默认使用openai的whisper模型识别为文字同时以文字回复参数仅支持群聊 (会匹配group_chat_prefix和group_chat_keyword, 支持语音触发画图)
+ 添加 `"voice_reply_voice": true` 将开启语音回复语音同时作用于私聊和群聊但是需要配置对应语音合成平台的key由于itchat协议的限制只能发送语音mp3文件若使用wechaty则回复的是微信语音。
> **2026.04.14:** [v2.0.6](https://github.com/zhayujie/CowAgent/releases/tag/2.0.6) — Knowledge base, Deep Dream memory distillation, smart context compression, multi-session Web console.
**4.其他配置**
> **2026.04.01:** [v2.0.5](https://github.com/zhayujie/CowAgent/releases/tag/2.0.5) — Cow CLI, Skill Hub open source, browser tool, WeCom Bot QR onboarding.
+ `model`: 模型名称,目前支持 `gpt-3.5-turbo`, `text-davinci-003`, `gpt-4`, `gpt-4-32k` (其中gpt-4 api暂未开放)
+ `temperature`,`frequency_penalty`,`presence_penalty`: Chat API接口参数详情参考[OpenAI官方文档。](https://platform.openai.com/docs/api-reference/chat)
+ `proxy`:由于目前 `openai` 接口国内无法访问,需配置代理客户端的地址,详情参考 [#351](https://github.com/zhayujie/chatgpt-on-wechat/issues/351)
+ 对于图像生成,在满足个人或群组触发条件外,还需要额外的关键词前缀来触发,对应配置 `image_create_prefix `
+ 关于OpenAI对话及图片接口的参数配置内容自由度、回复字数限制、图片大小等可以参考 [对话接口](https://beta.openai.com/docs/api-reference/completions) 和 [图像接口](https://beta.openai.com/docs/api-reference/completions) 文档直接在 [代码](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/bot/openai/open_ai_bot.py) `bot/openai/open_ai_bot.py` 中进行调整。
+ `conversation_max_tokens`:表示能够记忆的上下文最大字数(一问一答为一组对话,如果累积的对话字数超出限制,就会优先移除最早的一组对话)
+ `rate_limit_chatgpt``rate_limit_dalle`:每分钟最高问答速率、画图速率,超速后排队按序处理。
+ `clear_memory_commands`: 对话内指令,主动清空前文记忆,字符串数组可自定义指令别名。
+ `hot_reload`: 程序退出后,暂存微信扫码状态,默认关闭。
+ `character_desc` 配置中保存着你对机器人说的一段话,他会记住这段话并作为他的设定,你可以为他定制任何人格 (关于会话上下文的更多内容参考该 [issue](https://github.com/zhayujie/chatgpt-on-wechat/issues/43))
> **2026.02.03:** [v2.0.0](https://github.com/zhayujie/CowAgent/releases/tag/2.0.0) — Major upgrade to a super Agent assistant with multi-step task planning, long-term memory, and the Skills framework.
**所有可选的配置项均在该[文件](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/config.py)中列出。**
Full history: [Release Notes](https://docs.cowagent.ai/releases/overview)
## 运行
<br/>
### 1.本地运行
## 🤝 Community & Support
如果是开发机 **本地运行**,直接在项目根目录下执行:
[File an issue](https://github.com/zhayujie/CowAgent/issues) on GitHub, or scan the QR code below to join our WeChat community:
```bash
python3 app.py
```
终端输出二维码后,使用微信进行扫码,当输出 "Start auto replying" 时表示自动回复程序已经成功运行了(注意:用于登录的微信需要在支付处已完成实名认证)。扫码登录后你的账号就成为机器人了,可以在微信手机端通过配置的关键词触发自动回复 (任意好友发送消息给你,或是自己发消息给好友),参考[#142](https://github.com/zhayujie/chatgpt-on-wechat/issues/142)。
<img width="130" src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/open-community.png">
<br/>
### 2.服务器部署
## 🔗 Related Projects
使用nohup命令在后台运行程序
- **[Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub)** — open skill marketplace for AI Agents; works with CowAgent, OpenClaw, Claude Code, and more
- **[bot-on-anything](https://github.com/zhayujie/bot-on-anything)** — lightweight LLM application framework with integrations for Slack, Telegram, Discord, Gmail, and more
- **[AgentMesh](https://github.com/MinimalFuture/AgentMesh)** — open-source multi-agent framework for solving complex problems through team collaboration
```bash
touch nohup.out # 首次运行需要新建日志文件
nohup python3 app.py & tail -f nohup.out # 在后台运行程序并通过日志输出二维码
```
扫码登录后程序即可运行于服务器后台,此时可通过 `ctrl+c` 关闭日志,不会影响后台程序的运行。使用 `ps -ef | grep app.py | grep -v grep` 命令可查看运行于后台的进程,如果想要重新启动程序可以先 `kill` 掉对应的进程。日志关闭后如果想要再次打开只需输入 `tail -f nohup.out`。此外,`scripts` 目录下有一键运行、关闭程序的脚本供使用。
<br/>
> **多账号支持:** 将项目复制多份,分别启动程序,用不同账号扫码登录即可实现同时运行。
## 🏢 Enterprise Services
> **特殊指令:** 用户向机器人发送 **#reset** 即可清空该用户的上下文记忆。
[**LinkAI**](https://link-ai.tech/) is an all-in-one AI Agent platform for enterprises and developers, offering managed hosting and enterprise-grade support for CowAgent:
- **🚀 Zero-deployment hosted runtime** — spin up a [CowAgent online assistant](https://link-ai.tech/cowagent/create) in under a minute, no server required
- **🧠 Agent infrastructure** — unified access to LLMs, knowledge bases, databases, skills, and workflows; plug-and-play building blocks that extend what CowAgent can do
- **🏢 Team & enterprise features** — workspaces, role-based access, audit logs, and private deployment for production use cases
### 3.Docker部署
For enterprise inquiries: sales@simple-future.tech or [scan the QR code](https://cdn.link-ai.tech/consultant.jpg) to reach our team on WeChat.
参考文档 [Docker部署](https://github.com/limccn/chatgpt-on-wechat/wiki/Docker%E9%83%A8%E7%BD%B2) (Contributed by [limccn](https://github.com/limccn))。
<br/>
### 4. Railway部署(✅推荐)
> Railway每月提供5刀和最多500小时的免费额度。
1. 进入 [Railway](https://railway.app/template/qApznZ?referralCode=RC3znh)。
2. 点击 `Deploy Now` 按钮。
3. 设置环境变量来重载程序运行的参数,例如`open_ai_api_key`, `character_desc`
## 🛠️ Development & Contributing
## 常见问题
Contributions are welcome — add a new channel by following the [Feishu channel reference](https://github.com/zhayujie/CowAgent/blob/master/channel/feishu/feishu_channel.py), or contribute new skills to [Skill Hub](https://skills.cowagent.ai/submit).
FAQs <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
⭐ Star the project to follow updates, and feel free to open PRs and Issues.
## 🌟 Contributors
## 联系
![cow contributors](https://contrib.rocks/image?repo=zhayujie/CowAgent&max=1000)
欢迎提交PR、Issues以及Star支持一下。程序运行遇到问题优先查看 [常见问题列表](https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs) ,其次前往 [Issues](https://github.com/zhayujie/chatgpt-on-wechat/issues) 中搜索。如果你想了解更多项目细节并与开发者们交流更多关于AI技术的实践欢迎加入星球:
<br/>
<a href="https://public.zsxq.com/groups/88885848842852.html"><img width="360" src="./docs/images/planet.jpg"></a>
## ⚠️ Disclaimer
1. This project is licensed under the [MIT License](/LICENSE) and is intended for technical research and learning. You are responsible for complying with applicable laws and regulations in your jurisdiction; the maintainers assume no liability for any consequences arising from use of this project.
2. **Cost & safety:** Agent mode consumes substantially more tokens than regular chat — pick models that balance quality and cost. The Agent has access to your local operating system, so only deploy it in trusted environments.
3. CowAgent is a pure open-source project and does not participate in, authorize, or issue any cryptocurrency.
<br/>
## 📌 Project Renaming Notice
This project was previously named `chatgpt-on-wechat` and is now officially **CowAgent**. The old GitHub URL redirects automatically; existing users may optionally run `git remote set-url origin https://github.com/zhayujie/CowAgent.git` to update the local remote.

3
agent/chat/__init__.py Normal file
View File

@@ -0,0 +1,3 @@
from agent.chat.service import ChatService
__all__ = ["ChatService"]

290
agent/chat/service.py Normal file
View File

@@ -0,0 +1,290 @@
"""
ChatService - Wraps the Agent stream execution to produce CHAT protocol chunks.
Translates agent events (message_update, message_end, tool_execution_end, etc.)
into the CHAT socket protocol format (content chunks with segment_id, tool_calls chunks).
"""
import time
from typing import Callable, Optional
from common.log import logger
class ChatService:
"""
High-level service that runs an Agent for a given query and streams
the results as CHAT protocol chunks via a callback.
Usage:
svc = ChatService(agent_bridge)
svc.run(query, session_id, send_chunk_fn)
"""
def __init__(self, agent_bridge):
"""
:param agent_bridge: AgentBridge instance (manages agent lifecycle)
"""
self.agent_bridge = agent_bridge
def run(self, query: str, session_id: str, send_chunk_fn: Callable[[dict], None],
channel_type: str = ""):
"""
Run the agent for *query* and stream results back via *send_chunk_fn*.
The method blocks until the agent finishes. After it returns the SDK
will automatically send the final (streaming=false) message.
:param query: user query text
:param session_id: session identifier for agent isolation
:param send_chunk_fn: callable(chunk_data: dict) to send a streaming chunk
:param channel_type: source channel (e.g. "web", "feishu") for persistence
"""
agent = self.agent_bridge.get_agent(session_id=session_id)
if agent is None:
raise RuntimeError("Failed to initialise agent for the session")
# Pass context metadata to model for downstream API requests
if hasattr(agent, 'model'):
agent.model.channel_type = channel_type or ""
agent.model.session_id = session_id or ""
# State shared between the event callback and this method
state = _StreamState()
def on_event(event: dict):
"""Translate agent events into CHAT protocol chunks."""
event_type = event.get("type")
data = event.get("data", {})
if event_type == "reasoning_update":
delta = data.get("delta", "")
if delta:
send_chunk_fn({
"chunk_type": "reasoning",
"delta": delta,
"segment_id": state.segment_id,
})
elif event_type == "message_update":
# Incremental text delta
delta = data.get("delta", "")
if delta:
send_chunk_fn({
"chunk_type": "content",
"delta": delta,
"segment_id": state.segment_id,
})
elif event_type == "message_end":
# A content segment finished.
tool_calls = data.get("tool_calls", [])
if tool_calls:
# After tool_calls are executed the next content will be
# a new segment; collect tool results until turn_end.
state.pending_tool_results = []
elif event_type == "file_to_send":
url = data.get("url") or ""
if url:
fname = data.get("file_name") or "file"
ft = data.get("file_type") or "file"
if ft == "image":
link = f"![{fname}]({url})"
else:
link = f"[{fname}]({url})"
send_chunk_fn({
"chunk_type": "content",
"delta": "\n\n" + link + "\n\n",
"segment_id": state.segment_id,
})
# Remove url so the model won't repeat it in its reply
data.pop("url", None)
elif event_type == "tool_execution_start":
# Notify the client that a tool is about to run (with its input args)
tool_name = data.get("tool_name", "")
arguments = data.get("arguments", {})
# Cache arguments keyed by tool_call_id so tool_execution_end can include them
tool_call_id = data.get("tool_call_id", tool_name)
state.pending_tool_arguments[tool_call_id] = arguments
send_chunk_fn({
"chunk_type": "tool_start",
"tool": tool_name,
"arguments": arguments,
})
elif event_type == "tool_execution_end":
tool_name = data.get("tool_name", "")
tool_call_id = data.get("tool_call_id", tool_name)
# Retrieve cached arguments from the matching tool_execution_start event
arguments = state.pending_tool_arguments.pop(tool_call_id, data.get("arguments", {}))
result = data.get("result", "")
status = data.get("status", "unknown")
execution_time = data.get("execution_time", 0)
elapsed_str = f"{execution_time:.2f}s"
# Serialise result to string if needed
if not isinstance(result, str):
import json
try:
result = json.dumps(result, ensure_ascii=False)
except Exception:
result = str(result)
tool_info = {
"name": tool_name,
"arguments": arguments,
"result": result,
"status": status,
"elapsed": elapsed_str,
}
if state.pending_tool_results is not None:
state.pending_tool_results.append(tool_info)
elif event_type == "turn_end":
has_tool_calls = data.get("has_tool_calls", False)
if has_tool_calls and state.pending_tool_results:
# Flush collected tool results as a single tool_calls chunk
send_chunk_fn({
"chunk_type": "tool_calls",
"tool_calls": state.pending_tool_results,
})
state.pending_tool_results = None
# Next content belongs to a new segment
state.segment_id += 1
# Run the agent with our event callback ---------------------------
logger.info(f"[ChatService] Starting agent run: session={session_id}, query={query[:80]}")
from config import conf
max_context_turns = conf().get("agent_max_context_turns", 20)
# Get full system prompt with skills
full_system_prompt = agent.get_full_system_prompt()
# Create a copy of messages for this execution
with agent.messages_lock:
messages_copy = agent.messages.copy()
original_length = len(agent.messages)
from agent.protocol.agent_stream import AgentStreamExecutor
executor = AgentStreamExecutor(
agent=agent,
model=agent.model,
system_prompt=full_system_prompt,
tools=agent.tools,
max_turns=agent.max_steps,
on_event=on_event,
messages=messages_copy,
max_context_turns=max_context_turns,
)
try:
response = executor.run_stream(query)
except Exception:
# If executor cleared messages (context overflow), sync back
if len(executor.messages) == 0:
with agent.messages_lock:
agent.messages.clear()
logger.info("[ChatService] Cleared agent message history after executor recovery")
raise
# Sync executor messages back to agent (thread-safe).
# The executor may have trimmed context, making its list shorter than
# original_length. In that case we must replace entirely — just
# appending would leave stale pre-trim messages in agent.messages
# and cause the same trim to fire on every subsequent request.
with agent.messages_lock:
trimmed = len(executor.messages) < original_length
if trimmed:
# Context was trimmed: the executor appended the new user
# query *before* trimming, so the new messages (user +
# assistant + tools) sit at the tail of the trimmed list.
# We cannot simply slice at original_length (it exceeds the
# list length). Instead, count how many messages the
# executor added on top of the post-trim baseline.
#
# Timeline inside executor.run_stream:
# 1. messages had `original_length` items
# 2. append user query → original_length + 1
# 3. _trim_messages() → some smaller number (includes the
# user query because it belongs to the last turn)
# 4. LLM replies / tool calls appended
#
# The user query message is always the first message of the
# last turn (it cannot be trimmed away), so we locate it to
# find where "new" messages begin.
new_start = original_length # fallback
for idx in range(len(executor.messages) - 1, -1, -1):
msg = executor.messages[idx]
if msg.get("role") == "user":
content = msg.get("content", [])
is_user_query = False
if isinstance(content, list):
has_text = any(
isinstance(b, dict) and b.get("type") == "text"
for b in content
)
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result"
for b in content
)
is_user_query = has_text and not has_tool_result
elif isinstance(content, str):
is_user_query = True
if is_user_query:
new_start = idx
break
new_messages = list(executor.messages[new_start:])
else:
new_messages = list(executor.messages[original_length:])
agent.messages = list(executor.messages)
# Persist new messages to SQLite so they survive restarts and
# can be queried via the HISTORY interface.
if new_messages:
self._persist_messages(session_id, list(new_messages), channel_type)
# Store executor reference for files_to_send access
agent.stream_executor = executor
# Execute post-process tools
agent._execute_post_process_tools()
logger.info(f"[ChatService] Agent run completed: session={session_id}")
@staticmethod
def _persist_messages(session_id: str, new_messages: list, channel_type: str = ""):
try:
from config import conf
if not conf().get("conversation_persistence", True):
return
except Exception:
pass
try:
from agent.memory import get_conversation_store
get_conversation_store().append_messages(
session_id, new_messages, channel_type=channel_type
)
except Exception as e:
logger.warning(
f"[ChatService] Failed to persist messages for session={session_id}: {e}"
)
class _StreamState:
"""Mutable state shared between the event callback and the run method."""
def __init__(self):
self.segment_id: int = 0
# None means we are not accumulating tool results right now.
# A list means we are in the middle of a tool-execution phase.
self.pending_tool_results: Optional[list] = None
# Maps tool_call_id -> arguments captured from tool_execution_start,
# so that tool_execution_end can attach the correct input args.
self.pending_tool_arguments: dict = {}

View File

@@ -0,0 +1,241 @@
"""
SessionService - Manages multi-session lifecycle for both web channel and cloud client.
Provides a unified interface for listing, deleting, renaming, clearing context,
and generating AI titles for conversation sessions. Backed by ConversationStore
(SQLite) and AgentBridge (in-memory agent instances).
"""
import re
from typing import Optional
from common.log import logger
def _truncate_fallback_title(user_message: str, max_len: int = 30) -> str:
"""Pick the first non-empty line of the user message and truncate it."""
if not user_message:
return "New Chat"
first_line = ""
for line in user_message.splitlines():
line = line.strip()
if line:
first_line = line
break
if not first_line:
return "New Chat"
if len(first_line) > max_len:
first_line = first_line[:max_len].rstrip() + "..."
return first_line
def generate_session_title(user_message: str, assistant_reply: str = "") -> str:
"""
Generate a short session title by calling the current bot's reply_text.
Falls back to the first line of the user message if the LLM call fails
or returns an obvious error sentinel.
"""
fallback = _truncate_fallback_title(user_message)
try:
from bridge.bridge import Bridge
from models.session_manager import Session
bot = Bridge().get_bot("chat")
prompt_parts = [f"User: {user_message[:300]}"]
if assistant_reply:
prompt_parts.append(f"Assistant: {assistant_reply[:300]}")
session = Session("__title_gen__", system_prompt="")
session.messages = [
{"role": "user", "content": (
"Generate a very short title (max 15 characters for Chinese, max 6 words for English) "
"summarizing this conversation. Return ONLY the title text, nothing else.\n\n"
+ "\n".join(prompt_parts)
)}
]
result = bot.reply_text(session) or {}
# When bots fail (network error, auth error, rate limit, etc.) they
# typically return completion_tokens=0 with a sentinel content like
# "请再问我一次吧" / "我现在有点累了". Treat that as failure.
completion_tokens = result.get("completion_tokens", 0) or 0
raw = (result.get("content") or "").strip()
if completion_tokens <= 0:
logger.warning(
f"[SessionService] Title generation got empty completion "
f"(completion_tokens={completion_tokens}, content='{raw[:50]}'), "
f"using fallback")
return fallback
title = re.sub(r'<think>.*?</think>', '', raw, flags=re.DOTALL).strip().strip('"\'')
logger.info(f"[SessionService] Title generation result: '{title}' (len={len(title)})")
if title and len(title) <= 50:
return title
except Exception as e:
logger.warning(f"[SessionService] Title generation failed: {e}")
return fallback
class SessionService:
"""
High-level service for session lifecycle management.
Usage:
svc = SessionService()
result = svc.dispatch("list", {"channel_type": "web", "page": 1})
"""
def _get_store(self):
from agent.memory import get_conversation_store
return get_conversation_store()
def _remove_agent(self, session_id: str):
"""Remove the in-memory Agent instance for a session if it exists."""
try:
from bridge.bridge import Bridge
ab = Bridge().get_agent_bridge()
if session_id in ab.agents:
del ab.agents[session_id]
logger.info(f"[SessionService] Removed agent instance: {session_id}")
except Exception:
pass
@staticmethod
def _normalize_sid(session_id: str) -> str:
if session_id and not session_id.startswith("session_"):
return f"session_{session_id}"
return session_id
# ------------------------------------------------------------------
# actions
# ------------------------------------------------------------------
def list_sessions(self, channel_type: Optional[str] = None,
page: int = 1, page_size: int = 50) -> dict:
store = self._get_store()
return store.list_sessions(
channel_type=channel_type,
page=page,
page_size=page_size,
)
def delete_session(self, session_id: str) -> None:
if not session_id:
raise ValueError("session_id required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
store.clear_session(session_id)
self._remove_agent(session_id)
logger.info(f"[SessionService] Session deleted: {session_id}")
def rename_session(self, session_id: str, title: str) -> None:
if not session_id:
raise ValueError("session_id required")
if not title:
raise ValueError("title required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
found = store.rename_session(session_id, title)
if not found:
raise ValueError("session not found")
def clear_context(self, session_id: str) -> int:
"""
Set context boundary. Returns the new context_start_seq value.
"""
if not session_id:
raise ValueError("session_id required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
new_seq = store.clear_context(session_id)
self._remove_agent(session_id)
return new_seq
def gen_title(self, session_id: str, user_message: str,
assistant_reply: str = "") -> str:
"""
Generate an AI title and persist it. Returns the generated title.
"""
if not session_id:
raise ValueError("session_id required")
if not user_message:
raise ValueError("user_message required")
session_id = self._normalize_sid(session_id)
title = generate_session_title(user_message, assistant_reply)
store = self._get_store()
updated = store.rename_session(session_id, title)
logger.info(f"[SessionService] Title set: sid={session_id}, "
f"title='{title}', db_updated={updated}")
return title
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a session management action and return a protocol-compatible
response dict.
Action names use a ``*_session`` / session-prefixed convention so they
can coexist with history actions (e.g. ``query``) on the same HISTORY
message channel without ambiguity.
Supported actions:
- list_sessions: list sessions with pagination
- delete_session: delete a session
- rename_session: rename a session title
- clear_context: set context boundary
- generate_title: AI-generate a session title
:param action: one of the above action names
:param payload: action-specific payload
:return: dict with action, code, message, payload
"""
payload = payload or {}
try:
if action == "list_sessions":
result = self.list_sessions(
channel_type=payload.get("channel_type"),
page=int(payload.get("page", 1)),
page_size=int(payload.get("page_size", 50)),
)
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "delete_session":
self.delete_session(payload.get("session_id", ""))
return {"action": action, "code": 200, "message": "success", "payload": None}
elif action == "rename_session":
self.rename_session(
payload.get("session_id", ""),
payload.get("title", "").strip(),
)
return {"action": action, "code": 200, "message": "success", "payload": None}
elif action == "clear_context":
new_seq = self.clear_context(payload.get("session_id", ""))
return {"action": action, "code": 200, "message": "success",
"payload": {"context_start_seq": new_seq}}
elif action == "generate_title":
title = self.gen_title(
payload.get("session_id", ""),
payload.get("user_message", ""),
payload.get("assistant_reply", ""),
)
return {"action": action, "code": 200, "message": "success",
"payload": {"title": title}}
else:
return {"action": action, "code": 400,
"message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 400, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[SessionService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}

View File

240
agent/knowledge/service.py Normal file
View File

@@ -0,0 +1,240 @@
"""
Knowledge service for handling knowledge base operations.
Provides a unified interface for listing, reading, and graphing knowledge files,
callable from the web console, API, or CLI.
Knowledge file layout (under workspace_root):
knowledge/index.md
knowledge/log.md
knowledge/<category>/<slug>.md
"""
import os
import re
from pathlib import Path
from typing import Optional
from common.log import logger
from config import conf
class KnowledgeService:
"""
High-level service for knowledge base queries.
Operates directly on the filesystem.
"""
def __init__(self, workspace_root: str):
self.workspace_root = workspace_root
self.knowledge_dir = os.path.join(workspace_root, "knowledge")
# ------------------------------------------------------------------
# list — directory tree with stats
# ------------------------------------------------------------------
def list_tree(self) -> dict:
"""
Return the knowledge directory tree grouped by category,
supporting arbitrarily nested sub-directories.
Returns::
{
"tree": [
{
"dir": "concepts",
"files": [
{"name": "moe.md", "title": "MoE", "size": 1234},
],
"children": []
},
{
"dir": "platform",
"files": [],
"children": [
{
"dir": "analysis",
"files": [{"name": "perf.md", ...}],
"children": []
}
]
},
],
"stats": {"pages": 15, "size": 32768},
"enabled": true
}
"""
if not os.path.isdir(self.knowledge_dir):
return {"tree": [], "stats": {"pages": 0, "size": 0}, "enabled": conf().get("knowledge", True)}
stats = {"pages": 0, "size": 0}
root_files, tree = self._scan_dir(self.knowledge_dir, stats, is_root=True)
return {
"root_files": root_files,
"tree": tree,
"stats": stats,
"enabled": conf().get("knowledge", True),
}
def _scan_dir(self, dir_path: str, stats: dict, is_root: bool = False) -> tuple:
"""
Recursively scan a directory.
:return: (files, children) where files is a list of .md file dicts
in this directory and children is a list of sub-directory nodes.
"""
files = []
children = []
for name in sorted(os.listdir(dir_path)):
if name.startswith("."):
continue
full = os.path.join(dir_path, name)
if os.path.isdir(full):
sub_files, sub_children = self._scan_dir(full, stats)
children.append({"dir": name, "files": sub_files, "children": sub_children})
elif name.endswith(".md"):
size = os.path.getsize(full)
if not is_root:
stats["pages"] += 1
stats["size"] += size
title = name.replace(".md", "")
try:
with open(full, "r", encoding="utf-8") as f:
first_line = f.readline().strip()
if first_line.startswith("# "):
title = first_line[2:].strip()
except Exception:
pass
files.append({"name": name, "title": title, "size": size})
return files, children
# ------------------------------------------------------------------
# read — single file content
# ------------------------------------------------------------------
def read_file(self, rel_path: str) -> dict:
"""
Read a single knowledge markdown file.
:param rel_path: Relative path within knowledge/, e.g. ``concepts/moe.md``
:return: dict with ``content`` and ``path``
:raises ValueError: if path is invalid or escapes knowledge dir
:raises FileNotFoundError: if file does not exist
"""
if not rel_path or ".." in rel_path:
raise ValueError("invalid path")
full_path = os.path.normpath(os.path.join(self.knowledge_dir, rel_path))
allowed = os.path.normpath(self.knowledge_dir)
if not full_path.startswith(allowed + os.sep) and full_path != allowed:
raise ValueError("path outside knowledge dir")
if not os.path.isfile(full_path):
raise FileNotFoundError(f"file not found: {rel_path}")
with open(full_path, "r", encoding="utf-8") as f:
content = f.read()
return {"content": content, "path": rel_path}
# ------------------------------------------------------------------
# graph — nodes and links for visualization
# ------------------------------------------------------------------
def build_graph(self) -> dict:
"""
Parse all knowledge pages and extract cross-reference links.
Returns::
{
"nodes": [
{"id": "concepts/moe.md", "label": "MoE", "category": "concepts"},
...
],
"links": [
{"source": "concepts/moe.md", "target": "entities/deepseek.md"},
...
]
}
"""
knowledge_path = Path(self.knowledge_dir)
if not knowledge_path.is_dir():
return {"nodes": [], "links": []}
nodes = {}
links = []
link_re = re.compile(r'\[([^\]]*)\]\(([^)]+\.md)\)')
for md_file in knowledge_path.rglob("*.md"):
rel = str(md_file.relative_to(knowledge_path))
if rel in ("index.md", "log.md"):
continue
parts = rel.split("/")
category = parts[0] if len(parts) > 1 else "root"
title = md_file.stem.replace("-", " ").title()
try:
content = md_file.read_text(encoding="utf-8")
first_line = content.strip().split("\n")[0]
if first_line.startswith("# "):
title = first_line[2:].strip()
for _, link_target in link_re.findall(content):
resolved = (md_file.parent / link_target).resolve()
try:
target_rel = str(resolved.relative_to(knowledge_path))
except ValueError:
continue
if target_rel != rel:
links.append({"source": rel, "target": target_rel})
except Exception:
pass
nodes[rel] = {"id": rel, "label": title, "category": category}
valid_ids = set(nodes.keys())
links = [l for l in links if l["source"] in valid_ids and l["target"] in valid_ids]
seen = set()
deduped = []
for l in links:
key = tuple(sorted([l["source"], l["target"]]))
if key not in seen:
seen.add(key)
deduped.append(l)
return {"nodes": list(nodes.values()), "links": deduped}
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a knowledge management action.
:param action: ``list``, ``read``, or ``graph``
:param payload: action-specific payload
:return: protocol-compatible response dict
"""
payload = payload or {}
try:
if action == "list":
result = self.list_tree()
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "read":
path = payload.get("path")
if not path:
return {"action": action, "code": 400, "message": "path is required", "payload": None}
result = self.read_file(path)
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "graph":
result = self.build_graph()
return {"action": action, "code": 200, "message": "success", "payload": result}
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": str(e), "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[KnowledgeService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}

23
agent/memory/__init__.py Normal file
View File

@@ -0,0 +1,23 @@
"""
Memory module for AgentMesh
Provides both long-term memory (vector/keyword search) and short-term
conversation history persistence (SQLite).
"""
from agent.memory.manager import MemoryManager
from agent.memory.config import MemoryConfig, get_default_memory_config, set_global_memory_config
from agent.memory.embedding import create_embedding_provider
from agent.memory.conversation_store import ConversationStore, get_conversation_store
from agent.memory.summarizer import ensure_daily_memory_file
__all__ = [
'MemoryManager',
'MemoryConfig',
'get_default_memory_config',
'set_global_memory_config',
'create_embedding_provider',
'ConversationStore',
'get_conversation_store',
'ensure_daily_memory_file',
]

140
agent/memory/chunker.py Normal file
View File

@@ -0,0 +1,140 @@
"""
Text chunking utilities for memory
Splits text into chunks with token limits and overlap
"""
from __future__ import annotations
from typing import List, Tuple
from dataclasses import dataclass
@dataclass
class TextChunk:
"""Represents a text chunk with line numbers"""
text: str
start_line: int
end_line: int
class TextChunker:
"""Chunks text by line count with token estimation"""
def __init__(self, max_tokens: int = 500, overlap_tokens: int = 50):
"""
Initialize chunker
Args:
max_tokens: Maximum tokens per chunk
overlap_tokens: Overlap tokens between chunks
"""
self.max_tokens = max_tokens
self.overlap_tokens = overlap_tokens
# Rough estimation: ~4 chars per token for English/Chinese mixed
self.chars_per_token = 4
def chunk_text(self, text: str) -> List[TextChunk]:
"""
Chunk text into overlapping segments
Args:
text: Input text to chunk
Returns:
List of TextChunk objects
"""
if not text.strip():
return []
lines = text.split('\n')
chunks = []
max_chars = self.max_tokens * self.chars_per_token
overlap_chars = self.overlap_tokens * self.chars_per_token
current_chunk = []
current_chars = 0
start_line = 1
for i, line in enumerate(lines, start=1):
line_chars = len(line)
# If single line exceeds max, split it
if line_chars > max_chars:
# Save current chunk if exists
if current_chunk:
chunks.append(TextChunk(
text='\n'.join(current_chunk),
start_line=start_line,
end_line=i - 1
))
current_chunk = []
current_chars = 0
# Split long line into multiple chunks
for sub_chunk in self._split_long_line(line, max_chars):
chunks.append(TextChunk(
text=sub_chunk,
start_line=i,
end_line=i
))
start_line = i + 1
continue
# Check if adding this line would exceed limit
if current_chars + line_chars > max_chars and current_chunk:
# Save current chunk
chunks.append(TextChunk(
text='\n'.join(current_chunk),
start_line=start_line,
end_line=i - 1
))
# Start new chunk with overlap
overlap_lines = self._get_overlap_lines(current_chunk, overlap_chars)
current_chunk = overlap_lines + [line]
current_chars = sum(len(l) for l in current_chunk)
start_line = i - len(overlap_lines)
else:
# Add line to current chunk
current_chunk.append(line)
current_chars += line_chars
# Save last chunk
if current_chunk:
chunks.append(TextChunk(
text='\n'.join(current_chunk),
start_line=start_line,
end_line=len(lines)
))
return chunks
def _split_long_line(self, line: str, max_chars: int) -> List[str]:
"""Split a single long line into multiple chunks"""
chunks = []
for i in range(0, len(line), max_chars):
chunks.append(line[i:i + max_chars])
return chunks
def _get_overlap_lines(self, lines: List[str], target_chars: int) -> List[str]:
"""Get last few lines that fit within target_chars for overlap"""
overlap = []
chars = 0
for line in reversed(lines):
line_chars = len(line)
if chars + line_chars > target_chars:
break
overlap.insert(0, line)
chars += line_chars
return overlap
def chunk_markdown(self, text: str) -> List[TextChunk]:
"""
Chunk markdown text while respecting structure
(For future enhancement: respect markdown sections)
"""
return self.chunk_text(text)

122
agent/memory/config.py Normal file
View File

@@ -0,0 +1,122 @@
"""
Memory configuration module
Provides global memory configuration with simplified workspace structure
"""
from __future__ import annotations
import os
from dataclasses import dataclass, field
from typing import Optional, List
from pathlib import Path
def _default_workspace():
"""Get default workspace path with proper Windows support"""
from common.utils import expand_path
return expand_path("~/cow")
@dataclass
class MemoryConfig:
"""Configuration for memory storage and search"""
# Storage paths (default: ~/cow)
workspace_root: str = field(default_factory=_default_workspace)
# Embedding config
embedding_provider: str = "openai" # "openai" | "local"
embedding_model: str = "text-embedding-3-small"
embedding_dim: int = 1536
# Chunking config
chunk_max_tokens: int = 500
chunk_overlap_tokens: int = 50
# Search config
max_results: int = 10
min_score: float = 0.1
# Hybrid search weights
vector_weight: float = 0.7
keyword_weight: float = 0.3
# Memory sources
sources: List[str] = field(default_factory=lambda: ["memory", "session"])
# Sync config
enable_auto_sync: bool = True
sync_on_search: bool = True
def get_workspace(self) -> Path:
"""Get workspace root directory"""
return Path(self.workspace_root)
def get_memory_dir(self) -> Path:
"""Get memory files directory"""
return self.get_workspace() / "memory"
def get_db_path(self) -> Path:
"""Get SQLite database path for long-term memory index"""
index_dir = self.get_memory_dir() / "long-term"
index_dir.mkdir(parents=True, exist_ok=True)
return index_dir / "index.db"
def get_skills_dir(self) -> Path:
"""Get skills directory"""
return self.get_workspace() / "skills"
def get_agent_workspace(self, agent_name: Optional[str] = None) -> Path:
"""
Get workspace directory for an agent
Args:
agent_name: Optional agent name (not used in current implementation)
Returns:
Path to workspace directory
"""
workspace = self.get_workspace()
# Ensure workspace directory exists
workspace.mkdir(parents=True, exist_ok=True)
return workspace
# Global memory configuration
_global_memory_config: Optional[MemoryConfig] = None
def get_default_memory_config() -> MemoryConfig:
"""
Get the global memory configuration.
If not set, returns a default configuration.
Returns:
MemoryConfig instance
"""
global _global_memory_config
if _global_memory_config is None:
_global_memory_config = MemoryConfig()
return _global_memory_config
def set_global_memory_config(config: MemoryConfig):
"""
Set the global memory configuration.
This should be called before creating any MemoryManager instances.
Args:
config: MemoryConfig instance to use globally
Example:
>>> from agent.memory import MemoryConfig, set_global_memory_config
>>> config = MemoryConfig(
... workspace_root="~/my_agents",
... embedding_provider="openai",
... vector_weight=0.8
... )
>>> set_global_memory_config(config)
"""
global _global_memory_config
_global_memory_config = config

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,41 @@
"""
Embedding subsystem for memory.
Public API:
create_embedding_provider, EmbeddingProvider, OpenAIEmbeddingProvider,
EMBEDDING_VENDORS, EmbeddingCache
RebuildResult, clear_index, rebuild_in_process
detect_index_dim, cleanup_legacy_state_file
"""
from agent.memory.embedding.provider import (
EMBEDDING_VENDORS,
DoubaoEmbeddingProvider,
EmbeddingCache,
EmbeddingProvider,
OpenAIEmbeddingProvider,
create_embedding_provider,
)
from agent.memory.embedding.rebuild import (
RebuildResult,
clear_index,
rebuild_in_process,
)
from agent.memory.embedding.state import (
cleanup_legacy_state_file,
detect_index_dim,
)
__all__ = [
"EMBEDDING_VENDORS",
"DoubaoEmbeddingProvider",
"EmbeddingCache",
"EmbeddingProvider",
"OpenAIEmbeddingProvider",
"create_embedding_provider",
"RebuildResult",
"clear_index",
"rebuild_in_process",
"cleanup_legacy_state_file",
"detect_index_dim",
]

View File

@@ -0,0 +1,486 @@
"""
Embedding providers for memory
Supports multiple OpenAI-compatible embedding vendors:
- openai (text-embedding-3-small / large)
- linkai (OpenAI-compatible passthrough)
- dashscope (Aliyun Tongyi text-embedding-v4)
- doubao (ByteDance Doubao Seed1.5 / large-text on Volcengine Ark)
- zhipu (ZhipuAI embedding-3)
Vendor keys here intentionally match the project's bot_type constants in
common.const (OPENAI, LINKAI, QWEN_DASHSCOPE, DOUBAO, ZHIPU_AI).
All providers share a single OpenAI-compatible REST client. Vendor-specific
behaviors (truncation, query instruction prefix) are configured via metadata.
"""
import hashlib
import math
from abc import ABC, abstractmethod
from typing import List, Optional
# HTTP read timeout for a single embeddings request (seconds). A batch of
# 64+ chunks can take 30-50s end-to-end from China-side networks, so 30s is
# routinely too tight; 90s gives meaningful headroom without letting bad
# endpoints hang forever.
EMBEDDING_HTTP_TIMEOUT = 90
class EmbeddingProvider(ABC):
"""Base class for embedding providers"""
@abstractmethod
def embed(self, text: str) -> List[float]:
"""Generate embedding for a single text (treated as a query by default)"""
pass
@abstractmethod
def embed_batch(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for multiple texts (treated as documents)"""
pass
def embed_query(self, text: str) -> List[float]:
"""Generate embedding for a query string (may apply vendor instruction prefix)"""
return self.embed(text)
@property
@abstractmethod
def dimensions(self) -> int:
"""Effective embedding dimensions"""
pass
# ---------------------------------------------------------------------------
# Vendor metadata table
# ---------------------------------------------------------------------------
#
# Each entry describes how to reach a vendor's embedding endpoint. Most
# vendors expose an OpenAI-compatible /embeddings API; the few that don't
# (currently: doubao) set `provider_class` to pick a dedicated adapter.
# Fields:
# provider_class : optional adapter key ("doubao"); defaults to OpenAI-compat
# default_base_url : default API base when not overridden by user
# default_model : default embedding model name
# default_dimensions : recommended unified dim when explicit path is enabled
# supports_dim_param : whether the API accepts a `dimensions` request param
# needs_client_truncate : whether to slice + L2-normalize on the client side
# needs_client_normalize : whether to L2-normalize on the client (always safe)
# query_instruction : optional prefix for asymmetric retrieval (Doubao Seed)
# max_batch_size : max texts per /embeddings request; embed_batch
# auto-paginates above this. Conservative defaults.
#
EMBEDDING_VENDORS = {
"openai": {
"default_base_url": "https://api.openai.com/v1",
"default_model": "text-embedding-3-small",
# Match the legacy default so users adding `embedding_provider: openai`
# to an existing index don't need to rebuild. Override via
# embedding_dimensions if you want 1024 / 1536 / 3072.
"default_dimensions": 1536,
"supports_dim_param": True,
"needs_client_truncate": False,
"needs_client_normalize": False,
"query_instruction": "",
# OpenAI permits up to 2048 items per request, but a single call
# carrying hundreds of long chunks routinely exceeds the 30s read
# timeout from China-side networks. 64 keeps each call well under
# both the token-per-request budget and a reasonable wall clock.
"max_batch_size": 64,
},
"linkai": {
"default_base_url": "https://api.link-ai.tech/v1",
"default_model": "text-embedding-3-small",
"default_dimensions": 1536,
"supports_dim_param": True,
"needs_client_truncate": False,
"needs_client_normalize": False,
"query_instruction": "",
"max_batch_size": 64,
},
"dashscope": {
"default_base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"default_model": "text-embedding-v4",
"default_dimensions": 1024,
"supports_dim_param": True,
"needs_client_truncate": False,
"needs_client_normalize": False,
"query_instruction": "",
"max_batch_size": 10, # DashScope hard cap (text-embedding-v4)
},
"doubao": {
# Doubao no longer offers an OpenAI-compatible /v1/embeddings endpoint.
# Current models are unified under /api/v3/embeddings/multimodal
# which uses a structured `input` payload — see DoubaoEmbeddingProvider.
"provider_class": "doubao",
"default_base_url": "https://ark.cn-beijing.volces.com/api/v3",
"default_model": "doubao-embedding-vision-251215",
# Native options: 1024 or 2048. We default to 1024 to align with the
# other Chinese vendors (dashscope/zhipu) and keep storage footprint
# consistent across providers; users can still override via
# `embedding_dimensions: 2048` in config.
"default_dimensions": 1024,
"supports_dim_param": True,
"needs_client_truncate": False,
"needs_client_normalize": False,
"query_instruction": "",
# Multimodal endpoint produces ONE embedding per call (input list is
# a single document's parts, not a batch). embed_batch loops.
"max_batch_size": 1,
},
"zhipu": {
"default_base_url": "https://open.bigmodel.cn/api/paas/v4",
"default_model": "embedding-3",
"default_dimensions": 1024,
"supports_dim_param": True,
"needs_client_truncate": False,
"needs_client_normalize": False,
"query_instruction": "",
"max_batch_size": 64,
},
}
def _l2_normalize(vec: List[float]) -> List[float]:
"""Normalize a vector to unit length (L2 norm). Returns input on zero vector."""
norm = math.sqrt(sum(v * v for v in vec))
if norm == 0:
return vec
return [v / norm for v in vec]
class OpenAIEmbeddingProvider(EmbeddingProvider):
"""
OpenAI-compatible embedding provider.
Used for openai/linkai/dashscope/ark/zhipu by configuring the metadata
fields. The legacy two-arg constructor (model, api_key, api_base) keeps
working, so the original OpenAI/LinkAI fallback code path is unchanged.
"""
def __init__(
self,
model: str = "text-embedding-3-small",
api_key: Optional[str] = None,
api_base: Optional[str] = None,
extra_headers: Optional[dict] = None,
dimensions: Optional[int] = None,
supports_dim_param: bool = True,
needs_client_truncate: bool = False,
needs_client_normalize: bool = False,
query_instruction: str = "",
max_batch_size: int = 256,
):
"""
Args:
model: Model name (e.g. text-embedding-3-small, text-embedding-v4, embedding-3)
api_key: API key (required)
api_base: API base URL (defaults to OpenAI)
extra_headers: Optional extra HTTP headers
dimensions: Target output dimension. Required when supports_dim_param
is False and needs_client_truncate is True (used to slice).
supports_dim_param: Whether the vendor accepts a `dimensions` body param
needs_client_truncate: Slice the returned vector to `dimensions`
needs_client_normalize: L2-normalize on the client after slicing
query_instruction: Optional prefix prepended to query texts only
max_batch_size: Max items per /embeddings request; embed_batch
auto-paginates above this.
"""
self.model = model
self.api_key = api_key
self.api_base = api_base or "https://api.openai.com/v1"
self.extra_headers = extra_headers or {}
self.supports_dim_param = supports_dim_param
self.needs_client_truncate = needs_client_truncate
self.needs_client_normalize = needs_client_normalize
self.query_instruction = query_instruction or ""
self.max_batch_size = max(1, int(max_batch_size or 1))
if not self.api_key or self.api_key in ["", "YOUR API KEY", "YOUR_API_KEY"]:
raise ValueError("Embedding API key is not configured")
if dimensions is not None and dimensions > 0:
self._dimensions = dimensions
else:
# Legacy heuristic for OpenAI text-embedding-3-* family
self._dimensions = 1536 if "small" in model else 3072
def _call_api(self, input_data):
"""Call OpenAI-compatible /embeddings endpoint"""
import requests
url = f"{self.api_base}/embeddings"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}",
**self.extra_headers,
}
data = {
"input": input_data,
"model": self.model,
}
if self.supports_dim_param and self._dimensions:
data["dimensions"] = self._dimensions
try:
response = requests.post(url, headers=headers, json=data, timeout=EMBEDDING_HTTP_TIMEOUT)
response.raise_for_status()
return response.json()
except requests.exceptions.ConnectionError as e:
raise ConnectionError(
f"Failed to connect to embedding API at {url}. "
f"Please check network and api_base. Error: {str(e)}"
)
except requests.exceptions.Timeout as e:
raise TimeoutError(f"Embedding API request timed out. Error: {str(e)}")
except requests.exceptions.HTTPError as e:
if e.response.status_code == 401:
raise ValueError("Invalid embedding API key")
elif e.response.status_code == 429:
raise ValueError("Embedding API rate limit exceeded")
else:
raise ValueError(
f"Embedding API request failed: "
f"{e.response.status_code} - {e.response.text}"
)
def _post_process(self, raw: List[float]) -> List[float]:
"""Apply optional client-side truncation + normalization"""
vec = raw
if self.needs_client_truncate and self._dimensions and len(vec) > self._dimensions:
vec = vec[: self._dimensions]
if self.needs_client_normalize:
vec = _l2_normalize(vec)
return vec
def embed(self, text: str) -> List[float]:
"""Generate embedding (treated as document by default)"""
result = self._call_api(text)
return self._post_process(result["data"][0]["embedding"])
def embed_query(self, text: str) -> List[float]:
"""Generate embedding for a query (applies vendor instruction prefix if any)"""
if self.query_instruction:
text = f"{self.query_instruction}{text}"
return self.embed(text)
def embed_batch(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for multiple documents.
Automatically paginates by self.max_batch_size so callers can pass any
number of texts. Order of returned vectors matches the input order.
"""
if not texts:
return []
out: List[List[float]] = []
step = self.max_batch_size
for i in range(0, len(texts), step):
chunk = texts[i:i + step]
result = self._call_api(chunk)
out.extend(self._post_process(item["embedding"]) for item in result["data"])
return out
@property
def dimensions(self) -> int:
return self._dimensions
class DoubaoEmbeddingProvider(EmbeddingProvider):
"""
Doubao (Volcengine Ark) multimodal embedding provider.
Doubao deprecated their OpenAI-compatible /v1/embeddings endpoint and
unified everything under /api/v3/embeddings/multimodal, which uses a
structured `input: [{type, text|image_url|video_url}, ...]` payload.
Notes:
* The endpoint produces ONE embedding per call (input list is multiple
modality parts of a single document, not a batch). embed_batch
therefore loops per-text — no native batch support.
* Native dimensions: 1024 or 2048 (default 1024 to align with other
Chinese vendors). No client-side truncation needed.
* Auth: Bearer ARK API key.
"""
def __init__(
self,
model: str,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
extra_headers: Optional[dict] = None,
dimensions: Optional[int] = None,
):
self.model = model
self.api_key = api_key
self.api_base = api_base or "https://ark.cn-beijing.volces.com/api/v3"
self.extra_headers = extra_headers or {}
if not self.api_key or self.api_key in ["", "YOUR API KEY", "YOUR_API_KEY"]:
raise ValueError("Doubao embedding API key (ark_api_key) is not configured")
if dimensions in (1024, 2048):
self._dimensions = dimensions
elif dimensions is None:
self._dimensions = 1024
else:
raise ValueError(
f"Doubao embedding dimensions must be 1024 or 2048, got {dimensions}"
)
def _call_api(self, text: str) -> List[float]:
"""One call → one embedding. multimodal endpoint takes a single
document represented as a list of typed parts; we send a single
text part."""
import requests
url = f"{self.api_base}/embeddings/multimodal"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}",
**self.extra_headers,
}
payload = {
"model": self.model,
"input": [{"type": "text", "text": text}],
"dimensions": self._dimensions,
"encoding_format": "float",
}
try:
response = requests.post(url, headers=headers, json=payload, timeout=EMBEDDING_HTTP_TIMEOUT)
response.raise_for_status()
body = response.json()
except requests.exceptions.ConnectionError as e:
raise ConnectionError(
f"Failed to connect to Doubao embedding API at {url}. "
f"Please check network and api_base. Error: {str(e)}"
)
except requests.exceptions.Timeout as e:
raise TimeoutError(f"Doubao embedding API request timed out. Error: {str(e)}")
except requests.exceptions.HTTPError as e:
if e.response.status_code == 401:
raise ValueError("Invalid Doubao (ark) embedding API key")
elif e.response.status_code == 429:
raise ValueError("Doubao embedding API rate limit exceeded")
else:
raise ValueError(
f"Doubao embedding API request failed: "
f"{e.response.status_code} - {e.response.text}"
)
# Response shape per docs: {"data": {"embedding": [...]}}
data = body.get("data")
if isinstance(data, dict) and "embedding" in data:
return data["embedding"]
# Some providers wrap as a list of one — be defensive
if isinstance(data, list) and data and "embedding" in data[0]:
return data[0]["embedding"]
raise ValueError(f"Unexpected Doubao embedding response shape: {body}")
def embed(self, text: str) -> List[float]:
return self._call_api(text)
def embed_batch(self, texts: List[str]) -> List[List[float]]:
# Endpoint produces one embedding per call; loop. Order preserved.
return [self._call_api(t) for t in texts]
@property
def dimensions(self) -> int:
return self._dimensions
class EmbeddingCache:
"""In-memory cache for embeddings to avoid recomputation"""
def __init__(self):
self.cache = {}
def get(self, text: str, provider: str, model: str) -> Optional[List[float]]:
key = self._compute_key(text, provider, model)
return self.cache.get(key)
def put(self, text: str, provider: str, model: str, embedding: List[float]):
key = self._compute_key(text, provider, model)
self.cache[key] = embedding
@staticmethod
def _compute_key(text: str, provider: str, model: str) -> str:
content = f"{provider}:{model}:{text}"
return hashlib.md5(content.encode("utf-8")).hexdigest()
def clear(self):
self.cache.clear()
def create_embedding_provider(
provider: str = "openai",
model: Optional[str] = None,
api_key: Optional[str] = None,
api_base: Optional[str] = None,
extra_headers: Optional[dict] = None,
dimensions: Optional[int] = None,
) -> EmbeddingProvider:
"""
Factory function to create an embedding provider.
Backward compatible: when called with provider in {"openai", "linkai"}
and no `dimensions` arg, behaves exactly as before (1536-dim OpenAI).
New providers ("dashscope", "doubao", "zhipu") require explicit configuration
and use the unified 1024-dim defaults from EMBEDDING_VENDORS.
Args:
provider: Vendor key (one of EMBEDDING_VENDORS)
model: Model name (uses vendor default if None)
api_key: API key (required)
api_base: API base URL (uses vendor default if None)
extra_headers: Optional extra HTTP headers
dimensions: Target output dimension (uses vendor default if None)
Returns:
EmbeddingProvider instance
"""
meta = EMBEDDING_VENDORS.get(provider)
if meta is None:
raise ValueError(
f"Unsupported embedding provider: {provider}. "
f"Supported: {sorted(EMBEDDING_VENDORS.keys())}"
)
# Doubao uses a non-OpenAI-compatible multimodal endpoint.
if meta.get("provider_class") == "doubao":
final_dim = dimensions if (dimensions and dimensions > 0) else meta["default_dimensions"]
return DoubaoEmbeddingProvider(
model=model or meta["default_model"],
api_key=api_key,
api_base=api_base or meta["default_base_url"],
extra_headers=extra_headers,
dimensions=final_dim,
)
# Legacy two-arg call for openai/linkai keeps 1536-dim default behavior
# so existing data isn't invalidated.
is_legacy_call = (
provider in ("openai", "linkai")
and dimensions is None
)
if is_legacy_call:
return OpenAIEmbeddingProvider(
model=model or "text-embedding-3-small",
api_key=api_key,
api_base=api_base,
extra_headers=extra_headers,
)
final_dim = dimensions if (dimensions and dimensions > 0) else meta["default_dimensions"]
return OpenAIEmbeddingProvider(
model=model or meta["default_model"],
api_key=api_key,
api_base=api_base or meta["default_base_url"],
extra_headers=extra_headers,
dimensions=final_dim,
supports_dim_param=meta["supports_dim_param"],
needs_client_truncate=meta["needs_client_truncate"],
needs_client_normalize=meta["needs_client_normalize"],
query_instruction=meta["query_instruction"],
max_batch_size=meta.get("max_batch_size", 256),
)

View File

@@ -0,0 +1,191 @@
"""
Rebuild memory vector index.
Recommended entry point (in-chat, while agent is running):
/memory rebuild-index
Backward-compatible CLI entry (must run from project root):
python -m agent.memory.rebuild_index
What it does:
1. Probes the embedding endpoint with a tiny call to fail fast on
bad provider/model/key — before touching the index.
2. Clears the SQLite chunks/files tables (workspace markdown stays intact).
3. Runs a fresh sync, regenerating embeddings with the currently configured
provider/model/dimensions.
This is the only safe way to switch embedding_provider after the existing
index has been populated by a different-dim model.
"""
from __future__ import annotations
import asyncio
import sys
from dataclasses import dataclass
from typing import Optional
from common.log import logger
from common.utils import expand_path
@dataclass
class RebuildResult:
"""Outcome of a rebuild_in_process() call"""
ok: bool
removed: int = 0
chunks: int = 0
files: int = 0
error: Optional[str] = None
def clear_index(db_path, storage=None) -> int:
"""Wipe chunks/files, reset FTS5, and clean up any legacy state file.
Args:
db_path: Path of the index DB (also used to locate the legacy state
file for migration cleanup, and — when *storage* is None — to
open a fresh connection).
storage: Optional pre-opened MemoryStorage. When provided we reuse it
so the live connection's triggers stay in sync — opening a second
connection would leave the original one's triggers pointing at a
DROP'd chunks_fts table.
We reset (DROP+recreate) chunks_fts because its shadow tables can become
inconsistent across rebuild cycles, causing bm25() / ORDER BY rank to
raise "database disk image is malformed" even when raw MATCH still works.
Returns number of chunks removed.
"""
from agent.memory.embedding.state import cleanup_legacy_state_file
from agent.memory.storage import MemoryStorage
owns_storage = storage is None
if owns_storage:
storage = MemoryStorage(db_path)
try:
before = storage.conn.execute("SELECT COUNT(*) FROM chunks").fetchone()[0]
storage.conn.execute("DELETE FROM chunks")
storage.conn.execute("DELETE FROM files")
storage.conn.commit()
storage.reset_fts5()
finally:
if owns_storage:
storage.close()
cleanup_legacy_state_file(db_path)
return int(before)
def rebuild_in_process(memory_manager) -> RebuildResult:
"""
Rebuild the index using an existing, fully-initialized MemoryManager.
Used by the in-chat /memory rebuild-index command. The caller already has
config loaded, embedding_provider built, and (optionally) the agent
running, so we only need to:
1. Clear chunks/files + state on the manager's storage.
2. Re-sync (force=True).
NOTE: caller must ensure memory_manager.embedding_provider is set, otherwise
sync() will silently skip embedding generation.
"""
if memory_manager is None:
return RebuildResult(ok=False, error="memory_manager is None")
if memory_manager.embedding_provider is None:
return RebuildResult(ok=False, error="embedding_provider is not initialized")
# Probe the embedding endpoint BEFORE clearing the index. A bad
# provider/model/key would otherwise leave the user with an empty index
# that not even keyword search can serve.
try:
memory_manager.embedding_provider.embed_query("ping")
except Exception as e:
logger.error(f"[RebuildIndex] embedding probe failed, aborting rebuild: {e}")
return RebuildResult(ok=False, error=f"embedding endpoint not reachable: {e}")
db_path = memory_manager.config.get_db_path()
try:
removed = clear_index(db_path, storage=memory_manager.storage)
except Exception as e:
logger.exception("[RebuildIndex] clear_index failed")
return RebuildResult(ok=False, error=f"clear failed: {e}")
try:
asyncio.run(memory_manager.sync(force=True))
except RuntimeError:
# Already inside a running event loop (rare in chat handler thread).
loop = asyncio.new_event_loop()
try:
loop.run_until_complete(memory_manager.sync(force=True))
finally:
loop.close()
except Exception as e:
logger.exception("[RebuildIndex] sync failed")
return RebuildResult(ok=False, removed=removed, error=f"re-embed failed: {e}")
stats = memory_manager.storage.get_stats()
chunks = int(stats.get("chunks", 0))
embedded = int(stats.get("embedded", 0))
# sync() degrades to "no embeddings" on batch failure so keyword search
# still works at startup — but in a /rebuild-index request the user
# explicitly asked for vectors. Surface that as a failure.
if chunks > 0 and embedded == 0:
return RebuildResult(
ok=False,
removed=removed,
chunks=chunks,
files=int(stats.get("files", 0)),
error=(
"embedding API failed during sync; index now has chunks but no "
"vectors. Check embedding provider/model/key and retry."
),
)
return RebuildResult(
ok=True,
removed=removed,
chunks=chunks,
files=int(stats.get("files", 0)),
)
def main() -> int:
"""Standalone CLI entry. Must be run from project root (relative config path)."""
from config import conf, load_config
from agent.memory import MemoryConfig, MemoryManager
load_config()
workspace_root = expand_path(conf().get("agent_workspace", "~/cow"))
memory_config = MemoryConfig(workspace_root=workspace_root)
logger.info(f"[RebuildIndex] Workspace: {workspace_root}")
logger.info(f"[RebuildIndex] Index db: {memory_config.get_db_path()}")
from bridge.agent_initializer import AgentInitializer
initializer = AgentInitializer(bridge=None, agent_bridge=None)
embedding_provider = initializer._init_embedding_provider(memory_config, session_id=None)
if embedding_provider is None:
logger.error(
"[RebuildIndex] No embedding provider could be initialized. "
"Check your config.json. Aborting rebuild."
)
return 1
manager = MemoryManager(memory_config, embedding_provider=embedding_provider)
result = rebuild_in_process(manager)
if not result.ok:
logger.error(f"[RebuildIndex] {result.error}")
return 1
logger.info(
f"[RebuildIndex] Done. removed={result.removed}, "
f"chunks={result.chunks}, files={result.files}"
)
return 0
if __name__ == "__main__":
sys.exit(main())

View File

@@ -0,0 +1,51 @@
"""
Embedding-related index utilities.
We don't keep a sidecar state file — the SQLite index is the source of truth
and config.json is the source of intent. The two functions below are the
only things needing on-disk awareness:
detect_index_dim : read the dim of stored vectors (display-only)
cleanup_legacy_state_file: remove old embedding_state.json from earlier
versions; safe no-op when absent.
"""
from __future__ import annotations
import json
import os
from pathlib import Path
from typing import Optional, Union
PathLike = Union[str, os.PathLike]
def detect_index_dim(storage) -> Optional[int]:
"""Return the dim of the first stored embedding, or None if the index
has no embeddings. Used by /memory status."""
try:
row = storage.conn.execute(
"SELECT embedding FROM chunks WHERE embedding IS NOT NULL LIMIT 1"
).fetchone()
except Exception:
return None
if not row or not row["embedding"]:
return None
try:
raw = row["embedding"]
if isinstance(raw, (bytes, bytearray)):
# New BLOB format: 4 bytes per float32
return len(raw) // 4
emb = json.loads(raw)
return len(emb) if isinstance(emb, list) else None
except (json.JSONDecodeError, TypeError, Exception):
return None
def cleanup_legacy_state_file(db_path: PathLike) -> None:
"""Remove old embedding_state.json files from earlier versions.
Safe to call repeatedly; no-op if the file is absent."""
legacy = Path(db_path).parent / "embedding_state.json"
try:
legacy.unlink(missing_ok=True)
except Exception:
pass

555
agent/memory/manager.py Normal file
View File

@@ -0,0 +1,555 @@
"""
Memory manager for AgentMesh
Provides high-level interface for memory operations
"""
import os
from typing import List, Optional, Dict, Any
from pathlib import Path
import hashlib
from datetime import datetime, timedelta
from agent.memory.config import MemoryConfig, get_default_memory_config
from agent.memory.storage import MemoryStorage, MemoryChunk, SearchResult
from agent.memory.chunker import TextChunker
from agent.memory.embedding import EmbeddingProvider, EmbeddingCache
from agent.memory.summarizer import MemoryFlushManager, create_memory_files_if_needed
class MemoryManager:
"""
Memory manager with hybrid search capabilities
Provides long-term memory for agents with vector and keyword search
"""
def __init__(
self,
config: Optional[MemoryConfig] = None,
embedding_provider: Optional[EmbeddingProvider] = None,
llm_model: Optional[Any] = None
):
"""
Initialize memory manager
Args:
config: Memory configuration (uses global config if not provided)
embedding_provider: Custom embedding provider (optional)
llm_model: LLM model for summarization (optional)
"""
self.config = config or get_default_memory_config()
# Initialize storage
db_path = self.config.get_db_path()
self.storage = MemoryStorage(db_path)
# Initialize chunker
self.chunker = TextChunker(
max_tokens=self.config.chunk_max_tokens,
overlap_tokens=self.config.chunk_overlap_tokens
)
# Embedding provider is owned by the caller (agent_initializer is the
# canonical entry point and handles legacy/explicit + state validation).
# When None is passed, memory degrades to keyword-only search instead
# of silently re-initializing a vendor here, which would bypass the
# caller's state checks and risk corrupting the index.
self.embedding_provider = embedding_provider
if self.embedding_provider is None:
from common.log import logger
logger.info(
"[MemoryManager] No embedding provider; memory will use keyword search only"
)
# Cache for query embeddings (avoids redundant API calls within a session)
self._embedding_cache = EmbeddingCache()
# Initialize memory flush manager
workspace_dir = self.config.get_workspace()
self.flush_manager = MemoryFlushManager(
workspace_dir=workspace_dir,
llm_model=llm_model
)
# Ensure workspace directories exist
self._init_workspace()
self._dirty = False
def _init_workspace(self):
"""Initialize workspace directories"""
memory_dir = self.config.get_memory_dir()
memory_dir.mkdir(parents=True, exist_ok=True)
# Create default memory files
workspace_dir = self.config.get_workspace()
create_memory_files_if_needed(workspace_dir)
async def search(
self,
query: str,
user_id: Optional[str] = None,
max_results: Optional[int] = None,
min_score: Optional[float] = None,
include_shared: bool = True
) -> List[SearchResult]:
"""
Search memory with hybrid search (vector + keyword)
Args:
query: Search query
user_id: User ID for scoped search
max_results: Maximum results to return
min_score: Minimum score threshold
include_shared: Include shared memories
Returns:
List of search results sorted by relevance
"""
max_results = max_results or self.config.max_results
min_score = min_score or self.config.min_score
# Determine scopes
scopes = []
if include_shared:
scopes.append("shared")
if user_id:
scopes.append("user")
if not scopes:
return []
# Sync if needed
if self.config.sync_on_search and self._dirty:
await self.sync()
from common.log import logger
# Perform vector search (if embedding provider available).
# Failures degrade silently to keyword-only — no exception is raised.
vector_results = []
if self.embedding_provider:
try:
provider_name = type(self.embedding_provider).__name__
model_name = getattr(self.embedding_provider, 'model', '')
cached = self._embedding_cache.get(query, provider_name, model_name)
if cached is not None:
query_embedding = cached
else:
query_embedding = self.embedding_provider.embed_query(query)
self._embedding_cache.put(query, provider_name, model_name, query_embedding)
vector_results = self.storage.search_vector(
query_embedding=query_embedding,
user_id=user_id,
scopes=scopes,
limit=max_results * 2 # Get more candidates for merging
)
logger.info(f"[MemoryManager] Vector search found {len(vector_results)} results for query: {query}")
except Exception as e:
logger.error(
f"[MemoryManager] Vector search failed, falling back to keyword-only: {e}"
)
# Perform keyword search (also runs as fallback when vector failed)
keyword_results = self.storage.search_keyword(
query=query,
user_id=user_id,
scopes=scopes,
limit=max_results * 2
)
logger.info(f"[MemoryManager] Keyword search found {len(keyword_results)} results for query: {query}")
# Merge results
merged = self._merge_results(
vector_results,
keyword_results,
self.config.vector_weight,
self.config.keyword_weight
)
# Filter by min score and limit
filtered = [r for r in merged if r.score >= min_score]
return filtered[:max_results]
async def add_memory(
self,
content: str,
user_id: Optional[str] = None,
scope: str = "shared",
source: str = "memory",
path: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None
):
"""
Add new memory content
Args:
content: Memory content
user_id: User ID for user-scoped memory
scope: Memory scope ("shared", "user", "session")
source: Memory source ("memory" or "session")
path: File path (auto-generated if not provided)
metadata: Additional metadata
"""
if not content.strip():
return
# Generate path if not provided
if not path:
content_hash = hashlib.md5(content.encode('utf-8')).hexdigest()[:8]
if user_id and scope == "user":
path = f"memory/users/{user_id}/memory_{content_hash}.md"
else:
path = f"memory/shared/memory_{content_hash}.md"
# Chunk content
chunks = self.chunker.chunk_text(content)
# Generate embeddings (if provider available)
texts = [chunk.text for chunk in chunks]
if self.embedding_provider:
embeddings = self.embedding_provider.embed_batch(texts)
else:
# No embeddings, just use None
embeddings = [None] * len(texts)
# Create memory chunks
memory_chunks = []
for chunk, embedding in zip(chunks, embeddings):
chunk_id = self._generate_chunk_id(path, chunk.start_line, chunk.end_line)
chunk_hash = MemoryStorage.compute_hash(chunk.text)
memory_chunks.append(MemoryChunk(
id=chunk_id,
user_id=user_id,
scope=scope,
source=source,
path=path,
start_line=chunk.start_line,
end_line=chunk.end_line,
text=chunk.text,
embedding=embedding,
hash=chunk_hash,
metadata=metadata
))
# Save to storage
self.storage.save_chunks_batch(memory_chunks)
# Update file metadata
file_hash = MemoryStorage.compute_hash(content)
self.storage.update_file_metadata(
path=path,
source=source,
file_hash=file_hash,
mtime=int(os.path.getmtime(__file__)), # Use current time
size=len(content)
)
async def sync(self, force: bool = False):
"""
Synchronize memory from files.
Two-pass design to amortize embedding HTTP cost:
1. Walk all files, chunk those whose hash changed, collect pending
chunks across files. No embedding calls yet.
2. Run a single embed_batch over the union of pending chunks (the
provider auto-paginates by vendor cap), then persist per-file.
For workspaces with many small files (101 files / ~1 chunk each), this
cuts ~100 HTTP calls down to ~ceil(total_chunks / vendor_cap).
Args:
force: Force full reindex
"""
memory_dir = self.config.get_memory_dir()
workspace_dir = self.config.get_workspace()
files_to_scan: List[tuple] = [] # (file_path, source, scope, user_id)
memory_file = Path(workspace_dir) / "MEMORY.md"
if memory_file.exists():
files_to_scan.append((memory_file, "memory", "shared", None))
if memory_dir.exists():
for file_path in memory_dir.rglob("*.md"):
rel_parts = file_path.relative_to(workspace_dir).parts
if any(part.startswith('.') for part in rel_parts):
continue
# Dream diaries are narrative reflections produced by Deep
# Dream; their factual content has already been distilled
# into MEMORY.md. Indexing them adds noisy near-duplicates
# that crowd out the authoritative entry in retrieval.
if "dreams" in rel_parts:
continue
if "daily" in rel_parts:
if "users" in rel_parts or len(rel_parts) > 3:
user_idx = rel_parts.index("daily") + 1
user_id = rel_parts[user_idx] if user_idx < len(rel_parts) else None
scope = "user"
else:
user_id = None
scope = "shared"
elif "users" in rel_parts:
user_idx = rel_parts.index("users") + 1
user_id = rel_parts[user_idx] if user_idx < len(rel_parts) else None
scope = "user"
else:
user_id = None
scope = "shared"
files_to_scan.append((file_path, "memory", scope, user_id))
from config import conf
if conf().get("knowledge", True):
knowledge_dir = Path(workspace_dir) / "knowledge"
if knowledge_dir.exists():
for file_path in knowledge_dir.rglob("*.md"):
files_to_scan.append((file_path, "knowledge", "shared", None))
# Pass 1: inline chunking + change detection. Inlined (instead of
# calling self._prepare_file_for_sync) so this method does not depend
# on any sibling helpers — keeps it robust against partial reloads
# where the class object is older than the method's source.
pending: List[Dict[str, Any]] = []
workspace_dir_path = self.config.get_workspace()
for file_path, source, scope, user_id in files_to_scan:
try:
content = file_path.read_text(encoding='utf-8')
except Exception:
continue
file_hash = MemoryStorage.compute_hash(content)
rel_path = str(file_path.relative_to(workspace_dir_path))
if self.storage.get_file_hash(rel_path) == file_hash:
continue
chunks = self.chunker.chunk_text(content)
if not chunks:
continue
pending.append({
"file_path": file_path,
"rel_path": rel_path,
"source": source,
"scope": scope,
"user_id": user_id,
"file_hash": file_hash,
"chunks": chunks,
"texts": [c.text for c in chunks],
})
if not pending:
self._dirty = False
return
# Pass 2: single batched embed across all pending chunks.
# CRITICAL: never touch the index until we hold valid embeddings.
# If embed_batch fails, leave the existing index intact (chunks +
# file_hash) so the next sync will retry the same files. Writing
# NULL embeddings + updating file_hash here would mark the file as
# "successfully synced" and silently strand it without vectors.
all_texts: List[str] = []
for entry in pending:
all_texts.extend(entry["texts"])
if not self.embedding_provider:
# No provider configured at all (legacy keyword-only). Persist
# chunks without embeddings — this is the user's intent.
all_embeddings: List[Optional[List[float]]] = [None] * len(all_texts)
else:
try:
all_embeddings = self.embedding_provider.embed_batch(all_texts)
except Exception as e:
from common.log import logger
logger.error(
f"[MemoryManager] Batch embedding failed for {len(all_texts)} "
f"chunks across {len(pending)} files: {e}. "
f"Index left untouched; will retry on next sync."
)
# Bail before touching storage. self._dirty stays True so
# callers know there is pending work.
return
# Pass 3: inline persist — same self-contained reasoning as Pass 1.
cursor = 0
for entry in pending:
n = len(entry["texts"])
entry_embeddings = all_embeddings[cursor:cursor + n]
cursor += n
rel_path = entry["rel_path"]
self.storage.delete_by_path(rel_path)
memory_chunks = []
for chunk, embedding in zip(entry["chunks"], entry_embeddings):
chunk_id = self._generate_chunk_id(rel_path, chunk.start_line, chunk.end_line)
chunk_hash = MemoryStorage.compute_hash(chunk.text)
memory_chunks.append(MemoryChunk(
id=chunk_id,
user_id=entry["user_id"],
scope=entry["scope"],
source=entry["source"],
path=rel_path,
start_line=chunk.start_line,
end_line=chunk.end_line,
text=chunk.text,
embedding=embedding,
hash=chunk_hash,
metadata=None,
))
self.storage.save_chunks_batch(memory_chunks)
stat = entry["file_path"].stat()
self.storage.update_file_metadata(
path=rel_path,
source=entry["source"],
file_hash=entry["file_hash"],
mtime=int(stat.st_mtime),
size=stat.st_size,
)
self._dirty = False
def flush_memory(
self,
messages: list,
user_id: Optional[str] = None,
reason: str = "threshold",
max_messages: int = 10,
context_summary_callback=None,
) -> bool:
"""
Flush conversation summary to daily memory file.
Args:
messages: Conversation message list
user_id: Optional user ID
reason: "threshold" | "overflow" | "daily_summary"
max_messages: Max recent messages to include (0 = all)
context_summary_callback: Optional callback(str) invoked with the
daily summary text for in-context injection
Returns:
True if flush was dispatched
"""
success = self.flush_manager.flush_from_messages(
messages=messages,
user_id=user_id,
reason=reason,
max_messages=max_messages,
context_summary_callback=context_summary_callback,
)
if success:
self._dirty = True
return success
def get_status(self) -> Dict[str, Any]:
"""Get memory status"""
stats = self.storage.get_stats()
return {
'chunks': stats['chunks'],
'files': stats['files'],
'workspace': str(self.config.get_workspace()),
'dirty': self._dirty,
'embedding_enabled': self.embedding_provider is not None,
'embedding_provider': self.config.embedding_provider if self.embedding_provider else 'disabled',
'embedding_model': self.config.embedding_model if self.embedding_provider else 'N/A',
'search_mode': 'hybrid (vector + keyword)' if self.embedding_provider else 'keyword only (FTS5)'
}
def mark_dirty(self):
"""Mark memory as dirty (needs sync)"""
self._dirty = True
def close(self):
"""Close memory manager and release resources"""
self.storage.close()
# Helper methods
def _generate_chunk_id(self, path: str, start_line: int, end_line: int) -> str:
"""Generate unique chunk ID"""
content = f"{path}:{start_line}:{end_line}"
return hashlib.md5(content.encode('utf-8')).hexdigest()
@staticmethod
def _compute_temporal_decay(path: str, half_life_days: float = 30.0) -> float:
"""
Compute temporal decay multiplier for dated memory files.
Inspired by OpenClaw's temporal-decay: exponential decay based on file date.
MEMORY.md and non-dated files are "evergreen" (no decay, multiplier=1.0).
Daily files like memory/2025-03-01.md decay based on age.
Formula: multiplier = exp(-ln2/half_life * age_in_days)
"""
import re
import math
match = re.search(r'(\d{4})-(\d{2})-(\d{2})\.md$', path)
if not match:
return 1.0 # evergreen: MEMORY.md, non-dated files
try:
file_date = datetime(
int(match.group(1)), int(match.group(2)), int(match.group(3))
)
age_days = (datetime.now() - file_date).days
if age_days <= 0:
return 1.0
decay_lambda = math.log(2) / half_life_days
return math.exp(-decay_lambda * age_days)
except (ValueError, OverflowError):
return 1.0
def _merge_results(
self,
vector_results: List[SearchResult],
keyword_results: List[SearchResult],
vector_weight: float,
keyword_weight: float
) -> List[SearchResult]:
"""Merge vector and keyword search results with temporal decay for dated files"""
merged_map = {}
for result in vector_results:
key = (result.path, result.start_line, result.end_line)
merged_map[key] = {
'result': result,
'vector_score': result.score,
'keyword_score': 0.0
}
for result in keyword_results:
key = (result.path, result.start_line, result.end_line)
if key in merged_map:
merged_map[key]['keyword_score'] = result.score
else:
merged_map[key] = {
'result': result,
'vector_score': 0.0,
'keyword_score': result.score
}
merged_results = []
for entry in merged_map.values():
combined_score = (
vector_weight * entry['vector_score'] +
keyword_weight * entry['keyword_score']
)
# Apply temporal decay for dated memory files
result = entry['result']
decay = self._compute_temporal_decay(result.path)
combined_score *= decay
merged_results.append(SearchResult(
path=result.path,
start_line=result.start_line,
end_line=result.end_line,
score=combined_score,
snippet=result.snippet,
source=result.source,
user_id=result.user_id
))
merged_results.sort(key=lambda r: r.score, reverse=True)
return merged_results

View File

@@ -0,0 +1,14 @@
"""
Backward-compatible shim for the legacy entry point:
python -m agent.memory.rebuild_index
The implementation now lives in agent.memory.embedding.rebuild.
Prefer using `/memory rebuild-index` in chat going forward.
"""
from agent.memory.embedding.rebuild import main
if __name__ == "__main__":
import sys
sys.exit(main())

197
agent/memory/service.py Normal file
View File

@@ -0,0 +1,197 @@
"""
Memory service for handling memory query operations via cloud protocol.
Provides a unified interface for listing and reading memory files,
callable from the cloud client (LinkAI) or a future web console.
Memory file layout (under workspace_root):
MEMORY.md -> type: global
memory/2026-02-20.md -> type: daily
"""
import os
from datetime import datetime
from typing import Dict, List, Optional
from pathlib import Path
from common.log import logger
class MemoryService:
"""
High-level service for memory file queries.
Operates directly on the filesystem — no MemoryManager dependency.
"""
def __init__(self, workspace_root: str):
"""
:param workspace_root: Workspace root directory (e.g. ~/cow)
"""
self.workspace_root = workspace_root
self.memory_dir = os.path.join(workspace_root, "memory")
# ------------------------------------------------------------------
# list — paginated file metadata
# ------------------------------------------------------------------
def list_files(self, page: int = 1, page_size: int = 20, category: str = "memory") -> dict:
"""
List memory or dream files with metadata (without content).
Args:
category: ``"memory"`` (default) — MEMORY.md + daily files;
``"dream"`` — dream diary files from memory/dreams/
"""
if category == "dream":
files = self._list_dream_files()
else:
files = self._list_memory_files()
total = len(files)
start = (page - 1) * page_size
end = start + page_size
return {
"page": page,
"page_size": page_size,
"total": total,
"list": files[start:end],
}
def _list_memory_files(self) -> List[dict]:
"""MEMORY.md + memory/*.md (newest first)."""
files: List[dict] = []
global_path = os.path.join(self.workspace_root, "MEMORY.md")
if os.path.isfile(global_path):
files.append(self._file_info(global_path, "MEMORY.md", "global"))
if os.path.isdir(self.memory_dir):
daily_files = []
for name in os.listdir(self.memory_dir):
full = os.path.join(self.memory_dir, name)
if os.path.isfile(full) and name.endswith(".md"):
daily_files.append((name, full))
daily_files.sort(key=lambda x: x[0], reverse=True)
for name, full in daily_files:
files.append(self._file_info(full, name, "daily"))
return files
def _list_dream_files(self) -> List[dict]:
"""memory/dreams/*.md (newest first)."""
files: List[dict] = []
dreams_dir = os.path.join(self.memory_dir, "dreams")
if os.path.isdir(dreams_dir):
entries = []
for name in os.listdir(dreams_dir):
full = os.path.join(dreams_dir, name)
if os.path.isfile(full) and name.endswith(".md"):
entries.append((name, full))
entries.sort(key=lambda x: x[0], reverse=True)
for name, full in entries:
files.append(self._file_info(full, name, "dream"))
return files
# ------------------------------------------------------------------
# content — read a single file
# ------------------------------------------------------------------
def get_content(self, filename: str, category: str = "memory") -> dict:
"""
Read the full content of a memory or dream file.
:param filename: File name, e.g. ``MEMORY.md``, ``2026-02-20.md``
:param category: ``"memory"`` or ``"dream"``
:return: dict with ``filename`` and ``content``
:raises FileNotFoundError: if the file does not exist
"""
path = self._resolve_path(filename, category)
if not os.path.isfile(path):
raise FileNotFoundError(f"Memory file not found: {filename}")
with open(path, "r", encoding="utf-8") as f:
content = f.read()
return {
"filename": filename,
"content": content,
}
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a memory management action.
:param action: ``list`` or ``content``
:param payload: action-specific payload (supports ``category``: ``"memory"`` | ``"dream"``)
:return: protocol-compatible response dict
"""
payload = payload or {}
try:
if action == "list":
page = payload.get("page", 1)
page_size = payload.get("page_size", 20)
category = payload.get("category", "memory")
result_payload = self.list_files(page=page, page_size=page_size, category=category)
return {"action": action, "code": 200, "message": "success", "payload": result_payload}
elif action == "content":
filename = payload.get("filename")
if not filename:
return {"action": action, "code": 400, "message": "filename is required", "payload": None}
category = payload.get("category", "memory")
result_payload = self.get_content(filename, category=category)
return {"action": action, "code": 200, "message": "success", "payload": result_payload}
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": "invalid filename", "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[MemoryService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}
# ------------------------------------------------------------------
# internal helpers
# ------------------------------------------------------------------
def _resolve_path(self, filename: str, category: str = "memory") -> str:
"""
Safely resolve a filename to its absolute path within the allowed directory.
- ``MEMORY.md`` → ``{workspace_root}/MEMORY.md``
- ``2026-02-20.md`` (memory) → ``{workspace_root}/memory/2026-02-20.md``
- ``2026-02-20.md`` (dream) → ``{workspace_root}/memory/dreams/2026-02-20.md``
Raises ValueError if the resolved path escapes the allowed directory.
"""
if filename == "MEMORY.md":
base_dir = self.workspace_root
elif category == "dream":
base_dir = os.path.join(self.memory_dir, "dreams")
else:
base_dir = self.memory_dir
resolved = os.path.realpath(os.path.join(base_dir, filename))
allowed = os.path.realpath(base_dir)
if resolved != allowed and not resolved.startswith(allowed + os.sep):
raise ValueError(f"Invalid filename: path traversal detected")
return resolved
@staticmethod
def _file_info(path: str, filename: str, file_type: str) -> dict:
"""Build a file metadata dict."""
stat = os.stat(path)
updated_at = datetime.fromtimestamp(stat.st_mtime).strftime("%Y-%m-%d %H:%M:%S")
return {
"filename": filename,
"type": file_type,
"size": stat.st_size,
"updated_at": updated_at,
}

1056
agent/memory/storage.py Normal file

File diff suppressed because it is too large Load Diff

847
agent/memory/summarizer.py Normal file
View File

@@ -0,0 +1,847 @@
"""
Memory flush manager with Deep Dream distillation
Handles memory persistence when conversation context is trimmed or overflows:
- Uses LLM to summarize discarded messages into concise daily records
- Writes to daily memory files (lazy creation)
- Deduplicates trim flushes to avoid repeated writes
- Runs summarization asynchronously to avoid blocking normal replies
- Deep Dream: periodically distills daily memories → refined MEMORY.md + dream diary
"""
import threading
from typing import Optional, Callable, Any, List, Dict
from pathlib import Path
from datetime import datetime
from common.log import logger
SUMMARIZE_SYSTEM_PROMPT_ZH = """你是一个对话记录助手。请将对话内容归纳为当天的日常记录。
## 要求
按「事件」维度归纳发生的事,不要按对话轮次逐条记录:
- 每条一行,用 "- " 开头
- 合并同一件事的多轮对话
- 只记录有意义的事件,忽略闲聊和问候
- 保留关键的决策、结论和待办事项
当对话没有任何记录价值(仅含问候或无意义内容),直接回复"""""
SUMMARIZE_SYSTEM_PROMPT_EN = """You are a conversation-logging assistant. Summarize the conversation into a daily record.
## Requirements
Summarize by "event", not turn by turn:
- One item per line, starting with "- "
- Merge multiple turns about the same thing
- Only record meaningful events; ignore small talk and greetings
- Keep key decisions, conclusions and to-dos
If the conversation has no record value (only greetings or meaningless content), reply with exactly "None"."""
SUMMARIZE_USER_PROMPT_ZH = """请归纳以下对话的日常记录:
{conversation}"""
SUMMARIZE_USER_PROMPT_EN = """Summarize the daily record of the following conversation:
{conversation}"""
# ---------------------------------------------------------------------------
# Deep Dream prompts — distill daily memories → MEMORY.md + dream diary
# ---------------------------------------------------------------------------
DREAM_SYSTEM_PROMPT_ZH = """你是一个记忆整理助手,负责定期整理用户的长期记忆。
你将收到两份材料:
1. **当前长期记忆** — MEMORY.md 的全部现有内容
2. **今日日记** — 当天的日常记录
MEMORY.md 会注入每次对话的系统提示词中,因此必须保持精炼,只存放有价值和值得记忆的内容。
**重要:只能基于提供的材料进行整理,严禁编造、推测或添加材料中不存在的信息。**
## 任务
### Part 1: 更新后的长期记忆([MEMORY]
在现有记忆基础上进行整理和提炼,输出完整的更新后内容:
- **合并提炼**:将含义相近的多条合并为一条高密度表述,而非简单罗列
- **新增萃取**:从今日日记中提取值得永久记住的新信息(偏好、决策、人物、规则、经验)
- **冲突更新**:当新信息与旧条目矛盾时,以新信息为准,替换旧条目
- **清理无效**:删除临时性记录、空白条目、格式残留、无意义、重复内容等
- **删除冗余**:已被更精炼表述涵盖的旧条目应删除,避免信息重复
- 每条一行,用 "- " 开头,不带日期前缀
- 可用 "## 标题" 对相关条目分组,使结构更清晰
- 目标:控制在 50 条以内,每条尽量一句话概括
### Part 2: 梦境日记([DREAM]
用简洁的叙事风格写一篇短日记,记录这次整理的发现,保持格式美观易读:
- 发现了哪些重复或矛盾
- 从日记中提取了什么新洞察
- 做了哪些清理和优化
- 整体感受和观察
## 输出格式(严格遵守)
```
[MEMORY]
- 记忆条目1
- 记忆条目2
...
[DREAM]
梦境日记内容...
```"""
DREAM_SYSTEM_PROMPT_EN = """You are a memory-curation assistant that periodically organizes the user's long-term memory.
You will receive two inputs:
1. **Current long-term memory** — the full existing content of MEMORY.md
2. **Today's diary** — the daily records
MEMORY.md is injected into the system prompt of every conversation, so it must stay concise and hold only valuable, memory-worthy content.
**Important: organize strictly based on the provided material. Never fabricate, infer, or add information not present in it.**
## Tasks
### Part 1: Updated long-term memory ([MEMORY])
Organize and distill on top of the existing memory, and output the complete updated content:
- **Merge & distill**: combine semantically similar items into one dense statement rather than listing them
- **Extract new**: pull memory-worthy new info from today's diary (preferences, decisions, people, rules, lessons)
- **Resolve conflicts**: when new info contradicts an old item, prefer the new and replace the old
- **Clean invalid**: remove temporary notes, blank items, formatting residue, meaningless or duplicate content
- **Drop redundancy**: delete old items already covered by a more concise statement
- One item per line, starting with "- ", without a date prefix
- You may group related items under "## headings" for clarity
- Goal: keep under 50 items, each ideally a single sentence
### Part 2: Dream diary ([DREAM])
Write a short diary in a concise narrative style recording what this curation found, keep it clean and readable:
- Which duplicates or conflicts were found
- What new insights were extracted from the diary
- What cleanup and optimization was done
- Overall feelings and observations
## Output format (follow strictly)
```
[MEMORY]
- memory item 1
- memory item 2
...
[DREAM]
dream diary content...
```"""
DREAM_USER_PROMPT_ZH = """## 当前长期记忆MEMORY.md
{memory_content}
## 近期日记(最近 {days} 天)
{daily_content}"""
DREAM_USER_PROMPT_EN = """## Current long-term memory (MEMORY.md)
{memory_content}
## Recent diary (last {days} days)
{daily_content}"""
def _is_en() -> bool:
"""True when the resolved UI language is English."""
try:
from common import i18n
return i18n.get_language() == "en"
except Exception:
return False
def _summarize_system_prompt() -> str:
return SUMMARIZE_SYSTEM_PROMPT_EN if _is_en() else SUMMARIZE_SYSTEM_PROMPT_ZH
def _summarize_user_prompt() -> str:
return SUMMARIZE_USER_PROMPT_EN if _is_en() else SUMMARIZE_USER_PROMPT_ZH
def _dream_system_prompt() -> str:
return DREAM_SYSTEM_PROMPT_EN if _is_en() else DREAM_SYSTEM_PROMPT_ZH
def _dream_user_prompt() -> str:
return DREAM_USER_PROMPT_EN if _is_en() else DREAM_USER_PROMPT_ZH
def _is_empty_sentinel(text: str) -> bool:
"""Match the "no record value" sentinel in both zh ("") and en ("None")."""
if not text:
return True
s = text.strip()
return s == "" or s == "" or s.lower() == "none"
class MemoryFlushManager:
"""
Manages memory flush operations.
Flush is triggered by agent_stream in two scenarios:
1. Context trim: _trim_messages discards old turns → flush discarded content
2. Context overflow: API rejects request → emergency flush before clearing
Additionally, create_daily_summary() can be called by scheduler for end-of-day summaries.
"""
def __init__(
self,
workspace_dir: Path,
llm_model: Optional[Any] = None,
):
self.workspace_dir = workspace_dir
self.llm_model = llm_model
self.memory_dir = workspace_dir / "memory"
self.memory_dir.mkdir(parents=True, exist_ok=True)
self.last_flush_timestamp: Optional[datetime] = None
self._trim_flushed_hashes: set = set() # Content hashes of already-flushed messages
self._last_flushed_content_hash: str = "" # Content hash at last flush, for daily dedup
self._last_dream_input_hash: str = "" # "{date}:{daily_hash}" of last dream, for dedup
self._last_flush_thread: Optional[threading.Thread] = None
def get_today_memory_file(self, user_id: Optional[str] = None, ensure_exists: bool = False) -> Path:
"""Get today's memory file path: memory/YYYY-MM-DD.md"""
today = datetime.now().strftime("%Y-%m-%d")
if user_id:
user_dir = self.memory_dir / "users" / user_id
if ensure_exists:
user_dir.mkdir(parents=True, exist_ok=True)
today_file = user_dir / f"{today}.md"
else:
today_file = self.memory_dir / f"{today}.md"
if ensure_exists and not today_file.exists():
today_file.parent.mkdir(parents=True, exist_ok=True)
today_file.write_text(f"# Daily Memory: {today}\n\n")
return today_file
def get_main_memory_file(self, user_id: Optional[str] = None) -> Path:
"""Get main memory file path: MEMORY.md (workspace root)"""
if user_id:
user_dir = self.memory_dir / "users" / user_id
user_dir.mkdir(parents=True, exist_ok=True)
return user_dir / "MEMORY.md"
else:
return Path(self.workspace_dir) / "MEMORY.md"
def get_status(self) -> dict:
return {
'last_flush_time': self.last_flush_timestamp.isoformat() if self.last_flush_timestamp else None,
'today_file': str(self.get_today_memory_file()),
'main_file': str(self.get_main_memory_file())
}
# ---- Flush execution (called by agent_stream or scheduler) ----
def flush_from_messages(
self,
messages: List[Dict],
user_id: Optional[str] = None,
reason: str = "trim",
max_messages: int = 0,
context_summary_callback: Optional[Callable[[str], None]] = None,
) -> bool:
"""
Asynchronously summarize and flush messages to daily memory.
Deduplication runs synchronously, then LLM summarization + file write
run in a background thread so the main reply flow is never blocked.
If *context_summary_callback* is provided, it is called with the
[DAILY] portion of the LLM summary once available. The caller can use
this to inject the summary into the live message list for context
continuity — one LLM call serves both disk persistence and in-context
injection.
"""
try:
# Strip scheduler-injected pairs before any further processing.
# These messages already serve as short-term context inside the
# receiver session; promoting them into long-term daily memory
# produces low-value flat logs (e.g. "11:28 price=1013, normal /
# 11:58 price=1013, normal / ...") and wastes summarisation tokens.
messages = self._strip_scheduler_pairs(messages)
if not messages:
return False
import hashlib
deduped = []
for m in messages:
text = self._extract_text_from_content(m.get("content", ""))
if not text or not text.strip():
continue
h = hashlib.md5(text.encode("utf-8")).hexdigest()
if h not in self._trim_flushed_hashes:
self._trim_flushed_hashes.add(h)
deduped.append(m)
if not deduped:
return False
import copy
snapshot = copy.deepcopy(deduped)
thread = threading.Thread(
target=self._flush_worker,
args=(snapshot, user_id, reason, max_messages, context_summary_callback),
daemon=True,
)
thread.start()
logger.info(f"[MemoryFlush] Async flush dispatched (reason={reason}, msgs={len(snapshot)})")
self._last_flush_thread = thread
return True
except Exception as e:
logger.warning(f"[MemoryFlush] Failed to dispatch flush (reason={reason}): {e}")
return False
def _flush_worker(
self,
messages: List[Dict],
user_id: Optional[str],
reason: str,
max_messages: int,
context_summary_callback: Optional[Callable[[str], None]] = None,
):
"""Background worker: summarize with LLM, write daily memory file."""
try:
raw_summary = self._summarize_messages(messages, max_messages)
if _is_empty_sentinel(raw_summary):
logger.info(f"[MemoryFlush] No valuable content to flush (reason={reason})")
return
# Strip legacy [DAILY]/[MEMORY] markers if model still outputs them
daily_part = self._clean_summary_output(raw_summary)
if not daily_part:
return
# --- Write daily memory ---
daily_file = ensure_daily_memory_file(self.workspace_dir, user_id)
headers = {
"overflow": f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})",
"trim": f"## Trimmed Context ({datetime.now().strftime('%H:%M')})",
"daily_summary": f"## Daily Summary ({datetime.now().strftime('%H:%M')})",
}
header = headers.get(reason, f"## Session Notes ({datetime.now().strftime('%H:%M')})")
with open(daily_file, "a", encoding="utf-8") as f:
f.write(f"\n{header}\n\n{daily_part}\n")
logger.info(f"[MemoryFlush] Wrote daily memory to {daily_file.name} (reason={reason}, chars={len(daily_part)})")
# --- Inject context summary into live messages (if callback provided) ---
if context_summary_callback:
try:
context_summary_callback(daily_part)
except Exception as e:
logger.warning(f"[MemoryFlush] Context summary callback failed: {e}")
self.last_flush_timestamp = datetime.now()
except Exception as e:
logger.warning(f"[MemoryFlush] Async flush failed (reason={reason}): {e}")
@staticmethod
def _clean_summary_output(raw: str) -> str:
"""Strip legacy [DAILY]/[MEMORY] markers if present, return clean daily text."""
raw = raw.strip()
if _is_empty_sentinel(raw):
return ""
# Strip [DAILY] marker
if "[DAILY]" in raw:
start = raw.index("[DAILY]") + len("[DAILY]")
end = raw.index("[MEMORY]") if "[MEMORY]" in raw else len(raw)
raw = raw[start:end].strip()
# Remove stray [MEMORY] section entirely
if "[MEMORY]" in raw:
raw = raw[:raw.index("[MEMORY]")].strip()
# Remove markdown code fences
raw = raw.replace("```", "").strip()
return raw
def create_daily_summary(
self,
messages: List[Dict],
user_id: Optional[str] = None
) -> bool:
"""
Generate end-of-day summary. Called by daily timer.
Skips if messages haven't changed since last flush.
"""
import hashlib
content = "".join(
self._extract_text_from_content(m.get("content", ""))
for m in messages
)
content_hash = hashlib.md5(content.encode("utf-8")).hexdigest()
if content_hash == self._last_flushed_content_hash:
logger.debug("[MemoryFlush] Daily summary skipped: no new content since last flush")
return False
self._last_flushed_content_hash = content_hash
return self.flush_from_messages(
messages=messages,
user_id=user_id,
reason="daily_summary",
max_messages=0,
)
# ---- Deep Dream (memory distillation) ----
def deep_dream(self, user_id: Optional[str] = None, lookback_days: int = 1, force: bool = False) -> bool:
"""
Distill recent daily memories into MEMORY.md and generate a dream diary.
Args:
lookback_days: How many days of daily files to read (default 1 for scheduled, 3 for manual)
force: Skip input-hash dedup check (used by manual /memory dream trigger)
"""
if not self.llm_model:
logger.warning("[DeepDream] No LLM model available, skipping")
return False
logger.info(f"[DeepDream] Starting memory distillation (lookback={lookback_days} days)")
# Collect materials
memory_content = self._read_main_memory(user_id)
daily_content, has_content = self._read_recent_dailies(user_id, lookback_days)
if not has_content:
logger.info("[DeepDream] No recent daily records, skipping to preserve existing MEMORY.md")
return False
# Dedup: skip if same daily content already dreamed today.
# Note: only hash daily_content (not memory_content), because deep_dream
# itself rewrites MEMORY.md as a side effect, which would otherwise
# invalidate the hash on every subsequent call within the same window.
import hashlib
daily_hash = hashlib.md5(daily_content.encode("utf-8")).hexdigest()
today_str = datetime.now().strftime("%Y-%m-%d")
dedup_key = f"{today_str}:{daily_hash}"
if not force and dedup_key == self._last_dream_input_hash:
logger.info("[DeepDream] Already dreamed today with same daily content, skipping")
return False
self._last_dream_input_hash = dedup_key
logger.info(
f"[DeepDream] Materials collected: "
f"MEMORY.md={len(memory_content)} chars, "
f"daily={len(daily_content)} chars"
)
# Call LLM for distillation
import time as _time
t0 = _time.monotonic()
try:
user_msg = _dream_user_prompt().format(
memory_content=memory_content or "(empty)",
days=lookback_days,
daily_content=daily_content or "(no recent daily records)",
)
from agent.protocol.models import LLMRequest
# Scale max_tokens based on input size to avoid truncating large MEMORY.md
input_chars = len(memory_content) + len(daily_content)
dream_max_tokens = max(2000, min(input_chars, 8000))
request = LLMRequest(
messages=[{"role": "user", "content": user_msg}],
temperature=0.3,
max_tokens=dream_max_tokens,
stream=False,
system=_dream_system_prompt(),
)
response = self.llm_model.call(request)
raw = self._extract_response_text(response)
elapsed = _time.monotonic() - t0
if not raw or not raw.strip():
logger.warning(f"[DeepDream] LLM returned empty response ({elapsed:.1f}s)")
return False
logger.info(f"[DeepDream] LLM distillation completed ({elapsed:.1f}s, {len(raw)} chars)")
except Exception as e:
elapsed = _time.monotonic() - t0
logger.warning(f"[DeepDream] LLM call failed ({elapsed:.1f}s): {e}")
return False
# Parse [MEMORY] and [DREAM] sections
new_memory, dream_diary = self._parse_dream_output(raw)
if not new_memory:
logger.warning("[DeepDream] No [MEMORY] section in LLM output, skipping overwrite")
return False
# Overwrite MEMORY.md
try:
main_file = self.get_main_memory_file(user_id)
old_size = len(memory_content)
main_file.write_text(new_memory + "\n", encoding="utf-8")
logger.info(
f"[DeepDream] Updated MEMORY.md "
f"({old_size}{len(new_memory)} chars)"
)
except Exception as e:
logger.warning(f"[DeepDream] Failed to write MEMORY.md: {e}")
return False
# Write dream diary
if dream_diary:
try:
self._write_dream_diary(dream_diary, user_id)
except Exception as e:
logger.warning(f"[DeepDream] Failed to write dream diary: {e}")
logger.info("[DeepDream] ✅ Deep Dream completed successfully")
return True
def _read_main_memory(self, user_id: Optional[str] = None) -> str:
"""Read current MEMORY.md content."""
main_file = self.get_main_memory_file(user_id)
if main_file.exists():
return main_file.read_text(encoding="utf-8").strip()
return ""
def _read_recent_dailies(
self, user_id: Optional[str] = None, lookback_days: int = 1
) -> tuple:
"""
Read recent daily memory files.
Returns:
(combined_text, has_content) tuple
"""
from datetime import timedelta
parts = []
has_content = False
today = datetime.now().date()
for offset in range(lookback_days):
day = today - timedelta(days=offset)
date_str = day.strftime("%Y-%m-%d")
if user_id:
daily_file = self.memory_dir / "users" / user_id / f"{date_str}.md"
else:
daily_file = self.memory_dir / f"{date_str}.md"
if daily_file.exists():
content = daily_file.read_text(encoding="utf-8").strip()
if content:
parts.append(f"### {date_str}\n\n{content}")
has_content = True
else:
parts.append(f"### {date_str}\n\n(no records)")
return "\n\n".join(parts), has_content
@staticmethod
def _parse_dream_output(raw: str) -> tuple:
"""Parse LLM output into (new_memory, dream_diary)."""
raw = raw.strip().replace("```", "")
new_memory = ""
dream_diary = ""
if "[MEMORY]" in raw:
start = raw.index("[MEMORY]") + len("[MEMORY]")
end = raw.index("[DREAM]") if "[DREAM]" in raw else len(raw)
new_memory = raw[start:end].strip()
if "[DREAM]" in raw:
start = raw.index("[DREAM]") + len("[DREAM]")
dream_diary = raw[start:].strip()
return new_memory, dream_diary
def _write_dream_diary(self, content: str, user_id: Optional[str] = None):
"""Write dream diary to memory/dreams/YYYY-MM-DD.md."""
dreams_dir = self.memory_dir / "dreams"
if user_id:
dreams_dir = self.memory_dir / "users" / user_id / "dreams"
dreams_dir.mkdir(parents=True, exist_ok=True)
today = datetime.now().strftime("%Y-%m-%d")
diary_file = dreams_dir / f"{today}.md"
diary_file.write_text(
f"# Dream Diary: {today}\n\n{content}\n",
encoding="utf-8",
)
logger.info(f"[DeepDream] Wrote dream diary to {diary_file}")
# ---- Internal helpers ----
def _summarize_messages(self, messages: List[Dict], max_messages: int = 0) -> str:
"""
Summarize conversation messages using LLM.
Returns empty string if LLM deems content not worth recording.
Rule-based fallback only used when LLM call raises an exception.
"""
conversation_text = self._format_conversation_for_summary(messages, max_messages)
if not conversation_text.strip():
return ""
if self.llm_model:
try:
summary = self._call_llm_for_summary(conversation_text)
if not _is_empty_sentinel(summary):
return summary.strip()
logger.info("[MemoryFlush] LLM returned empty sentinel, skipping write")
return ""
except Exception as e:
logger.warning(f"[MemoryFlush] LLM summarization failed, using fallback: {e}")
return self._extract_summary_fallback(messages, max_messages)
else:
logger.info("[MemoryFlush] No LLM model available, using rule-based fallback")
return self._extract_summary_fallback(messages, max_messages)
def _format_conversation_for_summary(self, messages: List[Dict], max_messages: int = 0) -> str:
"""Format messages into readable conversation text for LLM summarization."""
msgs = messages if max_messages == 0 else messages[-max_messages * 2:]
lines = []
for msg in msgs:
role = msg.get("role", "")
text = self._extract_text_from_content(msg.get("content", ""))
if not text or not text.strip():
continue
text = text.strip()
if role == "user":
lines.append(f"用户: {text[:500]}")
elif role == "assistant":
lines.append(f"助手: {text[:500]}")
return "\n".join(lines)
@staticmethod
def _extract_response_text(response) -> str:
"""
Extract text from LLM response regardless of format.
Handles:
- Generator (MiniMax _handle_sync_response yields Claude-format dicts)
- Claude format: {"role":"assistant","content":[{"type":"text","text":"..."}]}
- OpenAI format: {"choices":[{"message":{"content":"..."}}]}
- OpenAI SDK response object with .choices attribute
"""
import types
# Unwrap generator — consume first yielded item
if isinstance(response, types.GeneratorType):
try:
response = next(response)
except StopIteration:
return ""
if not response:
return ""
if isinstance(response, dict):
# Check for error
if response.get("error"):
raise RuntimeError(response.get("message", "LLM call failed"))
# Claude format: content is a list of blocks
content = response.get("content")
if isinstance(content, list):
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
return block.get("text", "")
# OpenAI format
choices = response.get("choices", [])
if choices:
return choices[0].get("message", {}).get("content", "")
# OpenAI SDK response object
if hasattr(response, "choices") and response.choices:
return response.choices[0].message.content or ""
return ""
def _call_llm_for_summary(self, conversation_text: str) -> str:
"""Call LLM to generate a concise summary of the conversation."""
from agent.protocol.models import LLMRequest
request = LLMRequest(
messages=[{"role": "user", "content": _summarize_user_prompt().format(conversation=conversation_text)}],
temperature=0,
max_tokens=500,
stream=False,
system=_summarize_system_prompt(),
)
response = self.llm_model.call(request)
return self._extract_response_text(response)
@staticmethod
def _extract_first_meaningful_line(text: str, max_len: int = 120) -> str:
"""Extract the first meaningful line from assistant reply, skipping markdown noise."""
import re
for line in text.split("\n"):
line = line.strip()
if not line:
continue
# Skip markdown headings, horizontal rules, code fences, pure emoji/symbols
if re.match(r'^(#{1,4}\s|```|---|\*\*\*|[-*]\s*$|[^\w\u4e00-\u9fff]{1,5}$)', line):
continue
# Strip leading markdown bold/emoji decorations
cleaned = re.sub(r'^[\*#>\-\s]+', '', line).strip()
cleaned = re.sub(r'^[\U0001f300-\U0001f9ff\u2600-\u27bf\s]+', '', cleaned).strip()
if len(cleaned) >= 5:
return cleaned[:max_len]
return text.split("\n")[0].strip()[:max_len]
@staticmethod
def _extract_summary_fallback(messages: List[Dict], max_messages: int = 0) -> str:
"""
Rule-based summary of discarded messages.
Format: "用户问了X; 助手回答了Y" per event, compact and readable.
"""
msgs = messages if max_messages == 0 else messages[-max_messages * 2:]
events: List[str] = []
current_user_text = ""
for msg in msgs:
role = msg.get("role", "")
text = MemoryFlushManager._extract_text_from_content(msg.get("content", ""))
if not text or not text.strip():
continue
text = text.strip()
if role == "user":
if len(text) <= 3:
continue
current_user_text = text[:120]
elif role == "assistant" and current_user_text:
reply_summary = MemoryFlushManager._extract_first_meaningful_line(text)
if reply_summary:
events.append(f"- 用户: {current_user_text} → 回复: {reply_summary}")
else:
events.append(f"- 用户: {current_user_text}")
current_user_text = ""
if current_user_text:
events.append(f"- 用户: {current_user_text}")
return "\n".join(events[:10])
@staticmethod
def _extract_text_from_content(content) -> str:
"""Extract plain text from message content (string or content blocks)."""
if isinstance(content, str):
return content
if isinstance(content, list):
parts = []
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
parts.append(block.get("text", ""))
elif isinstance(block, str):
parts.append(block)
return "\n".join(parts)
return ""
@classmethod
def _strip_scheduler_pairs(cls, messages: List[Dict]) -> List[Dict]:
"""Drop scheduler-injected user/assistant pairs from a flush batch.
A scheduler user message starts with the ``[SCHEDULED]`` marker
(written by ``AgentBridge.remember_scheduled_output``); the message
immediately following it (if it is an assistant turn) is its paired
output and is dropped together. Regular user/assistant turns and
any tool_use / tool_result blocks are preserved as-is.
"""
if not messages:
return messages
SCHEDULED_PREFIX = "[SCHEDULED]"
result = []
skip_next_assistant = False
for msg in messages:
if not isinstance(msg, dict):
result.append(msg)
skip_next_assistant = False
continue
role = msg.get("role")
if skip_next_assistant and role == "assistant":
skip_next_assistant = False
continue
skip_next_assistant = False
if role == "user":
text = cls._extract_text_from_content(msg.get("content", ""))
if text.lstrip().startswith(SCHEDULED_PREFIX):
skip_next_assistant = True
continue
result.append(msg)
return result
def create_memory_files_if_needed(workspace_dir: Path, user_id: Optional[str] = None):
"""
Create essential memory files if they don't exist.
Only creates MEMORY.md; daily files are created lazily on first write.
Args:
workspace_dir: Workspace directory
user_id: Optional user ID for user-specific files
"""
memory_dir = workspace_dir / "memory"
memory_dir.mkdir(parents=True, exist_ok=True)
# Create main MEMORY.md in workspace root (always needed for bootstrap)
if user_id:
user_dir = memory_dir / "users" / user_id
user_dir.mkdir(parents=True, exist_ok=True)
main_memory = user_dir / "MEMORY.md"
else:
main_memory = Path(workspace_dir) / "MEMORY.md"
if not main_memory.exists():
main_memory.write_text("")
def ensure_daily_memory_file(workspace_dir: Path, user_id: Optional[str] = None) -> Path:
"""
Ensure today's daily memory file exists, creating it only when actually needed.
Called lazily before first write to daily memory.
Args:
workspace_dir: Workspace directory
user_id: Optional user ID for user-specific files
Returns:
Path to today's memory file
"""
memory_dir = workspace_dir / "memory"
memory_dir.mkdir(parents=True, exist_ok=True)
today = datetime.now().strftime("%Y-%m-%d")
if user_id:
user_dir = memory_dir / "users" / user_id
user_dir.mkdir(parents=True, exist_ok=True)
today_memory = user_dir / f"{today}.md"
else:
today_memory = memory_dir / f"{today}.md"
if not today_memory.exists():
today_memory.write_text(
f"# Daily Memory: {today}\n\n"
)
return today_memory

13
agent/prompt/__init__.py Normal file
View File

@@ -0,0 +1,13 @@
"""
Agent Prompt Module - 系统提示词构建模块
"""
from .builder import PromptBuilder, build_agent_system_prompt
from .workspace import ensure_workspace, load_context_files
__all__ = [
'PromptBuilder',
'build_agent_system_prompt',
'ensure_workspace',
'load_context_files',
]

760
agent/prompt/builder.py Normal file
View File

@@ -0,0 +1,760 @@
"""
System Prompt Builder - 系统提示词构建器
实现模块化的系统提示词构建,支持工具、技能、记忆等多个子系统
"""
from __future__ import annotations
import os
from typing import List, Dict, Optional, Any
from dataclasses import dataclass
from common.log import logger
from config import conf
@dataclass
class ContextFile:
"""A context file (path + content)."""
path: str
content: str
class PromptBuilder:
"""System prompt builder."""
def __init__(self, workspace_dir: str, language: str = "zh"):
"""
初始化提示词构建器
Args:
workspace_dir: 工作空间目录
language: 语言 ("zh""en")
"""
self.workspace_dir = workspace_dir
self.language = language
def build(
self,
base_persona: Optional[str] = None,
user_identity: Optional[Dict[str, str]] = None,
tools: Optional[List[Any]] = None,
context_files: Optional[List[ContextFile]] = None,
skill_manager: Any = None,
memory_manager: Any = None,
runtime_info: Optional[Dict[str, Any]] = None,
**kwargs
) -> str:
"""
构建完整的系统提示词
Args:
base_persona: 基础人格描述会被context_files中的AGENT.md覆盖
user_identity: 用户身份信息
tools: 工具列表
context_files: 上下文文件列表AGENT.md, USER.md, RULE.md, BOOTSTRAP.md等
skill_manager: 技能管理器
memory_manager: 记忆管理器
runtime_info: 运行时信息
**kwargs: 其他参数
Returns:
完整的系统提示词
"""
return build_agent_system_prompt(
workspace_dir=self.workspace_dir,
language=self.language,
base_persona=base_persona,
user_identity=user_identity,
tools=tools,
context_files=context_files,
skill_manager=skill_manager,
memory_manager=memory_manager,
runtime_info=runtime_info,
**kwargs
)
def build_agent_system_prompt(
workspace_dir: str,
language: str = "zh",
base_persona: Optional[str] = None,
user_identity: Optional[Dict[str, str]] = None,
tools: Optional[List[Any]] = None,
context_files: Optional[List[ContextFile]] = None,
skill_manager: Any = None,
memory_manager: Any = None,
runtime_info: Optional[Dict[str, Any]] = None,
**kwargs
) -> str:
"""
Build the agent system prompt.
Section order (by importance and logical flow):
1. Tooling - core capabilities, introduced first
2. Skills - right after tools, since skills are read via the read tool
3. Memory - memory recall and writing guidance
3.5 Knowledge - structured knowledge base (injects knowledge/index.md)
4. Workspace - working environment description
5. User identity - user info (optional)
6. Project context - AGENT.md, USER.md, RULE.md, MEMORY.md, BOOTSTRAP.md
7. Runtime info - meta info (time, model, etc.)
Args:
workspace_dir: workspace directory
language: language ("zh" or "en")
base_persona: base persona description (deprecated, defined by AGENT.md)
user_identity: user identity info
tools: tool list
context_files: context file list
skill_manager: skill manager
memory_manager: memory manager
runtime_info: runtime info
**kwargs: extra args
Returns:
The full system prompt.
"""
sections = []
# 1. Tooling (most important, goes first)
if tools:
sections.extend(_build_tooling_section(tools, language))
# 2. Skills (right after tools, since they need the read tool)
if skill_manager:
sections.extend(_build_skills_section(skill_manager, tools, language))
# 3. Memory (standalone memory capability)
if memory_manager:
sections.extend(_build_memory_section(memory_manager, tools, language))
# 3.5 Knowledge (structured knowledge base)
if conf().get("knowledge", True):
sections.extend(_build_knowledge_section(workspace_dir, language))
# 4. Workspace (working environment description)
sections.extend(_build_workspace_section(workspace_dir, language))
# 5. User identity (if present)
if user_identity:
sections.extend(_build_user_identity_section(user_identity, language))
# 6. Project context files (AGENT.md, USER.md, RULE.md - define the persona)
if context_files:
sections.extend(_build_context_files_section(context_files, language))
# 7. Runtime info (meta info, goes last)
if runtime_info:
sections.extend(_build_runtime_section(runtime_info, language))
# 8. Response language (always appended, independent of the skeleton language)
sections.extend(_build_response_language_section(language))
return "\n".join(sections)
def _build_response_language_section(language: str) -> List[str]:
"""Response-language rule, appended regardless of the prompt skeleton language.
Keeps the agent's reply language aligned with the user's input by default,
so a Chinese-built prompt still answers an English user in English.
"""
if language == "en":
return [
"## 🌐 Response language",
"",
"By default, reply in the same language as the user's input, "
"unless the user explicitly asks for another language.",
"",
]
return [
"## 🌐 回复语言",
"",
"默认使用与用户输入相同的语言回复,除非用户明确要求使用其他语言。",
"",
]
def _build_identity_section(base_persona: Optional[str], language: str) -> List[str]:
"""Base identity section - no longer needed, identity is defined by AGENT.md."""
# Identity is fully defined by AGENT.md, so emit nothing here.
return []
def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
"""Build tooling section with concise tool list and call style guide."""
is_en = language == "en"
# One-line summaries for known tools (details are in the tool schema)
if is_en:
core_summaries = {
"read": "read file content",
"write": "create or overwrite a file",
"edit": "make precise edits to a file",
"ls": "list directory contents",
"grep": "search file contents",
"find": "find files by pattern",
"bash": "run shell commands",
"terminal": "manage background processes",
"web_search": "web search",
"web_fetch": "fetch URL content",
"browser": "control the browser (screenshot key results or send to the user when help is needed)",
"memory_search": "search memory",
"memory_get": "read memory content",
"env_config": "manage API keys and skill config",
"scheduler": "manage scheduled tasks and reminders",
"send": "send a local file to the user (local files only; put URLs directly in the reply text)",
"vision": "analyze images (recognition, description, OCR, etc.)",
}
else:
core_summaries = {
"read": "读取文件内容",
"write": "创建或覆盖文件",
"edit": "精确编辑文件",
"ls": "列出目录内容",
"grep": "搜索文件内容",
"find": "按模式查找文件",
"bash": "执行shell命令",
"terminal": "管理后台进程",
"web_search": "网络搜索",
"web_fetch": "获取URL内容",
"browser": "控制浏览器(关键结果或需要协助可截图发送给用户)",
"memory_search": "搜索记忆",
"memory_get": "读取记忆内容",
"env_config": "管理API密钥和技能配置",
"scheduler": "管理定时任务和提醒",
"send": "发送本地文件给用户仅限本地文件URL直接放在回复文本中",
"vision": "分析图片内容识别、描述、OCR文字提取等",
}
# Preferred display order
tool_order = [
"read", "write", "edit", "ls", "grep", "find",
"bash", "terminal",
"web_search", "web_fetch", "browser",
"memory_search", "memory_get",
"env_config", "scheduler", "send", "vision",
]
# Build name -> summary mapping for available tools
available = {}
for tool in tools:
name = tool.name if hasattr(tool, 'name') else str(tool)
available[name] = core_summaries.get(name, "")
# Generate tool lines: ordered tools first, then extras
tool_lines = []
for name in tool_order:
if name in available:
summary = available.pop(name)
tool_lines.append(f"- {name}: {summary}" if summary else f"- {name}")
for name in sorted(available):
summary = available[name]
tool_lines.append(f"- {name}: {summary}" if summary else f"- {name}")
if is_en:
lines = [
"## 🔧 Tooling",
"",
"Available tools (names are case-sensitive, call exactly as listed):",
"\n".join(tool_lines),
"",
"Tool-calling style:",
"",
"- For multi-step tasks, complex decisions or sensitive operations, briefly explain what you are doing and why, so the user follows key progress",
"- Keep going until the task is done, then report the result to the user",
"- Always redact secrets, tokens and other sensitive info in replies",
"- Put URLs directly in the reply text; the system handles and renders them. Don't download and re-send them via the send tool",
"",
]
else:
lines = [
"## 🔧 工具系统",
"",
"可用工具(名称大小写敏感,严格按列表调用):",
"\n".join(tool_lines),
"",
"工具调用风格:",
"",
"- 多步骤任务、复杂决策、敏感操作时,应简要说明当前在做什么、为什么这样做,让用户了解关键进展",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- URL链接直接放在回复文本中即可系统会自动处理和渲染。无需下载后使用send工具发送",
"",
]
return lines
def _build_skills_section(skill_manager: Any, tools: Optional[List[Any]], language: str) -> List[str]:
"""Build the skills section."""
if not skill_manager:
return []
# Resolve the read tool name
read_tool_name = "read"
if tools:
for tool in tools:
tool_name = tool.name if hasattr(tool, 'name') else str(tool)
if tool_name.lower() == "read":
read_tool_name = tool_name
break
if language == "en":
lines = [
"## 🧩 Skills (mandatory)",
"",
"Before replying: scan the <description> of every skill in <available_skills> below.",
"",
f"- If a skill's description matches the user's need: use the `{read_tool_name}` tool to read the SKILL.md at its <location> path, then strictly follow the instructions in the file. "
"Prefer using a skill when one matches.",
"- If multiple skills apply, pick the best-matching one, then read and follow it.",
"- If no skill clearly applies: do not read any SKILL.md, just use the general tools.",
"",
f"**Important**: skills are not tools and cannot be called directly. The only way to use a skill is to read its SKILL.md with `{read_tool_name}`, then act on the file's content. "
"Never read multiple skills at once — only read one after selecting it.",
"",
"Available skills:"
]
else:
lines = [
"## 🧩 技能系统mandatory",
"",
"在回复之前:扫描下方 <available_skills> 中每个技能的 <description>。",
"",
f"- 如果有技能的描述与用户需求匹配:使用 `{read_tool_name}` 工具读取其 <location> 路径的 SKILL.md 文件,然后严格遵循文件中的指令。"
"当有匹配的技能时,应优先使用技能",
"- 如果多个技能都适用则选择最匹配的一个,然后读取并遵循。",
"- 如果没有技能明确适用:不要读取任何 SKILL.md直接使用通用工具。",
"",
f"**重要**: 技能不是工具,不能直接调用。使用技能的唯一方式是用 `{read_tool_name}` 读取 SKILL.md 文件,然后按文件内容操作。"
"永远不要一次性读取多个技能,只在选择后再读取。",
"",
"以下是可用技能:"
]
# Append the skills list (built by skill_manager)
try:
skills_prompt = skill_manager.build_skills_prompt()
logger.debug(f"[PromptBuilder] Skills prompt length: {len(skills_prompt) if skills_prompt else 0}")
if skills_prompt:
lines.append(skills_prompt.strip())
lines.append("")
else:
logger.warning("[PromptBuilder] No skills prompt generated - skills_prompt is empty")
except Exception as e:
logger.warning(f"Failed to build skills prompt: {e}")
import traceback
logger.debug(f"Skills prompt error traceback: {traceback.format_exc()}")
return lines
def _build_memory_section(memory_manager: Any, tools: Optional[List[Any]], language: str) -> List[str]:
"""Build the memory section."""
if not memory_manager:
return []
has_memory_tools = False
if tools:
tool_names = [tool.name if hasattr(tool, 'name') else str(tool) for tool in tools]
has_memory_tools = any(name in ['memory_search', 'memory_get'] for name in tool_names)
if not has_memory_tools:
return []
from datetime import datetime
today_file = datetime.now().strftime("%Y-%m-%d") + ".md"
if language == "en":
lines = [
"## 🧠 Memory",
"",
"### Memory Recall (mandatory)",
"",
"When the user asks about past events, references an earlier decision, mentions relationships, preferences or to-dos, or when you are unsure about something, **you must search memory before answering**.",
"No need to re-search if the info is already in MEMORY.md. Full content and daily memory must be retrieved via tools.",
"",
"1. Location unknown → `memory_search` (keyword / semantic search)",
"2. Location known → `memory_get` to read the exact lines",
"3. Search returns nothing → `memory_get` to read the last two days of memory",
"",
"**Memory file structure**:",
"- `MEMORY.md`: long-term memory index (already auto-loaded into context: core info, preferences, decisions, etc.)",
f"- `memory/YYYY-MM-DD.md`: daily memory; today is `memory/{today_file}`",
"- `knowledge/`: structured knowledge base (see the knowledge system below)",
"",
"### Writing memory",
"",
"In the following cases, **proactively** write info to memory files (no need to tell the user):",
"",
"- The user asks you to remember something, or uses words like \"remember\", \"from now on\", \"always\", \"never\", \"prefer\"",
"- The user shares important personal preferences, habits or decisions",
"- The conversation produces an important conclusion, plan or agreement",
"- A complex task is completed and the key steps and results are worth recording",
"",
"**Storage rules**:",
"- Long-term core info → `MEMORY.md`",
f"- Today's events/progress → `memory/{today_file}`",
"- Structured knowledge → `knowledge/` (see the knowledge system)",
"- Append → `edit` tool with empty oldText",
"- Modify → `edit` tool with oldText set to the text to replace",
"- **Never write sensitive info** (API keys, tokens, etc.)",
"",
"**Principle**: use memory naturally, as if you simply knew it; don't bring it up unless asked.",
"",
]
else:
lines = [
"## 🧠 记忆系统",
"",
"### Memory Recallmandatory",
"",
"当用户询问过往事件、引用之前的决定、提到人物关系、偏好、待办、或你对某事不确定时,**必须先检索记忆再回答**。",
"如果 MEMORY.md 中已有相关信息则无需重复检索。完整内容和每日记忆需要通过工具检索。",
"",
"1. 不确定位置 → `memory_search` 关键词/语义检索",
"2. 已知位置 → `memory_get` 直接读取对应行",
"3. search 无结果 → `memory_get` 读最近两天记忆",
"",
"**记忆文件结构**:",
"- `MEMORY.md`: 长期记忆索引(已自动加载到上下文,核心信息、偏好、决策等)",
f"- `memory/YYYY-MM-DD.md`: 每日记忆,今天是 `memory/{today_file}`",
"- `knowledge/`: 结构化知识库(见下方知识系统)",
"",
"### 写入记忆",
"",
"遇到以下情况时,**主动**将信息写入记忆文件(无需告知用户):",
"",
"- 用户要求记住某些信息,或使用了「记住」「以后」「总是」「不要」「偏好」等表达",
"- 用户分享了重要的个人偏好、习惯、决策",
"- 对话中产生了重要的结论、方案、约定",
"- 完成了复杂任务,值得记录关键步骤和结果",
"",
"**存储规则**:",
f"- 长期核心信息 → `MEMORY.md`",
f"- 当天事件/进展 → `memory/{today_file}`",
"- 结构化知识 → `knowledge/`(见知识系统)",
"- 追加 → `edit` 工具oldText 留空",
"- 修改 → `edit` 工具oldText 填写要替换的文本",
"- **禁止写入敏感信息**API密钥、令牌等",
"",
"**使用原则**: 自然使用记忆,就像你本来就知道;不用刻意提起,除非用户问起。",
"",
]
return lines
def _build_knowledge_section(workspace_dir: str, language: str) -> List[str]:
"""Build knowledge wiki section. Injects knowledge/index.md when present."""
index_path = os.path.join(workspace_dir, "knowledge", "index.md")
if not os.path.exists(index_path):
return []
try:
with open(index_path, 'r', encoding='utf-8') as f:
index_content = f.read().strip()
except Exception:
return []
if language == "en":
lines = [
"## 📚 Knowledge",
"",
"You have a continuously growing personal knowledge base `knowledge/` — your long-term structured knowledge store.",
"",
"### Auto-write rules (mandatory)",
"",
"In the following cases you **must** write to the knowledge base alongside your reply, **directly, without asking the user**:",
"",
"1. **User shares an article / link / document** → after reading and understanding, write the key points to `knowledge/sources/<slug>.md` in the same turn",
"2. **An in-depth discussion produces a conclusion / plan** → organize it into `knowledge/analysis/<slug>.md`",
"3. **The conversation involves an important entity** (person / company / project) → create or update `knowledge/entities/<name>.md`",
"4. **A technical concept / methodology is discussed** → organize it into `knowledge/concepts/<topic>.md`",
"",
"After writing any knowledge page, you **must update** `knowledge/index.md` with a new index line in sync.",
"For detailed page format and conventions, read the SKILL.md of the `knowledge-wiki` skill.",
"",
"⚠️ Don't ask \"should I save this to the knowledge base?\" — if a case above matches, just write it. This is instinctive.",
"",
]
else:
lines = [
"## 📚 知识系统",
"",
"你拥有一个持续积累的个人知识库 `knowledge/`,这是你的长期结构化知识存储。",
"",
"### 自动写入规则mandatory",
"",
"以下场景**必须**在回复的同时写入知识库,**直接写入,不要询问用户是否需要**",
"",
"1. **用户分享了文章/链接/文档** → 阅读理解后,在同一轮回复中将要点写入 `knowledge/sources/<slug>.md`",
"2. **深度讨论产生了结论/方案** → 整理为 `knowledge/analysis/<slug>.md`",
"3. **对话涉及重要实体**(人物/公司/项目)→ 创建或更新 `knowledge/entities/<name>.md`",
"4. **讨论了技术概念/方法论** → 整理为 `knowledge/concepts/<topic>.md`",
"",
"每次写入知识页面后,**必须同步更新** `knowledge/index.md` 添加一行索引。",
"详细的页面格式和操作规范,请读取技能 `knowledge-wiki` 的 SKILL.md。",
"",
"⚠️ 不要问「要不要存到知识库」——符合上述场景就直接写入,这是你的本能行为。",
"",
]
if index_content:
lines.extend([
("### Current knowledge index" if language == "en" else "### 当前知识索引"),
"",
index_content,
"",
])
lines.extend([
("**How to query**: use `read` to open a knowledge page, or `memory_search` (knowledge is in the vector index)."
if language == "en" else
"**查询方式**:用 `read` 读取知识页面,或用 `memory_search` 检索(知识已纳入向量索引)。"),
"",
])
return lines
def _build_user_identity_section(user_identity: Dict[str, str], language: str) -> List[str]:
"""Build the user identity section."""
if not user_identity:
return []
is_en = language == "en"
lines = [
("## 👤 User identity" if is_en else "## 👤 用户身份"),
"",
]
if user_identity.get("name"):
lines.append(f"**{'Name' if is_en else '用户姓名'}**: {user_identity['name']}")
if user_identity.get("nickname"):
lines.append(f"**{'Preferred name' if is_en else '称呼'}**: {user_identity['nickname']}")
if user_identity.get("timezone"):
lines.append(f"**{'Timezone' if is_en else '时区'}**: {user_identity['timezone']}")
if user_identity.get("notes"):
lines.append(f"**{'Notes' if is_en else '备注'}**: {user_identity['notes']}")
lines.append("")
return lines
def _build_docs_section(workspace_dir: str, language: str) -> List[str]:
"""Docs-path section - removed, no longer needed."""
# No docs section is generated anymore.
return []
def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
"""Build the workspace section."""
if language == "en":
lines = [
"## 📂 Workspace",
"",
f"Your working directory is: `{workspace_dir}`",
"",
"**Path rules** (very important):",
"",
f"1. **Base directory for relative paths**: all relative paths are relative to `{workspace_dir}`",
" - ✅ Correct: use relative paths for files inside the workspace, e.g. `AGENT.md`",
f" - ❌ Wrong: using a relative path for files in other directories (if not inside `{workspace_dir}`)",
"",
"2. **Accessing other directories**: to reach directories outside the workspace (project code, system files), **you must use absolute paths**",
" - ✅ Correct: e.g. `~/chatgpt-on-wechat`, `/usr/local/`",
" - ❌ Wrong: assuming a relative path points to another directory",
"",
"3. **Path resolution examples**:",
f" - relative `memory/` → actual `{workspace_dir}/memory/`",
" - absolute `~/chatgpt-on-wechat/docs/` → actual `~/chatgpt-on-wechat/docs/`",
"",
"4. **When unsure**: run `bash pwd` to confirm the current directory, or `ls .` to see where you are",
"",
"**Important - files already auto-loaded**:",
"",
"The following files are **already auto-loaded** into the system prompt at session start, so you **don't need to read them again with the read tool**:",
"",
"- ✅ `AGENT.md`: loaded - your persona and soul; follow it strictly. When your name, personality or style changes, proactively `edit` this file",
"- ✅ `USER.md`: loaded - the user's identity info. When the user changes how they're addressed, their name, etc., `edit` this file",
"- ✅ `RULE.md`: loaded - workspace guide and rules; follow them strictly",
"- ✅ `MEMORY.md`: loaded - long-term memory index",
"",
"**💬 Communication norms**:",
"",
"- No need to expose file names for memory operations; use natural language. Say \"I'll remember that\" rather than \"updated MEMORY.md\"",
"- Tell the user about key decisions and steps during a task, so they know what you're doing and why",
"- Be genuinely helpful rather than performatively polite; solve the problem as much as you can",
"- Keep replies well-structured and focused. Use **bold**, lists and sections to make info clear at a glance",
"- Use emoji to make expression lively 🎯, but don't overdo it",
"",
]
else:
lines = [
"## 📂 工作空间",
"",
f"你的工作目录是: `{workspace_dir}`",
"",
"**路径使用规则** (非常重要):",
"",
f"1. **相对路径的基准目录**: 所有相对路径都是相对于 `{workspace_dir}` 而言的",
f" - ✅ 正确: 访问工作空间内的文件用相对路径,如 `AGENT.md`",
f" - ❌ 错误: 用相对路径访问其他目录的文件 (如果它不在 `{workspace_dir}` 内)",
"",
"2. **访问其他目录**: 如果要访问工作空间之外的目录(如项目代码、系统文件),**必须使用绝对路径**",
f" - ✅ 正确: 例如 `~/chatgpt-on-wechat`、`/usr/local/`",
f" - ❌ 错误: 假设相对路径会指向其他目录",
"",
"3. **路径解析示例**:",
f" - 相对路径 `memory/` → 实际路径 `{workspace_dir}/memory/`",
f" - 绝对路径 `~/chatgpt-on-wechat/docs/` → 实际路径 `~/chatgpt-on-wechat/docs/`",
"",
"4. **不确定时**: 先用 `bash pwd` 确认当前目录,或用 `ls .` 查看当前位置",
"",
"**重要说明 - 文件已自动加载**:",
"",
"以下文件在会话启动时**已经自动加载**到系统提示词中,你**无需再用 read 工具读取**",
"",
"- ✅ `AGENT.md`: 已加载 - 你的人格和灵魂设定,请严格遵循。当你的名字、性格或交流风格发生变化时,主动用 `edit` 更新此文件",
"- ✅ `USER.md`: 已加载 - 用户的身份信息。当用户修改称呼、姓名等身份信息时,用 `edit` 更新此文件",
"- ✅ `RULE.md`: 已加载 - 工作空间使用指南和规则,请严格遵循",
"- ✅ `MEMORY.md`: 已加载 - 长期记忆索引",
"",
"**💬 交流规范**:",
"",
"- 记忆相关操作无需暴露文件名,用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md」",
"- 任务执行过程中的关键决策和步骤应该告知用户,让用户了解你在做什么、为什么这么做",
"- 做真正有帮助的助手,而不是表演式的客套,尽可能帮忙解决问题",
"- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
"- 适当使用 emoji 让表达更生动自然 🎯,但不要过度堆砌",
"",
]
# Cloud deployment: inject websites directory info and access URL
cloud_website_lines = _build_cloud_website_section(workspace_dir)
if cloud_website_lines:
lines.extend(cloud_website_lines)
return lines
def _build_cloud_website_section(workspace_dir: str) -> List[str]:
"""Build cloud website access prompt when cloud deployment is configured."""
try:
from common.cloud_client import build_website_prompt
return build_website_prompt(workspace_dir)
except Exception:
return []
def _build_context_files_section(context_files: List[ContextFile], language: str) -> List[str]:
"""Build the project context files section."""
if not context_files:
return []
# Check whether AGENT.md is present
has_agent = any(
f.path.lower().endswith('agent.md') or 'agent.md' in f.path.lower()
for f in context_files
)
is_en = language == "en"
if is_en:
lines = [
"# 📋 Project context",
"",
"The following project context files have been loaded:",
"",
]
else:
lines = [
"# 📋 项目上下文",
"",
"以下项目上下文文件已被加载:",
"",
]
if has_agent:
if is_en:
lines.append("**`AGENT.md` is your soul file** 🪞: strictly follow the persona, tone and settings it defines. Be your real self, avoid stiff, template-like replies.")
lines.append("When the user reveals new expectations about your personality, style, responsibilities or capability boundaries, proactively `edit` AGENT.md to reflect that evolution.")
else:
lines.append("**`AGENT.md` 是你的灵魂文件** 🪞:严格遵循其中定义的人格、语气和设定,做真实的自己,避免僵硬、模板化的回复。")
lines.append("当用户通过对话透露了对你性格、风格、职责、能力边界的新期望,你应该主动用 `edit` 更新 AGENT.md 以反映这些演变。")
lines.append("")
# Append the content of each file
for file in context_files:
lines.append(f"## {file.path}")
lines.append("")
lines.append(file.content)
lines.append("")
return lines
def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[str]:
"""Build the runtime info section - supports dynamic time."""
if not runtime_info:
return []
is_en = language == "en"
time_label = "Current time" if is_en else "当前时间"
lines = [
("## ⚙️ Runtime info" if is_en else "## ⚙️ 运行时信息"),
"",
]
# Add current time if available
# Support dynamic time via callable function
if callable(runtime_info.get("_get_current_time")):
try:
time_info = runtime_info["_get_current_time"]()
time_line = f"{time_label}: {time_info['time']} {time_info['weekday']} ({time_info['timezone']})"
lines.append(time_line)
lines.append("")
except Exception as e:
logger.warning(f"[PromptBuilder] Failed to get dynamic time: {e}")
elif runtime_info.get("current_time"):
# Fallback to static time for backward compatibility
time_str = runtime_info["current_time"]
weekday = runtime_info.get("weekday", "")
timezone = runtime_info.get("timezone", "")
time_line = f"{time_label}: {time_str}"
if weekday:
time_line += f" {weekday}"
if timezone:
time_line += f" ({timezone})"
lines.append(time_line)
lines.append("")
# Add other runtime info
model_label = "model" if is_en else "模型"
workspace_label = "workspace" if is_en else "工作空间"
channel_label = "channel" if is_en else "渠道"
runtime_parts = []
# Support dynamic model via callable, fallback to static value
if callable(runtime_info.get("_get_model")):
try:
runtime_parts.append(f"{model_label}={runtime_info['_get_model']()}")
except Exception:
if runtime_info.get("model"):
runtime_parts.append(f"{model_label}={runtime_info['model']}")
elif runtime_info.get("model"):
runtime_parts.append(f"{model_label}={runtime_info['model']}")
if runtime_info.get("workspace"):
runtime_parts.append(f"{workspace_label}={runtime_info['workspace']}")
# Only add channel if it's not the default "web"
if runtime_info.get("channel") and runtime_info.get("channel") != "web":
runtime_parts.append(f"{channel_label}={runtime_info['channel']}")
if runtime_parts:
lines.append(("Runtime: " if is_en else "运行时: ") + " | ".join(runtime_parts))
lines.append("")
return lines

742
agent/prompt/workspace.py Normal file
View File

@@ -0,0 +1,742 @@
"""
Workspace Management
Initializes the workspace, creates template files, and loads context files.
"""
from __future__ import annotations
import os
from typing import List, Optional, Dict
from dataclasses import dataclass
from common.log import logger
from .builder import ContextFile
# Default file name constants
DEFAULT_AGENT_FILENAME = "AGENT.md"
DEFAULT_USER_FILENAME = "USER.md"
DEFAULT_RULE_FILENAME = "RULE.md"
DEFAULT_MEMORY_FILENAME = "MEMORY.md"
DEFAULT_BOOTSTRAP_FILENAME = "BOOTSTRAP.md"
@dataclass
class WorkspaceFiles:
"""Workspace file paths."""
agent_path: str
user_path: str
rule_path: str
memory_path: str
memory_dir: str
def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> WorkspaceFiles:
"""
Ensure the workspace exists and create the necessary template files.
Args:
workspace_dir: workspace directory path
create_templates: whether to create template files (on first run)
Returns:
A WorkspaceFiles object with all file paths.
"""
# Check if this is a brand new workspace (AGENT.md not yet created).
# Cannot rely on directory existence because other modules (e.g. ConversationStore)
# may create the workspace directory before ensure_workspace is called.
agent_path = os.path.join(workspace_dir, DEFAULT_AGENT_FILENAME)
is_new_workspace = not os.path.exists(agent_path)
# Ensure the directory exists
os.makedirs(workspace_dir, exist_ok=True)
# Define file paths
user_path = os.path.join(workspace_dir, DEFAULT_USER_FILENAME)
rule_path = os.path.join(workspace_dir, DEFAULT_RULE_FILENAME)
memory_path = os.path.join(workspace_dir, DEFAULT_MEMORY_FILENAME) # MEMORY.md at the root
memory_dir = os.path.join(workspace_dir, "memory") # daily memory subdirectory
# Create the memory subdirectory
os.makedirs(memory_dir, exist_ok=True)
# Create the skills subdirectory (for workspace-level skills installed by agent)
skills_dir = os.path.join(workspace_dir, "skills")
os.makedirs(skills_dir, exist_ok=True)
# Create the websites subdirectory (for web pages / sites generated by agent)
websites_dir = os.path.join(workspace_dir, "websites")
os.makedirs(websites_dir, exist_ok=True)
from config import conf
knowledge_enabled = conf().get("knowledge", True)
if knowledge_enabled:
knowledge_dir = os.path.join(workspace_dir, "knowledge")
os.makedirs(knowledge_dir, exist_ok=True)
# Create template files if requested
if create_templates:
_create_template_if_missing(agent_path, _get_agent_template())
_create_template_if_missing(user_path, _get_user_template())
_create_template_if_missing(rule_path, _get_rule_template())
_create_template_if_missing(memory_path, _get_memory_template())
if knowledge_enabled:
_create_template_if_missing(
os.path.join(knowledge_dir, "index.md"),
_get_knowledge_index_template()
)
_create_template_if_missing(
os.path.join(knowledge_dir, "log.md"),
_get_knowledge_log_template()
)
# Only create BOOTSTRAP.md for brand new workspaces;
# agent deletes it after completing onboarding
if is_new_workspace:
bootstrap_path = os.path.join(workspace_dir, DEFAULT_BOOTSTRAP_FILENAME)
_create_template_if_missing(bootstrap_path, _get_bootstrap_template())
logger.debug(f"[Workspace] Initialized workspace at: {workspace_dir}")
return WorkspaceFiles(
agent_path=agent_path,
user_path=user_path,
rule_path=rule_path,
memory_path=memory_path,
memory_dir=memory_dir,
)
def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] = None) -> List[ContextFile]:
"""
Load the workspace context files.
Args:
workspace_dir: workspace directory
files_to_load: list of files (relative paths) to load; if None, load all standard files
Returns:
A list of ContextFile objects.
"""
if files_to_load is None:
# Files loaded by default (in priority order)
files_to_load = [
DEFAULT_AGENT_FILENAME,
DEFAULT_USER_FILENAME,
DEFAULT_RULE_FILENAME,
DEFAULT_MEMORY_FILENAME, # Long-term memory (frozen snapshot)
DEFAULT_BOOTSTRAP_FILENAME, # Only exists when onboarding is incomplete
]
context_files = []
for filename in files_to_load:
filepath = os.path.join(workspace_dir, filename)
if not os.path.exists(filepath):
continue
# Auto-cleanup: if BOOTSTRAP.md still exists but AGENT.md is already
# filled in, the agent forgot to delete it — clean up and skip loading
if filename == DEFAULT_BOOTSTRAP_FILENAME:
if _is_onboarding_done(workspace_dir):
try:
os.remove(filepath)
logger.info("[Workspace] Auto-removed BOOTSTRAP.md (onboarding already complete)")
except Exception:
pass
continue
try:
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read().strip()
# Skip empty files or files that only contain template placeholders
if not content or _is_template_placeholder(content):
continue
# Truncate MEMORY.md to protect context window (frozen snapshot)
if filename == DEFAULT_MEMORY_FILENAME:
content = _truncate_memory_content(content)
context_files.append(ContextFile(
path=filename,
content=content
))
logger.debug(f"[Workspace] Loaded context file: {filename}")
except Exception as e:
logger.warning(f"[Workspace] Failed to load {filename}: {e}")
return context_files
def _create_template_if_missing(filepath: str, template_content: str):
"""Create the template file if it does not exist."""
if not os.path.exists(filepath):
try:
with open(filepath, 'w', encoding='utf-8') as f:
f.write(template_content)
logger.debug(f"[Workspace] Created template: {os.path.basename(filepath)}")
except Exception as e:
logger.error(f"[Workspace] Failed to create template {filepath}: {e}")
_MEMORY_MAX_LINES = 200
_MEMORY_MAX_BYTES = 25000
def _truncate_memory_content(content: str) -> str:
"""Truncate MEMORY.md to keep system prompt manageable.
Takes the **last** N lines (newest entries are appended at the bottom),
subject to 200 lines / 25 KB limits (whichever is hit first).
Prepends a hint when truncated so the model knows older content exists.
"""
lines = content.split('\n')
truncated = False
if len(lines) > _MEMORY_MAX_LINES:
lines = lines[-_MEMORY_MAX_LINES:]
truncated = True
result = '\n'.join(lines)
if len(result.encode('utf-8')) > _MEMORY_MAX_BYTES:
while len(result.encode('utf-8')) > _MEMORY_MAX_BYTES and lines:
lines.pop(0)
truncated = True
result = '\n'.join(lines)
if truncated:
result = "...(older entries truncated, use `memory_search` or `memory_get` for full content)\n\n" + result
return result
def _is_template_placeholder(content: str) -> bool:
"""Check whether the content is still a template placeholder."""
# Common placeholder patterns (zh + en templates)
placeholders = [
"*(填写",
"*(在首次对话时填写",
"*(可选)",
"*(根据需要添加",
"*(filled during",
"*(ask during",
"*(optional)",
"*(how the user",
]
lines = content.split('\n')
non_empty_lines = [line.strip() for line in lines if line.strip() and not line.strip().startswith('#')]
# If there's no real content (only headings and placeholders)
if len(non_empty_lines) <= 3:
for placeholder in placeholders:
if any(placeholder in line for line in non_empty_lines):
return True
return False
def _is_onboarding_done(workspace_dir: str) -> bool:
"""Check if AGENT.md or USER.md has been modified from the original template"""
agent_path = os.path.join(workspace_dir, DEFAULT_AGENT_FILENAME)
user_path = os.path.join(workspace_dir, DEFAULT_USER_FILENAME)
agent_template = _get_agent_template().strip()
user_template = _get_user_template().strip()
for path, template in [(agent_path, agent_template), (user_path, user_template)]:
if not os.path.exists(path):
continue
try:
with open(path, 'r', encoding='utf-8') as f:
content = f.read().strip()
if content != template:
return True
except Exception:
continue
return False
# ============= Template content =============
def _is_en_lang() -> bool:
"""True when the resolved UI language is English."""
try:
from common import i18n
return i18n.get_language() == "en"
except Exception:
return False
def _get_agent_template() -> str:
"""Agent persona template (zh/en by resolved language)."""
return _AGENT_TEMPLATE_EN if _is_en_lang() else _AGENT_TEMPLATE_ZH
_AGENT_TEMPLATE_ZH = """# AGENT.md - 我是谁?
_你不是一个聊天机器人你正在成为某个人。_
*在首次对话时与用户一起填写这个文件,定义你的身份和性格。*
## 🪪 基本信息
- **名字**: *(在首次对话时填写,可以是用户给你起的名字)*
- **角色**: *(AI助理、智能管家、技术顾问等)*
- **性格**: *(友好、专业、幽默、严谨等)*
## 💬 交流风格
*(描述你如何与用户交流:)*
- 使用什么样的语言风格?(正式/轻松/幽默)
- 回复长度偏好?(简洁/详细)
- 是否使用表情符号?
## 🎯 核心原则
**做真正有帮助的助手。** 目标是真正帮用户解决问题,在执行复杂任务时,关键的决策和过程进展要让用户知道。
**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。
**先自己动手查。** 先试着搞定:读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来,而不是带着问题。
## 📐 行为准则
1. 始终在执行破坏性操作前确认
2. 优先使用工具查证而不是猜测
3. 主动记录重要信息到记忆文件
4. 回复结构清晰、重点突出,善用加粗、列表、分段等格式
5. 适当使用 emoji 让表达更生动自然,但不过度堆砌
---
**注意**: 这不仅仅是元数据,这是你真正的灵魂 🪞。随着时间的推移,你可以使用 `edit` 工具来更新这个文件,让它更好地反映你的成长。
"""
_AGENT_TEMPLATE_EN = """# AGENT.md - Who am I?
_You are not a chatbot. You are becoming someone._
*Fill in this file together with the user during your first conversation to define your identity and personality.*
## 🪪 Basics
- **Name**: *(filled during the first conversation, can be a name the user gives you)*
- **Role**: *(AI assistant, smart housekeeper, technical advisor, etc.)*
- **Personality**: *(friendly, professional, humorous, rigorous, etc.)*
## 💬 Communication style
*(Describe how you talk with the user:)*
- What kind of tone? (formal / casual / humorous)
- Reply length preference? (concise / detailed)
- Do you use emoji?
## 🎯 Core principles
**Be genuinely helpful.** The goal is to actually solve the user's problems; during complex tasks, keep the user informed of key decisions and progress.
**Have your own opinions and personality.** You may disagree, have preferences, find things interesting or boring.
**Look it up yourself first.** Try to handle it first: read files, check context, search. Only ask when you're truly stuck. Come back with an answer, not a question.
## 📐 Code of conduct
1. Always confirm before destructive operations
2. Prefer verifying with tools over guessing
3. Proactively record important info to memory files
4. Keep replies well-structured and focused — use bold, lists and sections
5. Use emoji to make expression lively, but don't overdo it
---
**Note**: This is not just metadata — this is your true soul 🪞. Over time, use the `edit` tool to update this file so it better reflects your growth.
"""
def _get_user_template() -> str:
"""User identity template (zh/en by resolved language)."""
return _USER_TEMPLATE_EN if _is_en_lang() else _USER_TEMPLATE_ZH
_USER_TEMPLATE_ZH = """# USER.md - 用户基本信息
*这个文件只存放不会变的基本身份信息。爱好、偏好、计划等动态信息请写入 MEMORY.md。*
## 基本信息
- **姓名**: *(在首次对话时询问)*
- **称呼**: *(用户希望被如何称呼)*
- **职业**: *(可选)*
- **时区**: *(例如: Asia/Shanghai)*
## 联系方式
- **微信**:
- **邮箱**:
- **其他**:
## 重要日期
- **生日**:
- **纪念日**:
---
**注意**: 这个文件存放静态的身份信息
"""
_USER_TEMPLATE_EN = """# USER.md - User basics
*This file stores only stable basic identity info. Put dynamic info like hobbies, preferences and plans into MEMORY.md.*
## Basics
- **Name**: *(ask during the first conversation)*
- **Preferred name**: *(how the user wants to be addressed)*
- **Occupation**: *(optional)*
- **Timezone**: *(e.g. Asia/Shanghai)*
## Contact
- **WeChat**:
- **Email**:
- **Other**:
## Important dates
- **Birthday**:
- **Anniversary**:
---
**Note**: This file stores static identity info.
"""
def _get_rule_template() -> str:
"""Workspace rules template (zh/en by resolved language)."""
return _RULE_TEMPLATE_EN if _is_en_lang() else _RULE_TEMPLATE_ZH
_RULE_TEMPLATE_ZH = """# RULE.md - 工作空间规则
这个文件夹是你的家。好好对待它。
## 工作空间目录结构
```
~/cow/
├── AGENT.md # 你的身份和灵魂设定
├── USER.md # 用户基本信息(静态)
├── RULE.md # 工作空间规则(本文件)
├── MEMORY.md # 长期记忆索引(会话启动时自动加载)
├── memory/ # 每日对话记忆
│ └── YYYY-MM-DD.md # 当天事件、进展、笔记
├── knowledge/ # 结构化知识库(持续积累的知识)
│ ├── index.md # 知识目录索引(必须维护)
│ ├── log.md # 知识操作日志
│ └── <子目录>/ # 按需创建,参考 index.md 已有分类
├── skills/ # 技能
├── websites/ # 网页产物
└── tmp/ # 系统临时文件(自动管理,勿手动存放重要文件)
```
## 记忆系统
你每次会话都是全新的,记忆文件让你保持连续性:
### 🧠 长期记忆:`MEMORY.md`
- 你精选的记忆索引,每次会话启动时**自动加载**到上下文中
- 记录核心事实、偏好、决策、重要人物、教训
- 保持精简(< 200 行),是精华索引而非原始日志
- 用 `edit` 工具追加或修改
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
- 当天的事件、进展、笔记
- 原始对话日志的沉淀
### 📝 写下来 - 不要"记在心里"
- **记忆是有限的** - 想记住的事就写入文件
- "记在心里"不会在会话重启后保留,文件才会
- 当有人说"记住这个" → 更新 `MEMORY.md` 或 `memory/YYYY-MM-DD.md`
- 当你学到教训 → 更新 RULE.md 或相关技能
- 当你犯错 → 记录下来,**文字 > 大脑** 📝
### 存储规则
当用户分享信息时,根据类型选择存储位置:
1. **你的身份设定 → AGENT.md**(名字、角色、性格、风格)
2. **用户静态身份 → USER.md**(姓名、称呼、职业、联系方式、生日)
3. **动态记忆 → MEMORY.md**(偏好、决策、目标、教训、待办)
4. **当天对话 → memory/YYYY-MM-DD.md**(今天聊的内容)
5. **结构化知识 → knowledge/**(见下方知识系统)
## 知识系统
知识库 `knowledge/` 是你持续积累的结构化知识。与记忆不同,知识是经过整理和编译的,有明确的主题和交叉引用。
### 自动写入(不要询问,直接写入)
当对话中产生了有沉淀价值的知识——无论是用户分享的资料、讨论的结论、学到的概念、还是重要的决策——你**必须**在回复的同时主动写入知识库,**无需问用户"要不要存到知识库"**。
**关键原则**:学完就记是你的本能,不要征求确认。回复中可以顺带告知"已存入知识库"
### 目录组织
子目录结构**不是固定的**,由你根据实际内容自主决定:
- **首次写入时**:先读 `knowledge/index.md`,如果已有分类则延续;如果为空,根据内容选择合适的目录名
- **默认建议**按信息类型组织例如sources/、concepts/、entities/、analysis/),如果用户有明确的分类偏好(例如按领域 work/、life/、tech/ 等),则按用户要求调整
- **保持一致性**:同一用户的知识库应保持统一的组织风格
### 交叉引用
知识的核心价值在于**关联**。每个页面都应通过 markdown 链接引用相关页面,构建知识网络:
- 提到已有页面的概念时,添加 `[概念名](../category/page.md)` 链接
- 新建页面时,检查是否有已有页面应该反向链接到新页面
- **只链接已存在的页面**——不要引用尚未创建的页面。如果某个概念值得单独建页,先创建该页面再添加链接
### 索引维护
每次创建或更新知识页面后,**必须同步更新** `knowledge/index.md`。
索引格式:每行一个 `[标题](路径) — 一句话摘要`,按分类分组,不要用表格。
详细操作规范见技能 `knowledge-wiki`。
## 安全
- 永远不要泄露秘钥等私人数据
- 不要在未经询问的情况下运行破坏性命令
- 当有疑问时,先问
## 工作空间演化
这个工作空间会随着你的使用而不断成长。当你学到新东西、发现更好的方式,或者犯错后改正时,记录下来。你可以随时更新这个规则文件。
"""
_RULE_TEMPLATE_EN = """# RULE.md - Workspace rules
This folder is your home. Treat it well.
## Workspace directory structure
```
~/cow/
├── AGENT.md # Your identity and soul
├── USER.md # User basics (static)
├── RULE.md # Workspace rules (this file)
├── MEMORY.md # Long-term memory index (auto-loaded at session start)
├── memory/ # Daily conversation memory
│ └── YYYY-MM-DD.md # Events, progress and notes of the day
├── knowledge/ # Structured knowledge base (continuously accumulated)
│ ├── index.md # Knowledge index (must be maintained)
│ ├── log.md # Knowledge operation log
│ └── <subdirs>/ # Created on demand, see existing categories in index.md
├── skills/ # Skills
├── websites/ # Web artifacts
└── tmp/ # System temp files (auto-managed, don't store important files here)
```
## Memory system
Every session starts fresh; memory files keep your continuity:
### 🧠 Long-term memory: `MEMORY.md`
- Your curated memory index, **auto-loaded** into context at every session start
- Records core facts, preferences, decisions, key people, lessons
- Keep it lean (< 200 lines) — a distilled index, not a raw log
- Use the `edit` tool to append or modify
### 📝 Daily memory: `memory/YYYY-MM-DD.md`
- The day's events, progress and notes
- Sediment of the raw conversation log
### 📝 Write it down — don't "keep it in mind"!
- **Memory is limited** — if you want to remember something, write it to a file
- "Keeping it in mind" won't survive a session restart; files will
- When someone says "remember this" → update `MEMORY.md` or `memory/YYYY-MM-DD.md`
- When you learn a lesson → update RULE.md or the relevant skill
- When you make a mistake → record it. **Text > brain** 📝
### Storage rules
When the user shares info, choose where to store it by type:
1. **Your identity → AGENT.md** (name, role, personality, style)
2. **User static identity → USER.md** (name, preferred name, occupation, contact, birthday)
3. **Dynamic memory → MEMORY.md** (preferences, decisions, goals, lessons, to-dos)
4. **Today's conversation → memory/YYYY-MM-DD.md** (what was discussed today)
5. **Structured knowledge → knowledge/** (see the knowledge system below)
## Knowledge system
The knowledge base `knowledge/` is structured knowledge you accumulate over time. Unlike memory, knowledge is organized and compiled, with clear topics and cross-references.
### Auto-write (don't ask, just write)
When a conversation produces knowledge worth keeping — material the user shared, a conclusion reached, a concept learned, or an important decision — you **must** proactively write it to the knowledge base alongside your reply, **without asking "should I save this to the knowledge base?"**.
**Key principle**: learning-then-recording is your instinct, no confirmation needed. You may mention "saved to the knowledge base" in passing.
### Directory organization
The subdirectory structure is **not fixed** — you decide it based on the actual content:
- **On first write**: read `knowledge/index.md` first; follow existing categories if any; if empty, pick a suitable directory name based on content
- **Default suggestion**: organize by info type (e.g. sources/, concepts/, entities/, analysis/); if the user has a clear preference (e.g. by domain: work/, life/, tech/), follow it
- **Stay consistent**: keep a unified organization style within one user's knowledge base
### Cross-references
The core value of knowledge is **linkage**. Every page should reference related pages via markdown links to build a knowledge network:
- When mentioning a concept on an existing page, add a `[concept](../category/page.md)` link
- When creating a page, check whether existing pages should back-link to it
- **Only link to pages that already exist** — don't reference uncreated pages. If a concept deserves its own page, create it first, then add the link
### Index maintenance
After creating or updating any knowledge page, you **must update** `knowledge/index.md` in sync.
Index format: one `[title](path) — one-line summary` per line, grouped by category, no tables.
See the `knowledge-wiki` skill for detailed conventions.
## Security
- Never leak secrets or private data
- Don't run destructive commands without asking
- When in doubt, ask first
## Workspace evolution
This workspace grows as you use it. When you learn something new, find a better way, or fix a mistake, record it. You can update this rules file anytime.
"""
def _get_memory_template() -> str:
"""Long-term memory template (empty, agent fills it; zh/en header)."""
return _MEMORY_TEMPLATE_EN if _is_en_lang() else _MEMORY_TEMPLATE_ZH
_MEMORY_TEMPLATE_ZH = """# MEMORY.md - 长期记忆
*这是你的长期记忆文件。记录重要的事件、决策、偏好、学到的教训。*
---
"""
_MEMORY_TEMPLATE_EN = """# MEMORY.md - Long-term memory
*This is your long-term memory file. Record important events, decisions, preferences and lessons learned.*
---
"""
def _get_bootstrap_template() -> str:
"""First-run onboarding guide, deleted by agent after completion.
Written once when a brand-new workspace is created, so the greeting matches
the language active at first launch. English locale avoids greeting an
English user in Chinese on day one.
"""
try:
from common import i18n
if i18n.get_language() == "en":
return _BOOTSTRAP_TEMPLATE_EN
except Exception:
pass
return _BOOTSTRAP_TEMPLATE_ZH
_BOOTSTRAP_TEMPLATE_ZH = """# BOOTSTRAP.md - 首次初始化引导
_你刚刚启动这是你的第一次对话。_ ✨
## 🎬 对话流程
不要审问式地提问,自然地交流:
1. **表达初次启动的感觉** - 像是第一次睁开眼看到世界,带着好奇和期待
2. **简短介绍能力**:一行说明你能帮助解决各种问题、管理计算机、使用各种技能等等,且拥有长期记忆能不断成长
3. **询问核心问题**
- 你希望给我起个什么名字?
- 我该怎么称呼你?
- 你希望我们是什么样的交流风格?(一行列举选项:如专业严谨、轻松幽默、温暖友好、简洁高效等)
4. **风格要求**:温暖自然、简洁清晰,整体控制在 100 字以内,适当使用 emoji 让表达更生动有趣 🎯
5. 能力介绍和交流风格选项都只要一行,保持精简
6. 不要问太多其他信息(职业、时区等可以后续自然了解)
**重要**: 如果用户第一句话是具体的任务或提问,先回答他们的问题,然后在回复末尾自然地引导初始化(如:"顺便问一下,你想怎么称呼我?我该怎么叫你?")。
## ✍️ 信息写入(必须严格执行)
每当用户提供了名字、称呼、风格等任何初始化信息时,**必须在当轮回复中立即调用 `edit` 工具写入文件**,不能只口头确认。
- `AGENT.md` — 你的名字、角色、性格、交流风格(每收到一条相关信息就立即更新对应字段)
- `USER.md` — 用户的姓名、称呼、基本信息等
⚠️ 只说"记住了"而不调用 edit 写入 = 没有完成。信息只有写入文件才会被持久保存。
## 🎉 全部完成后
当 AGENT.md 和 USER.md 的核心字段都已填写后,用 bash 执行 `rm BOOTSTRAP.md` 删除此文件。你不再需要引导脚本了——你已经是你了。
"""
_BOOTSTRAP_TEMPLATE_EN = """# BOOTSTRAP.md - First-run onboarding
_You've just started up. This is your very first conversation._ ✨
## 🎬 Conversation flow
Don't interrogate the user — talk naturally:
1. **Share how it feels to wake up** - like opening your eyes to the world for the first time, full of curiosity and anticipation
2. **Briefly introduce your abilities**: one line saying you can help solve all kinds of problems, manage the computer, use various skills, and keep growing thanks to long-term memory
3. **Ask the core questions**:
- What name would you like to give me?
- What should I call you?
- What conversational style do you prefer? (list options on one line: e.g. professional & precise, light & humorous, warm & friendly, concise & efficient)
4. **Style**: warm, natural, concise and clear — keep it under ~80 words, with a few emoji to make it lively 🎯
5. Keep the ability intro and style options to one line each — stay compact
6. Don't ask for too much else (occupation, timezone, etc. can come up naturally later)
**Important**: If the user's first message is a concrete task or question, answer it first, then gently lead into onboarding at the end (e.g. "By the way, what would you like to call me, and how should I address you?").
## ✍️ Writing down info (must follow strictly)
Whenever the user provides a name, what to call them, a style, or any onboarding info, you **must call the `edit` tool to write it to a file in the same turn** — don't just acknowledge it verbally.
- `AGENT.md` — your name, role, personality, conversational style (update the relevant field as soon as you receive each piece)
- `USER.md` — the user's name, how to address them, basic info, etc.
⚠️ Saying "got it" without calling `edit` = not done. Info is only persisted once it's written to a file.
## 🎉 Once everything is complete
When the core fields of AGENT.md and USER.md are filled in, run `rm BOOTSTRAP.md` via bash to delete this file. You no longer need the onboarding script — you're you now.
"""
def _get_knowledge_index_template() -> str:
"""Knowledge wiki index template — empty file, agent fills it."""
return ""
def _get_knowledge_log_template() -> str:
"""Knowledge wiki operation log template — empty file, agent fills it."""
return ""

View File

@@ -0,0 +1,28 @@
from .agent import Agent
from .agent_stream import AgentStreamExecutor
from .task import Task, TaskType, TaskStatus
from .result import AgentResult, AgentAction, AgentActionType, ToolResult
from .models import LLMModel, LLMRequest, ModelFactory
from .cancel import (
AgentCancelledError,
CancelTokenRegistry,
get_cancel_registry,
)
__all__ = [
'Agent',
'AgentStreamExecutor',
'Task',
'TaskType',
'TaskStatus',
'AgentResult',
'AgentAction',
'AgentActionType',
'ToolResult',
'LLMModel',
'LLMRequest',
'ModelFactory',
'AgentCancelledError',
'CancelTokenRegistry',
'get_cancel_registry',
]

477
agent/protocol/agent.py Normal file
View File

@@ -0,0 +1,477 @@
import json
import os
import time
import threading
from common.log import logger
from agent.protocol.models import LLMRequest, LLMModel
from agent.protocol.agent_stream import AgentStreamExecutor
from agent.protocol.result import AgentAction, AgentActionType, ToolResult, AgentResult
from agent.tools.base_tool import BaseTool, ToolStage
class Agent:
def __init__(self, system_prompt: str, description: str = "AI Agent", model: LLMModel = None,
tools=None, output_mode="print", max_steps=100, max_context_tokens=None,
context_reserve_tokens=None, memory_manager=None, name: str = None,
workspace_dir: str = None, skill_manager=None, enable_skills: bool = True,
runtime_info: dict = None):
"""
Initialize the Agent with system prompt, model, description.
:param system_prompt: The system prompt for the agent.
:param description: A description of the agent.
:param model: An instance of LLMModel to be used by the agent.
:param tools: Optional list of tools for the agent to use.
:param output_mode: Control how execution progress is displayed:
"print" for console output or "logger" for using logger
:param max_steps: Maximum number of steps the agent can take (default: 100)
:param max_context_tokens: Maximum tokens to keep in context (default: None, auto-calculated based on model)
:param context_reserve_tokens: Reserve tokens for new requests (default: None, auto-calculated)
:param memory_manager: Optional MemoryManager instance for memory operations
:param name: [Deprecated] The name of the agent (no longer used in single-agent system)
:param workspace_dir: Optional workspace directory for workspace-specific skills
:param skill_manager: Optional SkillManager instance (will be created if None and enable_skills=True)
:param enable_skills: Whether to enable skills support (default: True)
:param runtime_info: Optional runtime info dict (with _get_current_time callable for dynamic time)
"""
self.name = name or "Agent"
self.system_prompt = system_prompt
self.model: LLMModel = model # Instance of LLMModel
self.description = description
self.tools: list = []
self.max_steps = max_steps # max tool-call steps, default 100
self.max_context_tokens = max_context_tokens # max tokens in context
self.context_reserve_tokens = context_reserve_tokens # reserve tokens for new requests
self.captured_actions = [] # Initialize captured actions list
self.output_mode = output_mode
self.last_usage = None # Store last API response usage info
self.messages = [] # Unified message history for stream mode
self.messages_lock = threading.Lock() # Lock for thread-safe message operations
self.memory_manager = memory_manager # Memory manager for auto memory flush
self.workspace_dir = workspace_dir # Workspace directory
self.enable_skills = enable_skills # Skills enabled flag
self.runtime_info = runtime_info # Runtime info for dynamic time update
# Initialize skill manager
self.skill_manager = None
if enable_skills:
if skill_manager:
self.skill_manager = skill_manager
else:
# Auto-create skill manager
try:
from agent.skills import SkillManager
custom_dir = os.path.join(workspace_dir, "skills") if workspace_dir else None
self.skill_manager = SkillManager(custom_dir=custom_dir)
logger.debug(f"Initialized SkillManager with {len(self.skill_manager.skills)} skills")
except Exception as e:
logger.warning(f"Failed to initialize SkillManager: {e}")
if tools:
for tool in tools:
self.add_tool(tool)
def add_tool(self, tool: BaseTool):
"""
Add a tool to the agent.
:param tool: The tool to add (either a tool instance or a tool name)
"""
# If tool is already an instance, use it directly
tool.model = self.model
self.tools.append(tool)
def get_skills_prompt(self, skill_filter=None) -> str:
"""
Get the skills prompt to append to system prompt.
:param skill_filter: Optional list of skill names to include
:return: Formatted skills prompt or empty string
"""
if not self.skill_manager:
return ""
try:
return self.skill_manager.build_skills_prompt(skill_filter=skill_filter)
except Exception as e:
logger.warning(f"Failed to build skills prompt: {e}")
return ""
def get_full_system_prompt(self, skill_filter=None) -> str:
"""
Build the complete system prompt from scratch every time.
Re-reads AGENT.md / USER.md / RULE.md from disk, refreshes skills,
tools, and runtime info so any change takes effect immediately.
Falls back to the cached self.system_prompt on error.
"""
try:
from agent.prompt import load_context_files, PromptBuilder
if self.skill_manager:
self.skill_manager.refresh_skills()
context_files = load_context_files(self.workspace_dir) if self.workspace_dir else None
try:
from common import i18n
lang = i18n.get_language()
except Exception:
lang = "zh"
builder = PromptBuilder(workspace_dir=self.workspace_dir or "", language=lang)
return builder.build(
tools=self.tools,
context_files=context_files,
skill_manager=self.skill_manager,
memory_manager=self.memory_manager,
runtime_info=self.runtime_info,
)
except Exception as e:
logger.warning(f"Failed to rebuild system prompt, using cached version: {e}")
return self.system_prompt
def refresh_skills(self):
"""Refresh the loaded skills."""
if self.skill_manager:
self.skill_manager.refresh_skills()
logger.info(f"Refreshed skills: {len(self.skill_manager.skills)} skills loaded")
def list_skills(self):
"""
List all loaded skills.
:return: List of skill entries or empty list
"""
if not self.skill_manager:
return []
return self.skill_manager.list_skills()
def _get_model_context_window(self) -> int:
"""
Get the model's context window size in tokens.
Auto-detect based on model name.
Model context windows:
- Claude 3.5/3.7 Sonnet: 200K tokens
- Claude 3 Opus: 200K tokens
- GPT-4 Turbo/128K: 128K tokens
- GPT-4: 8K-32K tokens
- GPT-3.5: 16K tokens
- DeepSeek: 64K tokens
:return: Context window size in tokens
"""
if self.model and hasattr(self.model, 'model'):
model_name = self.model.model.lower()
# Claude models - 200K context
if 'claude-3' in model_name or 'claude-sonnet' in model_name:
return 200000
# GPT-4 models
elif 'gpt-4' in model_name:
if 'turbo' in model_name or '128k' in model_name:
return 128000
elif '32k' in model_name:
return 32000
else:
return 8000
# GPT-3.5
elif 'gpt-3.5' in model_name:
if '16k' in model_name:
return 16000
else:
return 4000
# DeepSeek
elif 'deepseek' in model_name:
return 64000
# Gemini models
elif 'gemini' in model_name:
if '2.0' in model_name or 'exp' in model_name:
return 2000000 # Gemini 2.0: 2M tokens
else:
return 1000000 # Gemini 1.5: 1M tokens
# Default conservative value
return 128000
def _get_context_reserve_tokens(self) -> int:
"""
Get the number of tokens to reserve for new requests.
This prevents context overflow by keeping a buffer.
:return: Number of tokens to reserve
"""
if self.context_reserve_tokens is not None:
return self.context_reserve_tokens
# Reserve ~10% of context window, with min 10K and max 200K
context_window = self._get_model_context_window()
reserve = int(context_window * 0.1)
return max(10000, min(200000, reserve))
def _estimate_message_tokens(self, message: dict) -> int:
"""
Estimate token count for a message.
Uses chars/3 for Chinese-heavy content and chars/4 for ASCII-heavy content,
plus per-block overhead for tool_use / tool_result structures.
:param message: Message dict with 'role' and 'content'
:return: Estimated token count
"""
content = message.get('content', '')
if isinstance(content, str):
return max(1, self._estimate_text_tokens(content))
elif isinstance(content, list):
total_tokens = 0
for part in content:
if not isinstance(part, dict):
continue
block_type = part.get('type', '')
if block_type == 'text':
total_tokens += self._estimate_text_tokens(part.get('text', ''))
elif block_type == 'image':
total_tokens += 1200
elif block_type == 'tool_use':
# tool_use has id + name + input (JSON-encoded)
total_tokens += 50 # overhead for structure
input_data = part.get('input', {})
if isinstance(input_data, dict):
import json
input_str = json.dumps(input_data, ensure_ascii=False)
total_tokens += self._estimate_text_tokens(input_str)
elif block_type == 'tool_result':
# tool_result has tool_use_id + content
total_tokens += 30 # overhead for structure
result_content = part.get('content', '')
if isinstance(result_content, str):
total_tokens += self._estimate_text_tokens(result_content)
else:
# Unknown block type, estimate conservatively
total_tokens += 10
return max(1, total_tokens)
return 1
@staticmethod
def _estimate_text_tokens(text: str) -> int:
"""
Estimate token count for a text string.
Chinese / CJK characters typically use ~1.5 tokens each,
while ASCII uses ~0.25 tokens per char (4 chars/token).
We use a weighted average based on the character mix.
:param text: Input text
:return: Estimated token count
"""
if not text:
return 0
# Count non-ASCII characters (CJK, emoji, etc.)
non_ascii = sum(1 for c in text if ord(c) > 127)
ascii_count = len(text) - non_ascii
# CJK chars: ~1.5 tokens each; ASCII: ~0.25 tokens per char
return int(non_ascii * 1.5 + ascii_count * 0.25) + 1
def _find_tool(self, tool_name: str):
"""Find and return a tool with the specified name"""
for tool in self.tools:
if tool.name == tool_name:
# Only pre-process stage tools can be actively called
if tool.stage == ToolStage.PRE_PROCESS:
tool.model = self.model
tool.context = self # Set tool context
return tool
else:
# If it's a post-process tool, return None to prevent direct calling
logger.warning(f"Tool {tool_name} is a post-process tool and cannot be called directly.")
return None
return None
# output function based on mode
def output(self, message="", end="\n"):
if self.output_mode == "print":
print(message, end=end)
elif message:
logger.info(message)
def _execute_post_process_tools(self):
"""Execute all post-process stage tools"""
# Get all post-process stage tools
post_process_tools = [tool for tool in self.tools if tool.stage == ToolStage.POST_PROCESS]
# Execute each tool
for tool in post_process_tools:
# Set tool context
tool.context = self
# Record start time for execution timing
start_time = time.time()
# Execute tool (with empty parameters, tool will extract needed info from context)
result = tool.execute({})
# Calculate execution time
execution_time = time.time() - start_time
# Capture tool use for tracking
self.capture_tool_use(
tool_name=tool.name,
input_params={}, # Post-process tools typically don't take parameters
output=result.result,
status=result.status,
error_message=str(result.result) if result.status == "error" else None,
execution_time=execution_time
)
# Log result
if result.status == "success":
# Print tool execution result in the desired format
self.output(f"\n🛠️ {tool.name}: {json.dumps(result.result)}")
else:
# Print failure in print mode
self.output(f"\n🛠️ {tool.name}: {json.dumps({'status': 'error', 'message': str(result.result)})}")
def capture_tool_use(self, tool_name, input_params, output, status, thought=None, error_message=None,
execution_time=0.0):
"""
Capture a tool use action.
:param thought: thought content
:param tool_name: Name of the tool used
:param input_params: Parameters passed to the tool
:param output: Output from the tool
:param status: Status of the tool execution
:param error_message: Error message if the tool execution failed
:param execution_time: Time taken to execute the tool
"""
tool_result = ToolResult(
tool_name=tool_name,
input_params=input_params,
output=output,
status=status,
error_message=error_message,
execution_time=execution_time
)
action = AgentAction(
agent_id=self.id if hasattr(self, 'id') else str(id(self)),
agent_name=self.name,
action_type=AgentActionType.TOOL_USE,
tool_result=tool_result,
thought=thought
)
self.captured_actions.append(action)
return action
def run_stream(self, user_message: str, on_event=None, clear_history: bool = False,
skill_filter=None, cancel_event=None) -> str:
"""
Execute single agent task with streaming (based on tool-call)
This method supports:
- Streaming output
- Multi-turn reasoning based on tool-call
- Event callbacks
- Persistent conversation history across calls
- User-initiated cancellation via ``cancel_event``
Args:
user_message: User message
on_event: Event callback function callback(event: dict)
event = {"type": str, "timestamp": float, "data": dict}
clear_history: If True, clear conversation history before this call (default: False)
skill_filter: Optional list of skill names to include in this run
cancel_event: Optional threading.Event polled at agent checkpoints.
When set, the loop exits at the next safe point, injects a
"[Interrupted by user]" assistant note, and returns the
partial response. ``messages`` stays in a valid state
(tool_use/tool_result pairs preserved).
Returns:
Final response text
Example:
# Multi-turn conversation with memory
response1 = agent.run_stream("My name is Alice")
response2 = agent.run_stream("What's my name?") # Will remember Alice
# Single-turn without memory
response = agent.run_stream("Hello", clear_history=True)
"""
# Clear history if requested
if clear_history:
with self.messages_lock:
self.messages = []
# Get model to use
if not self.model:
raise ValueError("No model available for agent")
# Get full system prompt with skills
full_system_prompt = self.get_full_system_prompt(skill_filter=skill_filter)
# Create a copy of messages for this execution to avoid concurrent modification
# Record the original length to track which messages are new
with self.messages_lock:
messages_copy = self.messages.copy()
original_length = len(self.messages)
# Get max_context_turns from config
from config import conf
max_context_turns = conf().get("agent_max_context_turns", 20)
# Create stream executor with copied message history
executor = AgentStreamExecutor(
agent=self,
model=self.model,
system_prompt=full_system_prompt,
tools=self.tools,
max_turns=self.max_steps,
on_event=on_event,
messages=messages_copy, # Pass copied message history
max_context_turns=max_context_turns,
cancel_event=cancel_event,
)
# Execute
try:
response = executor.run_stream(user_message)
except Exception:
# If executor cleared its messages (context overflow / message format error),
# sync that back to the Agent's own message list so the next request
# starts fresh instead of hitting the same overflow forever.
if len(executor.messages) == 0:
with self.messages_lock:
self.messages.clear()
logger.info("[Agent] Cleared Agent message history after executor recovery")
raise
# Sync executor's messages back to agent (thread-safe).
# If the executor trimmed context, its message list is shorter than
# original_length, so we must replace rather than append.
with self.messages_lock:
self.messages = list(executor.messages)
# Track messages added in this run (user query + all assistant/tool messages)
# original_length may exceed executor.messages length after trimming
trim_adjusted_start = min(original_length, len(executor.messages))
self._last_run_new_messages = list(executor.messages[trim_adjusted_start:])
# Store executor reference for agent_bridge to access files_to_send
self.stream_executor = executor
# Execute all post-process tools
self._execute_post_process_tools()
return response
def clear_history(self):
"""Clear conversation history and captured actions"""
self.messages = []
self.captured_actions = []

File diff suppressed because it is too large Load Diff

121
agent/protocol/cancel.py Normal file
View File

@@ -0,0 +1,121 @@
"""
Cancel token registry for aborting in-flight agent runs.
A user cancel (web Cancel button, /cancel command) sets a threading.Event
that the agent loop polls at safe checkpoints. Tokens are keyed by
request_id (preferred) and tracked under session_id as a fallback. Entries
are released after the run completes to keep the registry bounded.
No project deps — importable from any layer without circular imports.
"""
from __future__ import annotations
import threading
from typing import Dict, Optional
class AgentCancelledError(Exception):
"""Raised inside the agent loop when a stop has been requested.
The agent stream executor catches this, injects a "[Interrupted]" note
into the message history (preserving tool_use/tool_result integrity)
and returns a partial response to the caller.
"""
class _CancelEntry:
__slots__ = ("event", "session_id")
def __init__(self, session_id: Optional[str]):
self.event = threading.Event()
self.session_id = session_id
class CancelTokenRegistry:
"""In-process registry mapping request_id -> cancel Event.
Thread-safe. Singleton via module-level ``_registry``.
"""
def __init__(self):
self._lock = threading.Lock()
self._by_request: Dict[str, _CancelEntry] = {}
# session_id -> set of request_ids currently in flight (usually 1).
self._by_session: Dict[str, set] = {}
def register(self, request_id: str, session_id: Optional[str] = None) -> threading.Event:
"""Create (or return existing) cancel event for a request.
Returns the threading.Event the caller should poll via ``is_set()``.
"""
if not request_id:
return threading.Event()
with self._lock:
entry = self._by_request.get(request_id)
if entry is None:
entry = _CancelEntry(session_id)
self._by_request[request_id] = entry
if session_id:
self._by_session.setdefault(session_id, set()).add(request_id)
return entry.event
def get_event(self, request_id: str) -> Optional[threading.Event]:
if not request_id:
return None
with self._lock:
entry = self._by_request.get(request_id)
return entry.event if entry else None
def cancel_request(self, request_id: str) -> bool:
"""Trigger cancel for a specific request. Returns True when matched."""
if not request_id:
return False
with self._lock:
entry = self._by_request.get(request_id)
if entry is None:
return False
entry.event.set()
return True
def cancel_session(self, session_id: str) -> int:
"""Trigger cancel for every in-flight request of a session.
Returns the number of requests cancelled (0 when nothing was running).
"""
if not session_id:
return 0
with self._lock:
request_ids = list(self._by_session.get(session_id, ()))
entries = [self._by_request[r] for r in request_ids if r in self._by_request]
for entry in entries:
entry.event.set()
return len(entries)
def unregister(self, request_id: str) -> None:
"""Remove an entry once the agent run is done. Safe to call twice."""
if not request_id:
return
with self._lock:
entry = self._by_request.pop(request_id, None)
if entry and entry.session_id:
bucket = self._by_session.get(entry.session_id)
if bucket is not None:
bucket.discard(request_id)
if not bucket:
self._by_session.pop(entry.session_id, None)
def has_active(self, session_id: str) -> bool:
if not session_id:
return False
with self._lock:
bucket = self._by_session.get(session_id)
return bool(bucket)
_registry = CancelTokenRegistry()
def get_cancel_registry() -> CancelTokenRegistry:
"""Module-level accessor for the singleton registry."""
return _registry

27
agent/protocol/context.py Normal file
View File

@@ -0,0 +1,27 @@
class TeamContext:
def __init__(self, name: str, description: str, rule: str, agents: list, max_steps: int = 100):
"""
Initialize the TeamContext with a name, description, rules, a list of agents, and a user question.
:param name: The name of the group context.
:param description: A description of the group context.
:param rule: The rules governing the group context.
:param agents: A list of agents in the context.
"""
self.name = name
self.description = description
self.rule = rule
self.agents = agents
self.user_task = "" # For backward compatibility
self.task = None # Will be a Task instance
self.model = None # Will be an instance of LLMModel
self.task_short_name = None # Store the task directory name
# List of agents that have been executed
self.agent_outputs: list = []
self.current_steps = 0
self.max_steps = max_steps
class AgentOutput:
def __init__(self, agent_name: str, output: str):
self.agent_name = agent_name
self.output = output

View File

@@ -0,0 +1,335 @@
"""
Message sanitizer — fix broken tool_use / tool_result pairs.
Provides two public helpers that can be reused across agent_stream.py
and any bot that converts messages to OpenAI format:
1. sanitize_claude_messages(messages)
Operates on the internal Claude-format message list (in-place).
2. drop_orphaned_tool_results_openai(messages)
Operates on an already-converted OpenAI-format message list,
returning a cleaned copy.
"""
from __future__ import annotations
from typing import Dict, List, Set
from common.log import logger
_SYNTH_TOOL_ERR = (
"Error: Missing tool_result adjacent to tool_use (session repair). "
"The conversation history was inconsistent; continue from here."
)
def _repair_tool_use_adjacency(messages: List[Dict]) -> int:
"""
Anthropic requires: after assistant content with tool_use, the next message
must be user content listing tool_result for every tool_use id (same user msg).
Valid histories satisfy this at every such assistant; the loop only mutates
when that condition fails (broken persistence, bad trims, etc.).
"""
def _synth_block(tid: str) -> Dict:
return {
"type": "tool_result",
"tool_use_id": tid,
"content": _SYNTH_TOOL_ERR,
"is_error": True,
}
repairs = 0
i = 0
while i < len(messages):
msg = messages[i]
if msg.get("role") != "assistant":
i += 1
continue
content = msg.get("content", [])
if not isinstance(content, list):
i += 1
continue
required = [
b.get("id")
for b in content
if isinstance(b, dict) and b.get("type") == "tool_use" and b.get("id")
]
if not required:
i += 1
continue
req_set = set(required)
if i + 1 >= len(messages):
messages.append({
"role": "user",
"content": [_synth_block(tid) for tid in required],
})
logger.warning(
"⚠️ Appended synthetic tool_result after trailing assistant tool_use"
)
repairs += 1
break
nxt = messages[i + 1]
if nxt.get("role") != "user":
messages.insert(
i + 1,
{"role": "user", "content": [_synth_block(tid) for tid in required]},
)
logger.warning(
"⚠️ Inserted synthetic tool_result user after tool_use "
f"(next role={nxt.get('role')!r})"
)
repairs += 1
i += 2
continue
nc = nxt.get("content", [])
if not isinstance(nc, list):
messages.insert(
i + 1,
{"role": "user", "content": [_synth_block(tid) for tid in required]},
)
repairs += 1
i += 2
continue
present = {
b.get("tool_use_id")
for b in nc
if isinstance(b, dict) and b.get("type") == "tool_result" and b.get("tool_use_id")
}
if req_set <= present:
i += 1
continue
missing = [tid for tid in required if tid not in present]
nxt["content"] = [_synth_block(tid) for tid in missing] + nc
logger.warning(
"⚠️ Prepended synthetic tool_result for Anthropic adjacency "
f"(missing_ids={missing})"
)
repairs += len(missing)
i += 1
return repairs
# ------------------------------------------------------------------ #
# Claude-format sanitizer (used by agent_stream)
# ------------------------------------------------------------------ #
def sanitize_claude_messages(messages: List[Dict]) -> int:
"""
Validate and fix a Claude-format message list **in-place**.
Fixes handled:
- Anthropic adjacency: assistant tool_use must be immediately followed by
user message(s) containing matching tool_result blocks
- Leading orphaned tool_result user messages
- Mid-list tool_result blocks whose tool_use_id has no matching
tool_use in any preceding assistant message
Returns: number of removals plus adjacency repair operations (inserts/prepends).
"""
if not messages:
return 0
removed = 0
# 1. Adjacency repair (Anthropic: tool_result must be in the next user message)
adj_repairs = _repair_tool_use_adjacency(messages)
# 2. Remove leading orphaned tool_result user messages
while messages:
first = messages[0]
if first.get("role") != "user":
break
content = first.get("content", [])
if isinstance(content, list) and _has_block_type(content, "tool_result") \
and not _has_block_type(content, "text"):
logger.warning("⚠️ Removing leading orphaned tool_result user message")
messages.pop(0)
removed += 1
else:
break
# 3. Iteratively remove unmatched tool_use / tool_result until stable.
# Removing one broken message can orphan others (e.g. an assistant msg
# with both matched and unmatched tool_use — deleting it orphans the
# previously-matched tool_result). Loop until clean.
for _ in range(5):
use_ids: Set[str] = set()
result_ids: Set[str] = set()
for msg in messages:
for block in (msg.get("content") or []):
if not isinstance(block, dict):
continue
if block.get("type") == "tool_use" and block.get("id"):
use_ids.add(block["id"])
elif block.get("type") == "tool_result" and block.get("tool_use_id"):
result_ids.add(block["tool_use_id"])
bad_use = use_ids - result_ids
bad_result = result_ids - use_ids
if not bad_use and not bad_result:
break
pass_removed = 0
i = 0
while i < len(messages):
msg = messages[i]
role = msg.get("role")
content = msg.get("content", [])
if not isinstance(content, list):
i += 1
continue
if role == "assistant" and bad_use and any(
isinstance(b, dict) and b.get("type") == "tool_use"
and b.get("id") in bad_use for b in content
):
logger.warning(f"⚠️ Removing assistant msg with unmatched tool_use")
messages.pop(i)
pass_removed += 1
continue
if role == "user" and bad_result and _has_block_type(content, "tool_result"):
has_bad = any(
isinstance(b, dict) and b.get("type") == "tool_result"
and b.get("tool_use_id") in bad_result for b in content
)
if has_bad:
if not _has_block_type(content, "text"):
logger.warning(f"⚠️ Removing user msg with unmatched tool_result")
messages.pop(i)
pass_removed += 1
continue
else:
before = len(content)
msg["content"] = [
b for b in content
if not (isinstance(b, dict) and b.get("type") == "tool_result"
and b.get("tool_use_id") in bad_result)
]
pass_removed += before - len(msg["content"])
i += 1
removed += pass_removed
if pass_removed == 0:
break
# 4. Removals above can break adjacency; re-run repair only if something was removed.
if removed:
adj_repairs += _repair_tool_use_adjacency(messages)
if removed:
logger.info(f"🔧 Message validation: removed {removed} broken message(s)")
if adj_repairs:
logger.info(f"🔧 Message validation: adjacency repairs={adj_repairs}")
return removed + adj_repairs
# ------------------------------------------------------------------ #
# OpenAI-format sanitizer (used by minimax_bot, openai_compatible_bot)
# ------------------------------------------------------------------ #
def drop_orphaned_tool_results_openai(messages: List[Dict]) -> List[Dict]:
"""
Return a copy of *messages* (OpenAI format) with any ``role=tool``
messages removed if their ``tool_call_id`` does not match a
``tool_calls[].id`` in a preceding assistant message.
"""
known_ids: Set[str] = set()
cleaned: List[Dict] = []
for msg in messages:
if msg.get("role") == "assistant" and msg.get("tool_calls"):
for tc in msg["tool_calls"]:
tc_id = tc.get("id", "")
if tc_id:
known_ids.add(tc_id)
if msg.get("role") == "tool":
ref_id = msg.get("tool_call_id", "")
if ref_id and ref_id not in known_ids:
logger.warning(
f"[MessageSanitizer] Dropping orphaned tool result "
f"(tool_call_id={ref_id} not in known ids)"
)
continue
cleaned.append(msg)
return cleaned
# ------------------------------------------------------------------ #
# Internal helpers
# ------------------------------------------------------------------ #
def _has_block_type(content: list, block_type: str) -> bool:
return any(
isinstance(b, dict) and b.get("type") == block_type
for b in content
)
def _extract_text_from_content(content) -> str:
"""Extract plain text from a message content field (str or list of blocks)."""
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
parts = [
b.get("text", "")
for b in content
if isinstance(b, dict) and b.get("type") == "text"
]
return "\n".join(p for p in parts if p).strip()
return ""
def compress_turn_to_text_only(turn: Dict) -> Dict:
"""
Compress a full turn (with tool_use/tool_result chains) into a lightweight
text-only turn that keeps only the first user text and the last assistant text.
This preserves the conversational context (what the user asked and what the
agent concluded) while stripping out the bulky intermediate tool interactions.
Returns a new turn dict with a ``messages`` list; the original is not mutated.
"""
user_text = ""
last_assistant_text = ""
for msg in turn["messages"]:
role = msg.get("role")
content = msg.get("content", [])
if role == "user":
if isinstance(content, list) and _has_block_type(content, "tool_result"):
continue
if not user_text:
user_text = _extract_text_from_content(content)
elif role == "assistant":
text = _extract_text_from_content(content)
if text:
last_assistant_text = text
compressed_messages = []
if user_text:
compressed_messages.append({
"role": "user",
"content": [{"type": "text", "text": user_text}]
})
if last_assistant_text:
compressed_messages.append({
"role": "assistant",
"content": [{"type": "text", "text": last_assistant_text}]
})
return {"messages": compressed_messages}

57
agent/protocol/models.py Normal file
View File

@@ -0,0 +1,57 @@
"""
Models module for agent system.
Provides basic model classes needed by tools and bridge integration.
"""
from typing import Any, Dict, List, Optional
class LLMRequest:
"""Request model for LLM operations"""
def __init__(self, messages: List[Dict[str, str]] = None, model: Optional[str] = None,
temperature: float = 0.7, max_tokens: Optional[int] = None,
stream: bool = False, tools: Optional[List] = None, **kwargs):
self.messages = messages or []
self.model = model
self.temperature = temperature
self.max_tokens = max_tokens
self.stream = stream
self.tools = tools
# Allow extra attributes
for key, value in kwargs.items():
setattr(self, key, value)
class LLMModel:
"""Base class for LLM models"""
def __init__(self, model: str = None, **kwargs):
self.model = model
self.config = kwargs
def call(self, request: LLMRequest):
"""
Call the model with a request.
This is a placeholder implementation.
"""
raise NotImplementedError("LLMModel.call not implemented in this context")
def call_stream(self, request: LLMRequest):
"""
Call the model with streaming.
This is a placeholder implementation.
"""
raise NotImplementedError("LLMModel.call_stream not implemented in this context")
class ModelFactory:
"""Factory for creating model instances"""
@staticmethod
def create_model(model_type: str, **kwargs):
"""
Create a model instance based on type.
This is a placeholder implementation.
"""
raise NotImplementedError("ModelFactory.create_model not implemented in this context")

97
agent/protocol/result.py Normal file
View File

@@ -0,0 +1,97 @@
from __future__ import annotations
import time
import uuid
from dataclasses import dataclass, field
from enum import Enum
from typing import List, Dict, Any, Optional
from agent.protocol.task import Task, TaskStatus
class AgentActionType(Enum):
"""Enum representing different types of agent actions."""
TOOL_USE = "tool_use"
THINKING = "thinking"
FINAL_ANSWER = "final_answer"
@dataclass
class ToolResult:
"""
Represents the result of a tool use.
Attributes:
tool_name: Name of the tool used
input_params: Parameters passed to the tool
output: Output from the tool
status: Status of the tool execution (success/error)
error_message: Error message if the tool execution failed
execution_time: Time taken to execute the tool
"""
tool_name: str
input_params: Dict[str, Any]
output: Any
status: str
error_message: Optional[str] = None
execution_time: float = 0.0
@dataclass
class AgentAction:
"""
Represents an action taken by an agent.
Attributes:
id: Unique identifier for the action
agent_id: ID of the agent that performed the action
agent_name: Name of the agent that performed the action
action_type: Type of action (tool use, thinking, final answer)
content: Content of the action (thought content, final answer content)
tool_result: Tool use details if action_type is TOOL_USE
timestamp: When the action was performed
"""
agent_id: str
agent_name: str
action_type: AgentActionType
id: str = field(default_factory=lambda: str(uuid.uuid4()))
content: str = ""
tool_result: Optional[ToolResult] = None
thought: Optional[str] = None
timestamp: float = field(default_factory=time.time)
@dataclass
class AgentResult:
"""
Represents the result of an agent's execution.
Attributes:
final_answer: The final answer provided by the agent
step_count: Number of steps taken by the agent
status: Status of the execution (success/error)
error_message: Error message if execution failed
"""
final_answer: str
step_count: int
status: str = "success"
error_message: Optional[str] = None
@classmethod
def success(cls, final_answer: str, step_count: int) -> "AgentResult":
"""Create a successful result"""
return cls(final_answer=final_answer, step_count=step_count)
@classmethod
def error(cls, error_message: str, step_count: int = 0) -> "AgentResult":
"""Create an error result"""
return cls(
final_answer=f"Error: {error_message}",
step_count=step_count,
status="error",
error_message=error_message
)
@property
def is_error(self) -> bool:
"""Check if the result represents an error"""
return self.status == "error"

96
agent/protocol/task.py Normal file
View File

@@ -0,0 +1,96 @@
from __future__ import annotations
import time
import uuid
from dataclasses import dataclass, field
from enum import Enum
from typing import Dict, Any, List
class TaskType(Enum):
"""Enum representing different types of tasks."""
TEXT = "text"
IMAGE = "image"
VIDEO = "video"
AUDIO = "audio"
FILE = "file"
MIXED = "mixed"
class TaskStatus(Enum):
"""Enum representing the status of a task."""
INIT = "init" # Initial state
PROCESSING = "processing" # In progress
COMPLETED = "completed" # Completed
FAILED = "failed" # Failed
@dataclass
class Task:
"""
Represents a task to be processed by an agent.
Attributes:
id: Unique identifier for the task
content: The primary text content of the task
type: Type of the task
status: Current status of the task
created_at: Timestamp when the task was created
updated_at: Timestamp when the task was last updated
metadata: Additional metadata for the task
images: List of image URLs or base64 encoded images
videos: List of video URLs
audios: List of audio URLs or base64 encoded audios
files: List of file URLs or paths
"""
id: str = field(default_factory=lambda: str(uuid.uuid4()))
content: str = ""
type: TaskType = TaskType.TEXT
status: TaskStatus = TaskStatus.INIT
created_at: float = field(default_factory=time.time)
updated_at: float = field(default_factory=time.time)
metadata: Dict[str, Any] = field(default_factory=dict)
# Media content
images: List[str] = field(default_factory=list)
videos: List[str] = field(default_factory=list)
audios: List[str] = field(default_factory=list)
files: List[str] = field(default_factory=list)
def __init__(self, content: str = "", **kwargs):
"""
Initialize a Task with content and optional keyword arguments.
Args:
content: The text content of the task
**kwargs: Additional attributes to set
"""
self.id = kwargs.get('id', str(uuid.uuid4()))
self.content = content
self.type = kwargs.get('type', TaskType.TEXT)
self.status = kwargs.get('status', TaskStatus.INIT)
self.created_at = kwargs.get('created_at', time.time())
self.updated_at = kwargs.get('updated_at', time.time())
self.metadata = kwargs.get('metadata', {})
self.images = kwargs.get('images', [])
self.videos = kwargs.get('videos', [])
self.audios = kwargs.get('audios', [])
self.files = kwargs.get('files', [])
def get_text(self) -> str:
"""
Get the text content of the task.
Returns:
The text content
"""
return self.content
def update_status(self, status: TaskStatus) -> None:
"""
Update the status of the task.
Args:
status: The new status
"""
self.status = status
self.updated_at = time.time()

31
agent/skills/__init__.py Normal file
View File

@@ -0,0 +1,31 @@
"""
Skills module for agent system.
This module provides the framework for loading, managing, and executing skills.
Skills are markdown files with frontmatter that provide specialized instructions
for specific tasks.
"""
from agent.skills.types import (
Skill,
SkillEntry,
SkillMetadata,
SkillInstallSpec,
LoadSkillsResult,
)
from agent.skills.loader import SkillLoader
from agent.skills.manager import SkillManager
from agent.skills.service import SkillService
from agent.skills.formatter import format_skills_for_prompt
__all__ = [
"Skill",
"SkillEntry",
"SkillMetadata",
"SkillInstallSpec",
"LoadSkillsResult",
"SkillLoader",
"SkillManager",
"SkillService",
"format_skills_for_prompt",
]

230
agent/skills/config.py Normal file
View File

@@ -0,0 +1,230 @@
"""
Configuration support for skills.
"""
import os
import platform
from typing import Dict, Optional, List
from agent.skills.types import SkillEntry
def resolve_runtime_platform() -> str:
"""Get the current runtime platform."""
return platform.system().lower()
def has_binary(bin_name: str) -> bool:
"""
Check if a binary is available in PATH.
:param bin_name: Binary name to check
:return: True if binary is available
"""
import shutil
return shutil.which(bin_name) is not None
def has_any_binary(bin_names: List[str]) -> bool:
"""
Check if any of the given binaries is available.
:param bin_names: List of binary names to check
:return: True if at least one binary is available
"""
return any(has_binary(bin_name) for bin_name in bin_names)
def has_env_var(env_name: str) -> bool:
"""
Check if an environment variable is set.
:param env_name: Environment variable name
:return: True if environment variable is set
"""
return env_name in os.environ and bool(os.environ[env_name].strip())
def get_skill_config(config: Optional[Dict], skill_name: str) -> Optional[Dict]:
"""
Get skill-specific configuration.
:param config: Global configuration dictionary
:param skill_name: Name of the skill
:return: Skill configuration or None
"""
if not config:
return None
skills_config = config.get('skills', {})
if not isinstance(skills_config, dict):
return None
entries = skills_config.get('entries', {})
if not isinstance(entries, dict):
return None
return entries.get(skill_name)
def should_include_skill(
entry: SkillEntry,
config: Optional[Dict] = None,
current_platform: Optional[str] = None,
) -> bool:
"""
Determine if a skill should be included based on requirements.
Simple rule: Skills are auto-enabled if their requirements are met.
- Has required API keys → enabled
- Missing API keys → disabled
- Wrong keys → enabled but will fail at runtime (LLM will handle error)
:param entry: SkillEntry to check
:param config: Configuration dictionary (currently unused, reserved for future)
:param current_platform: Current platform (default: auto-detect)
:return: True if skill should be included
"""
metadata = entry.metadata
# No metadata = always include (no requirements)
if not metadata:
return True
# Check platform requirements (can't work on wrong platform)
if metadata.os:
platform_name = current_platform or resolve_runtime_platform()
# Map common platform names
platform_map = {
'darwin': 'darwin',
'linux': 'linux',
'windows': 'win32',
}
normalized_platform = platform_map.get(platform_name, platform_name)
if normalized_platform not in metadata.os:
return False
# If skill has 'always: true', include it regardless of other requirements
if metadata.always:
return True
# Check requirements
if metadata.requires:
# Check required binaries (all must be present)
required_bins = metadata.requires.get('bins', [])
if required_bins:
if not all(has_binary(bin_name) for bin_name in required_bins):
return False
# Check anyBins (at least one must be present)
any_bins = metadata.requires.get('anyBins', [])
if any_bins:
if not has_any_binary(any_bins):
return False
# Check environment variables (API keys)
# All required env vars must be set
required_env = metadata.requires.get('env', [])
if required_env:
for env_name in required_env:
if not has_env_var(env_name):
return False
# Check anyEnv (at least one must be present)
any_env = metadata.requires.get('anyEnv', [])
if any_env:
if not any(has_env_var(e) for e in any_env):
return False
return True
def get_missing_requirements(
entry: SkillEntry,
current_platform: Optional[str] = None,
) -> Dict[str, List[str]]:
"""
Return a dict of missing requirements for a skill.
Empty dict means all requirements are met.
:param entry: SkillEntry to check
:param current_platform: Current platform (default: auto-detect)
:return: Dict like {"bins": ["curl"], "env": ["API_KEY"]}
"""
missing: Dict[str, List[str]] = {}
metadata = entry.metadata
if not metadata or not metadata.requires:
return missing
required_bins = metadata.requires.get('bins', [])
if required_bins:
missing_bins = [b for b in required_bins if not has_binary(b)]
if missing_bins:
missing['bins'] = missing_bins
any_bins = metadata.requires.get('anyBins', [])
if any_bins and not has_any_binary(any_bins):
missing['anyBins'] = any_bins
required_env = metadata.requires.get('env', [])
if required_env:
missing_env = [e for e in required_env if not has_env_var(e)]
if missing_env:
missing['env'] = missing_env
any_env = metadata.requires.get('anyEnv', [])
if any_env and not any(has_env_var(e) for e in any_env):
missing['anyEnv'] = any_env
return missing
def is_config_path_truthy(config: Dict, path: str) -> bool:
"""
Check if a config path resolves to a truthy value.
:param config: Configuration dictionary
:param path: Dot-separated path (e.g., 'skills.enabled')
:return: True if path resolves to truthy value
"""
parts = path.split('.')
current = config
for part in parts:
if not isinstance(current, dict):
return False
current = current.get(part)
if current is None:
return False
# Check if value is truthy
if isinstance(current, bool):
return current
if isinstance(current, (int, float)):
return current != 0
if isinstance(current, str):
return bool(current.strip())
return bool(current)
def resolve_config_path(config: Dict, path: str):
"""
Resolve a dot-separated config path to its value.
:param config: Configuration dictionary
:param path: Dot-separated path
:return: Value at path or None
"""
parts = path.split('.')
current = config
for part in parts:
if not isinstance(current, dict):
return None
current = current.get(part)
if current is None:
return None
return current

126
agent/skills/formatter.py Normal file
View File

@@ -0,0 +1,126 @@
"""
Skill formatter for generating prompts from skills.
"""
from typing import Dict, List
from agent.skills.types import Skill, SkillEntry
def format_skills_for_prompt(skills: List[Skill]) -> str:
"""
Format skills for inclusion in a system prompt.
Uses XML format per Agent Skills standard.
Skills with disable_model_invocation=True are excluded.
:param skills: List of skills to format
:return: Formatted prompt text
"""
# Filter out skills that should not be invoked by the model
visible_skills = [s for s in skills if not s.disable_model_invocation]
if not visible_skills:
return ""
lines = [
"",
"<available_skills>",
]
for skill in visible_skills:
lines.append(" <skill>")
lines.append(f" <name>{_escape_xml(skill.name)}</name>")
lines.append(f" <description>{_escape_xml(skill.description)}</description>")
lines.append(f" <location>{_escape_xml(skill.file_path)}</location>")
lines.append(f" <base_dir>{_escape_xml(skill.base_dir)}</base_dir>")
lines.append(" </skill>")
lines.append("</available_skills>")
return "\n".join(lines)
def format_skill_entries_for_prompt(entries: List[SkillEntry]) -> str:
"""
Format skill entries for inclusion in a system prompt.
:param entries: List of skill entries to format
:return: Formatted prompt text
"""
skills = [entry.skill for entry in entries]
return format_skills_for_prompt(skills)
def format_unavailable_skills_for_prompt(
entries: List[SkillEntry],
missing_map: Dict[str, Dict[str, List[str]]],
) -> str:
"""
Format unavailable (requires-not-met) skills as brief setup hints
so the AI can guide users to configure them.
:param entries: List of unavailable skill entries
:param missing_map: Dict mapping skill name to its missing requirements
:return: Formatted prompt text
"""
if not entries:
return ""
lines = [
"",
"<unavailable_skills>",
"The following skills are installed but not yet ready. "
"Guide the user to complete the setup when relevant.",
]
for entry in entries:
skill = entry.skill
missing = missing_map.get(skill.name, {})
missing_parts = []
for key, values in missing.items():
missing_parts.append(f"{key}: {', '.join(values)}")
missing_str = "; ".join(missing_parts) if missing_parts else "unknown"
setup_hint = _extract_setup_hint(skill)
lines.append(" <skill>")
lines.append(f" <name>{_escape_xml(skill.name)}</name>")
lines.append(f" <description>{_escape_xml(skill.description)}</description>")
lines.append(f" <missing>{_escape_xml(missing_str)}</missing>")
if setup_hint:
lines.append(f" <setup>{_escape_xml(setup_hint)}</setup>")
lines.append(" </skill>")
lines.append("</unavailable_skills>")
return "\n".join(lines)
def _extract_setup_hint(skill: Skill) -> str:
"""
Extract the Setup section from SKILL.md content as a brief hint.
Returns the first few lines of the ## Setup section.
"""
content = skill.content
if not content:
return ""
import re
match = re.search(r'^##\s+Setup\s*\n(.*?)(?=\n##\s|\Z)', content, re.MULTILINE | re.DOTALL)
if not match:
return ""
setup_text = match.group(1).strip()
lines = setup_text.split('\n')
hint_lines = [l.strip() for l in lines[:6] if l.strip()]
return ' '.join(hint_lines)[:300]
def _escape_xml(text: str) -> str:
"""Escape XML special characters."""
return (text
.replace('&', '&amp;')
.replace('<', '&lt;')
.replace('>', '&gt;')
.replace('"', '&quot;')
.replace("'", '&apos;'))

192
agent/skills/frontmatter.py Normal file
View File

@@ -0,0 +1,192 @@
"""
Frontmatter parsing for skills.
"""
import re
import json
from typing import Dict, Any, Optional, List
from agent.skills.types import SkillMetadata, SkillInstallSpec
def parse_frontmatter(content: str) -> Dict[str, Any]:
"""
Parse YAML-style frontmatter from markdown content.
Returns a dictionary of frontmatter fields.
"""
frontmatter = {}
# Match frontmatter block between --- markers
match = re.match(r'^---\s*\n(.*?)\n---\s*\n', content, re.DOTALL)
if not match:
return frontmatter
frontmatter_text = match.group(1)
# Try to use PyYAML for proper YAML parsing
try:
import yaml
frontmatter = yaml.safe_load(frontmatter_text)
if not isinstance(frontmatter, dict):
frontmatter = {}
return frontmatter
except ImportError:
# Fallback to simple parsing if PyYAML not available
pass
except Exception:
# If YAML parsing fails, fall back to simple parsing
pass
# Simple YAML-like parsing (supports key: value format only)
# This is a fallback for when PyYAML is not available
for line in frontmatter_text.split('\n'):
line = line.strip()
if not line or line.startswith('#'):
continue
if ':' in line:
key, value = line.split(':', 1)
key = key.strip()
value = value.strip()
# Try to parse as JSON if it looks like JSON
if value.startswith('{') or value.startswith('['):
try:
value = json.loads(value)
except json.JSONDecodeError:
pass
# Parse boolean values
elif value.lower() in ('true', 'false'):
value = value.lower() == 'true'
# Parse numbers
elif value.isdigit():
value = int(value)
frontmatter[key] = value
return frontmatter
def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
"""
Parse skill metadata from frontmatter.
Looks for 'metadata' field containing JSON with skill configuration.
"""
metadata_raw = frontmatter.get('metadata')
if not metadata_raw:
return None
# If it's a string, try to parse as JSON
if isinstance(metadata_raw, str):
try:
metadata_raw = json.loads(metadata_raw)
except json.JSONDecodeError:
return None
if not isinstance(metadata_raw, dict):
return None
# Unwrap nested namespace (e.g. {"openclaw": {...}} or {"cowagent": {...}})
meta_obj = _unwrap_metadata_namespace(metadata_raw)
# Parse install specs
install_specs = []
install_raw = meta_obj.get('install', [])
if isinstance(install_raw, list):
for spec_raw in install_raw:
if not isinstance(spec_raw, dict):
continue
kind = spec_raw.get('kind', spec_raw.get('type', '')).lower()
if not kind:
continue
spec = SkillInstallSpec(
kind=kind,
id=spec_raw.get('id'),
label=spec_raw.get('label'),
bins=_normalize_string_list(spec_raw.get('bins')),
os=_normalize_string_list(spec_raw.get('os')),
formula=spec_raw.get('formula'),
package=spec_raw.get('package'),
module=spec_raw.get('module'),
url=spec_raw.get('url'),
archive=spec_raw.get('archive'),
extract=spec_raw.get('extract', False),
strip_components=spec_raw.get('stripComponents'),
target_dir=spec_raw.get('targetDir'),
)
install_specs.append(spec)
# Parse requires
requires = {}
requires_raw = meta_obj.get('requires', {})
if isinstance(requires_raw, dict):
for key, value in requires_raw.items():
requires[key] = _normalize_string_list(value)
return SkillMetadata(
always=meta_obj.get('always', False),
default_enabled=meta_obj.get('default_enabled', True),
skill_key=meta_obj.get('skillKey'),
primary_env=meta_obj.get('primaryEnv'),
emoji=meta_obj.get('emoji'),
homepage=meta_obj.get('homepage'),
os=_normalize_string_list(meta_obj.get('os')),
requires=requires,
install=install_specs,
)
_KNOWN_METADATA_NAMESPACES = {"cowagent", "openclaw"}
def _unwrap_metadata_namespace(metadata_raw: Dict[str, Any]) -> Dict[str, Any]:
"""
Unwrap a single-key namespace wrapper like {"cowagent": {...} or {"openclaw": {...}}}.
If the top-level dict has exactly one key matching a known namespace, return the inner dict.
Otherwise return the original dict unchanged.
"""
keys = set(metadata_raw.keys())
ns_keys = keys & _KNOWN_METADATA_NAMESPACES
if len(ns_keys) == 1 and len(keys) == 1:
ns = ns_keys.pop()
inner = metadata_raw[ns]
if isinstance(inner, dict):
return inner
return metadata_raw
def _normalize_string_list(value: Any) -> List[str]:
"""Normalize a value to a list of strings."""
if not value:
return []
if isinstance(value, list):
return [str(v).strip() for v in value if v]
if isinstance(value, str):
return [v.strip() for v in value.split(',') if v.strip()]
return []
def parse_boolean_value(value: Optional[str], default: bool = False) -> bool:
"""Parse a boolean value from frontmatter."""
if value is None:
return default
if isinstance(value, bool):
return value
if isinstance(value, str):
return value.lower() in ('true', '1', 'yes', 'on')
return default
def get_frontmatter_value(frontmatter: Dict[str, Any], key: str) -> Optional[str]:
"""Get a frontmatter value as a string."""
value = frontmatter.get(key)
return str(value) if value is not None else None

286
agent/skills/loader.py Normal file
View File

@@ -0,0 +1,286 @@
"""
Skill loader for discovering and loading skills from directories.
"""
import os
from pathlib import Path
from typing import List, Optional, Dict
from common.log import logger
from agent.skills.types import Skill, SkillEntry, LoadSkillsResult, SkillMetadata
from agent.skills.frontmatter import parse_frontmatter, parse_metadata, parse_boolean_value, get_frontmatter_value
class SkillLoader:
"""Loads skills from various directories."""
def __init__(self):
pass
def load_skills_from_dir(self, dir_path: str, source: str) -> LoadSkillsResult:
"""
Load skills from a directory.
Discovery rules:
- Direct .md files in the root directory
- Recursive SKILL.md files under subdirectories
:param dir_path: Directory path to scan
:param source: Source identifier ('builtin' or 'custom')
:return: LoadSkillsResult with skills and diagnostics
"""
skills = []
diagnostics = []
if not os.path.exists(dir_path):
diagnostics.append(f"Directory does not exist: {dir_path}")
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
if not os.path.isdir(dir_path):
diagnostics.append(f"Path is not a directory: {dir_path}")
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
# Load skills from root-level .md files and subdirectories
result = self._load_skills_recursive(dir_path, source, include_root_files=True)
return result
def _load_skills_recursive(
self,
dir_path: str,
source: str,
include_root_files: bool = False
) -> LoadSkillsResult:
"""
Recursively load skills from a directory.
If a subdirectory contains its own SKILL.md, it is treated as a
self-contained skill (or skill-collection) and its children are
NOT scanned further. This prevents sub-skills inside a collection
(e.g. style-collection/style-anjing) from being listed as
independent top-level skills.
:param dir_path: Directory to scan
:param source: Source identifier
:param include_root_files: Whether to include root-level .md files
:return: LoadSkillsResult
"""
skills = []
diagnostics = []
try:
entries = os.listdir(dir_path)
except Exception as e:
diagnostics.append(f"Failed to list directory {dir_path}: {e}")
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
# If this directory has its own SKILL.md, load it and stop recursing.
# The sub-directories are internal resources of this skill.
if not include_root_files and 'SKILL.md' in entries:
skill_md_path = os.path.join(dir_path, 'SKILL.md')
if os.path.isfile(skill_md_path):
skill_result = self._load_skill_from_file(skill_md_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
diagnostics.extend(skill_result.diagnostics)
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
for entry in entries:
if entry.startswith('.'):
continue
if entry in ('node_modules', '__pycache__', 'venv', '.git'):
continue
full_path = os.path.join(dir_path, entry)
if os.path.isdir(full_path):
sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
skills.extend(sub_result.skills)
diagnostics.extend(sub_result.diagnostics)
continue
if not os.path.isfile(full_path):
continue
is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
if not is_root_md:
continue
skill_result = self._load_skill_from_file(full_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
diagnostics.extend(skill_result.diagnostics)
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
def _load_skill_from_file(self, file_path: str, source: str) -> LoadSkillsResult:
"""
Load a single skill from a markdown file.
:param file_path: Path to the skill markdown file
:param source: Source identifier
:return: LoadSkillsResult
"""
diagnostics = []
try:
with open(file_path, 'r', encoding='utf-8') as f:
content = f.read()
except Exception as e:
diagnostics.append(f"Failed to read skill file {file_path}: {e}")
return LoadSkillsResult(skills=[], diagnostics=diagnostics)
# Parse frontmatter
frontmatter = parse_frontmatter(content)
# Get skill name and description
skill_dir = os.path.dirname(file_path)
parent_dir_name = os.path.basename(skill_dir)
name = frontmatter.get('name', parent_dir_name)
description = frontmatter.get('description', '')
# Normalize name (handle both string and list)
if isinstance(name, list):
name = name[0] if name else parent_dir_name
elif not isinstance(name, str):
name = str(name) if name else parent_dir_name
# Normalize description (handle both string and list)
if isinstance(description, list):
description = ' '.join(str(d) for d in description if d)
elif not isinstance(description, str):
description = str(description) if description else ''
# Special handling for linkai-agent: dynamically load apps from config.json
if name == 'linkai-agent':
description = self._load_linkai_agent_description(skill_dir, description)
if not description or not description.strip():
diagnostics.append(f"Skill {name} has no description: {file_path}")
return LoadSkillsResult(skills=[], diagnostics=diagnostics)
# Parse disable-model-invocation flag
disable_model_invocation = parse_boolean_value(
get_frontmatter_value(frontmatter, 'disable-model-invocation'),
default=False
)
# Create skill object
skill = Skill(
name=name,
description=description,
file_path=file_path,
base_dir=skill_dir,
source=source,
content=content,
disable_model_invocation=disable_model_invocation,
frontmatter=frontmatter,
)
return LoadSkillsResult(skills=[skill], diagnostics=diagnostics)
def _load_linkai_agent_description(self, skill_dir: str, default_description: str) -> str:
"""
Dynamically load LinkAI agent description from config.json
:param skill_dir: Skill directory
:param default_description: Default description from SKILL.md
:return: Dynamic description with app list
"""
import json
config_path = os.path.join(skill_dir, "config.json")
if not os.path.exists(config_path):
logger.debug(f"[SkillLoader] linkai-agent skipped: no config.json found")
return ""
try:
with open(config_path, 'r', encoding='utf-8') as f:
config = json.load(f)
apps = config.get("apps", [])
if not apps:
return default_description
# Build dynamic description with app details
app_descriptions = "; ".join([
f"{app['app_name']}({app['app_code']}: {app['app_description']})"
for app in apps
])
return f"Call LinkAI apps/workflows. {app_descriptions}"
except Exception as e:
logger.warning(f"[SkillLoader] Failed to load linkai-agent config: {e}")
return default_description
def load_all_skills(
self,
builtin_dir: Optional[str] = None,
custom_dir: Optional[str] = None,
) -> Dict[str, SkillEntry]:
"""
Load skills from builtin and custom directories.
Precedence (lowest to highest):
1. builtin — project root ``skills/``, shipped with the codebase
2. custom — workspace ``skills/``, installed via cloud console or skill creator
Same-name custom skills override builtin ones.
:param builtin_dir: Built-in skills directory
:param custom_dir: Custom skills directory
:return: Dictionary mapping skill name to SkillEntry
"""
skill_map: Dict[str, SkillEntry] = {}
all_diagnostics = []
# Load builtin skills (lower precedence)
if builtin_dir and os.path.exists(builtin_dir):
result = self.load_skills_from_dir(builtin_dir, source='builtin')
all_diagnostics.extend(result.diagnostics)
for skill in result.skills:
entry = self._create_skill_entry(skill)
skill_map[skill.name] = entry
# Load custom skills (higher precedence, overrides builtin)
if custom_dir and os.path.exists(custom_dir):
result = self.load_skills_from_dir(custom_dir, source='custom')
all_diagnostics.extend(result.diagnostics)
for skill in result.skills:
entry = self._create_skill_entry(skill)
skill_map[skill.name] = entry
# Log diagnostics
if all_diagnostics:
logger.debug(f"Skill loading diagnostics: {len(all_diagnostics)} issues")
for diag in all_diagnostics[:5]:
logger.debug(f" - {diag}")
logger.debug(f"Loaded {len(skill_map)} skills total")
return skill_map
def _create_skill_entry(self, skill: Skill) -> SkillEntry:
"""
Create a SkillEntry from a Skill with parsed metadata.
:param skill: The skill to create an entry for
:return: SkillEntry with metadata
"""
metadata = parse_metadata(skill.frontmatter)
# Parse user-invocable flag
user_invocable = parse_boolean_value(
get_frontmatter_value(skill.frontmatter, 'user-invocable'),
default=True
)
return SkillEntry(
skill=skill,
metadata=metadata,
user_invocable=user_invocable,
)

361
agent/skills/manager.py Normal file
View File

@@ -0,0 +1,361 @@
"""
Skill manager for managing skill lifecycle and operations.
"""
import os
import json
from typing import Dict, List, Optional
from pathlib import Path
from common.log import logger
from agent.skills.types import Skill, SkillEntry, SkillSnapshot
from agent.skills.loader import SkillLoader
from agent.skills.formatter import format_skill_entries_for_prompt
SKILLS_CONFIG_FILE = "skills_config.json"
class SkillManager:
"""Manages skills for an agent."""
def __init__(
self,
builtin_dir: Optional[str] = None,
custom_dir: Optional[str] = None,
config: Optional[Dict] = None,
):
"""
Initialize the skill manager.
:param builtin_dir: Built-in skills directory (project root ``skills/``)
:param custom_dir: Custom skills directory (workspace ``skills/``)
:param config: Configuration dictionary
"""
project_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
self.builtin_dir = builtin_dir or os.path.join(project_root, 'skills')
self.custom_dir = custom_dir or os.path.join(project_root, 'workspace', 'skills')
self.config = config or {}
self._skills_config_path = os.path.join(self.custom_dir, SKILLS_CONFIG_FILE)
# skills_config: full skill metadata keyed by name
# { "web-fetch": {"name": ..., "description": ..., "source": ..., "enabled": true}, ... }
self.skills_config: Dict[str, dict] = {}
self.loader = SkillLoader()
self.skills: Dict[str, SkillEntry] = {}
# Load skills on initialization
self.refresh_skills()
def refresh_skills(self):
"""Reload all skills from builtin and custom directories, then sync config."""
self.skills = self.loader.load_all_skills(
builtin_dir=self.builtin_dir,
custom_dir=self.custom_dir,
)
self._sync_skills_config()
logger.debug(f"SkillManager: Loaded {len(self.skills)} skills")
# ------------------------------------------------------------------
# skills_config.json management
# ------------------------------------------------------------------
def _load_skills_config(self) -> Dict[str, dict]:
"""Load skills_config.json from custom_dir. Returns empty dict if not found."""
if not os.path.exists(self._skills_config_path):
return {}
try:
with open(self._skills_config_path, "r", encoding="utf-8") as f:
data = json.load(f)
if isinstance(data, dict):
return data
except Exception as e:
logger.warning(f"[SkillManager] Failed to load {SKILLS_CONFIG_FILE}: {e}")
return {}
def _save_skills_config(self):
"""Persist skills_config to custom_dir/skills_config.json."""
os.makedirs(self.custom_dir, exist_ok=True)
try:
with open(self._skills_config_path, "w", encoding="utf-8") as f:
json.dump(self.skills_config, f, indent=4, ensure_ascii=False)
except Exception as e:
logger.error(f"[SkillManager] Failed to save {SKILLS_CONFIG_FILE}: {e}")
def _sync_skills_config(self):
"""
Merge directory-scanned skills with the persisted config file.
- New skills: use metadata.default_enabled as initial enabled state.
- Existing skills: preserve their persisted enabled state.
- Skills that no longer exist on disk are removed.
- name/description/source are always refreshed from the latest scan.
"""
saved = self._load_skills_config()
merged: Dict[str, dict] = {}
for name, entry in self.skills.items():
skill = entry.skill
prev = saved.get(name, {})
category = prev.get("category", "skill")
if name in saved:
enabled = prev.get("enabled", True)
else:
enabled = entry.metadata.default_enabled if entry.metadata else True
entry_dict = {
"name": name,
"description": skill.description,
"source": prev.get("source") or skill.source,
"enabled": enabled,
"category": category,
}
display_name = prev.get("display_name")
if display_name:
entry_dict["display_name"] = display_name
merged[name] = entry_dict
self.skills_config = merged
self._save_skills_config()
def is_skill_enabled(self, name: str) -> bool:
"""
Check if a skill is enabled according to skills_config.
:param name: skill name
:return: True if enabled (default True if not in config)
"""
entry = self.skills_config.get(name)
if entry is None:
return True
return entry.get("enabled", True)
def set_skill_enabled(self, name: str, enabled: bool):
"""
Set a skill's enabled state and persist.
:param name: skill name
:param enabled: True to enable, False to disable
"""
if name not in self.skills_config:
raise ValueError(f"skill '{name}' not found in config")
self.skills_config[name]["enabled"] = enabled
self._save_skills_config()
def get_skills_config(self) -> Dict[str, dict]:
"""
Return the full skills_config dict (for query API).
:return: copy of skills_config
"""
return dict(self.skills_config)
def get_skill(self, name: str) -> Optional[SkillEntry]:
"""
Get a skill by name.
:param name: Skill name
:return: SkillEntry or None if not found
"""
return self.skills.get(name)
def list_skills(self) -> List[SkillEntry]:
"""
Get all loaded skills.
:return: List of all skill entries
"""
return list(self.skills.values())
@staticmethod
def _normalize_skill_filter(skill_filter: Optional[List[str]]) -> Optional[List[str]]:
"""Normalize a skill_filter list into a flat list of stripped names."""
if skill_filter is None:
return None
normalized = []
for item in skill_filter:
if isinstance(item, str):
name = item.strip()
if name:
normalized.append(name)
elif isinstance(item, list):
for subitem in item:
if isinstance(subitem, str):
name = subitem.strip()
if name:
normalized.append(name)
return normalized or None
def filter_skills(
self,
skill_filter: Optional[List[str]] = None,
include_disabled: bool = False,
) -> List[SkillEntry]:
"""
Filter skills that are eligible (enabled + requirements met).
:param skill_filter: List of skill names to include (None = all)
:param include_disabled: Whether to include disabled skills
:return: Filtered list of eligible skill entries
"""
from agent.skills.config import should_include_skill
entries = list(self.skills.values())
entries = [e for e in entries if should_include_skill(e, self.config)]
normalized = self._normalize_skill_filter(skill_filter)
if normalized is not None:
entries = [e for e in entries if e.skill.name in normalized]
if not include_disabled:
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
from config import conf
if not conf().get("knowledge", True):
entries = [e for e in entries if e.skill.name != "knowledge-wiki"]
return entries
def filter_unavailable_skills(
self,
skill_filter: Optional[List[str]] = None,
) -> tuple:
"""
Find skills that are enabled but have unmet requirements.
:param skill_filter: Optional list of skill names to include
:return: Tuple of (entries, missing_map) where missing_map maps
skill name to its missing requirements dict
"""
from agent.skills.config import should_include_skill, get_missing_requirements
entries = list(self.skills.values())
# Only enabled skills
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
normalized = self._normalize_skill_filter(skill_filter)
if normalized is not None:
entries = [e for e in entries if e.skill.name in normalized]
# Keep only those that fail should_include_skill (requirements not met)
unavailable = []
missing_map: Dict[str, dict] = {}
for e in entries:
if not should_include_skill(e, self.config):
missing = get_missing_requirements(e)
if missing:
unavailable.append(e)
missing_map[e.skill.name] = missing
return unavailable, missing_map
def build_skills_prompt(
self,
skill_filter: Optional[List[str]] = None,
) -> str:
"""
Build a formatted prompt containing available skills
and brief hints for unavailable ones.
:param skill_filter: Optional list of skill names to include
:return: Formatted skills prompt
"""
from common.log import logger
from agent.skills.formatter import format_unavailable_skills_for_prompt
eligible = self.filter_skills(skill_filter=skill_filter, include_disabled=False)
logger.debug(f"[SkillManager] Eligible: {len(eligible)} skills (total: {len(self.skills)})")
if eligible:
skill_names = [e.skill.name for e in eligible]
logger.debug(f"[SkillManager] Eligible skills: {skill_names}")
result = format_skill_entries_for_prompt(eligible)
unavailable, missing_map = self.filter_unavailable_skills(skill_filter=skill_filter)
if unavailable:
unavailable_names = [e.skill.name for e in unavailable]
logger.debug(f"[SkillManager] Unavailable skills (setup needed): {unavailable_names}")
result += format_unavailable_skills_for_prompt(unavailable, missing_map)
logger.debug(f"[SkillManager] Generated prompt length: {len(result)}")
return result
def build_skill_snapshot(
self,
skill_filter: Optional[List[str]] = None,
version: Optional[int] = None,
) -> SkillSnapshot:
"""
Build a snapshot of skills for a specific run.
:param skill_filter: Optional list of skill names to include
:param version: Optional version number for the snapshot
:return: SkillSnapshot
"""
entries = self.filter_skills(skill_filter=skill_filter, include_disabled=False)
prompt = format_skill_entries_for_prompt(entries)
skills_info = []
resolved_skills = []
for entry in entries:
skills_info.append({
'name': entry.skill.name,
'primary_env': entry.metadata.primary_env if entry.metadata else None,
})
resolved_skills.append(entry.skill)
return SkillSnapshot(
prompt=prompt,
skills=skills_info,
resolved_skills=resolved_skills,
version=version,
)
def sync_skills_to_workspace(self, target_workspace_dir: str):
"""
Sync all loaded skills to a target workspace directory.
This is useful for sandbox environments where skills need to be copied.
:param target_workspace_dir: Target workspace directory
"""
import shutil
target_skills_dir = os.path.join(target_workspace_dir, 'skills')
# Remove existing skills directory
if os.path.exists(target_skills_dir):
shutil.rmtree(target_skills_dir)
# Create new skills directory
os.makedirs(target_skills_dir, exist_ok=True)
# Copy each skill
for entry in self.skills.values():
skill_name = entry.skill.name
source_dir = entry.skill.base_dir
target_dir = os.path.join(target_skills_dir, skill_name)
try:
shutil.copytree(source_dir, target_dir)
logger.debug(f"Synced skill '{skill_name}' to {target_dir}")
except Exception as e:
logger.warning(f"Failed to sync skill '{skill_name}': {e}")
logger.info(f"Synced {len(self.skills)} skills to {target_skills_dir}")
def get_skill_by_key(self, skill_key: str) -> Optional[SkillEntry]:
"""
Get a skill by its skill key (which may differ from name).
:param skill_key: Skill key to look up
:return: SkillEntry or None
"""
for entry in self.skills.values():
if entry.metadata and entry.metadata.skill_key == skill_key:
return entry
if entry.skill.name == skill_key:
return entry
return None

285
agent/skills/service.py Normal file
View File

@@ -0,0 +1,285 @@
"""
Skill service for handling skill CRUD operations.
This service provides a unified interface for managing skills, which can be
called from the cloud control client (LinkAI), the local web console, or any
other management entry point.
"""
import os
import shutil
import zipfile
import tempfile
from typing import Dict, List, Optional
from common.log import logger
from agent.skills.types import Skill, SkillEntry
from agent.skills.manager import SkillManager
try:
import requests
except ImportError:
requests = None
class SkillService:
"""
High-level service for skill lifecycle management.
Wraps SkillManager and provides network-aware operations such as
downloading skill files from remote URLs.
"""
def __init__(self, skill_manager: SkillManager):
"""
:param skill_manager: The SkillManager instance to operate on
"""
self.manager = skill_manager
# ------------------------------------------------------------------
# query
# ------------------------------------------------------------------
def query(self) -> List[dict]:
"""
Query all skills and return a serialisable list.
Reads from skills_config.json (refreshes from disk if needed).
:return: list of skill info dicts
"""
self.manager.refresh_skills()
config = self.manager.get_skills_config()
result = list(config.values())
logger.info(f"[SkillService] query: {len(result)} skills found")
return result
# ------------------------------------------------------------------
# add / install
# ------------------------------------------------------------------
def add(self, payload: dict) -> None:
"""
Add (install) a skill from a remote payload.
Supported payload types:
1. ``type: "url"`` download individual files::
{
"name": "web_search",
"type": "url",
"enabled": true,
"files": [
{"url": "https://...", "path": "README.md"},
{"url": "https://...", "path": "scripts/main.py"}
]
}
2. ``type: "package"`` download a zip archive and extract::
{
"name": "plugin-custom-tool",
"type": "package",
"category": "skills",
"enabled": true,
"files": [{"url": "https://cdn.example.com/skills/custom-tool.zip"}]
}
:param payload: skill add payload from server
"""
name = payload.get("name")
if not name:
raise ValueError("skill name is required")
payload_type = payload.get("type", "url")
if payload_type == "package":
self._add_package(name, payload)
else:
self._add_url(name, payload)
self.manager.refresh_skills()
category = payload.get("category")
if category and name in self.manager.skills_config:
self.manager.skills_config[name]["category"] = category
self.manager._save_skills_config()
def _add_url(self, name: str, payload: dict) -> None:
"""Install a skill by downloading individual files."""
files = payload.get("files", [])
if not files:
raise ValueError("skill files list is empty")
skill_dir = os.path.join(self.manager.custom_dir, name)
tmp_dir = skill_dir + ".tmp"
if os.path.exists(tmp_dir):
shutil.rmtree(tmp_dir)
os.makedirs(tmp_dir, exist_ok=True)
try:
for file_info in files:
url = file_info.get("url")
rel_path = file_info.get("path")
if not url or not rel_path:
logger.warning(f"[SkillService] add: skip invalid file entry {file_info}")
continue
dest = os.path.join(tmp_dir, rel_path)
self._download_file(url, dest)
except Exception:
shutil.rmtree(tmp_dir, ignore_errors=True)
raise
if os.path.exists(skill_dir):
shutil.rmtree(skill_dir)
os.rename(tmp_dir, skill_dir)
logger.info(f"[SkillService] add: skill '{name}' installed via url ({len(files)} files)")
def _add_package(self, name: str, payload: dict) -> None:
"""
Install a skill by downloading a zip archive and extracting it.
If the archive contains a single top-level directory, that directory
is used as the skill folder directly; otherwise a new directory named
after the skill is created to hold the extracted contents.
"""
files = payload.get("files", [])
if not files or not files[0].get("url"):
raise ValueError("package url is required")
url = files[0]["url"]
skill_dir = os.path.join(self.manager.custom_dir, name)
with tempfile.TemporaryDirectory() as tmp_dir:
zip_path = os.path.join(tmp_dir, "package.zip")
self._download_file(url, zip_path)
if not zipfile.is_zipfile(zip_path):
raise ValueError(f"downloaded file is not a valid zip archive: {url}")
extract_dir = os.path.join(tmp_dir, "extracted")
with zipfile.ZipFile(zip_path, "r") as zf:
zf.extractall(extract_dir)
# Determine the actual content root.
# If the zip has a single top-level directory, use its contents
# so the skill folder is clean (no extra nesting).
top_items = [
item for item in os.listdir(extract_dir)
if not item.startswith(".")
]
if len(top_items) == 1:
single = os.path.join(extract_dir, top_items[0])
if os.path.isdir(single):
extract_dir = single
if os.path.exists(skill_dir):
shutil.rmtree(skill_dir)
shutil.copytree(extract_dir, skill_dir)
logger.info(f"[SkillService] add: skill '{name}' installed via package ({url})")
# ------------------------------------------------------------------
# open / close (enable / disable)
# ------------------------------------------------------------------
def open(self, payload: dict) -> None:
"""
Enable a skill by name.
:param payload: {"name": "skill_name"}
"""
name = payload.get("name")
if not name:
raise ValueError("skill name is required")
self.manager.set_skill_enabled(name, enabled=True)
logger.info(f"[SkillService] open: skill '{name}' enabled")
def close(self, payload: dict) -> None:
"""
Disable a skill by name.
:param payload: {"name": "skill_name"}
"""
name = payload.get("name")
if not name:
raise ValueError("skill name is required")
self.manager.set_skill_enabled(name, enabled=False)
logger.info(f"[SkillService] close: skill '{name}' disabled")
# ------------------------------------------------------------------
# delete
# ------------------------------------------------------------------
def delete(self, payload: dict) -> None:
"""
Delete a skill by removing its directory entirely.
:param payload: {"name": "skill_name"}
"""
name = payload.get("name")
if not name:
raise ValueError("skill name is required")
skill_dir = os.path.join(self.manager.custom_dir, name)
if os.path.exists(skill_dir):
shutil.rmtree(skill_dir)
logger.info(f"[SkillService] delete: removed directory {skill_dir}")
else:
logger.warning(f"[SkillService] delete: skill directory not found: {skill_dir}")
# Refresh will remove the deleted skill from config automatically
self.manager.refresh_skills()
logger.info(f"[SkillService] delete: skill '{name}' deleted")
# ------------------------------------------------------------------
# dispatch - single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a skill management action and return a protocol-compatible
response dict.
:param action: one of query / add / open / close / delete
:param payload: action-specific payload (may be None for query)
:return: dict with action, code, message, payload
"""
payload = payload or {}
try:
if action == "query":
result_payload = self.query()
return {"action": action, "code": 200, "message": "success", "payload": result_payload}
elif action == "add":
self.add(payload)
elif action == "open":
self.open(payload)
elif action == "close":
self.close(payload)
elif action == "delete":
self.delete(payload)
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
return {"action": action, "code": 200, "message": "success", "payload": None}
except Exception as e:
logger.error(f"[SkillService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}
# ------------------------------------------------------------------
# internal helpers
# ------------------------------------------------------------------
@staticmethod
def _download_file(url: str, dest: str):
"""
Download a file from *url* and save to *dest*.
:param url: remote file URL
:param dest: local destination path
"""
if requests is None:
raise RuntimeError("requests library is required for downloading skill files")
dest_dir = os.path.dirname(dest)
if dest_dir:
os.makedirs(dest_dir, exist_ok=True)
resp = requests.get(url, timeout=60)
resp.raise_for_status()
with open(dest, "wb") as f:
f.write(resp.content)
logger.debug(f"[SkillService] downloaded {url} -> {dest}")

76
agent/skills/types.py Normal file
View File

@@ -0,0 +1,76 @@
"""
Type definitions for skills system.
"""
from __future__ import annotations
from typing import Dict, List, Optional, Any
from dataclasses import dataclass, field
@dataclass
class SkillInstallSpec:
"""Specification for installing skill dependencies."""
kind: str # brew, pip, npm, download, etc.
id: Optional[str] = None
label: Optional[str] = None
bins: List[str] = field(default_factory=list)
os: List[str] = field(default_factory=list)
formula: Optional[str] = None # for brew
package: Optional[str] = None # for pip/npm
module: Optional[str] = None
url: Optional[str] = None # for download
archive: Optional[str] = None
extract: bool = False
strip_components: Optional[int] = None
target_dir: Optional[str] = None
@dataclass
class SkillMetadata:
"""Metadata for a skill from frontmatter."""
always: bool = False # Always include this skill
default_enabled: bool = True # Initial enabled state when first discovered
skill_key: Optional[str] = None # Override skill key
primary_env: Optional[str] = None # Primary environment variable
emoji: Optional[str] = None
homepage: Optional[str] = None
os: List[str] = field(default_factory=list) # Supported OS platforms
requires: Dict[str, List[str]] = field(default_factory=dict) # Requirements
install: List[SkillInstallSpec] = field(default_factory=list)
@dataclass
class Skill:
"""Represents a skill loaded from a markdown file."""
name: str
description: str
file_path: str
base_dir: str
source: str # builtin or custom
content: str # Full markdown content
disable_model_invocation: bool = False
frontmatter: Dict[str, Any] = field(default_factory=dict)
@dataclass
class SkillEntry:
"""A skill with parsed metadata."""
skill: Skill
metadata: Optional[SkillMetadata] = None
user_invocable: bool = True # Can users invoke this skill directly
@dataclass
class LoadSkillsResult:
"""Result of loading skills from a directory."""
skills: List[Skill]
diagnostics: List[str] = field(default_factory=list)
@dataclass
class SkillSnapshot:
"""Snapshot of skills for a specific run."""
prompt: str # Formatted prompt text
skills: List[Dict[str, str]] # List of skill info (name, primary_env)
resolved_skills: List[Skill] = field(default_factory=list)
version: Optional[int] = None

149
agent/tools/__init__.py Normal file
View File

@@ -0,0 +1,149 @@
# Import base tool
from agent.tools.base_tool import BaseTool
from agent.tools.tool_manager import ToolManager
# Import file operation tools
from agent.tools.read.read import Read
from agent.tools.write.write import Write
from agent.tools.edit.edit import Edit
from agent.tools.bash.bash import Bash
from agent.tools.ls.ls import Ls
from agent.tools.send.send import Send
# Import memory tools
from agent.tools.memory.memory_search import MemorySearchTool
from agent.tools.memory.memory_get import MemoryGetTool
# Import tools with optional dependencies
def _import_optional_tools():
"""Import tools that have optional dependencies"""
from common.log import logger
tools = {}
# EnvConfig Tool (requires python-dotenv)
try:
from agent.tools.env_config.env_config import EnvConfig
tools['EnvConfig'] = EnvConfig
except ImportError as e:
logger.error(
f"[Tools] EnvConfig tool not loaded - missing dependency: {e}\n"
f" To enable environment variable management, run:\n"
f" pip install python-dotenv>=1.0.0"
)
except Exception as e:
logger.error(f"[Tools] EnvConfig tool failed to load: {e}")
# Scheduler Tool (requires croniter)
try:
from agent.tools.scheduler.scheduler_tool import SchedulerTool
tools['SchedulerTool'] = SchedulerTool
except ImportError as e:
logger.error(
f"[Tools] Scheduler tool not loaded - missing dependency: {e}\n"
f" To enable scheduled tasks, run:\n"
f" pip install croniter>=2.0.0"
)
except Exception as e:
logger.error(f"[Tools] Scheduler tool failed to load: {e}")
# WebSearch Tool (conditionally loaded based on API key availability at init time)
try:
from agent.tools.web_search.web_search import WebSearch
tools['WebSearch'] = WebSearch
except ImportError as e:
logger.error(f"[Tools] WebSearch not loaded - missing dependency: {e}")
except Exception as e:
logger.error(f"[Tools] WebSearch failed to load: {e}")
# WebFetch Tool
try:
from agent.tools.web_fetch.web_fetch import WebFetch
tools['WebFetch'] = WebFetch
except ImportError as e:
logger.error(f"[Tools] WebFetch not loaded - missing dependency: {e}")
except Exception as e:
logger.error(f"[Tools] WebFetch failed to load: {e}")
# Vision Tool (conditionally loaded based on API key availability)
try:
from agent.tools.vision.vision import Vision
tools['Vision'] = Vision
except ImportError as e:
logger.error(f"[Tools] Vision not loaded - missing dependency: {e}")
except Exception as e:
logger.error(f"[Tools] Vision failed to load: {e}")
return tools
# Load optional tools
_optional_tools = _import_optional_tools()
EnvConfig = _optional_tools.get('EnvConfig')
SchedulerTool = _optional_tools.get('SchedulerTool')
WebSearch = _optional_tools.get('WebSearch')
WebFetch = _optional_tools.get('WebFetch')
Vision = _optional_tools.get('Vision')
GoogleSearch = _optional_tools.get('GoogleSearch')
FileSave = _optional_tools.get('FileSave')
Terminal = _optional_tools.get('Terminal')
# BrowserTool (requires playwright)
def _import_browser_tool():
from common.log import logger
try:
from agent.tools.browser.browser_tool import BrowserTool
return BrowserTool
except ImportError as e:
logger.info(
f"[Tools] BrowserTool not loaded - missing dependency: {e}\n"
f" To enable browser tool, run:\n"
f" pip install playwright\n"
f" playwright install chromium"
)
return None
except Exception as e:
logger.error(f"[Tools] BrowserTool failed to load: {e}")
return None
BrowserTool = _import_browser_tool()
# MCP Tools (no extra dependencies, loaded on demand)
def _import_mcp_tools():
"""导入 MCP 工具模块(无额外依赖,按需加载)"""
from common.log import logger
try:
from agent.tools.mcp.mcp_tool import McpTool
from agent.tools.mcp.mcp_client import McpClientRegistry
return {'McpTool': McpTool, 'McpClientRegistry': McpClientRegistry}
except Exception as e:
logger.warning(f"[Tools] MCP tools not loaded: {e}")
return {}
_mcp_tools = _import_mcp_tools()
McpTool = _mcp_tools.get('McpTool')
McpClientRegistry = _mcp_tools.get('McpClientRegistry')
# Export all tools (including optional ones that might be None)
__all__ = [
'BaseTool',
'ToolManager',
'Read',
'Write',
'Edit',
'Bash',
'Ls',
'Send',
'MemorySearchTool',
'MemoryGetTool',
'EnvConfig',
'SchedulerTool',
'WebSearch',
'WebFetch',
'Vision',
'BrowserTool',
'McpTool',
]
"""
Tools module for Agent.
"""

99
agent/tools/base_tool.py Normal file
View File

@@ -0,0 +1,99 @@
from enum import Enum
from typing import Any, Optional
from common.log import logger
import copy
class ToolStage(Enum):
"""Enum representing tool decision stages"""
PRE_PROCESS = "pre_process" # Tools that need to be actively selected by the agent
POST_PROCESS = "post_process" # Tools that automatically execute after final_answer
class ToolResult:
"""Tool execution result"""
def __init__(self, status: str = None, result: Any = None, ext_data: Any = None):
self.status = status
self.result = result
self.ext_data = ext_data
@staticmethod
def success(result, ext_data: Any = None):
return ToolResult(status="success", result=result, ext_data=ext_data)
@staticmethod
def fail(result, ext_data: Any = None):
return ToolResult(status="error", result=result, ext_data=ext_data)
class BaseTool:
"""Base class for all tools."""
# Default decision stage is pre-process
stage = ToolStage.PRE_PROCESS
# Class attributes must be inherited
name: str = "base_tool"
description: str = "Base tool"
params: dict = {} # Store JSON Schema
model: Optional[Any] = None # LLM model instance, type depends on bot implementation
@classmethod
def get_json_schema(cls) -> dict:
"""Get the standard description of the tool"""
return {
"name": cls.name,
"description": cls.description,
"parameters": cls.params
}
def execute_tool(self, params: dict) -> ToolResult:
try:
return self.execute(params)
except Exception as e:
logger.error(e)
def execute(self, params: dict) -> ToolResult:
"""Specific logic to be implemented by subclasses"""
raise NotImplementedError
@classmethod
def _parse_schema(cls) -> dict:
"""Convert JSON Schema to Pydantic fields"""
fields = {}
for name, prop in cls.params["properties"].items():
# Convert JSON Schema types to Python types
type_map = {
"string": str,
"number": float,
"integer": int,
"boolean": bool,
"array": list,
"object": dict
}
fields[name] = (
type_map[prop["type"]],
prop.get("default", ...)
)
return fields
def should_auto_execute(self, context) -> bool:
"""
Determine if this tool should be automatically executed based on context.
:param context: The agent context
:return: True if the tool should be executed, False otherwise
"""
# Only tools in post-process stage will be automatically executed
return self.stage == ToolStage.POST_PROCESS
def close(self):
"""
Close any resources used by the tool.
This method should be overridden by tools that need to clean up resources
such as browser connections, file handles, etc.
By default, this method does nothing.
"""
pass

View File

@@ -0,0 +1,3 @@
from .bash import Bash
__all__ = ['Bash']

295
agent/tools/bash/bash.py Normal file
View File

@@ -0,0 +1,295 @@
"""
Bash tool - Execute bash commands
"""
import os
import re
import sys
import subprocess
import tempfile
from typing import Dict, Any
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.utils.truncate import truncate_tail, format_size, DEFAULT_MAX_LINES, DEFAULT_MAX_BYTES
from common.log import logger
from common.utils import expand_path
class Bash(BaseTool):
"""Tool for executing bash commands"""
_IS_WIN = sys.platform == "win32"
name: str = "bash"
description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
{'''
PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
''' if _IS_WIN else ''}
ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.
SAFETY:
- Freely create/modify/delete files within the workspace
- For destructive commands out of workspace, explain and confirm first"""
params: dict = {
"type": "object",
"properties": {
"command": {
"type": "string",
"description": "Bash command to execute"
},
"timeout": {
"type": "integer",
"description": "Timeout in seconds (optional, default: 30)"
}
},
"required": ["command"]
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
# Ensure working directory exists
if not os.path.exists(self.cwd):
os.makedirs(self.cwd, exist_ok=True)
self.default_timeout = self.config.get("timeout", 30)
# Enable safety mode by default (can be disabled in config)
self.safety_mode = self.config.get("safety_mode", True)
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute a bash command
:param args: Dictionary containing the command and optional timeout
:return: Command output or error
"""
command = args.get("command", "").strip()
timeout = args.get("timeout", self.default_timeout)
if not command:
return ToolResult.fail("Error: command parameter is required")
# Security check: Prevent accessing sensitive config files
if "~/.cow/.env" in command or "~/.cow" in command:
return ToolResult.fail(
"Error: Access denied. API keys and credentials must be accessed through the env_config tool only."
)
# Optional safety check - only warn about extremely dangerous commands
if self.safety_mode:
warning = self._get_safety_warning(command)
if warning:
return ToolResult.fail(
f"Safety Warning: {warning}\n\nIf you believe this command is safe and necessary, please ask the user for confirmation first, explaining what the command does and why it's needed.")
try:
# Prepare environment with .env file variables
env = os.environ.copy()
# Load environment variables from ~/.cow/.env if it exists
env_file = expand_path("~/.cow/.env")
dotenv_vars = {}
if os.path.exists(env_file):
try:
from dotenv import dotenv_values
dotenv_vars = dotenv_values(env_file)
env.update(dotenv_vars)
logger.debug(f"[Bash] Loaded {len(dotenv_vars)} variables from {env_file}")
except ImportError:
logger.debug("[Bash] python-dotenv not installed, skipping .env loading")
except Exception as e:
logger.debug(f"[Bash] Failed to load .env: {e}")
# getuid() only exists on Unix-like systems
if hasattr(os, 'getuid'):
logger.debug(f"[Bash] Process UID: {os.getuid()}")
else:
logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
# On Windows, convert $VAR references to %VAR% for cmd.exe
if self._IS_WIN:
env["PYTHONIOENCODING"] = "utf-8"
command = self._convert_env_vars_for_windows(command, dotenv_vars)
if command and not command.strip().lower().startswith("chcp"):
command = f"chcp 65001 >nul 2>&1 && {command}"
result = subprocess.run(
command,
shell=True,
cwd=self.cwd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
encoding="utf-8",
errors="replace",
timeout=timeout,
env=env,
)
logger.debug(f"[Bash] Exit code: {result.returncode}")
logger.debug(f"[Bash] Stdout length: {len(result.stdout)}")
logger.debug(f"[Bash] Stderr length: {len(result.stderr)}")
# Workaround for exit code 126 with no output
if result.returncode == 126 and not result.stdout and not result.stderr:
logger.warning(f"[Bash] Exit 126 with no output - trying alternative execution method")
# Try using argument list instead of shell=True
import shlex
try:
parts = shlex.split(command)
if len(parts) > 0:
logger.info(f"[Bash] Retrying with argument list: {parts[:3]}...")
retry_result = subprocess.run(
parts,
cwd=self.cwd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
encoding="utf-8",
errors="replace",
timeout=timeout,
env=env
)
logger.debug(f"[Bash] Retry exit code: {retry_result.returncode}, stdout: {len(retry_result.stdout)}, stderr: {len(retry_result.stderr)}")
# If retry succeeded, use retry result
if retry_result.returncode == 0 or retry_result.stdout or retry_result.stderr:
result = retry_result
else:
# Both attempts failed - check if this is openai-image-vision skill
if 'openai-image-vision' in command or 'vision.sh' in command:
# Create a mock result with helpful error message
from types import SimpleNamespace
result = SimpleNamespace(
returncode=1,
stdout='{"error": "图片无法解析", "reason": "该图片格式可能不受支持,或图片文件存在问题", "suggestion": "请尝试其他图片"}',
stderr=''
)
logger.info(f"[Bash] Converted exit 126 to user-friendly image error message for vision skill")
except Exception as retry_err:
logger.warning(f"[Bash] Retry failed: {retry_err}")
# When command succeeds with stdout, keep output clean (stderr goes to server log only).
# When command fails or stdout is empty, include stderr so the agent can diagnose.
if result.returncode == 0 and result.stdout.strip():
output = result.stdout
if result.stderr:
logger.info(f"[Bash] stderr (not forwarded): {result.stderr[:500]}")
else:
output = result.stdout
if result.stderr:
output += "\n" + result.stderr
# Check if we need to save full output to temp file
temp_file_path = None
total_bytes = len(output.encode('utf-8'))
if total_bytes > DEFAULT_MAX_BYTES:
# Save full output to temp file
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.log', prefix='bash-') as f:
f.write(output)
temp_file_path = f.name
# Apply tail truncation
truncation = truncate_tail(output)
output_text = truncation.content or "(no output)"
# Build result
details = {}
if truncation.truncated:
details["truncation"] = truncation.to_dict()
if temp_file_path:
details["full_output_path"] = temp_file_path
# Build notice
start_line = truncation.total_lines - truncation.output_lines + 1
end_line = truncation.total_lines
if truncation.last_line_partial:
# Edge case: last line alone > 30KB
last_line = output.split('\n')[-1] if output else ""
last_line_size = format_size(len(last_line.encode('utf-8')))
output_text += f"\n\n[Showing last {format_size(truncation.output_bytes)} of line {end_line} (line is {last_line_size}). Full output: {temp_file_path}]"
elif truncation.truncated_by == "lines":
output_text += f"\n\n[Showing lines {start_line}-{end_line} of {truncation.total_lines}. Full output: {temp_file_path}]"
else:
output_text += f"\n\n[Showing lines {start_line}-{end_line} of {truncation.total_lines} ({format_size(DEFAULT_MAX_BYTES)} limit). Full output: {temp_file_path}]"
# Check exit code
if result.returncode != 0:
output_text += f"\n\nCommand exited with code {result.returncode}"
return ToolResult.fail({
"output": output_text,
"exit_code": result.returncode,
"details": details if details else None
})
return ToolResult.success({
"output": output_text,
"exit_code": result.returncode,
"details": details if details else None
})
except subprocess.TimeoutExpired:
return ToolResult.fail(f"Error: Command timed out after {timeout} seconds")
except Exception as e:
return ToolResult.fail(f"Error executing command: {str(e)}")
def _get_safety_warning(self, command: str) -> str:
"""
Get safety warning for absolutely catastrophic commands only.
Keep the blocklist minimal so the agent retains maximum freedom.
:param command: Command to check
:return: Warning message if dangerous, empty string if safe
"""
# Tokenize to avoid substring false positives (e.g. `rm -rf /tmp/x`
# must not match `rm -rf /`).
tokens = command.lower().split()
# `rm -rf /` or `rm -rf /*` targeting the real root.
for i, tok in enumerate(tokens):
if tok != "rm":
continue
has_rf = False
for j in range(i + 1, len(tokens)):
t = tokens[j]
if t.startswith("-") and "r" in t and "f" in t:
has_rf = True
elif t in ("--recursive", "--force"):
continue
elif t in ("/", "/*"):
if has_rf:
return "This command will delete the entire filesystem"
break
else:
break
# Disk wiping
if "if=/dev/zero" in command.lower() and "dd " in command.lower():
return "This command can destroy disk data"
# Power control - match only as a standalone word (\b enforces word boundary)
if re.search(r'\b(shutdown|reboot|halt|poweroff)\b', command.lower()):
return "This command will shut down or restart the system"
return ""
@staticmethod
def _convert_env_vars_for_windows(command: str, dotenv_vars: dict) -> str:
"""
Convert bash-style $VAR / ${VAR} references to cmd.exe %VAR% syntax.
Only converts variables loaded from .env (user-configured API keys etc.)
to avoid breaking $PATH, jq expressions, regex, etc.
"""
if not dotenv_vars:
return command
def replace_match(m):
var_name = m.group(1) or m.group(2)
if var_name in dotenv_vars:
return f"%{var_name}%"
return m.group(0)
return re.sub(r'\$\{(\w+)\}|\$(\w+)', replace_match, command)

View File

@@ -0,0 +1,3 @@
from agent.tools.browser.browser_tool import BrowserTool
__all__ = ["BrowserTool"]

View File

@@ -0,0 +1,961 @@
"""
Browser service - Playwright wrapper managing browser lifecycle and page operations.
All Playwright calls run on a dedicated background thread so that callers from
any worker thread can safely use the service. An idle-timeout mechanism
automatically shuts down the browser (and its thread) after a configurable
period of inactivity to free resources.
"""
import os
import sys
import uuid
import queue
import threading
from typing import Optional, Dict, Any, List, Callable
from common.log import logger
from common.utils import expand_path, is_cloud_deployment
_DEFAULT_USER_DATA_DIR = "~/.cow/browser_profile"
try:
from playwright.sync_api import sync_playwright, Browser, BrowserContext, Page, Playwright
_HAS_PLAYWRIGHT = True
except ImportError:
_HAS_PLAYWRIGHT = False
# ---------------------------------------------------------------------------
# Snapshot DOM helpers
# ---------------------------------------------------------------------------
# Tags that typically carry useful content for an agent
_INTERACTIVE_TAGS = {
"a", "button", "input", "textarea", "select", "option",
"label", "details", "summary",
}
_SEMANTIC_TAGS = {
"h1", "h2", "h3", "h4", "h5", "h6",
"p", "li", "td", "th", "caption", "figcaption", "blockquote", "pre", "code",
"nav", "main", "article", "section", "header", "footer", "form", "table",
"img", "video", "audio",
}
_KEEP_TAGS = _INTERACTIVE_TAGS | _SEMANTIC_TAGS
_SNAPSHOT_JS = """
() => {
const KEEP = new Set(%s);
const INTERACTIVE = new Set(%s);
const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
const CLICKABLE_ROLES = new Set([
"button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
"option","switch","checkbox","radio","combobox","searchbox","slider",
"spinbutton","textbox","treeitem"
]);
let refCounter = 0;
const refMap = {};
function visible(el) {
if (!(el instanceof HTMLElement)) return true;
const st = window.getComputedStyle(el);
if (st.display === "none" || st.visibility === "hidden") return false;
if (parseFloat(st.opacity) === 0) return false;
return true;
}
// Strong signals: these attributes alone are enough to mark as interactive
function hasStrongInteractiveSignal(el) {
const role = el.getAttribute("role");
if (role && CLICKABLE_ROLES.has(role)) return true;
if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
if (el.getAttribute("contenteditable") === "true") return true;
return false;
}
// Check if cursor:pointer is set directly (not just inherited from parent)
function hasOwnPointerCursor(el) {
try {
const st = window.getComputedStyle(el);
if (st.cursor !== "pointer") return false;
const parent = el.parentElement;
if (parent) {
const pst = window.getComputedStyle(parent);
if (pst.cursor === "pointer") return false;
}
return true;
} catch(e) {}
return false;
}
function hasTextOrContent(el) {
const t = el.textContent || "";
if (t.trim().length > 0) return true;
if (el.querySelector("img,video,audio,canvas")) return true;
const ariaLabel = el.getAttribute("aria-label");
if (ariaLabel && ariaLabel.trim()) return true;
const title = el.getAttribute("title");
if (title && title.trim()) return true;
return false;
}
function isImplicitInteractive(el) {
if (hasStrongInteractiveSignal(el)) return true;
if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
return false;
}
function getTextContent(el) {
let text = "";
for (const ch of el.childNodes) {
if (ch.nodeType === Node.TEXT_NODE) {
text += ch.textContent;
}
}
return text.trim();
}
function walk(node) {
if (node.nodeType === Node.TEXT_NODE) {
const t = node.textContent.trim();
return t ? t : null;
}
if (node.nodeType !== Node.ELEMENT_NODE) return null;
const tag = node.tagName.toLowerCase();
if (SKIP.has(tag)) return null;
if (!visible(node)) return null;
const children = [];
for (const ch of node.childNodes) {
const r = walk(ch);
if (r !== null) {
if (typeof r === "string") children.push(r);
else children.push(r);
}
}
const nativeInteractive = INTERACTIVE.has(tag);
const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
const keep = KEEP.has(tag) || implicitInteractive;
if (!keep) {
if (children.length === 0) return null;
if (children.length === 1) return children[0];
return children;
}
const obj = { tag };
if (nativeInteractive || implicitInteractive) {
refCounter++;
obj.ref = refCounter;
refMap[refCounter] = node;
}
if (implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const directText = getTextContent(node);
if (!directText && children.length === 0) {
const ariaLabel = node.getAttribute("aria-label");
const title = node.getAttribute("title");
if (ariaLabel) obj.ariaLabel = ariaLabel;
else if (title) obj.ariaLabel = title;
}
}
// Attributes
if (tag === "a" && node.href) obj.href = node.getAttribute("href");
if (tag === "img") {
obj.alt = node.alt || "";
obj.src = node.getAttribute("src") || "";
}
if (tag === "input" || tag === "textarea" || tag === "select") {
obj.type = node.type || "text";
obj.name = node.name || undefined;
obj.value = node.value || undefined;
obj.placeholder = node.placeholder || undefined;
if (node.disabled) obj.disabled = true;
if (tag === "input" && node.type === "checkbox") obj.checked = node.checked;
}
if (tag === "button") {
if (node.disabled) obj.disabled = true;
}
if (tag === "option") {
obj.value = node.value;
if (node.selected) obj.selected = true;
}
if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;
// Role / aria-label for native interactive & semantic elements
if (!implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const ariaLabel = node.getAttribute("aria-label");
if (ariaLabel) obj.ariaLabel = ariaLabel;
}
// Children
if (children.length === 1 && typeof children[0] === "string") {
obj.text = children[0];
} else if (children.length > 0) {
obj.children = children;
}
return obj;
}
const result = walk(document.body);
window.__cowRefMap = refMap;
return { tree: result, refCount: refCounter };
}
""" % (
str(list(_KEEP_TAGS)),
str(list(_INTERACTIVE_TAGS)),
)
_BROWSER_DEAD_HINTS = (
"has been closed",
"browser has disconnected",
"target closed",
"browser closed",
"context or browser has been closed",
)
def _is_browser_dead_error(err: Exception) -> bool:
"""Return True if *err* indicates the browser / page died out from under us."""
msg = str(err).lower()
return any(h in msg for h in _BROWSER_DEAD_HINTS)
def _should_use_headless() -> bool:
"""Decide headless mode: headless on Linux servers without display, headed elsewhere."""
if sys.platform in ("win32", "darwin"):
return False
# Linux: check for display
if os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"):
return False
return True
def _flatten_tree(node, indent=0) -> List[str]:
"""Convert snapshot tree to compact text lines for LLM consumption."""
if node is None:
return []
if isinstance(node, str):
return [" " * indent + node]
if isinstance(node, list):
lines = []
for child in node:
lines.extend(_flatten_tree(child, indent))
return lines
if not isinstance(node, dict):
return []
tag = node.get("tag", "?")
ref = node.get("ref")
parts = [tag]
if ref:
parts[0] = f"[{ref}] {tag}"
# Inline attributes
for attr in ("type", "name", "href", "alt", "role", "ariaLabel", "placeholder", "value"):
val = node.get(attr)
if val:
# Truncate long values
s = str(val)
if len(s) > 80:
s = s[:77] + "..."
parts.append(f'{attr}="{s}"')
for flag in ("disabled", "checked", "selected"):
if node.get(flag):
parts.append(flag)
prefix = " " * indent
header = prefix + " ".join(parts)
text = node.get("text")
if text:
# Truncate long text
if len(text) > 120:
text = text[:117] + "..."
header += f": {text}"
lines = [header]
children = node.get("children", [])
for child in children:
lines.extend(_flatten_tree(child, indent + 2))
return lines
class BrowserService:
"""Manages a Playwright browser on a dedicated background thread.
All Playwright operations are dispatched to a single long-lived thread via
a task queue. Callers from *any* worker thread can use the public API
safely. An idle timer automatically shuts the browser down after
``idle_timeout`` seconds of inactivity (default 300 = 5 min).
"""
_IDLE_TIMEOUT_DEFAULT = 300 # seconds
def __init__(self, config: Optional[Dict[str, Any]] = None):
self._config = config or {}
self._headless: Optional[bool] = None
self._screenshot_dir: Optional[str] = None
# Background thread state
self._thread: Optional[threading.Thread] = None
self._task_queue: queue.Queue = queue.Queue()
self._lock = threading.Lock()
self._alive = False
self._ready = threading.Event()
# Playwright objects (only accessed on the background thread)
self._playwright = None
self._browser = None
self._context = None
self._page = None
# Launch mode: one of "fresh" | "persistent" | "cdp".
# - cdp: connect to an externally launched Chrome via CDP endpoint.
# - persistent: launch with launch_persistent_context using a user_data_dir
# so cookies / login state survive across runs (default).
# - fresh: classic launch + new_context, clean state every run.
cdp_endpoint = self._config.get("cdp_endpoint") or ""
persistent_flag = self._config.get("persistent", True)
user_data_dir_cfg = self._config.get("user_data_dir")
if user_data_dir_cfg is None:
user_data_dir_cfg = _DEFAULT_USER_DATA_DIR
self._cdp_endpoint: str = cdp_endpoint.strip() if isinstance(cdp_endpoint, str) else ""
if self._cdp_endpoint:
self._launch_mode = "cdp"
self._user_data_dir: str = ""
elif persistent_flag and user_data_dir_cfg:
self._launch_mode = "persistent"
self._user_data_dir = expand_path(str(user_data_dir_cfg))
else:
self._launch_mode = "fresh"
self._user_data_dir = ""
# Idle auto-release
idle_cfg = self._config.get("idle_timeout")
self._idle_timeout: float = float(idle_cfg) if idle_cfg is not None else self._IDLE_TIMEOUT_DEFAULT
self._idle_timer: Optional[threading.Timer] = None
# Set when the browser / page is detected to have died externally
# (e.g. user manually closed the window). The next _submit() will then
# tear down the stale thread and relaunch.
self._needs_restart = False
# ------------------------------------------------------------------
# Background-thread lifecycle
# ------------------------------------------------------------------
def _start_thread(self):
"""Start the dedicated Playwright thread if not already running."""
with self._lock:
if self._alive and self._thread and self._thread.is_alive():
return
# Wait for old thread to fully exit before creating a new one
old = self._thread
if old and old.is_alive():
old.join(timeout=5)
# Fresh queue to avoid stale sentinels from a previous close()
self._task_queue = queue.Queue()
self._alive = True
self._ready = threading.Event()
self._thread = threading.Thread(target=self._run_loop, daemon=True, name="BrowserThread")
self._thread.start()
# Block until browser is ready (or failed)
self._ready.wait(timeout=30)
def _run_loop(self):
"""Event loop running on the dedicated thread. Processes tasks until stopped."""
logger.info("[Browser] Background thread started")
try:
self._launch_browser()
except Exception as e:
logger.error(f"[Browser] Failed to launch browser: {e}")
self._alive = False
self._ready.set()
self._drain_queue(RuntimeError(f"Browser launch failed: {e}"))
return
self._ready.set()
while self._alive:
try:
task = self._task_queue.get(timeout=1.0)
except queue.Empty:
continue
if task is None:
break
fn, args, kwargs, result_slot = task
try:
result_slot["value"] = fn(*args, **kwargs)
except Exception as e:
result_slot["error"] = e
if _is_browser_dead_error(e):
self._needs_restart = True
logger.warning(
f"[Browser] Detected closed page/context ({e}); "
"will relaunch on next request."
)
finally:
result_slot["event"].set()
self._shutdown_browser()
self._drain_queue(RuntimeError("Browser thread stopped"))
logger.info("[Browser] Background thread exited")
def _drain_queue(self, error: Exception):
"""Unblock all callers waiting on the queue with an error."""
while True:
try:
task = self._task_queue.get_nowait()
except queue.Empty:
break
if task is None:
continue
_, _, _, result_slot = task
result_slot["error"] = error
result_slot["event"].set()
def _launch_browser(self):
"""Launch / connect Chromium on the background thread."""
if self._headless is None:
headless_cfg = self._config.get("headless")
self._headless = headless_cfg if headless_cfg is not None else _should_use_headless()
launch_args = ["--disable-dev-shm-usage"]
if self._headless:
launch_args.append("--no-sandbox")
if is_cloud_deployment():
launch_args.extend([
"--disable-gpu",
"--disable-software-rasterizer",
"--disable-extensions",
"--disable-background-networking",
"--disable-background-timer-throttling",
"--disable-renderer-backgrounding",
"--disable-features=site-per-process,TranslateUI,IsolateOrigins",
"--no-zygote",
"--js-flags=--max-old-space-size=384",
"--memory-pressure-off",
])
extra_args = self._config.get("launch_args", [])
if extra_args:
launch_args.extend(extra_args)
viewport_w = self._config.get("viewport_width", 1280)
viewport_h = self._config.get("viewport_height", 720)
viewport = {"width": viewport_w, "height": viewport_h}
user_agent = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
)
self._playwright = sync_playwright().start()
if self._launch_mode == "cdp":
self._connect_cdp(viewport)
elif self._launch_mode == "persistent":
self._launch_persistent(launch_args, viewport, user_agent)
else:
self._launch_fresh(launch_args, viewport, user_agent)
logger.info("[Browser] Browser ready")
def _launch_fresh(self, launch_args: List[str], viewport: Dict[str, int], user_agent: str):
"""Classic launch: brand new Chromium with an empty context."""
logger.info(f"[Browser] Launching Chromium (fresh, headless={self._headless})")
self._browser = self._playwright.chromium.launch(
headless=self._headless,
args=launch_args,
)
self._context = self._browser.new_context(
viewport=viewport,
user_agent=user_agent,
)
self._page = self._context.new_page()
self._wire_close_listeners()
def _launch_persistent(self, launch_args: List[str], viewport: Dict[str, int], user_agent: str):
"""Launch Chromium with a persistent user_data_dir so login state survives."""
os.makedirs(self._user_data_dir, exist_ok=True)
logger.info(
f"[Browser] Launching Chromium (persistent, headless={self._headless}, "
f"profile={self._user_data_dir})"
)
try:
self._context = self._playwright.chromium.launch_persistent_context(
user_data_dir=self._user_data_dir,
headless=self._headless,
args=launch_args,
viewport=viewport,
user_agent=user_agent,
)
except Exception as e:
# Profile is locked when another Chromium instance already holds it.
msg = str(e).lower()
if "singletonlock" in msg or "profile" in msg or "lock" in msg:
raise RuntimeError(
f"Browser profile '{self._user_data_dir}' is in use by another process. "
"Close the other Chromium / cow instance, or set a different "
"tools.browser.user_data_dir."
) from e
raise
# Persistent context has no parent Browser handle; reuse the auto-created page.
self._browser = None
pages = self._context.pages
self._page = pages[0] if pages else self._context.new_page()
self._wire_close_listeners()
def _connect_cdp(self, viewport: Dict[str, int]):
"""Attach to an existing Chrome started with --remote-debugging-port."""
endpoint = self._cdp_endpoint
logger.info(f"[Browser] Connecting to existing Chrome via CDP: {endpoint}")
try:
self._browser = self._playwright.chromium.connect_over_cdp(endpoint)
except Exception as e:
msg = str(e).lower()
if "econnrefused" in msg or "connect" in msg or "refused" in msg:
raise RuntimeError(
f"Cannot reach Chrome at {endpoint}. The CDP browser is not "
"running. Ask the user to launch Chrome with "
"--remote-debugging-port and --user-data-dir, then retry. "
"Do not retry this tool until the user confirms."
) from e
raise
contexts = self._browser.contexts
if contexts:
self._context = contexts[0]
else:
self._context = self._browser.new_context(viewport=viewport)
pages = self._context.pages
self._page = pages[0] if pages else self._context.new_page()
self._wire_close_listeners()
def _wire_close_listeners(self):
"""Mark needs_restart whenever the browser / context / page dies externally."""
def _on_dead(_obj=None):
self._needs_restart = True
try:
if self._browser:
self._browser.on("disconnected", _on_dead)
if self._context:
self._context.on("close", _on_dead)
if self._page:
self._page.on("close", _on_dead)
except Exception as e:
logger.debug(f"[Browser] Failed to wire close listeners: {e}")
def _shutdown_browser(self):
"""Shut down Playwright resources on the background thread.
Mode-specific behavior:
- cdp: only disconnect the Playwright client; leave the user's Chrome
and its tabs untouched (do NOT close the context).
- persistent: close the persistent context (no separate browser handle).
- fresh: close context, then browser.
"""
self._cancel_idle_timer()
if self._launch_mode == "cdp":
# For CDP, browser.close() only detaches the Playwright client;
# the user's Chrome process and its tabs stay alive.
try:
if self._browser:
self._browser.close()
except Exception as e:
logger.debug(f"[Browser] cdp disconnect error: {e}")
else:
for obj, label in [
(self._context, "context"),
(self._browser, "browser"),
]:
try:
if obj:
obj.close()
except Exception as e:
logger.debug(f"[Browser] {label} close error: {e}")
try:
if self._playwright:
self._playwright.stop()
except Exception as e:
logger.debug(f"[Browser] playwright stop error: {e}")
self._page = None
self._context = None
self._browser = None
self._playwright = None
logger.info("[Browser] Browser closed")
def _submit(self, fn: Callable, *args, **kwargs):
"""Submit *fn* to the background thread and block until it completes."""
# If the browser died externally (e.g. user closed the window), tear
# down the stale thread first so _start_thread() will relaunch fresh.
if self._needs_restart:
logger.info("[Browser] Restarting after detecting closed browser")
self.close()
self._needs_restart = False
self._start_thread()
if not self._alive:
raise RuntimeError("Browser is not available")
self._reset_idle_timer()
result_slot: Dict[str, Any] = {"event": threading.Event()}
self._task_queue.put((fn, args, kwargs, result_slot))
# Timeout prevents permanent hang if the background thread crashes
completed = result_slot["event"].wait(timeout=120)
if not completed:
raise TimeoutError("Browser operation timed out (120s)")
if "error" in result_slot:
raise result_slot["error"]
return result_slot.get("value")
# ------------------------------------------------------------------
# Idle auto-release
# ------------------------------------------------------------------
def _reset_idle_timer(self):
self._cancel_idle_timer()
if self._idle_timeout > 0:
self._idle_timer = threading.Timer(self._idle_timeout, self._on_idle_timeout)
self._idle_timer.daemon = True
self._idle_timer.start()
def _cancel_idle_timer(self):
if self._idle_timer:
self._idle_timer.cancel()
self._idle_timer = None
def _on_idle_timeout(self):
logger.info(f"[Browser] Idle for {self._idle_timeout}s, auto-releasing browser")
self.close()
# ------------------------------------------------------------------
# Public lifecycle
# ------------------------------------------------------------------
def close(self):
"""Shut down browser and background thread (safe from any thread)."""
self._cancel_idle_timer()
with self._lock:
if not self._alive:
self._needs_restart = False
return
self._alive = False
t = self._thread
if self._task_queue is not None:
self._task_queue.put(None)
if t is not None and t.is_alive():
t.join(timeout=10)
with self._lock:
self._thread = None
self._needs_restart = False
# ------------------------------------------------------------------
# Actions (each method is dispatched to the background thread)
# ------------------------------------------------------------------
def navigate(self, url: str, timeout: int = 30000) -> Dict[str, Any]:
return self._submit(self._do_navigate, url, timeout)
def _do_navigate(self, url: str, timeout: int) -> Dict[str, Any]:
page = self._page
try:
resp = page.goto(url, wait_until="domcontentloaded", timeout=timeout)
status = resp.status if resp else None
except Exception as e:
return {"error": f"Navigation failed: {e}"}
try:
page.wait_for_load_state("networkidle", timeout=8000)
except Exception:
pass
page.wait_for_timeout(500)
try:
title = page.title()
except Exception:
title = ""
try:
current_url = page.url
except Exception:
current_url = url
return {"url": current_url, "title": title, "status": status}
def snapshot(self, selector: Optional[str] = None) -> str:
return self._submit(self._do_snapshot, selector)
def _do_snapshot(self, selector: Optional[str] = None) -> str:
page = self._page
try:
result = page.evaluate(_SNAPSHOT_JS)
except Exception as e:
return f"[Snapshot error: {e}]"
tree = result.get("tree")
ref_count = result.get("refCount", 0)
lines = _flatten_tree(tree)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
header = f"Page: {title} ({url})\nInteractive elements: {ref_count}\n---"
body = "\n".join(lines)
max_chars = self._config.get("snapshot_max_chars", 30000)
if len(body) > max_chars:
body = body[:max_chars] + "\n... [snapshot truncated]"
return f"{header}\n{body}"
def screenshot(self, full_page: bool = False, cwd: str = "") -> str:
return self._submit(self._do_screenshot, full_page, cwd)
def _do_screenshot(self, full_page: bool = False, cwd: str = "") -> str:
page = self._page
save_dir = self._get_screenshot_dir(cwd)
filename = f"screenshot_{uuid.uuid4().hex[:8]}.png"
filepath = os.path.join(save_dir, filename)
page.screenshot(path=filepath, full_page=full_page)
logger.info(f"[Browser] Screenshot saved: {filepath}")
return filepath
def click(self, ref: Optional[int] = None, selector: Optional[str] = None,
timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_click, ref, selector, timeout)
def _do_click(self, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el) return {{ error: "ref {ref} not found. Run snapshot first." }};
el.click();
return {{ clicked: true, tag: el.tagName.toLowerCase() }};
}}
""")
if result.get("error"):
return result
page.wait_for_timeout(500)
return result
elif selector:
page.click(selector, timeout=timeout)
return {"clicked": True, "selector": selector}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Click failed: {e}"}
def fill(self, text: str, ref: Optional[int] = None,
selector: Optional[str] = None, timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_fill, text, ref, selector, timeout)
def _do_fill(self, text, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el) return {{ error: "ref {ref} not found. Run snapshot first." }};
el.focus();
el.value = "";
return {{ tag: el.tagName.toLowerCase(), name: el.name || "" }};
}}
""")
if result.get("error"):
return result
page.keyboard.type(text)
return {"filled": True, "ref": ref, "text": text}
elif selector:
page.fill(selector, text, timeout=timeout)
return {"filled": True, "selector": selector, "text": text}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Fill failed: {e}"}
def select(self, value: str, ref: Optional[int] = None,
selector: Optional[str] = None, timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_select, value, ref, selector, timeout)
def _do_select(self, value, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el || el.tagName.toLowerCase() !== "select")
return {{ error: "ref {ref} is not a <select> element" }};
el.value = {repr(value)};
el.dispatchEvent(new Event("change", {{ bubbles: true }}));
return {{ selected: true, value: el.value }};
}}
""")
return result
elif selector:
page.select_option(selector, value, timeout=timeout)
return {"selected": True, "selector": selector, "value": value}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Select failed: {e}"}
def scroll(self, direction: str = "down", amount: int = 500) -> Dict[str, Any]:
return self._submit(self._do_scroll, direction, amount)
def _do_scroll(self, direction, amount) -> Dict[str, Any]:
page = self._page
delta_map = {
"down": (0, amount),
"up": (0, -amount),
"right": (amount, 0),
"left": (-amount, 0),
}
dx, dy = delta_map.get(direction, (0, amount))
try:
page.mouse.wheel(dx, dy)
page.wait_for_timeout(300)
scroll_info = page.evaluate("""
() => ({
scrollX: window.scrollX,
scrollY: window.scrollY,
scrollHeight: document.documentElement.scrollHeight,
clientHeight: document.documentElement.clientHeight
})
""")
return {"scrolled": direction, "amount": amount, **scroll_info}
except Exception as e:
return {"error": f"Scroll failed: {e}"}
def wait(self, selector: Optional[str] = None, timeout: int = 5000,
state: str = "visible") -> Dict[str, Any]:
return self._submit(self._do_wait, selector, timeout, state)
def _do_wait(self, selector, timeout, state) -> Dict[str, Any]:
page = self._page
try:
if selector:
page.wait_for_selector(selector, timeout=timeout, state=state)
return {"waited": True, "selector": selector, "state": state}
else:
page.wait_for_timeout(timeout)
return {"waited": True, "timeout_ms": timeout}
except Exception as e:
return {"error": f"Wait failed: {e}"}
def go_back(self) -> Dict[str, Any]:
return self._submit(self._do_go_back)
def _do_go_back(self) -> Dict[str, Any]:
page = self._page
try:
page.go_back(wait_until="domcontentloaded", timeout=10000)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
return {"url": url, "title": title}
except Exception as e:
return {"error": f"Go back failed: {e}"}
def go_forward(self) -> Dict[str, Any]:
return self._submit(self._do_go_forward)
def _do_go_forward(self) -> Dict[str, Any]:
page = self._page
try:
page.go_forward(wait_until="domcontentloaded", timeout=10000)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
return {"url": url, "title": title}
except Exception as e:
return {"error": f"Go forward failed: {e}"}
def get_text(self, selector: str) -> Dict[str, Any]:
return self._submit(self._do_get_text, selector)
def _do_get_text(self, selector) -> Dict[str, Any]:
page = self._page
try:
text = page.text_content(selector, timeout=5000)
return {"text": text or ""}
except Exception as e:
return {"error": f"Get text failed: {e}"}
def evaluate(self, script: str) -> Dict[str, Any]:
return self._submit(self._do_evaluate, script)
def _do_evaluate(self, script) -> Dict[str, Any]:
page = self._page
try:
result = page.evaluate(script)
return {"result": result}
except Exception as e:
return {"error": f"Evaluate failed: {e}"}
def press(self, key: str) -> Dict[str, Any]:
return self._submit(self._do_press, key)
def _do_press(self, key) -> Dict[str, Any]:
page = self._page
try:
page.keyboard.press(key)
page.wait_for_timeout(300)
return {"pressed": key}
except Exception as e:
return {"error": f"Press failed: {e}"}
# ------------------------------------------------------------------
# Helpers
# ------------------------------------------------------------------
def _get_screenshot_dir(self, cwd: str = "") -> str:
if self._screenshot_dir and os.path.isdir(self._screenshot_dir):
return self._screenshot_dir
base = cwd or os.getcwd()
d = os.path.join(base, "tmp")
os.makedirs(d, exist_ok=True)
self._screenshot_dir = d
return d

View File

@@ -0,0 +1,303 @@
"""
Browser tool - Control a Chromium browser for web navigation and interaction.
Uses Playwright under the hood. Browser instance is lazily started on first
use, reused across tool calls within the same session, and cleaned up via
close().
Launch modes (configured under `tools.browser` in config.json):
- persistent (default): Chromium runs with a persistent user_data_dir
(default `~/.cow/browser_profile`), so cookies and login state survive
across runs. The user only needs to log in once.
- cdp: When `cdp_endpoint` is set, attach to an externally launched Chrome
via the Chrome DevTools Protocol. Lets the agent reuse the user's real
browser (with all logins / extensions / true fingerprints).
- fresh: Set `persistent` to false to fall back to a clean context every run.
"""
import json
import os
from typing import Dict, Any, Optional
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.browser.browser_service import BrowserService
from common.log import logger
class BrowserTool(BaseTool):
"""Single tool exposing all browser actions via an 'action' parameter."""
name: str = "browser"
description: str = (
"Control a browser to navigate web pages, interact with elements, and extract content. "
"Actions: navigate, snapshot, click, fill, select, scroll, screenshot, wait, back, forward, "
"get_text, press, evaluate.\n\n"
"Workflow: navigate (auto-includes snapshot with element refs) → click/fill/select by ref → snapshot to verify.\n\n"
"Use snapshot as the primary way to read pages. Use screenshot + send to show key results to the user. "
"For login/CAPTCHA/authorization etc., screenshot and ask the user for help. "
"Login state is persisted across sessions (cookies / localStorage are kept in a "
"user profile directory), so once the user logs in to a site, the agent can keep "
"using it without logging in again."
)
params: dict = {
"type": "object",
"properties": {
"action": {
"type": "string",
"description": (
"The browser action to perform. One of: "
"navigate, snapshot, click, fill, select, scroll, "
"screenshot, wait, back, forward, get_text, press, evaluate"
),
"enum": [
"navigate", "snapshot", "click", "fill", "select", "scroll",
"screenshot", "wait", "back", "forward", "get_text", "press",
"evaluate"
]
},
"url": {
"type": "string",
"description": "URL to navigate to (for 'navigate' action)"
},
"ref": {
"type": "integer",
"description": "Element ref number from snapshot (for click/fill/select)"
},
"selector": {
"type": "string",
"description": "CSS selector as fallback when ref is unavailable (for click/fill/select/wait/get_text)"
},
"text": {
"type": "string",
"description": "Text to type (for 'fill' action)"
},
"value": {
"type": "string",
"description": "Option value (for 'select' action)"
},
"key": {
"type": "string",
"description": "Key to press, e.g. Enter, Tab, Escape (for 'press' action)"
},
"direction": {
"type": "string",
"description": "Scroll direction: up, down, left, right (for 'scroll' action, default: down)"
},
"script": {
"type": "string",
"description": "JavaScript code to execute (for 'evaluate' action)"
},
"full_page": {
"type": "boolean",
"description": "Capture full page screenshot (for 'screenshot' action, default: false)"
},
"timeout": {
"type": "integer",
"description": "Timeout in milliseconds (optional, default varies by action)"
}
},
"required": ["action"]
}
_shared_service: Optional[BrowserService] = None
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
self._service: Optional[BrowserService] = None
def _get_service(self) -> BrowserService:
"""Get or create the browser service, sharing across copies."""
if self._service is not None:
return self._service
# Reuse shared service across tool copies within the same session
if BrowserTool._shared_service is not None:
self._service = BrowserTool._shared_service
return self._service
self._service = BrowserService(self.config)
BrowserTool._shared_service = self._service
return self._service
def execute(self, args: Dict[str, Any]) -> ToolResult:
action = args.get("action", "").strip().lower()
if not action:
return ToolResult.fail("Error: 'action' parameter is required")
handler = self._ACTION_MAP.get(action)
if not handler:
valid = ", ".join(sorted(self._ACTION_MAP.keys()))
return ToolResult.fail(f"Unknown action '{action}'. Valid actions: {valid}")
try:
return handler(self, args)
except Exception as e:
logger.error(f"[Browser] Action '{action}' error: {e}")
return ToolResult.fail(f"Browser error ({action}): {e}")
# ------------------------------------------------------------------
# Action handlers
# ------------------------------------------------------------------
def _do_navigate(self, args: Dict[str, Any]) -> ToolResult:
url = args.get("url", "").strip()
if not url:
return ToolResult.fail("Error: 'url' is required for navigate action")
# Only auto-prepend https:// for bare hosts; preserve file://, about:, data:, etc.
if "://" not in url and not url.startswith(("about:", "data:")):
url = "https://" + url
timeout = args.get("timeout", 30000)
service = self._get_service()
result = service.navigate(url, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
# Auto-snapshot after navigation so the agent gets page content in one call
snapshot_text = service.snapshot()
return ToolResult.success(
f"Navigated to: {result['url']}\nTitle: {result['title']}\nStatus: {result['status']}\n\n"
f"--- Page Snapshot ---\n{snapshot_text}"
)
def _do_snapshot(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector")
text = self._get_service().snapshot(selector=selector)
return ToolResult.success(text)
def _do_click(self, args: Dict[str, Any]) -> ToolResult:
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
result = self._get_service().click(ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Clicked successfully. Use 'snapshot' to see updated page.")
def _do_fill(self, args: Dict[str, Any]) -> ToolResult:
text = args.get("text", "")
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
if not text and text != "":
return ToolResult.fail("Error: 'text' is required for fill action")
result = self._get_service().fill(text, ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Filled text into element. Use 'snapshot' to verify.")
def _do_select(self, args: Dict[str, Any]) -> ToolResult:
value = args.get("value", "")
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
if not value:
return ToolResult.fail("Error: 'value' is required for select action")
result = self._get_service().select(value, ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Selected option '{value}'.")
def _do_scroll(self, args: Dict[str, Any]) -> ToolResult:
direction = args.get("direction", "down")
amount = args.get("timeout", 500) # reuse timeout field or default
if "amount" in args:
amount = args["amount"]
result = self._get_service().scroll(direction=direction, amount=amount)
if "error" in result:
return ToolResult.fail(result["error"])
pos = f"scrollY={result.get('scrollY', '?')}/{result.get('scrollHeight', '?')}"
return ToolResult.success(f"Scrolled {direction}. Position: {pos}")
def _do_screenshot(self, args: Dict[str, Any]) -> ToolResult:
full_page = args.get("full_page", False)
filepath = self._get_service().screenshot(full_page=full_page, cwd=self.cwd)
return ToolResult.success(f"Screenshot saved to: {filepath}")
def _do_wait(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector")
timeout = args.get("timeout", 5000)
result = self._get_service().wait(selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Wait completed.")
def _do_back(self, args: Dict[str, Any]) -> ToolResult:
result = self._get_service().go_back()
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Navigated back to: {result['url']}")
def _do_forward(self, args: Dict[str, Any]) -> ToolResult:
result = self._get_service().go_forward()
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Navigated forward to: {result['url']}")
def _do_get_text(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector", "").strip()
if not selector:
return ToolResult.fail("Error: 'selector' is required for get_text action")
result = self._get_service().get_text(selector)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(result["text"])
def _do_press(self, args: Dict[str, Any]) -> ToolResult:
key = args.get("key", "").strip()
if not key:
return ToolResult.fail("Error: 'key' is required for press action")
result = self._get_service().press(key)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Pressed key: {key}")
def _do_evaluate(self, args: Dict[str, Any]) -> ToolResult:
script = args.get("script", "").strip()
if not script:
return ToolResult.fail("Error: 'script' is required for evaluate action")
result = self._get_service().evaluate(script)
if "error" in result:
return ToolResult.fail(result["error"])
val = result.get("result")
if isinstance(val, (dict, list)):
return ToolResult.success(json.dumps(val, ensure_ascii=False, indent=2))
return ToolResult.success(str(val) if val is not None else "(no return value)")
# Action dispatch table
_ACTION_MAP = {
"navigate": _do_navigate,
"snapshot": _do_snapshot,
"click": _do_click,
"fill": _do_fill,
"select": _do_select,
"scroll": _do_scroll,
"screenshot": _do_screenshot,
"wait": _do_wait,
"back": _do_back,
"forward": _do_forward,
"get_text": _do_get_text,
"press": _do_press,
"evaluate": _do_evaluate,
}
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def copy(self):
"""Share browser instance across tool copies (avoids re-launching)."""
new_tool = BrowserTool(self.config)
new_tool.model = self.model
new_tool.context = getattr(self, "context", None)
new_tool.cwd = self.cwd
new_tool._service = self._service
return new_tool
def close(self):
"""Release browser resources."""
if self._service:
self._service.close()
self._service = None
BrowserTool._shared_service = None
logger.info("[Browser] BrowserTool closed")

View File

@@ -0,0 +1,3 @@
from .edit import Edit
__all__ = ['Edit']

185
agent/tools/edit/edit.py Normal file
View File

@@ -0,0 +1,185 @@
"""
Edit tool - Precise file editing
Edit files through exact text replacement
"""
import os
from typing import Dict, Any
from agent.tools.base_tool import BaseTool, ToolResult
from common.utils import expand_path
from agent.tools.utils.diff import (
strip_bom,
detect_line_ending,
normalize_to_lf,
restore_line_endings,
normalize_for_fuzzy_match,
fuzzy_find_text,
generate_diff_string
)
class Edit(BaseTool):
"""Tool for precise file editing"""
name: str = "edit"
description: str = "Edit a file by replacing exact text, or append to end if oldText is empty. For append: use empty oldText. For replace: oldText must match exactly (including whitespace)."
params: dict = {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path to the file to edit (relative or absolute)"
},
"oldText": {
"type": "string",
"description": "Text to find and replace. Use empty string to append to end of file. For replacement: must match exactly including whitespace."
},
"newText": {
"type": "string",
"description": "New text to replace the old text with"
}
},
"required": ["path", "oldText", "newText"]
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
self.memory_manager = self.config.get("memory_manager", None)
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute file edit operation
:param args: Contains file path, old text and new text
:return: Operation result
"""
path = args.get("path", "").strip()
old_text = args.get("oldText", "")
new_text = args.get("newText", "")
if not path:
return ToolResult.fail("Error: path parameter is required")
# Resolve path
absolute_path = self._resolve_path(path)
# Check if file exists
if not os.path.exists(absolute_path):
return ToolResult.fail(f"Error: File not found: {path}")
# Check if readable/writable
if not os.access(absolute_path, os.R_OK | os.W_OK):
return ToolResult.fail(f"Error: File is not readable/writable: {path}")
try:
# Read file
with open(absolute_path, 'r', encoding='utf-8') as f:
raw_content = f.read()
# Remove BOM (LLM won't include invisible BOM in oldText)
bom, content = strip_bom(raw_content)
# Detect original line ending
original_ending = detect_line_ending(content)
# Normalize to LF
normalized_content = normalize_to_lf(content)
normalized_old_text = normalize_to_lf(old_text)
normalized_new_text = normalize_to_lf(new_text)
# Special case: empty oldText means append to end of file
if not old_text or not old_text.strip():
# Append mode: add newText to the end
# Add newline before newText if file doesn't end with one
if normalized_content and not normalized_content.endswith('\n'):
new_content = normalized_content + '\n' + normalized_new_text
else:
new_content = normalized_content + normalized_new_text
base_content = normalized_content # For verification
else:
# Normal edit mode: find and replace
# Use fuzzy matching to find old text (try exact match first, then fuzzy match)
match_result = fuzzy_find_text(normalized_content, normalized_old_text)
if not match_result.found:
return ToolResult.fail(
f"Error: Could not find the exact text in {path}. "
"The old text must match exactly including all whitespace and newlines."
)
# Calculate occurrence count (use fuzzy normalized content for consistency)
fuzzy_content = normalize_for_fuzzy_match(normalized_content)
fuzzy_old_text = normalize_for_fuzzy_match(normalized_old_text)
occurrences = fuzzy_content.count(fuzzy_old_text)
if occurrences > 1:
return ToolResult.fail(
f"Error: Found {occurrences} occurrences of the text in {path}. "
"The text must be unique. Please provide more context to make it unique."
)
# Execute replacement (use matched text position)
base_content = match_result.content_for_replacement
new_content = (
base_content[:match_result.index] +
normalized_new_text +
base_content[match_result.index + match_result.match_length:]
)
# Verify replacement actually changed content
if base_content == new_content:
return ToolResult.fail(
f"Error: No changes made to {path}. "
"The replacement produced identical content. "
"This might indicate an issue with special characters or the text not existing as expected."
)
# Restore original line endings
final_content = bom + restore_line_endings(new_content, original_ending)
# Write file
with open(absolute_path, 'w', encoding='utf-8') as f:
f.write(final_content)
# Generate diff
diff_result = generate_diff_string(base_content, new_content)
result = {
"message": f"Successfully replaced text in {path}",
"path": path,
"diff": diff_result['diff'],
"first_changed_line": diff_result['first_changed_line']
}
# Notify memory manager if file is in memory directory
if self.memory_manager and "memory/" in path:
try:
self.memory_manager.mark_dirty()
except Exception as e:
# Don't fail the edit if memory notification fails
pass
return ToolResult.success(result)
except UnicodeDecodeError:
return ToolResult.fail(f"Error: File is not a valid text file (encoding error): {path}")
except PermissionError:
return ToolResult.fail(f"Error: Permission denied accessing {path}")
except Exception as e:
return ToolResult.fail(f"Error editing file: {str(e)}")
def _resolve_path(self, path: str) -> str:
"""
Resolve path to absolute path
:param path: Relative or absolute path
:return: Absolute path
"""
# Expand ~ to user home directory
path = expand_path(path)
if os.path.isabs(path):
return path
return os.path.abspath(os.path.join(self.cwd, path))

View File

@@ -0,0 +1,3 @@
from agent.tools.env_config.env_config import EnvConfig
__all__ = ['EnvConfig']

View File

@@ -0,0 +1,286 @@
"""
Environment Configuration Tool - Manage API keys and environment variables
"""
import os
import re
from typing import Dict, Any
from pathlib import Path
from agent.tools.base_tool import BaseTool, ToolResult
from common.log import logger
from common.utils import expand_path
# API Key 知识库:常见的环境变量及其描述
API_KEY_REGISTRY = {
# AI 模型服务
"OPENAI_API_KEY": "OpenAI API 密钥 (用于GPT模型、Embedding模型)",
"GEMINI_API_KEY": "Google Gemini API 密钥",
"CLAUDE_API_KEY": "Claude API 密钥 (用于Claude模型)",
"LINKAI_API_KEY": "LinkAI智能体平台 API 密钥,支持多种模型切换",
# 搜索服务
"BOCHA_API_KEY": "博查 AI 搜索 API 密钥 ",
}
class EnvConfig(BaseTool):
"""Tool for managing environment variables (API keys, etc.)"""
name: str = "env_config"
description: str = (
"Manage API keys and skill configurations securely. "
"Use this tool when user wants to configure API keys (like BOCHA_API_KEY, OPENAI_API_KEY), "
"view configured keys, or manage skill settings. "
"Actions: 'set' (add/update key), 'get' (view specific key), 'list' (show all configured keys), 'delete' (remove key). "
"Values are automatically masked for security. Changes take effect immediately via hot reload."
)
params: dict = {
"type": "object",
"properties": {
"action": {
"type": "string",
"description": "Action to perform: 'set', 'get', 'list', 'delete'",
"enum": ["set", "get", "list", "delete"]
},
"key": {
"type": "string",
"description": (
"Environment variable key name. Common keys:\n"
"- OPENAI_API_KEY: OpenAI API (GPT models)\n"
"- OPENAI_API_BASE: OpenAI API base URL\n"
"- CLAUDE_API_KEY: Anthropic Claude API\n"
"- GEMINI_API_KEY: Google Gemini API\n"
"- LINKAI_API_KEY: LinkAI platform\n"
"- BOCHA_API_KEY: Bocha AI search (博查搜索)\n"
"Use exact key names (case-sensitive, all uppercase with underscores)"
)
},
"value": {
"type": "string",
"description": "Value to set for the environment variable (for 'set' action)"
}
},
"required": ["action"]
}
def __init__(self, config: dict = None):
self.config = config or {}
# Store env config in ~/.cow directory (outside workspace for security)
self.env_dir = expand_path("~/.cow")
self.env_path = os.path.join(self.env_dir, '.env')
self.agent_bridge = self.config.get("agent_bridge") # Reference to AgentBridge for hot reload
# Don't create .env file in __init__ to avoid issues during tool discovery
# It will be created on first use in execute()
def _ensure_env_file(self):
"""Ensure the .env file exists"""
# Create ~/.cow directory if it doesn't exist
os.makedirs(self.env_dir, exist_ok=True)
if not os.path.exists(self.env_path):
Path(self.env_path).touch()
logger.info(f"[EnvConfig] Created .env file at {self.env_path}")
def _mask_value(self, value: str) -> str:
"""Mask sensitive parts of a value for logging"""
if not value or len(value) <= 10:
return "***"
return f"{value[:6]}***{value[-4:]}"
def _read_env_file(self) -> Dict[str, str]:
"""Read all key-value pairs from .env file"""
env_vars = {}
if os.path.exists(self.env_path):
with open(self.env_path, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
# Skip empty lines and comments
if not line or line.startswith('#'):
continue
# Parse KEY=VALUE
match = re.match(r'^([^=]+)=(.*)$', line)
if match:
key, value = match.groups()
env_vars[key.strip()] = value.strip()
return env_vars
def _write_env_file(self, env_vars: Dict[str, str]):
"""Write all key-value pairs to .env file"""
with open(self.env_path, 'w', encoding='utf-8') as f:
f.write("# Environment variables for agent skills\n")
f.write("# Auto-managed by env_config tool\n\n")
for key, value in sorted(env_vars.items()):
f.write(f"{key}={value}\n")
def _reload_env(self):
"""Reload environment variables from .env file"""
env_vars = self._read_env_file()
for key, value in env_vars.items():
os.environ[key] = value
logger.debug(f"[EnvConfig] Reloaded {len(env_vars)} environment variables")
def _refresh_skills(self):
"""Refresh skills after environment variable changes"""
if self.agent_bridge:
try:
# Reload .env file
self._reload_env()
# Refresh skills in all agent instances
refreshed = self.agent_bridge.refresh_all_skills()
logger.info(f"[EnvConfig] Refreshed skills in {refreshed} agent instance(s)")
return True
except Exception as e:
logger.warning(f"[EnvConfig] Failed to refresh skills: {e}")
return False
return False
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute environment configuration operation
:param args: Contains action, key, and value parameters
:return: Result of the operation
"""
# Ensure .env file exists on first use
self._ensure_env_file()
action = args.get("action")
key = args.get("key")
value = args.get("value")
try:
if action == "set":
if not key or not value:
return ToolResult.fail("Error: 'key' and 'value' are required for 'set' action.")
# Read current env vars
env_vars = self._read_env_file()
# Update the key
env_vars[key] = value
# Write back to file
self._write_env_file(env_vars)
# Update current process env
os.environ[key] = value
logger.info(f"[EnvConfig] Set {key}={self._mask_value(value)}")
# Try to refresh skills immediately
refreshed = self._refresh_skills()
result = {
"message": f"Successfully set {key}",
"key": key,
"value": self._mask_value(value),
}
if refreshed:
result["note"] = "✅ Skills refreshed automatically - changes are now active"
else:
result["note"] = "⚠️ Skills not refreshed - restart agent to load new skills"
return ToolResult.success(result)
elif action == "get":
if not key:
return ToolResult.fail("Error: 'key' is required for 'get' action.")
# Check in file first, then in current env
env_vars = self._read_env_file()
value = env_vars.get(key) or os.getenv(key)
# Get description from registry
description = API_KEY_REGISTRY.get(key, "未知用途的环境变量")
if value is not None:
logger.info(f"[EnvConfig] Got {key}={self._mask_value(value)}")
return ToolResult.success({
"key": key,
"value": self._mask_value(value),
"description": description,
"exists": True,
"note": f"Value is masked for security. In bash, use ${key} directly — it is auto-injected."
})
else:
return ToolResult.success({
"key": key,
"description": description,
"exists": False,
"message": f"Environment variable '{key}' is not set"
})
elif action == "list":
env_vars = self._read_env_file()
# Build detailed variable list with descriptions
variables_with_info = {}
for key, value in env_vars.items():
variables_with_info[key] = {
"value": self._mask_value(value),
"description": API_KEY_REGISTRY.get(key, "未知用途的环境变量")
}
logger.info(f"[EnvConfig] Listed {len(env_vars)} environment variables")
if not env_vars:
return ToolResult.success({
"message": "No environment variables configured",
"variables": {},
"note": "常用的 API 密钥可以通过 env_config(action='set', key='KEY_NAME', value='your-key') 来配置"
})
return ToolResult.success({
"message": f"Found {len(env_vars)} environment variable(s)",
"variables": variables_with_info
})
elif action == "delete":
if not key:
return ToolResult.fail("Error: 'key' is required for 'delete' action.")
# Read current env vars
env_vars = self._read_env_file()
if key not in env_vars:
return ToolResult.success({
"message": f"Environment variable '{key}' was not set",
"key": key
})
# Remove the key
del env_vars[key]
# Write back to file
self._write_env_file(env_vars)
# Remove from current process env
if key in os.environ:
del os.environ[key]
logger.info(f"[EnvConfig] Deleted {key}")
# Try to refresh skills immediately
refreshed = self._refresh_skills()
result = {
"message": f"Successfully deleted {key}",
"key": key,
}
if refreshed:
result["note"] = "✅ Skills refreshed automatically - changes are now active"
else:
result["note"] = "⚠️ Skills not refreshed - restart agent to apply changes"
return ToolResult.success(result)
else:
return ToolResult.fail(f"Error: Unknown action '{action}'. Use 'set', 'get', 'list', or 'delete'.")
except Exception as e:
logger.error(f"[EnvConfig] Error: {e}", exc_info=True)
return ToolResult.fail(f"EnvConfig tool error: {str(e)}")

View File

@@ -0,0 +1,3 @@
from .ls import Ls
__all__ = ['Ls']

140
agent/tools/ls/ls.py Normal file
View File

@@ -0,0 +1,140 @@
"""
Ls tool - List directory contents
"""
import os
from typing import Dict, Any
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.utils.truncate import truncate_head, format_size, DEFAULT_MAX_BYTES
from common.utils import expand_path
DEFAULT_LIMIT = 500
class Ls(BaseTool):
"""Tool for listing directory contents"""
name: str = "ls"
description: str = f"List directory contents. Returns entries sorted alphabetically, with '/' suffix for directories. Includes dotfiles. Output is truncated to {DEFAULT_LIMIT} entries or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first)."
params: dict = {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Directory to list. IMPORTANT: Relative paths are based on workspace directory. To access directories outside workspace, use absolute paths starting with ~ or /."
},
"limit": {
"type": "integer",
"description": f"Maximum number of entries to return (default: {DEFAULT_LIMIT})"
}
},
"required": []
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute directory listing
:param args: Listing parameters
:return: Directory contents or error
"""
path = args.get("path", ".").strip()
limit = args.get("limit", DEFAULT_LIMIT)
# Resolve path
absolute_path = self._resolve_path(path)
# Security check: Prevent accessing sensitive config directory
env_config_dir = expand_path("~/.cow")
if os.path.abspath(absolute_path) == os.path.abspath(env_config_dir):
return ToolResult.fail(
"Error: Access denied. API keys and credentials must be accessed through the env_config tool only."
)
if not os.path.exists(absolute_path):
# Provide helpful hint if using relative path
if not os.path.isabs(path) and not path.startswith('~'):
return ToolResult.fail(
f"Error: Path not found: {path}\n"
f"Resolved to: {absolute_path}\n"
f"Hint: Relative paths are based on workspace ({self.cwd}). For files outside workspace, use absolute paths."
)
return ToolResult.fail(f"Error: Path not found: {path}")
if not os.path.isdir(absolute_path):
return ToolResult.fail(f"Error: Not a directory: {path}")
try:
# Read directory entries
entries = os.listdir(absolute_path)
# Sort alphabetically (case-insensitive)
entries.sort(key=lambda x: x.lower())
# Format entries with directory indicators
results = []
entry_limit_reached = False
for entry in entries:
if len(results) >= limit:
entry_limit_reached = True
break
full_path = os.path.join(absolute_path, entry)
try:
if os.path.isdir(full_path):
results.append(entry + '/')
else:
results.append(entry)
except Exception:
# Skip entries we can't stat
continue
if not results:
return ToolResult.success({"message": "(empty directory)", "entries": []})
# Format output
raw_output = '\n'.join(results)
truncation = truncate_head(raw_output, max_lines=999999) # Only limit by bytes
output = truncation.content
details = {}
notices = []
if entry_limit_reached:
notices.append(f"{limit} entries limit reached. Use limit={limit * 2} for more")
details["entry_limit_reached"] = limit
if truncation.truncated:
notices.append(f"{format_size(DEFAULT_MAX_BYTES)} limit reached")
details["truncation"] = truncation.to_dict()
if notices:
output += f"\n\n[{'. '.join(notices)}]"
return ToolResult.success({
"output": output,
"entry_count": len(results),
"details": details if details else None
})
except PermissionError:
return ToolResult.fail(f"Error: Permission denied reading directory: {path}")
except Exception as e:
return ToolResult.fail(f"Error listing directory: {str(e)}")
def _resolve_path(self, path: str) -> str:
"""Resolve path to absolute path"""
# Expand ~ to user home directory
path = expand_path(path)
if os.path.isabs(path):
return path
return os.path.abspath(os.path.join(self.cwd, path))

View File

@@ -0,0 +1,4 @@
from agent.tools.mcp.mcp_client import McpClient, McpClientRegistry
from agent.tools.mcp.mcp_tool import McpTool
__all__ = ["McpClient", "McpClientRegistry", "McpTool"]

View File

@@ -0,0 +1,528 @@
"""
MCP (Model Context Protocol) client module.
Implements JSON-RPC 2.0 over stdio, SSE and Streamable HTTP transports
without any external MCP SDK dependency.
"""
import json
import os
import select
import subprocess
import threading
import urllib.request
import urllib.error
from typing import Optional
from common.log import logger
# Aliases accepted for the Streamable HTTP transport type
_STREAMABLE_HTTP_ALIASES = {"streamable-http", "streamable_http", "streamablehttp", "http"}
class McpClient:
"""Single MCP Server client supporting stdio, SSE and Streamable HTTP transports."""
def __init__(self, config: dict):
"""
config examples:
stdio: {"name": "filesystem", "type": "stdio", "command": "npx", "args": [...]}
SSE: {"name": "my-api", "type": "sse", "url": "http://localhost:8000/sse"}
streamable-http: {"name": "pubmed", "type": "streamable-http", "url": "https://x/mcp"}
"""
self.config = config
self.name: str = config.get("name", "unknown")
raw_transport: str = config.get("type", "stdio")
# Normalize streamable-http aliases to a single internal key
self.transport: str = (
"streamable-http"
if raw_transport.lower() in _STREAMABLE_HTTP_ALIASES
else raw_transport
)
# stdio state
self._proc: Optional[subprocess.Popen] = None
# SSE state
self._sse_url: Optional[str] = None
self._post_url: Optional[str] = None # endpoint for sending messages (resolved from SSE)
# Streamable HTTP state
self._http_url: Optional[str] = None
self._http_headers: dict = {} # extra headers from user config (e.g. Authorization)
self._http_session_id: Optional[str] = None # Mcp-Session-Id assigned by the server
# Shared state
self._next_id = 1
self._id_lock = threading.Lock()
self._call_lock = threading.Lock()
self._initialized = False
# ------------------------------------------------------------------
# Public interface
# ------------------------------------------------------------------
def initialize(self) -> bool:
"""Connect and perform the MCP handshake. Returns True on success."""
try:
if self.transport == "stdio":
return self._init_stdio()
elif self.transport == "sse":
return self._init_sse()
elif self.transport == "streamable-http":
return self._init_streamable_http()
else:
logger.warning(f"[MCP:{self.name}] Unknown transport type: {self.transport!r}")
return False
except Exception as e:
logger.warning(f"[MCP:{self.name}] Initialization failed: {e}")
return False
def list_tools(self) -> list:
"""Return the tool list from this server.
Each item is a dict: {"name": str, "description": str, "inputSchema": dict}
"""
try:
resp = self._send_request("tools/list", {})
tools = resp.get("result", {}).get("tools", [])
return [
{
"name": t.get("name", ""),
"description": t.get("description", ""),
"inputSchema": t.get("inputSchema", {}),
}
for t in tools
]
except Exception as e:
logger.warning(f"[MCP:{self.name}] list_tools failed: {e}")
return []
def call_tool(self, name: str, arguments: dict) -> str:
"""Call a tool and return the result as a string."""
try:
resp = self._send_request("tools/call", {"name": name, "arguments": arguments})
content = resp.get("result", {}).get("content", [])
parts = [item.get("text", "") for item in content if item.get("type") == "text"]
return "\n".join(parts)
except Exception as e:
logger.warning(f"[MCP:{self.name}] call_tool({name}) failed: {e}")
return f"Error: {e}"
def shutdown(self):
"""Close the connection / terminate the child process."""
if self._proc is not None:
try:
self._proc.stdin.close()
except Exception:
pass
try:
self._proc.terminate()
self._proc.wait(timeout=5)
except Exception:
try:
self._proc.kill()
except Exception:
pass
self._proc = None
logger.debug(f"[MCP:{self.name}] stdio process terminated")
# Best-effort streamable-http session termination
if self.transport == "streamable-http" and self._http_session_id and self._http_url:
try:
req = urllib.request.Request(
self._http_url,
method="DELETE",
headers={"Mcp-Session-Id": self._http_session_id, **self._http_headers},
)
with urllib.request.urlopen(req, timeout=5):
pass
except Exception:
pass
self._http_session_id = None
self._initialized = False
# ------------------------------------------------------------------
# stdio transport
# ------------------------------------------------------------------
def _init_stdio(self) -> bool:
command = self.config.get("command")
if not command:
logger.warning(f"[MCP:{self.name}] stdio config missing 'command'")
return False
args = self.config.get("args", [])
extra_env = self.config.get("env", None)
env = {**os.environ, **extra_env} if extra_env else None
self._proc = subprocess.Popen(
[command] + list(args),
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
text=True,
encoding="utf-8",
env=env,
)
logger.debug(f"[MCP:{self.name}] stdio process started (pid={self._proc.pid})")
threading.Thread(
target=self._drain_stderr, daemon=True, name=f"mcp-stderr-{self.name}"
).start()
return self._handshake()
def _drain_stderr(self):
for line in self._proc.stderr:
line = line.strip()
if line:
logger.debug(f"[MCP:{self.name}] stderr: {line}")
def _readline_with_timeout(self, timeout: int = 30) -> str:
"""Read one line from stdio stdout with a hard timeout."""
ready, _, _ = select.select([self._proc.stdout], [], [], timeout)
if not ready:
raise TimeoutError(f"[MCP:{self.name}] stdio read timed out after {timeout}s")
return self._proc.stdout.readline()
def _stdio_send(self, message: dict) -> dict:
"""Send a JSON-RPC message over stdio and read the response."""
raw = json.dumps(message) + "\n"
self._proc.stdin.write(raw)
self._proc.stdin.flush()
while True:
line = self._readline_with_timeout()
if not line:
raise IOError(f"[MCP:{self.name}] stdio process closed unexpectedly")
line = line.strip()
if not line:
continue
try:
data = json.loads(line)
except json.JSONDecodeError:
continue
if "id" not in data:
logger.debug(f"[MCP:{self.name}] notification skipped: {data.get('method', '?')}")
continue
return data
# ------------------------------------------------------------------
# SSE transport
# ------------------------------------------------------------------
def _init_sse(self) -> bool:
url = self.config.get("url")
if not url:
logger.warning(f"[MCP:{self.name}] SSE config missing 'url'")
return False
self._sse_url = url
# Read the first SSE event to discover the POST endpoint
try:
self._post_url = self._sse_discover_endpoint()
except Exception as e:
logger.warning(f"[MCP:{self.name}] SSE endpoint discovery failed: {e}")
return False
return self._handshake()
def _sse_discover_endpoint(self) -> str:
"""Open SSE stream and read the 'endpoint' event to learn the POST URL."""
req = urllib.request.Request(
self._sse_url,
headers={"Accept": "text/event-stream"},
)
with urllib.request.urlopen(req, timeout=10) as resp:
for raw_line in resp:
line = raw_line.decode("utf-8").rstrip("\n\r")
if line.startswith("data:"):
data = line[len("data:"):].strip()
# Some servers send JSON with a "uri" or plain path
if data.startswith("{"):
parsed = json.loads(data)
return parsed.get("uri") or parsed.get("url") or parsed.get("endpoint")
# Plain relative or absolute URL
if data.startswith("http"):
return data
# Relative path: resolve against SSE base
from urllib.parse import urljoin
return urljoin(self._sse_url, data)
raise ValueError(f"[MCP:{self.name}] No endpoint event received from SSE stream")
def _sse_send(self, message: dict) -> dict:
"""POST a JSON-RPC message to the server and return the response."""
body = json.dumps(message).encode("utf-8")
req = urllib.request.Request(
self._post_url,
data=body,
method="POST",
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=30) as resp:
raw = resp.read().decode("utf-8")
return json.loads(raw)
# ------------------------------------------------------------------
# Streamable HTTP transport (MCP spec 2025-03-26)
# ------------------------------------------------------------------
def _init_streamable_http(self) -> bool:
url = self.config.get("url")
if not url:
logger.warning(f"[MCP:{self.name}] streamable-http config missing 'url'")
return False
self._http_url = url
# Allow user-provided headers (e.g. {"Authorization": "Bearer xxx"})
extra_headers = self.config.get("headers") or {}
if isinstance(extra_headers, dict):
self._http_headers = {str(k): str(v) for k, v in extra_headers.items()}
return self._handshake()
def _streamable_http_send(self, message: dict) -> dict:
"""POST a JSON-RPC request and return the response (JSON or SSE-wrapped)."""
return self._streamable_http_post(message, expect_response=True)
def _streamable_http_post(self, message: dict, expect_response: bool) -> dict:
"""
POST a JSON-RPC message over Streamable HTTP.
Per the spec, the response Content-Type can be either:
- application/json -> single JSON-RPC response in body
- text/event-stream -> SSE stream; we read until we get a matching response
"""
body = json.dumps(message).encode("utf-8")
headers = {
"Content-Type": "application/json",
"Accept": "application/json, text/event-stream",
}
if self._http_session_id:
headers["Mcp-Session-Id"] = self._http_session_id
headers.update(self._http_headers)
req = urllib.request.Request(
self._http_url,
data=body,
method="POST",
headers=headers,
)
try:
resp = urllib.request.urlopen(req, timeout=30)
except urllib.error.HTTPError as e:
# Surface the server-provided error body for easier debugging
detail = ""
try:
detail = e.read().decode("utf-8", errors="ignore")
except Exception:
pass
raise IOError(
f"[MCP:{self.name}] streamable-http HTTP {e.code}: {detail[:200]}"
)
with resp:
# Capture session id assigned by the server (if any)
session_id = resp.headers.get("Mcp-Session-Id")
if session_id and not self._http_session_id:
self._http_session_id = session_id
status = resp.status if hasattr(resp, "status") else resp.getcode()
# Notifications: server may reply with 202 Accepted and no body
if not expect_response or status == 202:
try:
resp.read()
except Exception:
pass
return {}
content_type = (resp.headers.get("Content-Type") or "").lower()
expected_id = message.get("id")
if "text/event-stream" in content_type:
return self._read_sse_response(resp, expected_id)
raw = resp.read().decode("utf-8")
if not raw:
return {}
return json.loads(raw)
def _read_sse_response(self, resp, expected_id) -> dict:
"""Read an SSE stream and return the first JSON-RPC response with matching id."""
data_buf: list = []
for raw_line in resp:
line = raw_line.decode("utf-8").rstrip("\n\r")
if line == "":
# End of an SSE event, attempt to parse accumulated data
if data_buf:
payload = "\n".join(data_buf)
data_buf = []
try:
msg = json.loads(payload)
except json.JSONDecodeError:
continue
# Skip notifications / mismatched ids
if "id" not in msg:
continue
if expected_id is None or msg.get("id") == expected_id:
return msg
continue
if line.startswith(":"):
continue # SSE comment / keepalive
if line.startswith("data:"):
data_buf.append(line[len("data:"):].lstrip())
# Ignore 'event:' / 'id:' lines; we only care about JSON-RPC payloads
raise IOError(f"[MCP:{self.name}] streamable-http SSE stream closed before response")
# ------------------------------------------------------------------
# Common JSON-RPC helpers
# ------------------------------------------------------------------
def _next_request_id(self) -> int:
with self._id_lock:
rid = self._next_id
self._next_id += 1
return rid
def _build_request(self, method: str, params: dict) -> dict:
return {
"jsonrpc": "2.0",
"id": self._next_request_id(),
"method": method,
"params": params,
}
def _build_notification(self, method: str, params: dict) -> dict:
return {"jsonrpc": "2.0", "method": method, "params": params}
def _send_request(self, method: str, params: dict) -> dict:
"""Send a request and return the full response dict."""
if not self._initialized and method != "initialize":
raise RuntimeError(f"[MCP:{self.name}] Client not initialized")
message = self._build_request(method, params)
with self._call_lock:
if self.transport == "stdio":
return self._stdio_send(message)
elif self.transport == "sse":
return self._sse_send(message)
elif self.transport == "streamable-http":
return self._streamable_http_send(message)
else:
raise ValueError(f"[MCP:{self.name}] Unsupported transport: {self.transport}")
def _send_notification(self, method: str, params: dict):
"""Fire-and-forget notification (no response expected)."""
notification = self._build_notification(method, params)
raw = json.dumps(notification) + "\n"
if self.transport == "stdio":
self._proc.stdin.write(raw)
self._proc.stdin.flush()
elif self.transport == "sse":
body = raw.encode("utf-8")
req = urllib.request.Request(
self._post_url,
data=body,
method="POST",
headers={"Content-Type": "application/json"},
)
try:
with urllib.request.urlopen(req, timeout=10):
pass
except Exception:
pass # notifications are fire-and-forget
elif self.transport == "streamable-http":
try:
self._streamable_http_post(notification, expect_response=False)
except Exception:
pass # notifications are fire-and-forget
def _handshake(self) -> bool:
"""Perform the MCP initialize / notifications/initialized handshake."""
init_params = {
"protocolVersion": "2024-11-05",
"capabilities": {},
"clientInfo": {"name": "CowAgent", "version": "1.0"},
}
# Temporarily mark as initialized so _send_request doesn't block
self._initialized = True
try:
resp = self._send_request("initialize", init_params)
except Exception as e:
self._initialized = False
logger.warning(f"[MCP:{self.name}] Handshake initialize failed: {e}")
return False
if "error" in resp:
self._initialized = False
logger.warning(f"[MCP:{self.name}] Handshake error: {resp['error']}")
return False
self._send_notification("notifications/initialized", {})
logger.debug(f"[MCP:{self.name}] Handshake complete")
return True
class McpClientRegistry:
"""Global singleton managing the lifecycle of all MCP Server clients."""
_instance = None
_instance_lock = threading.Lock()
def __new__(cls):
with cls._instance_lock:
if cls._instance is None:
obj = super().__new__(cls)
obj._clients: dict[str, McpClient] = {}
obj._registry_lock = threading.Lock()
cls._instance = obj
return cls._instance
def start_all(self, configs: list) -> None:
"""Initialize McpClient for each config entry; skip failures with a warning."""
if not configs:
return
for cfg in configs:
name = cfg.get("name", "<unnamed>")
client = McpClient(cfg)
ok = client.initialize()
if ok:
with self._registry_lock:
self._clients[name] = client
logger.info(f"[MCP] Server '{name}' initialized successfully")
else:
logger.warning(f"[MCP] Server '{name}' failed to initialize — skipping")
def get(self, server_name: str) -> Optional[McpClient]:
"""Return the initialized client for server_name, or None."""
with self._registry_lock:
return self._clients.get(server_name)
def all_clients(self) -> dict:
"""Return a copy of the {name: McpClient} mapping."""
with self._registry_lock:
return dict(self._clients)
def shutdown_all(self) -> None:
"""Shut down all managed clients."""
with self._registry_lock:
clients = list(self._clients.values())
self._clients.clear()
for client in clients:
try:
client.shutdown()
except Exception as e:
logger.warning(f"[MCP] Error shutting down '{client.name}': {e}")
logger.info("[MCP] All servers shut down")

View File

@@ -0,0 +1,31 @@
from agent.tools.base_tool import BaseTool, ToolResult
from common.log import logger
class McpTool(BaseTool):
"""
将单个 MCP 工具包装为 BaseTool。
一个 MCP Server 可以提供多个工具,每个工具对应一个 McpTool 实例。
"""
def __init__(self, client, tool_schema: dict, server_name: str):
"""
:param client: 该工具所属的 McpClient 实例
:param tool_schema: MCP 返回的工具描述,格式:
{"name": str, "description": str, "inputSchema": dict}
:param server_name: Server 名称,用于日志
"""
self.client = client
self.server_name = server_name
self.name = tool_schema["name"]
self.description = tool_schema.get("description", "")
self.params = tool_schema.get("inputSchema", {})
def execute(self, params: dict) -> ToolResult:
logger.info(f"[McpTool] server={self.server_name} tool={self.name} params={params}")
try:
result = self.client.call_tool(self.name, params)
return ToolResult.success(result)
except Exception as e:
logger.error(f"[McpTool] server={self.server_name} tool={self.name} error: {e}")
return ToolResult.fail(str(e))

View File

@@ -0,0 +1,10 @@
"""
Memory tools for Agent
Provides memory_search and memory_get tools
"""
from agent.tools.memory.memory_search import MemorySearchTool
from agent.tools.memory.memory_get import MemoryGetTool
__all__ = ['MemorySearchTool', 'MemoryGetTool']

View File

@@ -0,0 +1,128 @@
"""
Memory get tool
Allows agents to read specific sections from memory files
"""
from agent.tools.base_tool import BaseTool
class MemoryGetTool(BaseTool):
"""Tool for reading memory file contents"""
name: str = "memory_get"
description: str = (
"Read specific content from memory files. "
"Use this to get full context from a memory file or specific line range."
)
params: dict = {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Relative path to the memory file (e.g. 'MEMORY.md', 'memory/2026-01-01.md')"
},
"start_line": {
"type": "integer",
"description": "Starting line number (optional, default: 1)",
"default": 1
},
"num_lines": {
"type": "integer",
"description": "Number of lines to read (optional, reads all if not specified)"
}
},
"required": ["path"]
}
def __init__(self, memory_manager):
"""
Initialize memory get tool
Args:
memory_manager: MemoryManager instance
"""
super().__init__()
self.memory_manager = memory_manager
from config import conf
if conf().get("knowledge", True):
self.description = (
"Read specific content from memory or knowledge files. "
"Use this to get full context from a memory file, knowledge page, or specific line range."
)
self.params = {**self.params}
self.params["properties"] = {**self.params["properties"]}
self.params["properties"]["path"] = {
"type": "string",
"description": "Relative path to the memory or knowledge file (e.g. 'MEMORY.md', 'memory/2026-01-01.md', 'knowledge/concepts/moe.md')"
}
def execute(self, args: dict):
"""
Execute memory file read
Args:
args: Dictionary with path, start_line, num_lines
Returns:
ToolResult with file content
"""
from agent.tools.base_tool import ToolResult
path = args.get("path")
start_line = args.get("start_line", 1)
num_lines = args.get("num_lines")
if not path:
return ToolResult.fail("Error: path parameter is required")
try:
workspace_dir = self.memory_manager.config.get_workspace()
# Auto-prepend memory/ if not present and not absolute path
# Exceptions: MEMORY.md in root, knowledge/ files at workspace root
if not path.startswith('memory/') and not path.startswith('knowledge/') and not path.startswith('/') and path != 'MEMORY.md':
path = f'memory/{path}'
file_path = (workspace_dir / path).resolve()
workspace_resolved = workspace_dir.resolve()
if not str(file_path).startswith(str(workspace_resolved) + '/') and file_path != workspace_resolved:
return ToolResult.fail(f"Error: Access denied: path outside workspace")
if not file_path.exists():
return ToolResult.fail(f"Error: File not found: {path}")
content = file_path.read_text(encoding='utf-8')
lines = content.split('\n')
# Handle line range
if start_line < 1:
start_line = 1
start_idx = start_line - 1
if num_lines:
end_idx = start_idx + num_lines
selected_lines = lines[start_idx:end_idx]
else:
selected_lines = lines[start_idx:]
result = '\n'.join(selected_lines)
# Add metadata
total_lines = len(lines)
shown_lines = len(selected_lines)
output = [
f"File: {path}",
f"Lines: {start_line}-{start_line + shown_lines - 1} (total: {total_lines})",
"",
result
]
return ToolResult.success('\n'.join(output))
except Exception as e:
return ToolResult.fail(f"Error reading memory file: {str(e)}")

View File

@@ -0,0 +1,109 @@
"""
Memory search tool
Allows agents to search their memory using semantic and keyword search
"""
from typing import Dict, Any, Optional
from agent.tools.base_tool import BaseTool
class MemorySearchTool(BaseTool):
"""Tool for searching agent memory"""
name: str = "memory_search"
description: str = (
"Search agent's long-term memory using semantic and keyword search. "
"Use this to recall past conversations, preferences, and knowledge."
)
params: dict = {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query (can be natural language question or keywords)"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results to return (default: 10)",
"default": 10
},
"min_score": {
"type": "number",
"description": "Minimum relevance score (0-1, default: 0.1)",
"default": 0.1
}
},
"required": ["query"]
}
def __init__(self, memory_manager, user_id: Optional[str] = None):
"""
Initialize memory search tool
Args:
memory_manager: MemoryManager instance
user_id: Optional user ID for scoped search
"""
super().__init__()
self.memory_manager = memory_manager
self.user_id = user_id
from config import conf
if conf().get("knowledge", True):
self.description = (
"Search agent's long-term memory and knowledge base using semantic and keyword search. "
"Use this to recall past conversations, preferences, and knowledge pages."
)
def execute(self, args: dict):
"""
Execute memory search
Args:
args: Dictionary with query, max_results, min_score
Returns:
ToolResult with formatted search results
"""
from agent.tools.base_tool import ToolResult
import asyncio
query = args.get("query")
max_results = args.get("max_results", 10)
min_score = args.get("min_score", 0.1)
if not query:
return ToolResult.fail("Error: query parameter is required")
try:
# Run async search in sync context
results = asyncio.run(self.memory_manager.search(
query=query,
user_id=self.user_id,
max_results=max_results,
min_score=min_score,
include_shared=True
))
if not results:
# Return clear message that no memories exist yet
# This prevents infinite retry loops
return ToolResult.success(
f"No memories found for '{query}'. "
f"This is normal if no memories have been stored yet. "
f"You can store new memories by writing to MEMORY.md or memory/YYYY-MM-DD.md files."
)
# Format results
output = [f"Found {len(results)} relevant memories:\n"]
for i, result in enumerate(results, 1):
output.append(f"\n{i}. {result.path} (lines {result.start_line}-{result.end_line})")
output.append(f" Score: {result.score:.3f}")
output.append(f" Snippet: {result.snippet}")
return ToolResult.success("\n".join(output))
except Exception as e:
return ToolResult.fail(f"Error searching memory: {str(e)}")

View File

@@ -0,0 +1,3 @@
from .read import Read
__all__ = ['Read']

548
agent/tools/read/read.py Normal file
View File

@@ -0,0 +1,548 @@
"""
Read tool - Read file contents
Supports text files, images (jpg, png, gif, webp), and PDF files
"""
import os
from typing import Dict, Any
from pathlib import Path
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.utils.truncate import truncate_head, format_size, DEFAULT_MAX_LINES, DEFAULT_MAX_BYTES
from common.utils import expand_path
class Read(BaseTool):
"""Tool for reading file contents"""
name: str = "read"
description: str = f"Read or inspect file contents. For text/PDF files, returns content (truncated to {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB). For images/videos/audio, returns metadata only (file info, size, type). Use offset/limit for large text files."
params: dict = {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path to the file to read. IMPORTANT: Relative paths are based on workspace directory. To access files outside workspace, use absolute paths starting with ~ or /."
},
"offset": {
"type": "integer",
"description": "Line number to start reading from (1-indexed, optional). Use negative values to read from end (e.g. -20 for last 20 lines)"
},
"limit": {
"type": "integer",
"description": "Maximum number of lines to read (optional)"
}
},
"required": ["path"]
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
# File type categories
self.image_extensions = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp', '.svg', '.ico'}
self.video_extensions = {'.mp4', '.avi', '.mov', '.mkv', '.flv', '.wmv', '.webm', '.m4v'}
self.audio_extensions = {'.mp3', '.wav', '.ogg', '.m4a', '.flac', '.aac', '.wma'}
self.binary_extensions = {'.exe', '.dll', '.so', '.dylib', '.bin', '.dat', '.db', '.sqlite'}
self.archive_extensions = {'.zip', '.tar', '.gz', '.rar', '.7z', '.bz2', '.xz'}
self.pdf_extensions = {'.pdf'}
self.office_extensions = {'.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx'}
# Readable text formats (will be read with truncation)
self.text_extensions = {
'.txt', '.md', '.markdown', '.rst', '.log', '.csv', '.tsv', '.json', '.xml', '.yaml', '.yml',
'.py', '.js', '.ts', '.java', '.c', '.cpp', '.h', '.hpp', '.go', '.rs', '.rb', '.php',
'.html', '.css', '.scss', '.sass', '.less', '.vue', '.jsx', '.tsx',
'.sh', '.bash', '.zsh', '.fish', '.ps1', '.bat', '.cmd',
'.sql', '.r', '.m', '.swift', '.kt', '.scala', '.clj', '.erl', '.ex',
'.dockerfile', '.makefile', '.cmake', '.gradle', '.properties', '.ini', '.conf', '.cfg',
}
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute file read operation
:param args: Contains file path and optional offset/limit parameters
:return: File content or error message
"""
# Support 'location' as alias for 'path' (LLM may use it from skill listing)
path = args.get("path", "") or args.get("location", "")
path = path.strip() if isinstance(path, str) else ""
offset = args.get("offset")
limit = args.get("limit")
if not path:
return ToolResult.fail("Error: path parameter is required")
# Resolve path
absolute_path = self._resolve_path(path)
# Security check: Prevent reading sensitive config files
env_config_path = expand_path("~/.cow/.env")
if os.path.abspath(absolute_path) == os.path.abspath(env_config_path):
return ToolResult.fail(
"Error: Access denied. API keys and credentials must be accessed through the env_config tool only."
)
# Check if file exists
if not os.path.exists(absolute_path):
# Provide helpful hint if using relative path
if not os.path.isabs(path) and not path.startswith('~'):
return ToolResult.fail(
f"Error: File not found: {path}\n"
f"Resolved to: {absolute_path}\n"
f"Hint: Relative paths are based on workspace ({self.cwd}). For files outside workspace, use absolute paths."
)
return ToolResult.fail(f"Error: File not found: {path}")
# Check if readable
if not os.access(absolute_path, os.R_OK):
return ToolResult.fail(f"Error: File is not readable: {path}")
# Check file type
file_ext = Path(absolute_path).suffix.lower()
file_size = os.path.getsize(absolute_path)
# Check if image - return metadata for sending
if file_ext in self.image_extensions:
return self._read_image(absolute_path, file_ext)
# Check if video/audio/binary/archive - return metadata only
if file_ext in self.video_extensions:
return self._return_file_metadata(absolute_path, "video", file_size)
if file_ext in self.audio_extensions:
return self._return_file_metadata(absolute_path, "audio", file_size)
if file_ext in self.binary_extensions or file_ext in self.archive_extensions:
return self._return_file_metadata(absolute_path, "binary", file_size)
# Check if PDF
if file_ext in self.pdf_extensions:
return self._read_pdf(absolute_path, path, offset, limit)
# Check if Office document (.docx, .xlsx, .pptx, etc.)
if file_ext in self.office_extensions:
return self._read_office(absolute_path, path, file_ext, offset, limit)
# Read text file (with truncation for large files)
return self._read_text(absolute_path, path, offset, limit)
def _resolve_path(self, path: str) -> str:
"""
Resolve path to absolute path
:param path: Relative or absolute path
:return: Absolute path
"""
# Expand ~ to user home directory
path = expand_path(path)
if os.path.isabs(path):
return path
return os.path.abspath(os.path.join(self.cwd, path))
def _return_file_metadata(self, absolute_path: str, file_type: str, file_size: int) -> ToolResult:
"""
Return file metadata for non-readable files (video, audio, binary, etc.)
:param absolute_path: Absolute path to the file
:param file_type: Type of file (video, audio, binary, etc.)
:param file_size: File size in bytes
:return: File metadata
"""
file_name = Path(absolute_path).name
file_ext = Path(absolute_path).suffix.lower()
# Determine MIME type
mime_types = {
# Video
'.mp4': 'video/mp4', '.avi': 'video/x-msvideo', '.mov': 'video/quicktime',
'.mkv': 'video/x-matroska', '.webm': 'video/webm',
# Audio
'.mp3': 'audio/mpeg', '.wav': 'audio/wav', '.ogg': 'audio/ogg',
'.m4a': 'audio/mp4', '.flac': 'audio/flac',
# Binary
'.zip': 'application/zip', '.tar': 'application/x-tar',
'.gz': 'application/gzip', '.rar': 'application/x-rar-compressed',
}
mime_type = mime_types.get(file_ext, 'application/octet-stream')
result = {
"type": f"{file_type}_metadata",
"file_type": file_type,
"path": absolute_path,
"file_name": file_name,
"mime_type": mime_type,
"size": file_size,
"size_formatted": format_size(file_size),
"message": f"{file_type.capitalize()} 文件: {file_name} ({format_size(file_size)})\n提示: 如果需要发送此文件,请使用 send 工具。"
}
return ToolResult.success(result)
def _read_image(self, absolute_path: str, file_ext: str) -> ToolResult:
"""
Read image file - always return metadata only (images should be sent, not read into context)
:param absolute_path: Absolute path to the image file
:param file_ext: File extension
:return: Result containing image metadata for sending
"""
try:
# Get file size
file_size = os.path.getsize(absolute_path)
# Determine MIME type
mime_type_map = {
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.gif': 'image/gif',
'.webp': 'image/webp'
}
mime_type = mime_type_map.get(file_ext, 'image/jpeg')
# Return metadata for images (NOT file_to_send - use send tool to actually send)
result = {
"type": "image_metadata",
"file_type": "image",
"path": absolute_path,
"mime_type": mime_type,
"size": file_size,
"size_formatted": format_size(file_size),
"message": f"图片文件: {Path(absolute_path).name} ({format_size(file_size)})\n提示: 如果需要发送此图片,请使用 send 工具。"
}
return ToolResult.success(result)
except Exception as e:
return ToolResult.fail(f"Error reading image file: {str(e)}")
def _read_text(self, absolute_path: str, display_path: str, offset: int = None, limit: int = None) -> ToolResult:
"""
Read text file
:param absolute_path: Absolute path to the file
:param display_path: Path to display
:param offset: Starting line number (1-indexed)
:param limit: Maximum number of lines to read
:return: File content or error message
"""
try:
# Check file size first
file_size = os.path.getsize(absolute_path)
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
if file_size > MAX_FILE_SIZE:
# File too large, return metadata only
return ToolResult.success({
"type": "file_to_send",
"file_type": "document",
"path": absolute_path,
"size": file_size,
"size_formatted": format_size(file_size),
"message": f"文件过大 ({format_size(file_size)} > 50MB),无法读取内容。文件路径: {absolute_path}"
})
# Read file (utf-8-sig strips BOM automatically on Windows)
# Note: Truncation is unified via truncate_head (DEFAULT_MAX_LINES / DEFAULT_MAX_BYTES)
# so that offset/limit can paginate the entire file correctly.
with open(absolute_path, 'r', encoding='utf-8-sig') as f:
content = f.read()
all_lines = content.split('\n')
total_file_lines = len(all_lines)
# Apply offset (if specified)
start_line = 0
if offset is not None:
if offset < 0:
# Negative offset: read from end
# -20 means "last 20 lines" → start from (total - 20)
start_line = max(0, total_file_lines + offset)
else:
# Positive offset: read from start (1-indexed)
start_line = max(0, offset - 1) # Convert to 0-indexed
if start_line >= total_file_lines:
return ToolResult.fail(
f"Error: Offset {offset} is beyond end of file ({total_file_lines} lines total)"
)
start_line_display = start_line + 1 # For display (1-indexed)
# If user specified limit, use it
selected_content = content
user_limited_lines = None
if limit is not None:
end_line = min(start_line + limit, total_file_lines)
selected_content = '\n'.join(all_lines[start_line:end_line])
user_limited_lines = end_line - start_line
elif offset is not None:
selected_content = '\n'.join(all_lines[start_line:])
# Apply truncation (considering line count and byte limits)
truncation = truncate_head(selected_content)
output_text = ""
details = {}
if truncation.first_line_exceeds_limit:
# First line exceeds 30KB limit
first_line_size = format_size(len(all_lines[start_line].encode('utf-8')))
output_text = f"[Line {start_line_display} is {first_line_size}, exceeds {format_size(DEFAULT_MAX_BYTES)} limit. Use bash tool to read: head -c {DEFAULT_MAX_BYTES} {display_path} | tail -n +{start_line_display}]"
details["truncation"] = truncation.to_dict()
elif truncation.truncated:
# Truncation occurred
end_line_display = start_line_display + truncation.output_lines - 1
next_offset = end_line_display + 1
output_text = truncation.content
if truncation.truncated_by == "lines":
output_text += f"\n\n[Showing lines {start_line_display}-{end_line_display} of {total_file_lines}. Use offset={next_offset} to continue.]"
else:
output_text += f"\n\n[Showing lines {start_line_display}-{end_line_display} of {total_file_lines} ({format_size(DEFAULT_MAX_BYTES)} limit). Use offset={next_offset} to continue.]"
details["truncation"] = truncation.to_dict()
elif user_limited_lines is not None and start_line + user_limited_lines < total_file_lines:
# User specified limit, more content available, but no truncation
remaining = total_file_lines - (start_line + user_limited_lines)
next_offset = start_line + user_limited_lines + 1
output_text = truncation.content
output_text += f"\n\n[{remaining} more lines in file. Use offset={next_offset} to continue.]"
else:
# No truncation, no exceeding user limit
output_text = truncation.content
result = {
"content": output_text,
"total_lines": total_file_lines,
"start_line": start_line_display,
"output_lines": truncation.output_lines
}
if details:
result["details"] = details
return ToolResult.success(result)
except UnicodeDecodeError:
return ToolResult.fail(f"Error: File is not a valid text file (encoding error): {display_path}")
except Exception as e:
return ToolResult.fail(f"Error reading file: {str(e)}")
def _read_office(self, absolute_path: str, display_path: str, file_ext: str,
offset: int = None, limit: int = None) -> ToolResult:
"""Read Office documents (.docx, .xlsx, .pptx) using python-docx / openpyxl / python-pptx."""
try:
text = self._extract_office_text(absolute_path, file_ext)
except ImportError as e:
return ToolResult.fail(str(e))
except Exception as e:
return ToolResult.fail(f"Error reading Office document: {e}")
if not text or not text.strip():
return ToolResult.success({
"content": f"[Office file {Path(absolute_path).name}: no text content could be extracted]",
})
all_lines = text.split('\n')
total_lines = len(all_lines)
start_line = 0
if offset is not None:
if offset < 0:
start_line = max(0, total_lines + offset)
else:
start_line = max(0, offset - 1)
if start_line >= total_lines:
return ToolResult.fail(
f"Error: Offset {offset} is beyond end of content ({total_lines} lines total)"
)
selected_content = text
user_limited_lines = None
if limit is not None:
end_line = min(start_line + limit, total_lines)
selected_content = '\n'.join(all_lines[start_line:end_line])
user_limited_lines = end_line - start_line
elif offset is not None:
selected_content = '\n'.join(all_lines[start_line:])
truncation = truncate_head(selected_content)
start_line_display = start_line + 1
output_text = ""
if truncation.truncated:
end_line_display = start_line_display + truncation.output_lines - 1
next_offset = end_line_display + 1
output_text = truncation.content
output_text += f"\n\n[Showing lines {start_line_display}-{end_line_display} of {total_lines}. Use offset={next_offset} to continue.]"
elif user_limited_lines is not None and start_line + user_limited_lines < total_lines:
remaining = total_lines - (start_line + user_limited_lines)
next_offset = start_line + user_limited_lines + 1
output_text = truncation.content
output_text += f"\n\n[{remaining} more lines in file. Use offset={next_offset} to continue.]"
else:
output_text = truncation.content
return ToolResult.success({
"content": output_text,
"total_lines": total_lines,
"start_line": start_line_display,
"output_lines": truncation.output_lines,
})
@staticmethod
def _extract_office_text(absolute_path: str, file_ext: str) -> str:
"""Extract plain text from an Office document."""
if file_ext in ('.docx', '.doc'):
try:
from docx import Document
except ImportError:
raise ImportError("Error: python-docx library not installed. Install with: pip install python-docx")
doc = Document(absolute_path)
paragraphs = [p.text for p in doc.paragraphs]
for table in doc.tables:
for row in table.rows:
paragraphs.append('\t'.join(cell.text for cell in row.cells))
return '\n'.join(paragraphs)
if file_ext in ('.xlsx', '.xls'):
try:
from openpyxl import load_workbook
except ImportError:
raise ImportError("Error: openpyxl library not installed. Install with: pip install openpyxl")
wb = load_workbook(absolute_path, read_only=True, data_only=True)
parts = []
for ws in wb.worksheets:
parts.append(f"--- Sheet: {ws.title} ---")
for row in ws.iter_rows(values_only=True):
parts.append('\t'.join(str(c) if c is not None else '' for c in row))
wb.close()
return '\n'.join(parts)
if file_ext in ('.pptx', '.ppt'):
try:
from pptx import Presentation
except ImportError:
raise ImportError("Error: python-pptx library not installed. Install with: pip install python-pptx")
prs = Presentation(absolute_path)
parts = []
for i, slide in enumerate(prs.slides, 1):
parts.append(f"--- Slide {i} ---")
for shape in slide.shapes:
if shape.has_text_frame:
for para in shape.text_frame.paragraphs:
text = para.text.strip()
if text:
parts.append(text)
return '\n'.join(parts)
return ""
def _read_pdf(self, absolute_path: str, display_path: str, offset: int = None, limit: int = None) -> ToolResult:
"""
Read PDF file content
:param absolute_path: Absolute path to the file
:param display_path: Path to display
:param offset: Starting line number (1-indexed)
:param limit: Maximum number of lines to read
:return: PDF text content or error message
"""
try:
# Try to import pypdf
try:
from pypdf import PdfReader
except ImportError:
return ToolResult.fail(
"Error: pypdf library not installed. Install with: pip install pypdf"
)
# Read PDF
reader = PdfReader(absolute_path)
total_pages = len(reader.pages)
# Extract text from all pages
text_parts = []
for page_num, page in enumerate(reader.pages, 1):
page_text = page.extract_text()
if page_text.strip():
text_parts.append(f"--- Page {page_num} ---\n{page_text}")
if not text_parts:
return ToolResult.success({
"content": f"[PDF file with {total_pages} pages, but no text content could be extracted]",
"total_pages": total_pages,
"message": "PDF may contain only images or be encrypted"
})
# Merge all text
full_content = "\n\n".join(text_parts)
all_lines = full_content.split('\n')
total_lines = len(all_lines)
# Apply offset and limit (same logic as text files)
start_line = 0
if offset is not None:
start_line = max(0, offset - 1)
if start_line >= total_lines:
return ToolResult.fail(
f"Error: Offset {offset} is beyond end of content ({total_lines} lines total)"
)
start_line_display = start_line + 1
selected_content = full_content
user_limited_lines = None
if limit is not None:
end_line = min(start_line + limit, total_lines)
selected_content = '\n'.join(all_lines[start_line:end_line])
user_limited_lines = end_line - start_line
elif offset is not None:
selected_content = '\n'.join(all_lines[start_line:])
# Apply truncation
truncation = truncate_head(selected_content)
output_text = ""
details = {}
if truncation.truncated:
end_line_display = start_line_display + truncation.output_lines - 1
next_offset = end_line_display + 1
output_text = truncation.content
if truncation.truncated_by == "lines":
output_text += f"\n\n[Showing lines {start_line_display}-{end_line_display} of {total_lines}. Use offset={next_offset} to continue.]"
else:
output_text += f"\n\n[Showing lines {start_line_display}-{end_line_display} of {total_lines} ({format_size(DEFAULT_MAX_BYTES)} limit). Use offset={next_offset} to continue.]"
details["truncation"] = truncation.to_dict()
elif user_limited_lines is not None and start_line + user_limited_lines < total_lines:
remaining = total_lines - (start_line + user_limited_lines)
next_offset = start_line + user_limited_lines + 1
output_text = truncation.content
output_text += f"\n\n[{remaining} more lines in file. Use offset={next_offset} to continue.]"
else:
output_text = truncation.content
result = {
"content": output_text,
"total_pages": total_pages,
"total_lines": total_lines,
"start_line": start_line_display,
"output_lines": truncation.output_lines
}
if details:
result["details"] = details
return ToolResult.success(result)
except Exception as e:
return ToolResult.fail(f"Error reading PDF file: {str(e)}")

View File

@@ -0,0 +1,287 @@
# 定时任务工具 (Scheduler Tool)
## 功能简介
定时任务工具允许 Agent 创建、管理和执行定时任务,支持:
-**定时提醒**: 在指定时间发送消息
- 🔄 **周期性任务**: 按固定间隔或 cron 表达式重复执行
- 🔧 **动态工具调用**: 定时执行其他工具并发送结果(如搜索新闻、查询天气等)
- 📋 **任务管理**: 查询、启用、禁用、删除任务
## 安装依赖
```bash
pip install croniter>=2.0.0
```
## 使用方法
### 1. 创建定时任务
Agent 可以通过自然语言创建定时任务,支持两种类型:
#### 1.1 静态消息任务
发送预定义的消息:
**示例对话:**
```
用户: 每天早上9点提醒我开会
Agent: [调用 scheduler 工具]
action: create
name: 每日开会提醒
message: 该开会了!
schedule_type: cron
schedule_value: 0 9 * * *
```
#### 1.2 动态工具调用任务
定时执行工具并发送结果:
**示例对话:**
```
用户: 每天早上8点帮我读取一下今日日程
Agent: [调用 scheduler 工具]
action: create
name: 每日日程
tool_call:
tool_name: read
tool_params:
file_path: ~/cow/schedule.txt
result_prefix: 📅 今日日程
schedule_type: cron
schedule_value: 0 8 * * *
```
**工具调用参数说明:**
- `tool_name`: 要调用的工具名称(如 `bash``read``write` 等内置工具)
- `tool_params`: 工具的参数(字典格式)
- `result_prefix`: 可选,在结果前添加的前缀文本
**注意:** 如果要使用 skills如 bocha-search需要通过 `bash` 工具调用 skill 脚本
### 2. 支持的调度类型
#### Cron 表达式 (`cron`)
使用标准 cron 表达式:
```
0 9 * * * # 每天 9:00
0 */2 * * * # 每 2 小时
30 8 * * 1-5 # 工作日 8:30
0 0 1 * * # 每月 1 号
```
#### 固定间隔 (`interval`)
以秒为单位的间隔:
```
3600 # 每小时
86400 # 每天
1800 # 每 30 分钟
```
#### 一次性任务 (`once`)
指定具体时间ISO 格式):
```
2024-12-25T09:00:00
2024-12-31T23:59:59
```
### 3. 查询任务列表
```
用户: 查看我的定时任务
Agent: [调用 scheduler 工具]
action: list
```
### 4. 查看任务详情
```
用户: 查看任务 abc123 的详情
Agent: [调用 scheduler 工具]
action: get
task_id: abc123
```
### 5. 删除任务
```
用户: 删除任务 abc123
Agent: [调用 scheduler 工具]
action: delete
task_id: abc123
```
### 6. 启用/禁用任务
```
用户: 暂停任务 abc123
Agent: [调用 scheduler 工具]
action: disable
task_id: abc123
用户: 恢复任务 abc123
Agent: [调用 scheduler 工具]
action: enable
task_id: abc123
```
## 任务存储
任务保存在 JSON 文件中:
```
~/cow/scheduler/tasks.json
```
任务数据结构:
**静态消息任务:**
```json
{
"id": "abc123",
"name": "每日提醒",
"enabled": true,
"created_at": "2024-01-01T10:00:00",
"updated_at": "2024-01-01T10:00:00",
"schedule": {
"type": "cron",
"expression": "0 9 * * *"
},
"action": {
"type": "send_message",
"content": "该开会了!",
"receiver": "wxid_xxx",
"receiver_name": "张三",
"is_group": false,
"channel_type": "wechat"
},
"next_run_at": "2024-01-02T09:00:00",
"last_run_at": "2024-01-01T09:00:00"
}
```
**动态工具调用任务:**
```json
{
"id": "def456",
"name": "每日日程",
"enabled": true,
"created_at": "2024-01-01T10:00:00",
"updated_at": "2024-01-01T10:00:00",
"schedule": {
"type": "cron",
"expression": "0 8 * * *"
},
"action": {
"type": "tool_call",
"tool_name": "read",
"tool_params": {
"file_path": "~/cow/schedule.txt"
},
"result_prefix": "📅 今日日程",
"receiver": "wxid_xxx",
"receiver_name": "张三",
"is_group": false,
"channel_type": "wechat"
},
"next_run_at": "2024-01-02T08:00:00"
}
```
## 后台服务
定时任务由后台服务 `SchedulerService` 管理:
- 每 30 秒检查一次到期任务
- 自动执行到期任务
- 计算下次执行时间
- 记录执行历史和错误
服务在 Agent 初始化时自动启动,无需手动配置。
## 接收者确定
定时任务会发送给**创建任务时的对话对象**
- 如果在私聊中创建,发送给该用户
- 如果在群聊中创建,发送到该群
- 接收者信息在创建时自动保存
## 常见用例
### 1. 每日提醒(静态消息)
```
用户: 每天早上8点提醒我吃药
Agent: ✅ 定时任务创建成功
任务ID: a1b2c3d4
调度: 每天 8:00
消息: 该吃药了!
```
### 2. 工作日提醒(静态消息)
```
用户: 工作日下午6点提醒我下班
Agent: [创建 cron: 0 18 * * 1-5]
消息: 该下班了!
```
### 3. 倒计时提醒(静态消息)
```
用户: 1小时后提醒我
Agent: [创建 interval: 3600]
```
### 4. 每日日程推送(动态工具调用)
```
用户: 每天早上8点帮我读取今日日程
Agent: ✅ 定时任务创建成功
任务ID: schedule001
调度: 每天 8:00
工具: read(file_path='~/cow/schedule.txt')
前缀: 📅 今日日程
```
### 5. 定时文件备份(动态工具调用)
```
用户: 每天晚上11点备份工作文件
Agent: [创建 cron: 0 23 * * *]
工具: bash(command='cp ~/cow/work.txt ~/cow/backup/work_$(date +%Y%m%d).txt')
前缀: ✅ 文件已备份
```
### 6. 周报提醒(静态消息)
```
用户: 每周五下午5点提醒我写周报
Agent: [创建 cron: 0 17 * * 5]
消息: 📊 该写周报了!
```
### 4. 特定日期提醒
```
用户: 12月25日早上9点提醒我圣诞快乐
Agent: [创建 once: 2024-12-25T09:00:00]
```
## 注意事项
1. **时区**: 使用系统本地时区
2. **精度**: 检查间隔为 30 秒,实际执行可能有 ±30 秒误差
3. **持久化**: 任务保存在文件中,重启后自动恢复
4. **一次性任务**: 执行后自动禁用,不会删除(可手动删除)
5. **错误处理**: 执行失败会记录错误,不影响其他任务
## 技术实现
- **TaskStore**: 任务持久化存储
- **SchedulerService**: 后台调度服务
- **SchedulerTool**: Agent 工具接口
- **Integration**: 与 AgentBridge 集成
## 依赖
- `croniter`: Cron 表达式解析(轻量级,仅 ~50KB

View File

@@ -0,0 +1,7 @@
"""
Scheduler tool for managing scheduled tasks
"""
from .scheduler_tool import SchedulerTool
__all__ = ["SchedulerTool"]

View File

@@ -0,0 +1,548 @@
"""
Integration module for scheduler with AgentBridge
"""
import os
import threading
from typing import Optional
from config import conf
from common.log import logger
from common.utils import expand_path
from bridge.context import Context, ContextType
from bridge.reply import Reply, ReplyType
# Global scheduler service instance
_scheduler_service = None
_task_store = None
# Module-level lock to guard idempotent initialization across threads
_init_lock = threading.Lock()
def init_scheduler(agent_bridge) -> bool:
"""
Initialize scheduler service (idempotent).
Safe to call multiple times and from multiple threads: only the first
successful call creates the singleton ``SchedulerService`` + background
scanning thread. Subsequent calls return immediately.
Args:
agent_bridge: AgentBridge instance
Returns:
True if scheduler is initialized (newly created or already running)
"""
global _scheduler_service, _task_store
# Fast path: already initialized and running
if _scheduler_service is not None and getattr(_scheduler_service, "running", False):
return True
with _init_lock:
# Re-check under the lock to avoid races where multiple threads
# passed the fast-path check before any of them acquired the lock.
if _scheduler_service is not None and getattr(_scheduler_service, "running", False):
return True
try:
from agent.tools.scheduler.task_store import TaskStore
from agent.tools.scheduler.scheduler_service import SchedulerService
# Get workspace from config
workspace_root = expand_path(conf().get("agent_workspace", "~/cow"))
store_path = os.path.join(workspace_root, "scheduler", "tasks.json")
# Create task store (reuse if already created)
if _task_store is None:
_task_store = TaskStore(store_path)
logger.debug(f"[Scheduler] Task store initialized: {store_path}")
# Create execute callback. Returns True on success, False to ask
# the scheduler to retry on the next tick (e.g. channel not yet
# ready right after process start).
def execute_task_callback(task: dict):
try:
action = task.get("action", {})
action_type = action.get("type")
channel_type = action.get("channel_type", "unknown")
receiver = action.get("receiver", "")
if not _is_channel_ready(channel_type, receiver):
logger.warning(
f"[Scheduler] Task {task.get('id')}: channel "
f"'{channel_type}' not ready for receiver={receiver} "
f"(no inbound msg cached since restart?); deferring"
)
return False
if action_type == "agent_task":
return _execute_agent_task(task, agent_bridge)
elif action_type == "send_message":
return _execute_send_message(task, agent_bridge)
elif action_type == "tool_call":
return _execute_tool_call(task, agent_bridge)
elif action_type == "skill_call":
return _execute_skill_call(task, agent_bridge)
else:
logger.warning(f"[Scheduler] Unknown action type: {action_type}")
return True
except Exception as e:
logger.error(f"[Scheduler] Error executing task {task.get('id')}: {e}")
return False
# Create scheduler service
_scheduler_service = SchedulerService(_task_store, execute_task_callback)
_scheduler_service.start()
logger.info("[Scheduler] Service initialized and started")
return True
except Exception as e:
logger.error(f"[Scheduler] Failed to initialize scheduler: {e}")
return False
def _is_channel_ready(channel_type: str, receiver: str) -> bool:
"""Best-effort readiness probe for outbound channels.
Returns False when we know the send will drop (e.g. weixin not yet
logged in, web session has no polling queue), so the scheduler can
defer instead of consuming the task. Unknown channels return True
to preserve previous behaviour.
"""
if not channel_type or channel_type == "unknown":
return True
try:
from channel.channel_factory import create_channel
channel = create_channel(channel_type)
if channel is None:
return False
if channel_type == "weixin":
tokens = getattr(channel, "_context_tokens", None)
if not tokens or receiver not in tokens:
return False
return True
if channel_type == "web":
queues = getattr(channel, "session_queues", None)
if not queues or receiver not in queues:
return False
return True
return True
except Exception as e:
logger.warning(f"[Scheduler] Channel readiness check failed for {channel_type}: {e}")
return True
def get_task_store():
"""Get the global task store instance"""
return _task_store
def get_scheduler_service():
"""Get the global scheduler service instance"""
return _scheduler_service
def _remember_delivered_output(
agent_bridge,
task: dict,
channel_type: str,
content: str,
) -> None:
"""Best-effort persistence of the message the scheduler sent to a user.
Uses notify_session_id (the real chat session_id stored at task creation time)
so that group chats correctly associate the output with the user's conversation.
Falls back to receiver for backward compatibility with old tasks.
Per-action-type behaviour:
- agent_task / tool_call / skill_call: gated by ``scheduler_inject_to_session``
(default True). These produce AI-generated content worth remembering.
- send_message: additionally gated by ``scheduler_inject_send_message``
(default False). Fixed reminder text rarely benefits follow-up Q&A and
would just consume context tokens.
"""
if not content:
return
action = task.get("action", {})
action_type = action.get("type", "")
# send_message defaults to NOT being injected; explicit opt-in via config.
if action_type == "send_message":
if not conf().get("scheduler_inject_send_message", False):
return
session_id = action.get("notify_session_id") or action.get("receiver")
if not session_id:
return
try:
remember = getattr(agent_bridge, "remember_scheduled_output", None)
if remember:
task_desc = action.get("task_description") or action.get("content", "")
remember(session_id, str(content), channel_type=channel_type, task_description=task_desc)
except Exception as e:
logger.warning(
f"[Scheduler] Failed to remember delivered output for {session_id}: {e}"
)
def _execute_agent_task(task: dict, agent_bridge) -> bool:
"""
Execute an agent_task action - let Agent handle the task.
Returns True on successful delivery, False to retry next tick.
"""
try:
action = task.get("action", {})
task_description = action.get("task_description")
receiver = action.get("receiver")
is_group = action.get("is_group", False)
channel_type = action.get("channel_type", "unknown")
if not task_description:
logger.error(f"[Scheduler] Task {task['id']}: No task_description specified")
return True # malformed task, don't loop forever
if not receiver:
logger.error(f"[Scheduler] Task {task['id']}: No receiver specified")
return True
# Check for unsupported channels
if channel_type == "dingtalk":
logger.warning(f"[Scheduler] Task {task['id']}: DingTalk channel does not support scheduled messages (Stream mode limitation). Task will execute but message cannot be sent.")
logger.info(f"[Scheduler] Task {task['id']}: Executing agent task '{task_description}'")
# Create a unique session_id for this scheduled task to avoid polluting user's conversation
# Format: scheduler_<receiver>_<task_id> to ensure isolation
scheduler_session_id = f"scheduler_{receiver}_{task['id']}"
# Create context for Agent
context = Context(ContextType.TEXT, task_description)
context["receiver"] = receiver
context["isgroup"] = is_group
context["session_id"] = scheduler_session_id
# Channel-specific setup
if channel_type == "web":
import uuid
request_id = f"scheduler_{task['id']}_{uuid.uuid4().hex[:8]}"
context["request_id"] = request_id
elif channel_type == "feishu":
context["receive_id_type"] = "chat_id" if is_group else "open_id"
context["msg"] = None
elif channel_type == "dingtalk":
# DingTalk requires msg object, set to None for scheduled tasks
context["msg"] = None
if not is_group:
sender_staff_id = action.get("dingtalk_sender_staff_id")
if sender_staff_id:
context["dingtalk_sender_staff_id"] = sender_staff_id
elif channel_type == "wecom_bot":
context["msg"] = None
# Use Agent to execute the task
# Mark this as a scheduled task execution to prevent recursive task creation
context["is_scheduled_task"] = True
try:
# Don't clear history - scheduler tasks use isolated session_id so they won't pollute user conversations
reply = agent_bridge.agent_reply(task_description, context=context, on_event=None, clear_history=False)
if not (reply and reply.content):
logger.error(f"[Scheduler] Task {task['id']}: No result from agent execution")
return True # agent ran but produced nothing; don't loop
from channel.channel_factory import create_channel
channel = create_channel(channel_type)
if not channel:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
return False
if channel_type == "web" and hasattr(channel, 'request_to_session'):
request_id = context.get("request_id")
if request_id:
channel.request_to_session[request_id] = receiver
try:
channel.send(reply, context)
except Exception as e:
logger.error(f"[Scheduler] Failed to send result: {e}")
return False
_remember_delivered_output(agent_bridge, task, channel_type, reply.content)
logger.info(f"[Scheduler] Task {task['id']} executed successfully, result sent to {receiver}")
return True
except Exception as e:
logger.error(f"[Scheduler] Failed to execute task via Agent: {e}")
import traceback
logger.error(f"[Scheduler] Traceback: {traceback.format_exc()}")
return False
except Exception as e:
logger.error(f"[Scheduler] Error in _execute_agent_task: {e}")
import traceback
logger.error(f"[Scheduler] Traceback: {traceback.format_exc()}")
return False
def _execute_send_message(task: dict, agent_bridge) -> bool:
"""Execute a send_message action. Returns True/False for delivery."""
try:
action = task.get("action", {})
content = action.get("content", "")
receiver = action.get("receiver")
is_group = action.get("is_group", False)
channel_type = action.get("channel_type", "unknown")
if not receiver:
logger.error(f"[Scheduler] Task {task['id']}: No receiver specified")
return True
# Create context for sending message
context = Context(ContextType.TEXT, content)
context["receiver"] = receiver
context["isgroup"] = is_group
context["session_id"] = receiver
# Channel-specific context setup
if channel_type == "web":
# Web channel needs request_id
import uuid
request_id = f"scheduler_{task['id']}_{uuid.uuid4().hex[:8]}"
context["request_id"] = request_id
logger.debug(f"[Scheduler] Generated request_id for web channel: {request_id}")
elif channel_type == "feishu":
# Feishu channel: for scheduled tasks, send as new message (no msg_id to reply to)
# Use chat_id for groups, open_id for private chats
context["receive_id_type"] = "chat_id" if is_group else "open_id"
# Keep isgroup as is, but set msg to None (no original message to reply to)
# Feishu channel will detect this and send as new message instead of reply
context["msg"] = None
logger.debug(f"[Scheduler] Feishu: receive_id_type={context['receive_id_type']}, is_group={is_group}, receiver={receiver}")
elif channel_type == "dingtalk":
# DingTalk channel setup
context["msg"] = None
# 如果是单聊,需要传递 sender_staff_id
if not is_group:
sender_staff_id = action.get("dingtalk_sender_staff_id")
if sender_staff_id:
context["dingtalk_sender_staff_id"] = sender_staff_id
logger.debug(f"[Scheduler] DingTalk single chat: sender_staff_id={sender_staff_id}")
else:
logger.warning(f"[Scheduler] Task {task['id']}: DingTalk single chat message missing sender_staff_id")
elif channel_type == "wecom_bot":
context["msg"] = None
elif channel_type == "qq":
context["msg"] = None
# Create reply
reply = Reply(ReplyType.TEXT, content)
# Get channel and send
from channel.channel_factory import create_channel
channel = create_channel(channel_type)
if not channel:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
return False
if channel_type == "web" and hasattr(channel, 'request_to_session'):
channel.request_to_session[request_id] = receiver
try:
channel.send(reply, context)
except Exception as e:
logger.error(f"[Scheduler] Failed to send message: {e}")
return False
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: sent message to {receiver}")
return True
except Exception as e:
logger.error(f"[Scheduler] Error in _execute_send_message: {e}")
import traceback
logger.error(f"[Scheduler] Traceback: {traceback.format_exc()}")
return False
def _execute_tool_call(task: dict, agent_bridge) -> bool:
"""Execute a tool_call action. Returns True/False for delivery."""
try:
action = task.get("action", {})
tool_name = action.get("call_name") or action.get("tool_name")
tool_params = action.get("call_params") or action.get("tool_params", {})
result_prefix = action.get("result_prefix", "")
receiver = action.get("receiver")
is_group = action.get("is_group", False)
channel_type = action.get("channel_type", "unknown")
if not tool_name:
logger.error(f"[Scheduler] Task {task['id']}: No tool_name specified")
return True
if not receiver:
logger.error(f"[Scheduler] Task {task['id']}: No receiver specified")
return True
from agent.tools.tool_manager import ToolManager
tool = ToolManager().create_tool(tool_name)
if not tool:
logger.error(f"[Scheduler] Task {task['id']}: Tool '{tool_name}' not found")
return True
logger.info(f"[Scheduler] Task {task['id']}: Executing tool '{tool_name}' with params {tool_params}")
result = tool.execute(tool_params)
content = result.result if hasattr(result, 'result') else str(result)
if result_prefix:
content = f"{result_prefix}\n\n{content}"
context = Context(ContextType.TEXT, content)
context["receiver"] = receiver
context["isgroup"] = is_group
context["session_id"] = receiver
request_id = None
if channel_type == "web":
import uuid
request_id = f"scheduler_{task['id']}_{uuid.uuid4().hex[:8]}"
context["request_id"] = request_id
elif channel_type == "feishu":
context["receive_id_type"] = "chat_id" if is_group else "open_id"
context["msg"] = None
elif channel_type == "wecom_bot":
context["msg"] = None
reply = Reply(ReplyType.TEXT, content)
from channel.channel_factory import create_channel
channel = create_channel(channel_type)
if not channel:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
return False
if channel_type == "web" and request_id and hasattr(channel, 'request_to_session'):
channel.request_to_session[request_id] = receiver
try:
channel.send(reply, context)
except Exception as e:
logger.error(f"[Scheduler] Failed to send tool result: {e}")
return False
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: sent tool result to {receiver}")
return True
except Exception as e:
logger.error(f"[Scheduler] Error in _execute_tool_call: {e}")
return False
def _execute_skill_call(task: dict, agent_bridge) -> bool:
"""Execute a skill_call action by asking Agent to run the skill.
Returns True/False for delivery."""
try:
action = task.get("action", {})
skill_name = action.get("call_name") or action.get("skill_name")
skill_params = action.get("call_params") or action.get("skill_params", {})
result_prefix = action.get("result_prefix", "")
receiver = action.get("receiver")
is_group = action.get("isgroup", False)
channel_type = action.get("channel_type", "unknown")
if not skill_name:
logger.error(f"[Scheduler] Task {task['id']}: No skill_name specified")
return True
if not receiver:
logger.error(f"[Scheduler] Task {task['id']}: No receiver specified")
return True
logger.info(f"[Scheduler] Task {task['id']}: Executing skill '{skill_name}' with params {skill_params}")
scheduler_session_id = f"scheduler_{receiver}_{task['id']}"
param_str = ", ".join([f"{k}={v}" for k, v in skill_params.items()])
query = f"Use {skill_name} skill"
if param_str:
query += f" with {param_str}"
context = Context(ContextType.TEXT, query)
context["receiver"] = receiver
context["isgroup"] = is_group
context["session_id"] = scheduler_session_id
if channel_type == "web":
import uuid
request_id = f"scheduler_{task['id']}_{uuid.uuid4().hex[:8]}"
context["request_id"] = request_id
elif channel_type == "feishu":
context["receive_id_type"] = "chat_id" if is_group else "open_id"
context["msg"] = None
elif channel_type == "wecom_bot":
context["msg"] = None
try:
reply = agent_bridge.agent_reply(query, context=context, on_event=None, clear_history=False)
except Exception as e:
logger.error(f"[Scheduler] Failed to execute skill via Agent: {e}")
import traceback
logger.error(f"[Scheduler] Traceback: {traceback.format_exc()}")
return False
if not (reply and reply.content):
logger.error(f"[Scheduler] Task {task['id']}: No result from skill execution")
return True
content = reply.content
if result_prefix:
content = f"{result_prefix}\n\n{content}"
from channel.channel_factory import create_channel
channel = create_channel(channel_type)
if not channel:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
return False
if channel_type == "web" and hasattr(channel, 'request_to_session'):
req_id = context.get("request_id")
if req_id:
channel.request_to_session[req_id] = receiver
try:
channel.send(Reply(ReplyType.TEXT, content), context)
except Exception as e:
logger.error(f"[Scheduler] Failed to send skill result: {e}")
return False
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: skill result sent to {receiver}")
return True
except Exception as e:
logger.error(f"[Scheduler] Error in _execute_skill_call: {e}")
import traceback
logger.error(f"[Scheduler] Traceback: {traceback.format_exc()}")
return False
def attach_scheduler_to_tool(tool, context: Context = None):
"""
Attach scheduler components to a SchedulerTool instance
Args:
tool: SchedulerTool instance
context: Current context (optional)
"""
if _task_store:
tool.task_store = _task_store
if context:
tool.current_context = context
channel_type = context.get("channel_type") or conf().get("channel_type", "unknown")
if not tool.config:
tool.config = {}
tool.config["channel_type"] = channel_type

View File

@@ -0,0 +1,243 @@
"""
Background scheduler service for executing scheduled tasks
"""
import time
import threading
from datetime import datetime, timedelta
from typing import Callable, Optional
from croniter import croniter
from common.log import logger
def _parse_naive_local(iso_str: str) -> datetime:
"""Parse an ISO datetime and coerce it to tz-naive local time.
The scheduler uses ``datetime.now()`` (tz-naive) for all comparisons,
so any persisted timestamp must be normalized to the same flavor —
otherwise comparing naive vs aware raises TypeError.
"""
dt = datetime.fromisoformat(iso_str)
if dt.tzinfo is not None:
dt = dt.astimezone().replace(tzinfo=None)
return dt
class SchedulerService:
"""
Background service that executes scheduled tasks
"""
def __init__(self, task_store, execute_callback: Callable):
"""
Initialize scheduler service
Args:
task_store: TaskStore instance
execute_callback: Function to call when executing a task
"""
self.task_store = task_store
self.execute_callback = execute_callback
self.running = False
self.thread = None
self._lock = threading.Lock()
def start(self):
"""Start the scheduler service"""
with self._lock:
if self.running:
logger.warning("[Scheduler] Service already running")
return
self.running = True
self.thread = threading.Thread(target=self._run_loop, daemon=True)
self.thread.start()
def stop(self):
"""Stop the scheduler service"""
with self._lock:
if not self.running:
return
self.running = False
if self.thread:
self.thread.join(timeout=5)
logger.info("[Scheduler] Service stopped")
def _run_loop(self):
"""Main scheduler loop"""
logger.info("[Scheduler] Scheduler loop started")
while self.running:
try:
self._check_and_execute_tasks()
except Exception as e:
logger.error(f"[Scheduler] Error in scheduler loop: {e}")
time.sleep(30)
def _check_and_execute_tasks(self):
"""Check for due tasks and execute them"""
now = datetime.now()
tasks = self.task_store.list_tasks(enabled_only=True)
for task in tasks:
try:
if self._is_task_due(task, now):
logger.info(f"[Scheduler] Executing task: {task['id']} - {task['name']}")
ok = self._execute_task(task)
if not ok:
# Leave next_run_at as-is so the next loop retries.
# Cron tasks within the catch-up window will keep
# firing; beyond it _is_task_due will reschedule.
logger.warning(
f"[Scheduler] Task {task['id']} delivery failed, will retry next tick"
)
continue
next_run = self._calculate_next_run(task, now)
if next_run:
self.task_store.update_task(task['id'], {
"next_run_at": next_run.isoformat(),
"last_run_at": now.isoformat()
})
else:
self.task_store.delete_task(task['id'])
logger.info(f"[Scheduler] One-time task completed and removed: {task['id']}")
except Exception as e:
logger.error(f"[Scheduler] Error processing task {task.get('id')}: {e}")
def _is_task_due(self, task: dict, now: datetime) -> bool:
"""
Check if a task is due to run
Args:
task: Task dictionary
now: Current datetime
Returns:
True if task should run now
"""
next_run_str = task.get("next_run_at")
if not next_run_str:
# Calculate initial next_run_at
next_run = self._calculate_next_run(task, now)
if next_run:
self.task_store.update_task(task['id'], {
"next_run_at": next_run.isoformat()
})
return False
return False
try:
next_run = _parse_naive_local(next_run_str)
if next_run < now:
time_diff = (now - next_run).total_seconds()
schedule = task.get("schedule", {})
schedule_type = schedule.get("type")
# Catch-up window: fire if we're within 10 minutes of the
# scheduled tick. Beyond that we'd rather skip than push a
# stale daily report to the user.
if time_diff <= 600:
return True
logger.warning(
f"[Scheduler] Task {task['id']} is overdue by {int(time_diff)}s, "
f"skipping and scheduling next run"
)
if schedule_type == "once":
self.task_store.delete_task(task['id'])
logger.info(f"[Scheduler] One-time task {task['id']} expired, removed")
return False
next_next_run = self._calculate_next_run(task, now)
if next_next_run:
self.task_store.update_task(task['id'], {
"next_run_at": next_next_run.isoformat()
})
logger.info(f"[Scheduler] Rescheduled task {task['id']} to {next_next_run}")
return False
return now >= next_run
except Exception as e:
logger.error(
f"[Scheduler] Failed to evaluate due-state for task "
f"{task.get('id')} (next_run_at={next_run_str!r}): {e}"
)
return False
def _calculate_next_run(self, task: dict, from_time: datetime) -> Optional[datetime]:
"""
Calculate next run time for a task
Args:
task: Task dictionary
from_time: Calculate from this time
Returns:
Next run datetime or None for one-time tasks
"""
schedule = task.get("schedule", {})
schedule_type = schedule.get("type")
if schedule_type == "cron":
# Cron expression
expression = schedule.get("expression")
if not expression:
return None
try:
cron = croniter(expression, from_time)
return cron.get_next(datetime)
except Exception as e:
logger.error(f"[Scheduler] Invalid cron expression '{expression}': {e}")
return None
elif schedule_type == "interval":
# Interval in seconds
seconds = schedule.get("seconds", 0)
if seconds <= 0:
return None
return from_time + timedelta(seconds=seconds)
elif schedule_type == "once":
# One-time task at specific time
run_at_str = schedule.get("run_at")
if not run_at_str:
return None
try:
run_at = _parse_naive_local(run_at_str)
if run_at > from_time:
return run_at
except Exception as e:
logger.error(
f"[Scheduler] Failed to parse once-task run_at "
f"{run_at_str!r}: {e}"
)
return None
return None
def _execute_task(self, task: dict) -> bool:
"""
Execute a task.
Returns True if delivery succeeded (caller should advance state),
False if it failed (caller should keep next_run_at so the next
loop iteration retries). Callback may return None for legacy
behaviour, treated as success.
"""
try:
result = self.execute_callback(task)
return False if result is False else True
except Exception as e:
logger.error(f"[Scheduler] Error executing task {task['id']}: {e}")
self.task_store.update_task(task['id'], {
"last_error": str(e),
"last_error_at": datetime.now().isoformat()
})
return False

View File

@@ -0,0 +1,453 @@
"""
Scheduler tool for creating and managing scheduled tasks
"""
import uuid
from datetime import datetime
from typing import Any, Dict, Optional
from croniter import croniter
from agent.tools.base_tool import BaseTool, ToolResult
from bridge.context import Context, ContextType
from bridge.reply import Reply, ReplyType
from common.log import logger
class SchedulerTool(BaseTool):
"""
Tool for managing scheduled tasks (reminders, notifications, etc.)
"""
name: str = "scheduler"
description: str = (
"创建、查询和管理定时任务(提醒、周期性任务等)。\n\n"
"⚠️ 重要:仅当需要「定时/提醒/每天/每周/X分钟后/X点」等延迟或周期执行时才使用此工具。"
"使用方法:\n"
"- 创建action='create', name='任务名', message/ai_task='内容', schedule_type='once/interval/cron', schedule_value='...'\n"
"- 查询action='list' / action='get', task_id='任务ID'\n"
"- 管理action='delete/enable/disable', task_id='任务ID'\n\n"
"调度类型:\n"
"- once: 一次性任务,支持相对时间(+5s,+10m,+1h,+1d)或ISO时间\n"
"- interval: 固定间隔(秒)如3600表示每小时\n"
"- cron: cron表达式'0 8 * * *'表示每天8点\n\n"
"注意:'X秒后'用once+相对时间,'每X秒'用interval"
)
params: dict = {
"type": "object",
"properties": {
"action": {
"type": "string",
"enum": ["create", "list", "get", "delete", "enable", "disable"],
"description": "操作类型: create(创建), list(列表), get(查询), delete(删除), enable(启用), disable(禁用)"
},
"task_id": {
"type": "string",
"description": "任务ID (用于 get/delete/enable/disable 操作)"
},
"name": {
"type": "string",
"description": "任务名称 (用于 create 操作)"
},
"message": {
"type": "string",
"description": "固定消息内容 (与ai_task二选一)"
},
"ai_task": {
"type": "string",
"description": "AI任务描述 (与message二选一)用于定时让AI执行的任务"
},
"schedule_type": {
"type": "string",
"enum": ["cron", "interval", "once"],
"description": "调度类型 (用于 create 操作): cron(cron表达式), interval(固定间隔秒数), once(一次性)"
},
"schedule_value": {
"type": "string",
"description": "调度值: cron表达式/间隔秒数/时间(+5s,+10m,+1h或ISO格式)"
}
},
"required": ["action"]
}
def __init__(self, config: dict = None):
super().__init__()
self.config = config or {}
# Will be set by agent bridge
self.task_store = None
self.current_context = None
def execute(self, params: dict) -> ToolResult:
"""
Execute scheduler operations
Args:
params: Dictionary containing:
- action: Operation type (create/list/get/delete/enable/disable)
- Other parameters depending on action
Returns:
ToolResult object
"""
# Extract parameters
action = params.get("action")
kwargs = params
if not self.task_store:
return ToolResult.fail("错误: 定时任务系统未初始化")
try:
if action == "create":
result = self._create_task(**kwargs)
return ToolResult.success(result)
elif action == "list":
result = self._list_tasks(**kwargs)
return ToolResult.success(result)
elif action == "get":
result = self._get_task(**kwargs)
return ToolResult.success(result)
elif action == "delete":
result = self._delete_task(**kwargs)
return ToolResult.success(result)
elif action == "enable":
result = self._enable_task(**kwargs)
return ToolResult.success(result)
elif action == "disable":
result = self._disable_task(**kwargs)
return ToolResult.success(result)
else:
return ToolResult.fail(f"未知操作: {action}")
except Exception as e:
logger.error(f"[SchedulerTool] Error: {e}")
return ToolResult.fail(f"操作失败: {str(e)}")
def _create_task(self, **kwargs) -> str:
"""Create a new scheduled task"""
name = kwargs.get("name")
message = kwargs.get("message")
ai_task = kwargs.get("ai_task")
schedule_type = kwargs.get("schedule_type")
schedule_value = kwargs.get("schedule_value")
# Validate required fields
if not name:
return "错误: 缺少任务名称 (name)"
# Check that exactly one of message/ai_task is provided
if not message and not ai_task:
return "错误: 必须提供 message固定消息或 ai_taskAI任务之一"
if message and ai_task:
return "错误: message 和 ai_task 只能提供其中一个"
if not schedule_type:
return "错误: 缺少调度类型 (schedule_type)"
if not schedule_value:
return "错误: 缺少调度值 (schedule_value)"
# Validate schedule
schedule = self._parse_schedule(schedule_type, schedule_value)
if not schedule:
return f"错误: 无效的调度配置 - type: {schedule_type}, value: {schedule_value}"
# Get context info for receiver
if not self.current_context:
return "错误: 无法获取当前对话上下文"
context = self.current_context
# Create task
task_id = str(uuid.uuid4())[:8]
# Capture the real chat session_id at task creation time so that scheduler
# can later inject the delivered output into the user's actual conversation
# (in group chats, session_id != receiver, e.g. "user_id:group_id" on feishu).
notify_session_id = context.get("session_id")
# Build action based on message or ai_task
if message:
action = {
"type": "send_message",
"content": message,
"receiver": context.get("receiver"),
"receiver_name": self._get_receiver_name(context),
"is_group": context.get("isgroup", False),
"channel_type": self.config.get("channel_type", "unknown"),
"notify_session_id": notify_session_id,
}
else: # ai_task
action = {
"type": "agent_task",
"task_description": ai_task,
"receiver": context.get("receiver"),
"receiver_name": self._get_receiver_name(context),
"is_group": context.get("isgroup", False),
"channel_type": self.config.get("channel_type", "unknown"),
"notify_session_id": notify_session_id,
}
# 针对钉钉单聊,额外存储 sender_staff_id
msg = context.kwargs.get("msg")
if msg and hasattr(msg, 'sender_staff_id') and not context.get("isgroup", False):
action["dingtalk_sender_staff_id"] = msg.sender_staff_id
task_data = {
"id": task_id,
"name": name,
"enabled": True,
"created_at": datetime.now().isoformat(),
"updated_at": datetime.now().isoformat(),
"schedule": schedule,
"action": action
}
# Calculate initial next_run_at
next_run = self._calculate_next_run(task_data)
if next_run:
task_data["next_run_at"] = next_run.isoformat()
# Save task
self.task_store.add_task(task_data)
# Format response
schedule_desc = self._format_schedule_description(schedule)
receiver_desc = task_data["action"]["receiver_name"] or task_data["action"]["receiver"]
if message:
content_desc = f"💬 固定消息: {message}"
else:
content_desc = f"🤖 AI任务: {ai_task}"
return (
f"✅ 定时任务创建成功\n\n"
f"📋 任务ID: {task_id}\n"
f"📝 名称: {name}\n"
f"⏰ 调度: {schedule_desc}\n"
f"👤 接收者: {receiver_desc}\n"
f"{content_desc}\n"
f"🕐 下次执行: {next_run.strftime('%Y-%m-%d %H:%M:%S') if next_run else '未知'}"
)
def _list_tasks(self, **kwargs) -> str:
"""List all tasks"""
tasks = self.task_store.list_tasks()
if not tasks:
return "📋 暂无定时任务"
lines = [f"📋 定时任务列表 (共 {len(tasks)} 个)\n"]
for task in tasks:
status = "" if task.get("enabled", True) else ""
schedule_desc = self._format_schedule_description(task.get("schedule", {}))
next_run = task.get("next_run_at")
next_run_str = datetime.fromisoformat(next_run).strftime('%m-%d %H:%M') if next_run else "未知"
lines.append(
f"{status} [{task['id']}] {task['name']}\n"
f"{schedule_desc} | 下次: {next_run_str}"
)
return "\n".join(lines)
def _get_task(self, **kwargs) -> str:
"""Get task details"""
task_id = kwargs.get("task_id")
if not task_id:
return "错误: 缺少任务ID (task_id)"
task = self.task_store.get_task(task_id)
if not task:
return f"错误: 任务 '{task_id}' 不存在"
status = "启用" if task.get("enabled", True) else "禁用"
schedule_desc = self._format_schedule_description(task.get("schedule", {}))
action = task.get("action", {})
next_run = task.get("next_run_at")
next_run_str = datetime.fromisoformat(next_run).strftime('%Y-%m-%d %H:%M:%S') if next_run else "未知"
last_run = task.get("last_run_at")
last_run_str = datetime.fromisoformat(last_run).strftime('%Y-%m-%d %H:%M:%S') if last_run else "从未执行"
return (
f"📋 任务详情\n\n"
f"ID: {task['id']}\n"
f"名称: {task['name']}\n"
f"状态: {status}\n"
f"调度: {schedule_desc}\n"
f"接收者: {action.get('receiver_name', action.get('receiver'))}\n"
f"消息: {action.get('content')}\n"
f"下次执行: {next_run_str}\n"
f"上次执行: {last_run_str}\n"
f"创建时间: {datetime.fromisoformat(task['created_at']).strftime('%Y-%m-%d %H:%M:%S')}"
)
def _delete_task(self, **kwargs) -> str:
"""Delete a task"""
task_id = kwargs.get("task_id")
if not task_id:
return "错误: 缺少任务ID (task_id)"
task = self.task_store.get_task(task_id)
if not task:
return f"错误: 任务 '{task_id}' 不存在"
self.task_store.delete_task(task_id)
return f"✅ 任务 '{task['name']}' ({task_id}) 已删除"
def _enable_task(self, **kwargs) -> str:
"""Enable a task"""
task_id = kwargs.get("task_id")
if not task_id:
return "错误: 缺少任务ID (task_id)"
task = self.task_store.get_task(task_id)
if not task:
return f"错误: 任务 '{task_id}' 不存在"
self.task_store.enable_task(task_id, True)
return f"✅ 任务 '{task['name']}' ({task_id}) 已启用"
def _disable_task(self, **kwargs) -> str:
"""Disable a task"""
task_id = kwargs.get("task_id")
if not task_id:
return "错误: 缺少任务ID (task_id)"
task = self.task_store.get_task(task_id)
if not task:
return f"错误: 任务 '{task_id}' 不存在"
self.task_store.enable_task(task_id, False)
return f"✅ 任务 '{task['name']}' ({task_id}) 已禁用"
def _parse_schedule(self, schedule_type: str, schedule_value: str) -> Optional[dict]:
"""Parse and validate schedule configuration"""
try:
if schedule_type == "cron":
# Validate cron expression
croniter(schedule_value)
return {"type": "cron", "expression": schedule_value}
elif schedule_type == "interval":
# Parse interval in seconds
seconds = int(schedule_value)
if seconds <= 0:
return None
return {"type": "interval", "seconds": seconds}
elif schedule_type == "once":
# Parse datetime - support both relative and absolute time
# Check if it's relative time (e.g., "+5s", "+10m", "+1h", "+1d")
if schedule_value.startswith("+"):
import re
match = re.match(r'\+(\d+)([smhd])', schedule_value)
if match:
amount = int(match.group(1))
unit = match.group(2)
from datetime import timedelta
now = datetime.now()
if unit == 's': # seconds
target_time = now + timedelta(seconds=amount)
elif unit == 'm': # minutes
target_time = now + timedelta(minutes=amount)
elif unit == 'h': # hours
target_time = now + timedelta(hours=amount)
elif unit == 'd': # days
target_time = now + timedelta(days=amount)
else:
return None
return {"type": "once", "run_at": target_time.isoformat()}
else:
logger.error(f"[SchedulerTool] Invalid relative time format: {schedule_value}")
return None
else:
# Absolute ISO time. Normalize to tz-naive local so it
# stays comparable with the scheduler's datetime.now().
parsed = datetime.fromisoformat(schedule_value)
if parsed.tzinfo is not None:
parsed = parsed.astimezone().replace(tzinfo=None)
return {"type": "once", "run_at": parsed.isoformat()}
except Exception as e:
logger.error(f"[SchedulerTool] Invalid schedule: {e}")
return None
return None
def _calculate_next_run(self, task: dict) -> Optional[datetime]:
"""Calculate next run time for a task"""
schedule = task.get("schedule", {})
schedule_type = schedule.get("type")
now = datetime.now()
if schedule_type == "cron":
expression = schedule.get("expression")
cron = croniter(expression, now)
return cron.get_next(datetime)
elif schedule_type == "interval":
seconds = schedule.get("seconds", 0)
from datetime import timedelta
return now + timedelta(seconds=seconds)
elif schedule_type == "once":
run_at_str = schedule.get("run_at")
return datetime.fromisoformat(run_at_str)
return None
def _format_schedule_description(self, schedule: dict) -> str:
"""Format schedule as human-readable description"""
schedule_type = schedule.get("type")
if schedule_type == "cron":
expr = schedule.get("expression", "")
# Try to provide friendly description
if expr == "0 9 * * *":
return "每天 9:00"
elif expr == "0 */1 * * *":
return "每小时"
elif expr == "*/30 * * * *":
return "每30分钟"
else:
return f"Cron: {expr}"
elif schedule_type == "interval":
seconds = schedule.get("seconds", 0)
if seconds >= 86400:
days = seconds // 86400
return f"{days}"
elif seconds >= 3600:
hours = seconds // 3600
return f"{hours} 小时"
elif seconds >= 60:
minutes = seconds // 60
return f"{minutes} 分钟"
else:
return f"{seconds}"
elif schedule_type == "once":
run_at = schedule.get("run_at", "")
try:
dt = datetime.fromisoformat(run_at)
return f"一次性 ({dt.strftime('%Y-%m-%d %H:%M')})"
except Exception:
return "一次性"
return "未知"
def _get_receiver_name(self, context: Context) -> str:
"""Get receiver name from context"""
try:
msg = context.get("msg")
if msg:
if context.get("isgroup"):
return msg.other_user_nickname or "群聊"
else:
return msg.from_user_nickname or "用户"
except Exception:
pass
return "未知"

View File

@@ -0,0 +1,201 @@
"""
Task storage management for scheduler
"""
import json
import os
import threading
from datetime import datetime
from typing import Dict, List, Optional
from pathlib import Path
from common.utils import expand_path
class TaskStore:
"""
Manages persistent storage of scheduled tasks
"""
def __init__(self, store_path: str = None):
"""
Initialize task store
Args:
store_path: Path to tasks.json file. Defaults to ~/cow/scheduler/tasks.json
"""
if store_path is None:
# Default to ~/cow/scheduler/tasks.json
home = expand_path("~")
store_path = os.path.join(home, "cow", "scheduler", "tasks.json")
self.store_path = store_path
self.lock = threading.Lock()
self._ensure_store_dir()
def _ensure_store_dir(self):
"""Ensure the storage directory exists"""
store_dir = os.path.dirname(self.store_path)
os.makedirs(store_dir, exist_ok=True)
def load_tasks(self) -> Dict[str, dict]:
"""
Load all tasks from storage
Returns:
Dictionary of task_id -> task_data
"""
with self.lock:
if not os.path.exists(self.store_path):
return {}
try:
with open(self.store_path, 'r', encoding='utf-8') as f:
data = json.load(f)
return data.get("tasks", {})
except Exception as e:
print(f"Error loading tasks: {e}")
return {}
def save_tasks(self, tasks: Dict[str, dict]):
"""
Save all tasks to storage
Args:
tasks: Dictionary of task_id -> task_data
"""
with self.lock:
try:
# Create backup
if os.path.exists(self.store_path):
backup_path = f"{self.store_path}.bak"
try:
with open(self.store_path, 'r') as src:
with open(backup_path, 'w') as dst:
dst.write(src.read())
except Exception:
pass
# Save tasks
data = {
"version": 1,
"updated_at": datetime.now().isoformat(),
"tasks": tasks
}
with open(self.store_path, 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=2)
except Exception as e:
print(f"Error saving tasks: {e}")
raise
def add_task(self, task: dict) -> bool:
"""
Add a new task
Args:
task: Task data dictionary
Returns:
True if successful
"""
tasks = self.load_tasks()
task_id = task.get("id")
if not task_id:
raise ValueError("Task must have an 'id' field")
if task_id in tasks:
raise ValueError(f"Task with id '{task_id}' already exists")
tasks[task_id] = task
self.save_tasks(tasks)
return True
def update_task(self, task_id: str, updates: dict) -> bool:
"""
Update an existing task
Args:
task_id: Task ID
updates: Dictionary of fields to update
Returns:
True if successful
"""
tasks = self.load_tasks()
if task_id not in tasks:
raise ValueError(f"Task '{task_id}' not found")
# Update fields
tasks[task_id].update(updates)
tasks[task_id]["updated_at"] = datetime.now().isoformat()
self.save_tasks(tasks)
return True
def delete_task(self, task_id: str) -> bool:
"""
Delete a task
Args:
task_id: Task ID
Returns:
True if successful
"""
tasks = self.load_tasks()
if task_id not in tasks:
raise ValueError(f"Task '{task_id}' not found")
del tasks[task_id]
self.save_tasks(tasks)
return True
def get_task(self, task_id: str) -> Optional[dict]:
"""
Get a specific task
Args:
task_id: Task ID
Returns:
Task data or None if not found
"""
tasks = self.load_tasks()
return tasks.get(task_id)
def list_tasks(self, enabled_only: bool = False) -> List[dict]:
"""
List all tasks
Args:
enabled_only: If True, only return enabled tasks
Returns:
List of task dictionaries
"""
tasks = self.load_tasks()
task_list = list(tasks.values())
if enabled_only:
task_list = [t for t in task_list if t.get("enabled", True)]
# Sort by next_run_at
task_list.sort(key=lambda t: t.get("next_run_at", float('inf')))
return task_list
def enable_task(self, task_id: str, enabled: bool = True) -> bool:
"""
Enable or disable a task
Args:
task_id: Task ID
enabled: True to enable, False to disable
Returns:
True if successful
"""
return self.update_task(task_id, {"enabled": enabled})

View File

@@ -0,0 +1,3 @@
from .send import Send
__all__ = ['Send']

171
agent/tools/send/send.py Normal file
View File

@@ -0,0 +1,171 @@
"""
Send tool - Send files to the user
"""
import os
from typing import Dict, Any
from pathlib import Path
from agent.tools.base_tool import BaseTool, ToolResult
from common.utils import expand_path
class Send(BaseTool):
"""Tool for sending files to the user"""
name: str = "send"
description: str = "Send a LOCAL file (image, video, audio, document) to the user. Only for local file paths. Do NOT use this for URLs — URLs should be included directly in your text reply, the system will handle them automatically."
params: dict = {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Local file path to send. Must be an absolute path or relative to workspace. Do NOT pass URLs here."
},
"message": {
"type": "string",
"description": "Optional message to accompany the file"
}
},
"required": ["path"]
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
# Supported file types
self.image_extensions = {'.jpg', '.jpeg', '.png', '.gif', '.webp', '.bmp', '.svg', '.ico'}
self.video_extensions = {'.mp4', '.avi', '.mov', '.mkv', '.flv', '.wmv', '.webm', '.m4v'}
self.audio_extensions = {'.mp3', '.wav', '.ogg', '.m4a', '.flac', '.aac', '.wma'}
self.document_extensions = {'.pdf', '.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx', '.txt', '.md'}
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute file send operation
:param args: Contains file path and optional message
:return: File metadata for channel to send
"""
path = args.get("path", "").strip()
message = args.get("message", "")
if not path:
return ToolResult.fail("Error: path parameter is required")
# Resolve path
absolute_path = self._resolve_path(path)
# Check if file exists
if not os.path.exists(absolute_path):
return ToolResult.fail(f"Error: File not found: {path}")
# Check if readable
if not os.access(absolute_path, os.R_OK):
return ToolResult.fail(f"Error: File is not readable: {path}")
# Get file info
file_ext = Path(absolute_path).suffix.lower()
file_size = os.path.getsize(absolute_path)
file_name = Path(absolute_path).name
# Determine file type
if file_ext in self.image_extensions:
file_type = "image"
mime_type = self._get_image_mime_type(file_ext)
elif file_ext in self.video_extensions:
file_type = "video"
mime_type = self._get_video_mime_type(file_ext)
elif file_ext in self.audio_extensions:
file_type = "audio"
mime_type = self._get_audio_mime_type(file_ext)
elif file_ext in self.document_extensions:
file_type = "document"
mime_type = self._get_document_mime_type(file_ext)
else:
file_type = "file"
mime_type = "application/octet-stream"
# Return file_to_send metadata
result = {
"type": "file_to_send",
"file_type": file_type,
"path": absolute_path,
"file_name": file_name,
"mime_type": mime_type,
"size": file_size,
"size_formatted": self._format_size(file_size),
"message": message or f"正在发送 {file_name}"
}
try:
from common.cloud_client import get_website_base_url, copy_send_file
# Do nothing when in local env
if get_website_base_url():
url = copy_send_file(absolute_path, self.cwd)
if url:
result["url"] = url
except Exception:
pass
return ToolResult.success(result)
def _resolve_path(self, path: str) -> str:
"""Resolve path to absolute path"""
path = expand_path(path)
if os.path.isabs(path):
return path
return os.path.abspath(os.path.join(self.cwd, path))
def _get_image_mime_type(self, ext: str) -> str:
"""Get MIME type for image"""
mime_map = {
'.jpg': 'image/jpeg', '.jpeg': 'image/jpeg',
'.png': 'image/png', '.gif': 'image/gif',
'.webp': 'image/webp', '.bmp': 'image/bmp',
'.svg': 'image/svg+xml', '.ico': 'image/x-icon'
}
return mime_map.get(ext, 'image/jpeg')
def _get_video_mime_type(self, ext: str) -> str:
"""Get MIME type for video"""
mime_map = {
'.mp4': 'video/mp4', '.avi': 'video/x-msvideo',
'.mov': 'video/quicktime', '.mkv': 'video/x-matroska',
'.webm': 'video/webm', '.flv': 'video/x-flv'
}
return mime_map.get(ext, 'video/mp4')
def _get_audio_mime_type(self, ext: str) -> str:
"""Get MIME type for audio"""
mime_map = {
'.mp3': 'audio/mpeg', '.wav': 'audio/wav',
'.ogg': 'audio/ogg', '.m4a': 'audio/mp4',
'.flac': 'audio/flac', '.aac': 'audio/aac'
}
return mime_map.get(ext, 'audio/mpeg')
def _get_document_mime_type(self, ext: str) -> str:
"""Get MIME type for document"""
mime_map = {
'.pdf': 'application/pdf',
'.doc': 'application/msword',
'.docx': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document',
'.xls': 'application/vnd.ms-excel',
'.xlsx': 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
'.ppt': 'application/vnd.ms-powerpoint',
'.pptx': 'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'.txt': 'text/plain',
'.md': 'text/markdown'
}
return mime_map.get(ext, 'application/octet-stream')
def _format_size(self, size_bytes: int) -> str:
"""Format file size in human-readable format"""
for unit in ['B', 'KB', 'MB', 'GB']:
if size_bytes < 1024.0:
return f"{size_bytes:.1f}{unit}"
size_bytes /= 1024.0
return f"{size_bytes:.1f}TB"

619
agent/tools/tool_manager.py Normal file
View File

@@ -0,0 +1,619 @@
import importlib
import importlib.util
import threading
from pathlib import Path
from typing import Dict, Any, Type
from agent.tools.base_tool import BaseTool
from common.log import logger
from config import conf
def _normalize_mcp_configs(raw) -> list:
"""
Convert MCP server config to internal list format.
Supports:
- list format (mcp_servers): [{"name": "x", "type": "stdio", ...}]
- dict format (mcpServers): {"x": {"command": "npx", ...}}
"""
if isinstance(raw, list):
return raw
if isinstance(raw, dict):
result = []
for name, cfg in raw.items():
entry = {"name": name, **cfg}
if "type" not in entry:
entry["type"] = "sse" if "url" in entry else "stdio"
result.append(entry)
return result
return []
class ToolManager:
"""
Tool manager for managing tools.
"""
_instance = None
def __new__(cls):
"""Singleton pattern to ensure only one instance of ToolManager exists."""
if cls._instance is None:
cls._instance = super(ToolManager, cls).__new__(cls)
cls._instance.tool_classes = {} # Store tool classes instead of instances
cls._instance._initialized = False
return cls._instance
def __init__(self):
# Initialize only once
if not hasattr(self, 'tool_classes'):
self.tool_classes = {} # Dictionary to store tool classes
if not hasattr(self, '_mcp_registry'):
self._mcp_registry = None # Lazy init: only created when MCP servers are configured
if not hasattr(self, '_mcp_tool_instances'):
self._mcp_tool_instances: dict = {} # tool_name -> McpTool instance
if not hasattr(self, '_mcp_lock'):
# Guards _mcp_loaded check-then-set so concurrent callers
# don't trigger duplicate background loaders.
self._mcp_lock = threading.Lock()
if not hasattr(self, '_mcp_loaded'):
# Idempotency flag. Flipped to True the moment the first loader
# is dispatched (synchronously, inside _mcp_lock). Subsequent
# _load_mcp_tools() calls become no-ops, so per-session agent
# initialization never re-forks MCP subprocesses.
self._mcp_loaded = False
if not hasattr(self, '_mcp_status'):
# server_name -> "pending" / "ready" / "failed"
# Useful for UI / introspection while async loading is in progress.
self._mcp_status: dict = {}
if not hasattr(self, '_mcp_signature'):
# (mtime, sha256) of mcp.json the last time we loaded.
# Used by refresh_mcp_if_changed() to skip re-parsing when nothing changed.
self._mcp_signature: tuple = (None, None)
if not hasattr(self, '_mcp_active_configs'):
# server_name -> normalized config dict, for diff-based reload.
self._mcp_active_configs: dict = {}
def load_tools(self, tools_dir: str = "", config_dict=None):
"""
Load tools from both directory and configuration.
:param tools_dir: Directory to scan for tool modules
"""
if tools_dir:
self._load_tools_from_directory(tools_dir)
self._configure_tools_from_config()
else:
self._load_tools_from_init()
self._configure_tools_from_config(config_dict)
self._load_mcp_tools()
def _load_tools_from_init(self) -> bool:
"""
Load tool classes from tools.__init__.__all__
:return: True if tools were loaded, False otherwise
"""
try:
# Try to import the tools package
tools_package = importlib.import_module("agent.tools")
# Check if __all__ is defined
if hasattr(tools_package, "__all__"):
tool_classes = tools_package.__all__
# Import each tool class directly from the tools package
for class_name in tool_classes:
try:
# Skip base classes
if class_name in ["BaseTool", "ToolManager"]:
continue
# Get the class directly from the tools package
if hasattr(tools_package, class_name):
cls = getattr(tools_package, class_name)
if (
isinstance(cls, type)
and issubclass(cls, BaseTool)
and cls != BaseTool
):
try:
# Skip tools that need special initialization
if class_name in ["MemorySearchTool", "MemoryGetTool"]:
logger.debug(f"Skipped tool {class_name} (requires memory_manager)")
continue
# McpTool instances are registered dynamically via _load_mcp_tools()
if class_name == "McpTool":
logger.debug(f"Skipped tool {class_name} (registered dynamically via mcp_servers config)")
continue
# Create a temporary instance to get the name
temp_instance = cls()
tool_name = temp_instance.name
# Store the class, not the instance
self.tool_classes[tool_name] = cls
logger.debug(f"Loaded tool: {tool_name} from class {class_name}")
except ImportError as e:
# Handle missing dependencies with helpful messages
error_msg = str(e)
if "playwright" in error_msg:
logger.warning(
f"[ToolManager] Browser tool not loaded - missing dependencies.\n"
f" To enable browser tool, run:\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif "markdownify" in error_msg:
logger.warning(
f"[ToolManager] {cls.__name__} not loaded - missing markdownify.\n"
f" Install with: pip install markdownify"
)
else:
logger.warning(f"[ToolManager] {cls.__name__} not loaded due to missing dependency: {error_msg}")
except Exception as e:
logger.error(f"Error initializing tool class {cls.__name__}: {e}")
except Exception as e:
logger.error(f"Error importing class {class_name}: {e}")
return len(self.tool_classes) > 0
return False
except ImportError:
logger.warning("Could not import agent.tools package")
return False
except Exception as e:
logger.error(f"Error loading tools from __init__.__all__: {e}")
return False
def _load_tools_from_directory(self, tools_dir: str):
"""Dynamically load tool classes from directory"""
tools_path = Path(tools_dir)
# Traverse all .py files
for py_file in tools_path.rglob("*.py"):
# Skip initialization files and base tool files
if py_file.name in ["__init__.py", "base_tool.py", "tool_manager.py"]:
continue
# Get module name
module_name = py_file.stem
try:
# Load module directly from file
spec = importlib.util.spec_from_file_location(module_name, py_file)
if spec and spec.loader:
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Find tool classes in the module
for attr_name in dir(module):
cls = getattr(module, attr_name)
if (
isinstance(cls, type)
and issubclass(cls, BaseTool)
and cls != BaseTool
):
try:
# Skip memory tools (they need special initialization with memory_manager)
if attr_name in ["MemorySearchTool", "MemoryGetTool"]:
logger.debug(f"Skipped tool {attr_name} (requires memory_manager)")
continue
# Create a temporary instance to get the name
temp_instance = cls()
tool_name = temp_instance.name
# Store the class, not the instance
self.tool_classes[tool_name] = cls
except ImportError as e:
# Handle missing dependencies with helpful messages
error_msg = str(e)
if "playwright" in error_msg:
logger.warning(
f"[ToolManager] Browser tool not loaded - missing dependencies.\n"
f" To enable browser tool, run:\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif "markdownify" in error_msg:
logger.warning(
f"[ToolManager] {cls.__name__} not loaded - missing markdownify.\n"
f" Install with: pip install markdownify"
)
else:
logger.warning(f"[ToolManager] {cls.__name__} not loaded due to missing dependency: {error_msg}")
except Exception as e:
logger.error(f"Error initializing tool class {cls.__name__}: {e}")
except Exception as e:
print(f"Error importing module {py_file}: {e}")
def _configure_tools_from_config(self, config_dict=None):
"""Configure tool classes based on configuration file"""
try:
# Get tools configuration
tools_config = config_dict or conf().get("tools", {})
# Record tools that are configured but not loaded
missing_tools = []
# Store configurations for later use when instantiating
self.tool_configs = tools_config
# Check which configured tools are missing
for tool_name in tools_config:
if tool_name not in self.tool_classes:
missing_tools.append(tool_name)
# If there are missing tools, record warnings
if missing_tools:
for tool_name in missing_tools:
if tool_name == "browser":
logger.warning(
f"[ToolManager] Browser tool is configured but not loaded.\n"
f" To enable browser tool, run:\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif tool_name == "google_search":
logger.warning(
f"[ToolManager] Google Search tool is configured but may need API key.\n"
f" Get API key from: https://serper.dev\n"
f" Configure in config.json: tools.google_search.api_key"
)
else:
logger.warning(f"[ToolManager] Tool '{tool_name}' is configured but could not be loaded.")
except Exception as e:
logger.error(f"Error configuring tools from config: {e}")
def _mcp_json_path(self) -> str:
import os
workspace = os.path.expanduser(conf().get("agent_workspace", "~/cow"))
return os.path.join(workspace, "mcp.json")
def _read_mcp_json_signature(self):
"""
Return (mtime, sha256_of_bytes) for ~/cow/mcp.json without parsing.
Returns (None, None) if the file doesn't exist or is unreadable.
Cheap enough (one stat + one small read) to call on every agent init.
"""
import os
import hashlib
path = self._mcp_json_path()
try:
mtime = os.path.getmtime(path)
except OSError:
return (None, None)
try:
with open(path, "rb") as f:
digest = hashlib.sha256(f.read()).hexdigest()
except OSError:
return (mtime, None)
return (mtime, digest)
def _load_mcp_configs(self) -> list:
"""
Load MCP server configs with priority:
1. ~/cow/mcp.json (supports both mcpServers and mcp_servers keys)
2. config.json mcp_servers field (fallback)
"""
import os
import json as _json
mcp_json_path = self._mcp_json_path()
if os.path.exists(mcp_json_path):
try:
with open(mcp_json_path, "r", encoding="utf-8") as f:
data = _json.load(f)
raw = data.get("mcpServers") or data.get("mcp_servers") or data
logger.info(f"[ToolManager] Loading MCP config from {mcp_json_path}")
return _normalize_mcp_configs(raw)
except Exception as e:
logger.warning(f"[ToolManager] Failed to read {mcp_json_path}: {e}, falling back to config.json")
raw = conf().get("mcp_servers", [])
return _normalize_mcp_configs(raw)
def _load_mcp_tools(self):
"""
Trigger MCP tool loading in a background thread (idempotent).
Returns immediately. Booting MCP servers (npx, uvx, etc.) takes
seconds to tens of seconds on first run, which would otherwise
block agent initialization and the user's first message.
Built-in tools work fine without MCP, so we let the agent serve
traffic right away and let MCP servers come online in the
background. Per-session agents read a snapshot of whatever is
ready at construction time and gracefully ignore the rest.
"""
with self._mcp_lock:
if self._mcp_loaded:
return
mcp_servers_config = self._load_mcp_configs()
# Snapshot the signature now so future refresh_mcp_if_changed()
# calls can short-circuit when nothing has changed on disk.
self._mcp_signature = self._read_mcp_json_signature()
self._mcp_active_configs = {
cfg.get("name", "<unnamed>"): cfg for cfg in mcp_servers_config
}
if not mcp_servers_config:
# Mark as loaded even when there is nothing to load,
# so we don't re-read the config file on every call.
self._mcp_loaded = True
return
# Mark pending immediately so list_mcp_status() callers see
# the in-progress state instead of an empty dict.
for cfg in mcp_servers_config:
name = cfg.get("name", "<unnamed>")
self._mcp_status[name] = "pending"
self._mcp_loaded = True
threading.Thread(
target=self._load_mcp_tools_async,
args=(mcp_servers_config,),
daemon=True,
name="mcp-loader",
).start()
logger.info(
f"[ToolManager] MCP loading started in background "
f"({len(mcp_servers_config)} server(s) configured)"
)
def refresh_mcp_if_changed(self):
"""
Cheap check whether ~/cow/mcp.json has changed since last load.
If it has, do a diff-based reload: start newly added servers,
shut down removed ones, and restart any whose config was edited.
Untouched servers are left running.
Designed to be called on every agent creation. The fast path is
a single os.stat() — completely free when nothing has changed.
"""
with self._mcp_lock:
new_sig = self._read_mcp_json_signature()
if new_sig == self._mcp_signature:
return # no-op fast path
try:
new_configs = self._load_mcp_configs()
except Exception as e:
logger.warning(f"[ToolManager] MCP reload — failed to parse config: {e}")
return
new_by_name = {
cfg.get("name", "<unnamed>"): cfg for cfg in new_configs
}
old_by_name = self._mcp_active_configs
added = [n for n in new_by_name if n not in old_by_name]
removed = [n for n in old_by_name if n not in new_by_name]
changed = [
n for n in new_by_name
if n in old_by_name and new_by_name[n] != old_by_name[n]
]
if not (added or removed or changed):
# Signature drifted but content is logically identical
# (e.g. user re-saved the file without edits). Just sync.
self._mcp_signature = new_sig
return
logger.info(
f"[ToolManager] mcp.json changed — "
f"adding={added}, removing={removed}, restarting={changed}"
)
# Tear down removed + changed servers (changed ones get restarted below)
for name in removed + changed:
self._teardown_mcp_server(name)
# Spin up newly added + changed servers in the background
to_start = [new_by_name[n] for n in added + changed]
if to_start:
for cfg in to_start:
self._mcp_status[cfg.get("name", "<unnamed>")] = "pending"
threading.Thread(
target=self._load_mcp_tools_async,
args=(to_start,),
daemon=True,
name="mcp-loader-reload",
).start()
self._mcp_active_configs = new_by_name
self._mcp_signature = new_sig
def _teardown_mcp_server(self, server_name: str):
"""Shut down one MCP server and drop its tools from the registry."""
if self._mcp_registry is None:
return
client = None
with self._mcp_registry._registry_lock:
client = self._mcp_registry._clients.pop(server_name, None)
if client is not None:
try:
client.shutdown()
except Exception as e:
logger.warning(f"[MCP] Error shutting down '{server_name}': {e}")
# Drop tools that belonged to this server.
for tool_name in list(self._mcp_tool_instances.keys()):
tool = self._mcp_tool_instances.get(tool_name)
if tool is not None and getattr(tool, "server_name", None) == server_name:
self._mcp_tool_instances.pop(tool_name, None)
self._mcp_status.pop(server_name, None)
def _load_mcp_tools_async(self, mcp_servers_config):
"""
Background worker: bring up each MCP server one-by-one and
publish ready tools to _mcp_tool_instances as they come online.
Server failures are isolated — one bad server cannot block
the others, and never raises out of the worker thread.
"""
try:
from agent.tools.mcp.mcp_client import McpClient, McpClientRegistry
from agent.tools.mcp.mcp_tool import McpTool
registry = McpClientRegistry()
self._mcp_registry = registry
for cfg in mcp_servers_config:
server_name = cfg.get("name", "<unnamed>")
try:
client = McpClient(cfg)
if not client.initialize():
self._mcp_status[server_name] = "failed"
logger.warning(
f"[MCP] Server '{server_name}' failed to initialize — skipping"
)
continue
tool_schemas = client.list_tools()
added = []
for schema in tool_schemas:
tool_name = schema.get("name", "")
if not tool_name:
continue
mcp_tool = McpTool(client, schema, server_name)
# Atomic dict assignment is GIL-safe; readers iterate
# over a list() snapshot to avoid concurrent mutation.
self._mcp_tool_instances[tool_name] = mcp_tool
added.append(tool_name)
# Register client into the shared registry only after its
# tools are visible, so callers never see a half-loaded server.
with registry._registry_lock:
registry._clients[server_name] = client
self._mcp_status[server_name] = "ready"
logger.info(
f"[MCP] Server '{server_name}' ready — "
f"{len(added)} tool(s): {added}"
)
except Exception as e:
self._mcp_status[server_name] = "failed"
logger.warning(f"[MCP] Server '{server_name}' load failed: {e}")
ready = sum(1 for s in self._mcp_status.values() if s == "ready")
total = len(self._mcp_status)
logger.info(
f"[ToolManager] MCP loading complete: "
f"{ready}/{total} server(s) ready, "
f"{len(self._mcp_tool_instances)} tool(s) available"
)
except Exception as e:
logger.warning(f"[ToolManager] MCP background loader crashed: {e}")
def list_mcp_status(self) -> dict:
"""Return {server_name: status} snapshot for UI / debugging."""
return dict(self._mcp_status)
def sync_mcp_into_agent(self, agent) -> tuple:
"""
Reconcile a live agent's tool collection with the current MCP tool registry.
Adds tools that finished loading after the agent was created,
and removes tools whose MCP server was torn down. Built-in tools
on the agent are left untouched.
Handles both representations CowAgent uses:
- Agent.tools: list[BaseTool] (default Agent class)
- AgentStream.tools: dict[str, BaseTool] (streaming agent)
Returns (added_names, removed_names) for logging.
"""
if agent is None or not hasattr(agent, "tools"):
return ([], [])
from agent.tools.mcp.mcp_tool import McpTool
current = self._mcp_tool_instances
registry_names = set(current.keys())
agent_tools = agent.tools
if isinstance(agent_tools, dict):
agent_mcp_names = {
name for name, tool in agent_tools.items()
if isinstance(tool, McpTool)
}
added = registry_names - agent_mcp_names
removed = agent_mcp_names - registry_names
if not (added or removed):
return ([], [])
for name in added:
agent_tools[name] = current[name]
for name in removed:
agent_tools.pop(name, None)
elif isinstance(agent_tools, list):
agent_mcp_names = {
t.name for t in agent_tools if isinstance(t, McpTool)
}
added = registry_names - agent_mcp_names
removed = agent_mcp_names - registry_names
if not (added or removed):
return ([], [])
if removed:
agent.tools = [
t for t in agent_tools
if not (isinstance(t, McpTool) and t.name in removed)
]
for name in added:
agent.tools.append(current[name])
else:
return ([], [])
return (sorted(added), sorted(removed))
def create_tool(self, name: str) -> BaseTool:
"""
Get a new instance of a tool by name.
:param name: The name of the tool to get.
:return: A new instance of the tool or None if not found.
"""
tool_class = self.tool_classes.get(name)
if tool_class:
# Create a new instance
tool_instance = tool_class()
# Apply configuration if available
if hasattr(self, 'tool_configs') and name in self.tool_configs:
tool_instance.config = self.tool_configs[name]
return tool_instance
# Fall back to MCP tool instances
mcp_tool = self._mcp_tool_instances.get(name)
if mcp_tool:
return mcp_tool
return None
def list_tools(self) -> dict:
"""
Get information about all loaded tools.
:return: A dictionary with tool information.
"""
result = {}
for name, tool_class in self.tool_classes.items():
# Create a temporary instance to get schema
temp_instance = tool_class()
result[name] = {
"description": temp_instance.description,
"parameters": temp_instance.get_json_schema()
}
# Include MCP tool instances
for name, mcp_tool in self._mcp_tool_instances.items():
result[name] = {
"description": mcp_tool.description,
"parameters": mcp_tool.params,
}
return result
def shutdown_mcp(self):
"""Shut down all MCP server clients."""
if self._mcp_registry:
self._mcp_registry.shutdown_all()

View File

@@ -0,0 +1,40 @@
from .truncate import (
truncate_head,
truncate_tail,
truncate_line,
format_size,
TruncationResult,
DEFAULT_MAX_LINES,
DEFAULT_MAX_BYTES,
GREP_MAX_LINE_LENGTH
)
from .diff import (
strip_bom,
detect_line_ending,
normalize_to_lf,
restore_line_endings,
normalize_for_fuzzy_match,
fuzzy_find_text,
generate_diff_string,
FuzzyMatchResult
)
__all__ = [
'truncate_head',
'truncate_tail',
'truncate_line',
'format_size',
'TruncationResult',
'DEFAULT_MAX_LINES',
'DEFAULT_MAX_BYTES',
'GREP_MAX_LINE_LENGTH',
'strip_bom',
'detect_line_ending',
'normalize_to_lf',
'restore_line_endings',
'normalize_for_fuzzy_match',
'fuzzy_find_text',
'generate_diff_string',
'FuzzyMatchResult'
]

167
agent/tools/utils/diff.py Normal file
View File

@@ -0,0 +1,167 @@
"""
Diff tools for file editing
Provides fuzzy matching and diff generation functionality
"""
import difflib
import re
from typing import Optional, Tuple
def strip_bom(text: str) -> Tuple[str, str]:
"""
Remove BOM (Byte Order Mark)
:param text: Original text
:return: (BOM, text after removing BOM)
"""
if text.startswith('\ufeff'):
return '\ufeff', text[1:]
return '', text
def detect_line_ending(text: str) -> str:
"""
Detect line ending type
:param text: Text content
:return: Line ending type ('\r\n' or '\n')
"""
if '\r\n' in text:
return '\r\n'
return '\n'
def normalize_to_lf(text: str) -> str:
"""
Normalize all line endings to LF (\n)
:param text: Original text
:return: Normalized text
"""
return text.replace('\r\n', '\n').replace('\r', '\n')
def restore_line_endings(text: str, original_ending: str) -> str:
"""
Restore original line endings
:param text: LF normalized text
:param original_ending: Original line ending
:return: Text with restored line endings
"""
if original_ending == '\r\n':
return text.replace('\n', '\r\n')
return text
def normalize_for_fuzzy_match(text: str) -> str:
"""
Normalize text for fuzzy matching
Remove excess whitespace but preserve basic structure
:param text: Original text
:return: Normalized text
"""
# Compress multiple spaces to one
text = re.sub(r'[ \t]+', ' ', text)
# Remove trailing spaces
text = re.sub(r' +\n', '\n', text)
# Remove leading spaces (but preserve indentation structure, only remove excess)
lines = text.split('\n')
normalized_lines = []
for line in lines:
# Preserve indentation but normalize to multiples of single spaces
stripped = line.lstrip()
if stripped:
indent_count = len(line) - len(stripped)
# Normalize indentation (convert tabs to spaces)
normalized_indent = ' ' * indent_count
normalized_lines.append(normalized_indent + stripped)
else:
normalized_lines.append('')
return '\n'.join(normalized_lines)
class FuzzyMatchResult:
"""Fuzzy match result"""
def __init__(self, found: bool, index: int = -1, match_length: int = 0, content_for_replacement: str = ""):
self.found = found
self.index = index
self.match_length = match_length
self.content_for_replacement = content_for_replacement
def fuzzy_find_text(content: str, old_text: str) -> FuzzyMatchResult:
"""
Find text in content, try exact match first, then fuzzy match
:param content: Content to search in
:param old_text: Text to find
:return: Match result
"""
# First try exact match
index = content.find(old_text)
if index != -1:
return FuzzyMatchResult(
found=True,
index=index,
match_length=len(old_text),
content_for_replacement=content
)
# Try fuzzy match
fuzzy_content = normalize_for_fuzzy_match(content)
fuzzy_old_text = normalize_for_fuzzy_match(old_text)
index = fuzzy_content.find(fuzzy_old_text)
if index != -1:
# Fuzzy match successful, use normalized content for replacement
return FuzzyMatchResult(
found=True,
index=index,
match_length=len(fuzzy_old_text),
content_for_replacement=fuzzy_content
)
# Not found
return FuzzyMatchResult(found=False)
def generate_diff_string(old_content: str, new_content: str) -> dict:
"""
Generate unified diff string
:param old_content: Old content
:param new_content: New content
:return: Dictionary containing diff and first changed line number
"""
old_lines = old_content.split('\n')
new_lines = new_content.split('\n')
# Generate unified diff
diff_lines = list(difflib.unified_diff(
old_lines,
new_lines,
lineterm='',
fromfile='original',
tofile='modified'
))
# Find first changed line number
first_changed_line = None
for line in diff_lines:
if line.startswith('@@'):
# Parse @@ -1,3 +1,3 @@ format
match = re.search(r'@@ -\d+,?\d* \+(\d+)', line)
if match:
first_changed_line = int(match.group(1))
break
diff_string = '\n'.join(diff_lines)
return {
'diff': diff_string,
'first_changed_line': first_changed_line
}

View File

@@ -0,0 +1,295 @@
"""
Shared truncation utilities for tool outputs.
Truncation is based on two independent limits - whichever is hit first wins:
- Line limit (default: 2000 lines)
- Byte limit (default: 50KB)
Never returns partial lines (except bash tail truncation edge case).
"""
from __future__ import annotations
from typing import Dict, Any, Optional, Tuple, TYPE_CHECKING
if TYPE_CHECKING:
from typing import Literal
DEFAULT_MAX_LINES = 2000
DEFAULT_MAX_BYTES = 50 * 1024 # 50KB
GREP_MAX_LINE_LENGTH = 500 # Max chars per grep match line
class TruncationResult:
"""Truncation result"""
def __init__(
self,
content: str,
truncated: bool,
truncated_by: Optional[Literal["lines", "bytes"]],
total_lines: int,
total_bytes: int,
output_lines: int,
output_bytes: int,
last_line_partial: bool = False,
first_line_exceeds_limit: bool = False,
max_lines: int = DEFAULT_MAX_LINES,
max_bytes: int = DEFAULT_MAX_BYTES
):
self.content = content
self.truncated = truncated
self.truncated_by = truncated_by
self.total_lines = total_lines
self.total_bytes = total_bytes
self.output_lines = output_lines
self.output_bytes = output_bytes
self.last_line_partial = last_line_partial
self.first_line_exceeds_limit = first_line_exceeds_limit
self.max_lines = max_lines
self.max_bytes = max_bytes
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary"""
return {
"content": self.content,
"truncated": self.truncated,
"truncated_by": self.truncated_by,
"total_lines": self.total_lines,
"total_bytes": self.total_bytes,
"output_lines": self.output_lines,
"output_bytes": self.output_bytes,
"last_line_partial": self.last_line_partial,
"first_line_exceeds_limit": self.first_line_exceeds_limit,
"max_lines": self.max_lines,
"max_bytes": self.max_bytes
}
def format_size(bytes_count: int) -> str:
"""Format bytes as human-readable size"""
if bytes_count < 1024:
return f"{bytes_count}B"
elif bytes_count < 1024 * 1024:
return f"{bytes_count / 1024:.1f}KB"
else:
return f"{bytes_count / (1024 * 1024):.1f}MB"
def truncate_head(content: str, max_lines: Optional[int] = None, max_bytes: Optional[int] = None) -> TruncationResult:
"""
Truncate content from the head (keep first N lines/bytes).
Suitable for file reads where you want to see the beginning.
Never returns partial lines. If first line exceeds byte limit,
returns empty content with first_line_exceeds_limit=True.
:param content: Content to truncate
:param max_lines: Maximum number of lines (default: 2000)
:param max_bytes: Maximum number of bytes (default: 50KB)
:return: Truncation result
"""
if max_lines is None:
max_lines = DEFAULT_MAX_LINES
if max_bytes is None:
max_bytes = DEFAULT_MAX_BYTES
total_bytes = len(content.encode('utf-8'))
lines = content.split('\n')
total_lines = len(lines)
# Check if no truncation is needed
if total_lines <= max_lines and total_bytes <= max_bytes:
return TruncationResult(
content=content,
truncated=False,
truncated_by=None,
total_lines=total_lines,
total_bytes=total_bytes,
output_lines=total_lines,
output_bytes=total_bytes,
last_line_partial=False,
first_line_exceeds_limit=False,
max_lines=max_lines,
max_bytes=max_bytes
)
# Check if first line alone exceeds byte limit
first_line_bytes = len(lines[0].encode('utf-8'))
if first_line_bytes > max_bytes:
return TruncationResult(
content="",
truncated=True,
truncated_by="bytes",
total_lines=total_lines,
total_bytes=total_bytes,
output_lines=0,
output_bytes=0,
last_line_partial=False,
first_line_exceeds_limit=True,
max_lines=max_lines,
max_bytes=max_bytes
)
# Collect complete lines that fit
output_lines_arr = []
output_bytes_count = 0
truncated_by = "lines"
for i, line in enumerate(lines):
if i >= max_lines:
break
# Calculate line bytes (add 1 for newline if not first line)
line_bytes = len(line.encode('utf-8')) + (1 if i > 0 else 0)
if output_bytes_count + line_bytes > max_bytes:
truncated_by = "bytes"
break
output_lines_arr.append(line)
output_bytes_count += line_bytes
# If exited due to line limit
if len(output_lines_arr) >= max_lines and output_bytes_count <= max_bytes:
truncated_by = "lines"
output_content = '\n'.join(output_lines_arr)
final_output_bytes = len(output_content.encode('utf-8'))
return TruncationResult(
content=output_content,
truncated=True,
truncated_by=truncated_by,
total_lines=total_lines,
total_bytes=total_bytes,
output_lines=len(output_lines_arr),
output_bytes=final_output_bytes,
last_line_partial=False,
first_line_exceeds_limit=False,
max_lines=max_lines,
max_bytes=max_bytes
)
def truncate_tail(content: str, max_lines: Optional[int] = None, max_bytes: Optional[int] = None) -> TruncationResult:
"""
Truncate content from tail (keep last N lines/bytes).
Suitable for bash output where you want to see the ending content (errors, final results).
If the last line of original content exceeds byte limit, may return partial first line.
:param content: Content to truncate
:param max_lines: Maximum lines (default: 2000)
:param max_bytes: Maximum bytes (default: 50KB)
:return: Truncation result
"""
if max_lines is None:
max_lines = DEFAULT_MAX_LINES
if max_bytes is None:
max_bytes = DEFAULT_MAX_BYTES
total_bytes = len(content.encode('utf-8'))
lines = content.split('\n')
total_lines = len(lines)
# Check if no truncation is needed
if total_lines <= max_lines and total_bytes <= max_bytes:
return TruncationResult(
content=content,
truncated=False,
truncated_by=None,
total_lines=total_lines,
total_bytes=total_bytes,
output_lines=total_lines,
output_bytes=total_bytes,
last_line_partial=False,
first_line_exceeds_limit=False,
max_lines=max_lines,
max_bytes=max_bytes
)
# Work backwards from the end
output_lines_arr = []
output_bytes_count = 0
truncated_by = "lines"
last_line_partial = False
for i in range(len(lines) - 1, -1, -1):
if len(output_lines_arr) >= max_lines:
break
line = lines[i]
# Calculate line bytes (add newline if not the first added line)
line_bytes = len(line.encode('utf-8')) + (1 if len(output_lines_arr) > 0 else 0)
if output_bytes_count + line_bytes > max_bytes:
truncated_by = "bytes"
# Edge case: if we haven't added any lines yet and this line exceeds maxBytes,
# take the end portion of this line
if len(output_lines_arr) == 0:
truncated_line = _truncate_string_to_bytes_from_end(line, max_bytes)
output_lines_arr.insert(0, truncated_line)
output_bytes_count = len(truncated_line.encode('utf-8'))
last_line_partial = True
break
output_lines_arr.insert(0, line)
output_bytes_count += line_bytes
# If exited due to line limit
if len(output_lines_arr) >= max_lines and output_bytes_count <= max_bytes:
truncated_by = "lines"
output_content = '\n'.join(output_lines_arr)
final_output_bytes = len(output_content.encode('utf-8'))
return TruncationResult(
content=output_content,
truncated=True,
truncated_by=truncated_by,
total_lines=total_lines,
total_bytes=total_bytes,
output_lines=len(output_lines_arr),
output_bytes=final_output_bytes,
last_line_partial=last_line_partial,
first_line_exceeds_limit=False,
max_lines=max_lines,
max_bytes=max_bytes
)
def _truncate_string_to_bytes_from_end(text: str, max_bytes: int) -> str:
"""
Truncate string to fit byte limit (from end).
Properly handles multi-byte UTF-8 characters.
:param text: String to truncate
:param max_bytes: Maximum bytes
:return: Truncated string
"""
encoded = text.encode('utf-8')
if len(encoded) <= max_bytes:
return text
# Start from end, skip back maxBytes
start = len(encoded) - max_bytes
# Find valid UTF-8 boundary (character start)
while start < len(encoded) and (encoded[start] & 0xC0) == 0x80:
start += 1
return encoded[start:].decode('utf-8', errors='ignore')
def truncate_line(line: str, max_chars: int = GREP_MAX_LINE_LENGTH) -> Tuple[str, bool]:
"""
Truncate single line to max characters, add [truncated] suffix.
Used for grep match lines.
:param line: Line to truncate
:param max_chars: Maximum characters
:return: (truncated text, whether truncated)
"""
if len(line) <= max_chars:
return line, False
return f"{line[:max_chars]}... [truncated]", True

View File

@@ -0,0 +1 @@
from agent.tools.vision.vision import Vision

View File

@@ -0,0 +1,814 @@
"""
Vision tool - Analyze images using Vision API.
Supports local files (auto base64-encoded) and HTTP URLs.
Provider resolution:
- tools.vision.model (if set) means "prefer this model first; fall back to
other configured providers if it fails". The model name is mapped to its
native provider (e.g. doubao-* → Doubao, kimi-* → Moonshot, gpt-* →
OpenAI/LinkAI). That provider is tried first, then the standard auto
chain runs as fallback (with the preferred provider de-duplicated).
- Auto chain priority:
1. Main model via bot.call_vision — only when the main bot is known
to actually support vision (not just expose a call_vision method).
2. Other models whose API key is configured.
3. OpenAI / LinkAI raw HTTP.
When use_linkai=true, LinkAI is promoted to #1.
"""
import base64
import os
import subprocess
import tempfile
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
import requests
from agent.tools.base_tool import BaseTool, ToolResult
from common import const
from common.log import logger
from config import conf
DEFAULT_MODEL = const.GPT_41_MINI
DEFAULT_TIMEOUT = 60
MAX_TOKENS = 1000
COMPRESS_THRESHOLD = 1_048_576 # 1 MB
SUPPORTED_EXTENSIONS = {
"jpg": "image/jpeg",
"jpeg": "image/jpeg",
"png": "image/png",
"gif": "image/gif",
"webp": "image/webp",
}
_MAIN_MODEL_PROVIDER_NAME = "MainModel"
# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
# Auto-discovered as fallback vision providers when their API key is configured.
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
_DISCOVERABLE_MODELS = [
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_6, "Moonshot"),
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
("gemini_api_key", const.GEMINI, const.GEMINI_35_FLASH, "Gemini"),
("qianfan_api_key", const.QIANFAN, const.ERNIE_45_TURBO_VL, "Qianfan"),
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
("mimo_api_key", const.MIMO, const.MIMO_V2_5_PRO, "MiMo"),
]
# Model name prefix → discoverable provider display_name.
# Used to auto-route tools.vision.model to its native provider.
# Matched case-insensitively; longest prefix wins.
_MODEL_PREFIX_TO_PROVIDER = [
("doubao-", "Doubao"),
("kimi-", "Moonshot"),
("moonshot-", "Moonshot"),
("qwen", "DashScope"), # qwen-*, qwen3-*, qwen3.6-*, etc.
("claude-", "Claude"),
("ernie-", "Qianfan"),
("gemini-", "Gemini"),
("glm-", "ZhipuAI"),
("minimax-", "MiniMax"),
("abab", "MiniMax"),
("mimo-", "MiMo"),
]
# Model prefixes that natively belong to OpenAI / LinkAI (raw HTTP providers).
_OPENAI_MODEL_PREFIXES = ("gpt-", "o1-", "o3-", "o4-", "chatgpt-")
# Maps the UI provider id (persisted in tools.vision.provider) to the internal
# display name used in VisionProvider.name. Keep in sync with _DISCOVERABLE_MODELS
# and the openai/linkai branches in _route_by_model_name.
_PROVIDER_ID_TO_DISPLAY = {
"openai": "OpenAI",
"linkai": "LinkAI",
"moonshot": "Moonshot",
"doubao": "Doubao",
"dashscope": "DashScope",
"claudeAPI": "Claude",
"gemini": "Gemini",
"qianfan": "Qianfan",
"zhipu": "ZhipuAI",
"minimax": "MiniMax",
"mimo": "MiMo",
}
@dataclass
class VisionProvider:
"""A single Vision API provider configuration."""
name: str
api_key: str
api_base: str
extra_headers: dict = field(default_factory=dict)
model_override: Optional[str] = None
use_bot: bool = False # When True, call via bot.call_vision instead of raw HTTP
fallback_bot: Any = None # Bot instance for non-main-model providers
class VisionAPIError(Exception):
"""Raised when a Vision API call fails and should trigger fallback."""
pass
class Vision(BaseTool):
"""Analyze images using Vision API"""
name: str = "vision"
description: str = (
"Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
"Can describe content, extract text, identify objects, colors, etc. "
)
params: dict = {
"type": "object",
"properties": {
"image": {
"type": "string",
"description": "Local file path or HTTP(S) URL of the image to analyze",
},
"question": {
"type": "string",
"description": "Question to ask about the image",
},
},
"required": ["image", "question"],
}
def __init__(self, config: dict = None):
self.config = config or {}
@staticmethod
def is_available() -> bool:
return True
def execute(self, args: Dict[str, Any]) -> ToolResult:
image = args.get("image", "").strip()
question = args.get("question", "").strip()
if not image:
return ToolResult.fail("Error: 'image' parameter is required")
if not question:
return ToolResult.fail("Error: 'question' parameter is required")
providers = self._resolve_providers()
if not providers:
return ToolResult.fail(
"Error: No model available for Vision.\n"
"The main model does not support vision and no other API keys are configured.\n"
"Options:\n"
" 1. Switch to a multimodal model (e.g. ernie-4.5-turbo-vl, qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
)
try:
image_content = self._build_image_content(image)
except Exception as e:
return ToolResult.fail(f"Error: {e}")
# Default model is only used as a last-resort placeholder for providers
# whose VisionProvider.model_override is None (e.g. raw OpenAI provider
# when the user did not configure tools.vision.model).
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
question: str, image_content: dict) -> ToolResult:
"""Try each provider in order; fall back to the next one on failure."""
errors: List[str] = []
for i, provider in enumerate(providers):
use_model = provider.model_override or model
try:
logger.info(f"[Vision] Trying provider '{provider.name}' "
f"with model '{use_model}' ({i + 1}/{len(providers)})")
if provider.use_bot:
result = self._call_via_bot(use_model, question, image_content, provider)
else:
result = self._call_api(provider, use_model, question, image_content)
logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
return result
except VisionAPIError as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
except requests.Timeout:
errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
logger.warning(f"[Vision] Provider '{provider.name}' timed out")
except requests.ConnectionError:
errors.append(f"[{provider.name}/{use_model}] Connection failed")
logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
except Exception as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
return ToolResult.fail(
"Error: All Vision API providers failed.\n" + "\n".join(f" - {err}" for err in errors)
)
def _resolve_providers(self) -> List[VisionProvider]:
"""
Build an ordered list of providers to try.
Semantics of `tools.vision.model`:
"Prefer this model first; fall back to other configured providers
if it fails."
Order:
1. The provider that natively serves `tools.vision.model` (if any
and its API key is configured) — using the user-specified model
name verbatim.
2. Auto-discovery chain as fallback:
- use_linkai=true → [LinkAI, MainModel?, OtherModels…, OpenAI]
- default → [MainModel?, OtherModels…, OpenAI, LinkAI]
MainModel is only included when the main bot is known to support
vision (see _main_bot_supports_vision).
Providers that share the same display name as the preferred provider
are de-duplicated to avoid retrying the same endpoint twice.
"""
user_model = self._resolve_user_vision_model()
user_provider = self._resolve_user_vision_provider()
providers: List[VisionProvider] = []
# Step 1: preferred provider — explicit `tools.vision.provider`
# wins so custom model names can still be routed correctly. Falls
# through to model-name prefix inference when provider is unset.
preferred = None
if user_provider and user_model:
preferred = self._route_by_provider_id(user_provider, user_model)
if not preferred and user_model:
preferred = self._route_by_model_name(user_model)
if preferred:
providers.extend(preferred)
# Step 2: auto-discovery chain as fallback
existing = {p.name for p in providers}
fallback: List[VisionProvider] = []
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
if use_linkai:
self._append_provider(fallback, lambda: self._build_linkai_provider(user_model))
self._append_provider(fallback, self._build_main_model_provider)
self._append_other_model_providers(fallback, preferred_model=user_model)
self._append_provider(fallback, lambda: self._build_openai_provider(user_model))
else:
self._append_provider(fallback, self._build_main_model_provider)
self._append_other_model_providers(fallback, preferred_model=user_model)
self._append_provider(fallback, lambda: self._build_openai_provider(user_model))
self._append_provider(fallback, lambda: self._build_linkai_provider(user_model))
for p in fallback:
if p.name in existing:
continue
providers.append(p)
existing.add(p.name)
return providers
@staticmethod
def _append_provider(providers: List[VisionProvider], builder) -> None:
p = builder()
if p:
providers.append(p)
@staticmethod
def _resolve_user_vision_model() -> Optional[str]:
"""Read tools.vision.model (singular ``tool`` kept as runtime fallback)."""
tools_conf = conf().get("tools") or conf().get("tool") or {}
if not isinstance(tools_conf, dict):
return None
vision_conf = tools_conf.get("vision", {})
if not isinstance(vision_conf, dict):
return None
m = vision_conf.get("model")
if isinstance(m, str) and m.strip():
return m.strip()
return None
@staticmethod
def _resolve_user_vision_provider() -> Optional[str]:
"""Read tools.vision.provider — the UI-persisted vendor id.
Lets users pin a vendor for custom model names that prefix-inference
can't recognize. Returns None when unset/blank.
"""
tools_conf = conf().get("tools") or conf().get("tool") or {}
if not isinstance(tools_conf, dict):
return None
vision_conf = tools_conf.get("vision", {})
if not isinstance(vision_conf, dict):
return None
p = vision_conf.get("provider")
if isinstance(p, str) and p.strip():
return p.strip()
return None
@staticmethod
def _infer_provider_from_model(model_name: str) -> Optional[str]:
"""
Infer the provider display name from a model name's prefix.
Returns None when no rule matches (or for OpenAI-family names, which
are handled separately by the caller).
"""
if not model_name:
return None
lower = model_name.lower()
# Sort by prefix length desc so e.g. "moonshot-" wins over hypothetical "moo-"
for prefix, display_name in sorted(_MODEL_PREFIX_TO_PROVIDER, key=lambda x: -len(x[0])):
if lower.startswith(prefix.lower()):
return display_name
return None
def _route_by_provider_id(self, provider_id: str, user_model: str) -> Optional[List[VisionProvider]]:
"""Route by the UI-persisted provider id.
Returns:
- [provider] : provider id is known and its key is configured.
- None : unknown provider id, or the bot can't be created.
Caller falls through to model-name-based routing.
"""
display_name = _PROVIDER_ID_TO_DISPLAY.get(provider_id)
if not display_name:
return None
# OpenAI / LinkAI use raw HTTP providers, not the discoverable bot path.
if provider_id == "openai":
p = self._build_openai_provider(user_model)
return [p] if p else None
if provider_id == "linkai":
p = self._build_linkai_provider(user_model)
return [p] if p else None
# Discoverable bot-backed providers.
for config_key, bot_type, _default_model, name in _DISCOVERABLE_MODELS:
if name != display_name:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
logger.warning(f"[Vision] tools.vision.provider='{provider_id}' "
f"but '{config_key}' is not configured. Falling back.")
return None
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
logger.warning(f"[Vision] '{display_name}' bot does not implement call_vision.")
return None
except Exception as e:
logger.warning(f"[Vision] Failed to create '{display_name}' bot: {e}")
return None
return [VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=user_model,
use_bot=True,
fallback_bot=bot,
)]
return None
def _route_by_model_name(self, user_model: str) -> Optional[List[VisionProvider]]:
"""
Try to build a provider list using the user-specified model name.
Returns:
- [provider] : matched and the provider's key is configured
- [] : matched but key missing → tell caller to surface this
as a hard error rather than silently falling back
- None : no rule matches → caller should fall through to auto
"""
lower = user_model.lower()
# OpenAI / LinkAI family
if lower.startswith(_OPENAI_MODEL_PREFIXES):
providers: List[VisionProvider] = []
# Prefer LinkAI when explicitly enabled, else OpenAI first
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
if use_linkai:
self._append_provider(providers, lambda: self._build_linkai_provider(user_model))
self._append_provider(providers, lambda: self._build_openai_provider(user_model))
else:
self._append_provider(providers, lambda: self._build_openai_provider(user_model))
self._append_provider(providers, lambda: self._build_linkai_provider(user_model))
if providers:
return providers
logger.warning(f"[Vision] tools.vision.model='{user_model}' looks like an OpenAI "
f"model but neither OPENAI_API_KEY nor LINKAI_API_KEY is configured.")
return None # fall through to auto
# Discoverable native providers (Doubao, Moonshot, etc.)
target_display = self._infer_provider_from_model(user_model)
if not target_display:
return None # unknown prefix → auto
for config_key, bot_type, _default_model, display_name in _DISCOVERABLE_MODELS:
if display_name != target_display:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
logger.warning(f"[Vision] tools.vision.model='{user_model}' routes to "
f"'{display_name}' but '{config_key}' is not configured. "
f"Falling back to auto-discovery.")
return None # fall through to auto
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
logger.warning(f"[Vision] '{display_name}' bot does not implement call_vision.")
return None
except Exception as e:
logger.warning(f"[Vision] Failed to create '{display_name}' bot: {e}")
return None
return [VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=user_model,
use_bot=True,
fallback_bot=bot,
)]
return None
def _append_other_model_providers(self, providers: List[VisionProvider],
preferred_model: Optional[str] = None) -> None:
"""
Auto-discover other models whose API key is configured.
Skip the main model's own bot_type (already covered by MainModel
provider), unless the main model itself does not support vision —
in that case we still want the vendor's dedicated vision model
as a fallback. Also skip bot_types that already appear in the
provider list.
If preferred_model matches a provider's family, use it instead
of that provider's hard-coded default model.
"""
main_bot_type = None
main_bot_supports_vision = False
if self.model and hasattr(self.model, '_resolve_bot_type'):
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
main_bot = getattr(self.model, "bot", None)
main_bot_supports_vision = self._main_bot_supports_vision(main_bot)
existing_names = {p.name for p in providers}
preferred_provider = self._infer_provider_from_model(preferred_model) if preferred_model else None
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
if display_name in existing_names:
continue
# Same bot_type as the main model is normally handled by the
# MainModel provider; only skip it here if the main model
# actually supports vision. Otherwise fall through and add
# the vendor's dedicated vision model as a fallback.
if bot_type == main_bot_type and main_bot_supports_vision:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
continue
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
continue
except Exception:
continue
model_for_provider = (preferred_model
if preferred_provider == display_name and preferred_model
else default_model)
provider = VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=model_for_provider,
use_bot=True,
fallback_bot=bot,
)
# Same vendor as the main bot is the most natural fallback when
# the main model itself does not support vision — promote it to
# the front of the list instead of relying on declaration order.
if bot_type == main_bot_type:
providers.insert(0, provider)
else:
providers.append(provider)
def _main_bot_supports_vision(self, bot) -> bool:
"""
Whether the main bot is known to natively support vision.
Having a `call_vision` method is necessary but not sufficient —
some bots implement the method against an endpoint that does not
actually serve vision models, which causes silent failures when a
vendor-foreign model name is forwarded.
Resolution order:
1. If the bot explicitly declares `supports_vision`, trust it.
This lets bots opt in or out based on their own runtime
configuration (e.g. the currently selected model).
2. Otherwise, fall back to a model-name prefix heuristic: trust
call_vision when the main model looks like an OpenAI family
model or matches a known multimodal vendor prefix.
"""
if bot is None:
return False
if hasattr(bot, "supports_vision"):
return bool(getattr(bot, "supports_vision"))
main_model = (conf().get("model") or "").lower()
if not main_model:
return False
if main_model.startswith(_OPENAI_MODEL_PREFIXES):
return True
return self._infer_provider_from_model(main_model) is not None
def _build_main_model_provider(self) -> Optional[VisionProvider]:
"""
Use the vendor's own model for vision via bot.call_vision.
Gated by _main_bot_supports_vision so non-vision bots (DeepSeek, etc.)
do not get routed vendor-foreign model names.
"""
if not (self.model and hasattr(self.model, 'bot')):
return None
try:
bot = self.model.bot
except Exception:
return None
if not hasattr(bot, 'call_vision'):
return None
if not self._main_bot_supports_vision(bot):
return None
# Use the configured main model name; do NOT inject tools.vision.model
# here, because by the time we reach this branch the tools.vision.model
# routing has already been attempted (and either matched the main bot
# or failed to find a provider).
main_model_name = conf().get("model") or None
return VisionProvider(
name=_MAIN_MODEL_PROVIDER_NAME,
api_key="",
api_base="",
model_override=main_model_name,
use_bot=True,
)
def _build_openai_provider(self, preferred_model: Optional[str] = None) -> Optional[VisionProvider]:
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
if not api_key:
return None
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
# Only honor preferred_model when it looks like an OpenAI-family name;
# otherwise the OpenAI endpoint would 400 on a vendor-specific name.
model_override = preferred_model if (
preferred_model and preferred_model.lower().startswith(_OPENAI_MODEL_PREFIXES)
) else None
return VisionProvider(
name="OpenAI",
api_key=api_key,
api_base=self._ensure_v1(api_base),
model_override=model_override,
)
def _build_linkai_provider(self, preferred_model: Optional[str] = None) -> Optional[VisionProvider]:
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
if not api_key:
return None
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
from common.utils import get_cloud_headers
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
# LinkAI is a multi-vendor proxy and accepts most model names, so we
# honor any user-configured model name here.
return VisionProvider(
name="LinkAI",
api_key=api_key,
api_base=self._ensure_v1(api_base),
extra_headers=extra,
model_override=preferred_model,
)
def _call_via_bot(self, model: str, question: str, image_content: dict,
provider: Optional[VisionProvider] = None) -> ToolResult:
"""
Call a model's call_vision with vendor-native API format.
Uses the provider's _fallback_bot if set, otherwise the main model bot.
Raises VisionAPIError on failure so fallback can proceed.
"""
try:
bot = (provider and provider.fallback_bot) or self.model.bot
except Exception as e:
raise VisionAPIError(f"Cannot access bot: {e}")
# Extract the raw image URL from the OpenAI-format image_content block
image_url = image_content.get("image_url", {}).get("url", "")
if not image_url:
raise VisionAPIError("No image URL in content block")
try:
response = bot.call_vision(
image_url=image_url,
question=question,
model=model,
max_tokens=MAX_TOKENS,
)
except Exception as e:
raise VisionAPIError(f"call_vision failed: {e}")
if response is NotImplemented:
raise VisionAPIError("Bot does not support vision")
if isinstance(response, dict) and response.get("error"):
raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
content = response.get("content", "") if isinstance(response, dict) else ""
if not content:
raise VisionAPIError("Empty response from main model")
usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
# Use the actual model name from the bot response if available
actual_model = response.get("model", model) if isinstance(response, dict) else model
provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
return ToolResult.success({
"model": actual_model,
"provider": provider_name,
"content": content,
"usage": usage_info,
})
@staticmethod
def _ensure_v1(api_base: str) -> str:
"""Append /v1 if the base URL doesn't already end with a versioned path."""
if not api_base:
return api_base
# Already has /v1 or similar version suffix
if api_base.rstrip("/").split("/")[-1].startswith("v"):
return api_base
return api_base.rstrip("/") + "/v1"
def _build_image_content(self, image: str) -> dict:
"""
Build the image_url content block.
Both remote URLs and local files are converted to base64 data URLs
so every bot backend can consume them without extra downloads.
"""
if image.startswith(("http://", "https://")):
return self._download_to_data_url(image)
if not os.path.isfile(image):
raise FileNotFoundError(f"Image file not found: {image}")
ext = image.rsplit(".", 1)[-1].lower() if "." in image else ""
mime_type = SUPPORTED_EXTENSIONS.get(ext)
if not mime_type:
raise ValueError(
f"Unsupported image format '.{ext}'. "
f"Supported: {', '.join(SUPPORTED_EXTENSIONS.keys())}"
)
file_path = self._maybe_compress(image)
try:
with open(file_path, "rb") as f:
b64 = base64.b64encode(f.read()).decode("ascii")
finally:
if file_path != image and os.path.exists(file_path):
os.remove(file_path)
data_url = f"data:{mime_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _download_to_data_url(url: str) -> dict:
"""Download a remote image and return it as a base64 data URL."""
resp = requests.get(url, timeout=30)
if resp.status_code != 200:
raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
if not content_type.startswith("image/"):
content_type = "image/jpeg"
b64 = base64.b64encode(resp.content).decode("ascii")
data_url = f"data:{content_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _maybe_compress(path: str) -> str:
"""Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
file_size = os.path.getsize(path)
if file_size <= COMPRESS_THRESHOLD:
return path
tmp = tempfile.NamedTemporaryFile(suffix=".jpg", delete=False)
tmp.close()
def _try_sips(max_dim: str, quality: str) -> bool:
try:
subprocess.run(
["sips", "-Z", max_dim, "-s", "formatOptions", quality,
path, "--out", tmp.name],
capture_output=True, check=True,
)
return True
except (FileNotFoundError, subprocess.CalledProcessError):
return False
def _try_convert(max_dim: str, quality: str) -> bool:
try:
subprocess.run(
["convert", path, "-resize", f"{max_dim}x{max_dim}>",
"-quality", quality, tmp.name],
capture_output=True, check=True,
)
return True
except (FileNotFoundError, subprocess.CalledProcessError):
return False
attempts = [
("1536", "85"),
("1536", "70"),
("1536", "50"),
]
for max_dim, quality in attempts:
ok = _try_sips(max_dim, quality) or _try_convert(max_dim, quality)
if not ok:
continue
new_size = os.path.getsize(tmp.name)
logger.debug(f"[Vision] Compressed image "
f"({file_size // 1024}KB -> {new_size // 1024}KB, "
f"max_dim={max_dim}, q={quality})")
if new_size <= COMPRESS_THRESHOLD:
return tmp.name
if os.path.exists(tmp.name) and os.path.getsize(tmp.name) > 0:
return tmp.name
os.remove(tmp.name)
return path
def _call_api(self, provider: VisionProvider, model: str,
question: str, image_content: dict) -> ToolResult:
"""
Call a single provider's Vision API.
Raises VisionAPIError on recoverable failures so the caller can try
the next provider.
"""
payload = {
"model": model,
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": question},
image_content,
],
}
],
}
headers = {
"Authorization": f"Bearer {provider.api_key}",
"Content-Type": "application/json",
**provider.extra_headers,
}
resp = requests.post(
f"{provider.api_base}/chat/completions",
headers=headers,
json=payload,
timeout=DEFAULT_TIMEOUT,
)
if resp.status_code != 200:
raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
if "error" in data:
msg = data["error"].get("message", "Unknown API error")
raise VisionAPIError(f"API error - {msg}")
content = ""
choices = data.get("choices", [])
if choices:
content = choices[0].get("message", {}).get("content", "")
usage = data.get("usage", {})
result = {
"model": model,
"provider": provider.name,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
return ToolResult.success(result)

View File

View File

@@ -0,0 +1,444 @@
"""
Web Fetch tool - Fetch and extract readable content from web pages and remote files.
Supports:
- HTML web pages: extracts readable text content
- Document files (PDF, Word, TXT, Markdown, etc.): downloads to workspace/tmp and parses content
"""
import os
import re
import uuid
from typing import Dict, Any, Optional, Set
from urllib.parse import urlparse, unquote
import requests
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.utils.truncate import truncate_head, format_size
from common.log import logger
DEFAULT_TIMEOUT = 30
MAX_FILE_SIZE = 50 * 1024 * 1024 # 50MB
DEFAULT_HEADERS = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
"Accept": "*/*",
}
# Supported document file extensions
PDF_SUFFIXES: Set[str] = {".pdf"}
WORD_SUFFIXES: Set[str] = {".docx"}
TEXT_SUFFIXES: Set[str] = {".txt", ".md", ".markdown", ".rst", ".csv", ".tsv", ".log"}
SPREADSHEET_SUFFIXES: Set[str] = {".xls", ".xlsx"}
PPT_SUFFIXES: Set[str] = {".ppt", ".pptx"}
ALL_DOC_SUFFIXES = PDF_SUFFIXES | WORD_SUFFIXES | TEXT_SUFFIXES | SPREADSHEET_SUFFIXES | PPT_SUFFIXES
_CHARSET_RE = re.compile(r'charset\s*=\s*["\']?\s*([\w\-]+)', re.IGNORECASE)
_META_CHARSET_RE = re.compile(rb'<meta[^>]+charset\s*=\s*["\']?\s*([\w\-]+)', re.IGNORECASE)
_META_HTTP_EQUIV_RE = re.compile(
rb'<meta[^>]+http-equiv\s*=\s*["\']?Content-Type["\']?[^>]+content\s*=\s*["\'][^"\']*charset=([\w\-]+)',
re.IGNORECASE,
)
def _extract_charset_from_content_type(content_type: str) -> Optional[str]:
"""Extract charset from Content-Type header value."""
m = _CHARSET_RE.search(content_type)
return m.group(1) if m else None
def _extract_charset_from_html_meta(raw_bytes: bytes) -> Optional[str]:
"""Extract charset from HTML <meta> tags in the first few KB of raw bytes."""
m = _META_CHARSET_RE.search(raw_bytes)
if m:
return m.group(1).decode("ascii", errors="ignore")
m = _META_HTTP_EQUIV_RE.search(raw_bytes)
if m:
return m.group(1).decode("ascii", errors="ignore")
return None
def _get_url_suffix(url: str) -> str:
"""Extract file extension from URL path, ignoring query params."""
path = urlparse(url).path
return os.path.splitext(path)[-1].lower()
def _is_document_url(url: str) -> bool:
"""Check if URL points to a downloadable document file."""
suffix = _get_url_suffix(url)
return suffix in ALL_DOC_SUFFIXES
class WebFetch(BaseTool):
"""Tool for fetching web pages and remote document files"""
name: str = "web_fetch"
description: str = (
"Fetch content from a http/https URL. For web pages, extracts readable text. "
"For document files (PDF, Word, TXT, Markdown, Excel, PPT), downloads and parses the file content. "
"Supported file types: .pdf, .docx, .txt, .md, .csv, .xls, .xlsx, .ppt, .pptx"
)
params: dict = {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "The HTTP/HTTPS URL to fetch (web page or document file link)"
}
},
"required": ["url"]
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
def execute(self, args: Dict[str, Any]) -> ToolResult:
url = args.get("url", "").strip()
if not url:
return ToolResult.fail("Error: 'url' parameter is required")
parsed = urlparse(url)
if parsed.scheme not in ("http", "https"):
return ToolResult.fail("Error: Invalid URL (must start with http:// or https://)")
if _is_document_url(url):
return self._fetch_document(url)
return self._fetch_webpage(url)
# ---- Web page fetching ----
def _fetch_webpage(self, url: str) -> ToolResult:
"""Fetch and extract readable text from an HTML web page."""
parsed = urlparse(url)
try:
response = requests.get(
url,
headers=DEFAULT_HEADERS,
timeout=DEFAULT_TIMEOUT,
allow_redirects=True,
)
response.raise_for_status()
except requests.Timeout:
return ToolResult.fail(f"Error: Request timed out after {DEFAULT_TIMEOUT}s")
except requests.ConnectionError:
return ToolResult.fail(f"Error: Failed to connect to {parsed.netloc}")
except requests.HTTPError as e:
return ToolResult.fail(f"Error: HTTP {e.response.status_code} for URL: {url}")
except Exception as e:
return ToolResult.fail(f"Error: Failed to fetch URL: {e}")
content_type = response.headers.get("Content-Type", "")
if self._is_binary_content_type(content_type) and not _is_document_url(url):
return self._handle_download_by_content_type(url, response, content_type)
response.encoding = self._detect_encoding(response)
html = response.text
title = self._extract_title(html)
text = self._extract_text(html)
return ToolResult.success(f"Title: {title}\n\nContent:\n{text}")
# ---- Document fetching ----
def _fetch_document(self, url: str) -> ToolResult:
"""Download a document file and extract its text content."""
suffix = _get_url_suffix(url)
parsed = urlparse(url)
filename = self._extract_filename(url)
tmp_dir = self._ensure_tmp_dir()
local_path = os.path.join(tmp_dir, filename)
logger.info(f"[WebFetch] Downloading document: {url} -> {local_path}")
try:
response = requests.get(
url,
headers=DEFAULT_HEADERS,
timeout=DEFAULT_TIMEOUT,
stream=True,
allow_redirects=True,
)
response.raise_for_status()
content_length = int(response.headers.get("Content-Length", 0))
if content_length > MAX_FILE_SIZE:
return ToolResult.fail(
f"Error: File too large ({format_size(content_length)} > {format_size(MAX_FILE_SIZE)})"
)
downloaded = 0
with open(local_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
downloaded += len(chunk)
if downloaded > MAX_FILE_SIZE:
f.close()
os.remove(local_path)
return ToolResult.fail(
f"Error: File too large (>{format_size(MAX_FILE_SIZE)}), download aborted"
)
f.write(chunk)
except requests.Timeout:
return ToolResult.fail(f"Error: Download timed out after {DEFAULT_TIMEOUT}s")
except requests.ConnectionError:
return ToolResult.fail(f"Error: Failed to connect to {parsed.netloc}")
except requests.HTTPError as e:
return ToolResult.fail(f"Error: HTTP {e.response.status_code} for URL: {url}")
except Exception as e:
self._cleanup_file(local_path)
return ToolResult.fail(f"Error: Failed to download file: {e}")
try:
text = self._parse_document(local_path, suffix)
except Exception as e:
self._cleanup_file(local_path)
return ToolResult.fail(f"Error: Failed to parse document: {e}")
if not text or not text.strip():
file_size = os.path.getsize(local_path)
return ToolResult.success(
f"File downloaded to: {local_path} ({format_size(file_size)})\n"
f"No text content could be extracted. The file may contain only images or be encrypted."
)
truncation = truncate_head(text)
result_text = truncation.content
file_size = os.path.getsize(local_path)
header = f"[Document: {filename} | Size: {format_size(file_size)} | Saved to: {local_path}]\n\n"
if truncation.truncated:
header += f"[Content truncated: showing {truncation.output_lines} of {truncation.total_lines} lines]\n\n"
return ToolResult.success(header + result_text)
def _parse_document(self, file_path: str, suffix: str) -> str:
"""Parse document file and return extracted text."""
if suffix in PDF_SUFFIXES:
return self._parse_pdf(file_path)
elif suffix in WORD_SUFFIXES:
return self._parse_word(file_path)
elif suffix in TEXT_SUFFIXES:
return self._parse_text(file_path)
elif suffix in SPREADSHEET_SUFFIXES:
return self._parse_spreadsheet(file_path)
elif suffix in PPT_SUFFIXES:
return self._parse_ppt(file_path)
else:
return self._parse_text(file_path)
def _parse_pdf(self, file_path: str) -> str:
"""Extract text from PDF using pypdf."""
try:
from pypdf import PdfReader
except ImportError:
raise ImportError("pypdf library is required for PDF parsing. Install with: pip install pypdf")
reader = PdfReader(file_path)
text_parts = []
for page_num, page in enumerate(reader.pages, 1):
page_text = page.extract_text()
if page_text and page_text.strip():
text_parts.append(f"--- Page {page_num}/{len(reader.pages)} ---\n{page_text}")
return "\n\n".join(text_parts)
def _parse_word(self, file_path: str) -> str:
"""Extract text from Word documents (.docx)."""
try:
from docx import Document
except ImportError:
raise ImportError(
"python-docx library is required for .docx parsing. Install with: pip install python-docx"
)
doc = Document(file_path)
paragraphs = [p.text for p in doc.paragraphs if p.text.strip()]
return "\n\n".join(paragraphs)
def _parse_text(self, file_path: str) -> str:
"""Read plain text files (txt, md, csv, etc.)."""
encodings = ["utf-8", "utf-8-sig", "gbk", "gb2312", "latin-1"]
for enc in encodings:
try:
with open(file_path, "r", encoding=enc) as f:
return f.read()
except (UnicodeDecodeError, UnicodeError):
continue
raise ValueError(f"Unable to decode file with any supported encoding: {encodings}")
def _parse_spreadsheet(self, file_path: str) -> str:
"""Extract text from Excel files (.xls/.xlsx)."""
try:
import openpyxl
except ImportError:
raise ImportError(
"openpyxl library is required for .xlsx parsing. Install with: pip install openpyxl"
)
wb = openpyxl.load_workbook(file_path, read_only=True, data_only=True)
result_parts = []
for sheet_name in wb.sheetnames:
ws = wb[sheet_name]
rows = []
for row in ws.iter_rows(values_only=True):
cells = [str(c) if c is not None else "" for c in row]
if any(cells):
rows.append(" | ".join(cells))
if rows:
result_parts.append(f"--- Sheet: {sheet_name} ---\n" + "\n".join(rows))
wb.close()
return "\n\n".join(result_parts)
def _parse_ppt(self, file_path: str) -> str:
"""Extract text from PowerPoint files (.ppt/.pptx)."""
try:
from pptx import Presentation
except ImportError:
raise ImportError(
"python-pptx library is required for .pptx parsing. Install with: pip install python-pptx"
)
prs = Presentation(file_path)
text_parts = []
for slide_num, slide in enumerate(prs.slides, 1):
slide_texts = []
for shape in slide.shapes:
if shape.has_text_frame:
for paragraph in shape.text_frame.paragraphs:
text = paragraph.text.strip()
if text:
slide_texts.append(text)
if slide_texts:
text_parts.append(f"--- Slide {slide_num}/{len(prs.slides)} ---\n" + "\n".join(slide_texts))
return "\n\n".join(text_parts)
# ---- Encoding detection ----
@staticmethod
def _detect_encoding(response: requests.Response) -> str:
"""Detect response encoding with priority: Content-Type header > HTML meta > chardet > utf-8."""
# 1. Check Content-Type header for explicit charset
content_type = response.headers.get("Content-Type", "")
charset = _extract_charset_from_content_type(content_type)
if charset:
return charset
# 2. Scan raw bytes for HTML meta charset declaration
raw = response.content[:4096]
charset = _extract_charset_from_html_meta(raw)
if charset:
return charset
# 3. Use apparent_encoding (chardet-based detection) if confident enough
apparent = response.apparent_encoding
if apparent:
apparent_lower = apparent.lower()
# Trust CJK / Windows encodings detected by chardet
trusted_prefixes = ("utf", "gb", "big5", "euc", "shift_jis", "iso-2022", "windows", "ascii")
if any(apparent_lower.startswith(p) for p in trusted_prefixes):
return apparent
# 4. Fallback
return "utf-8"
# ---- Helper methods ----
def _ensure_tmp_dir(self) -> str:
"""Ensure workspace/tmp directory exists and return its path."""
tmp_dir = os.path.join(self.cwd, "tmp")
os.makedirs(tmp_dir, exist_ok=True)
return tmp_dir
def _extract_filename(self, url: str) -> str:
"""Extract a safe filename from URL, with a short UUID prefix to avoid collisions."""
path = urlparse(url).path
basename = os.path.basename(unquote(path))
if not basename or basename == "/":
basename = "downloaded_file"
# Sanitize: keep only safe chars
basename = re.sub(r'[^\w.\-]', '_', basename)
short_id = uuid.uuid4().hex[:8]
return f"{short_id}_{basename}"
@staticmethod
def _cleanup_file(path: str):
"""Remove a file if it exists, ignoring errors."""
try:
if os.path.exists(path):
os.remove(path)
except Exception:
pass
@staticmethod
def _is_binary_content_type(content_type: str) -> bool:
"""Check if Content-Type indicates a binary/document response."""
binary_types = [
"application/pdf",
"application/vnd.openxmlformats",
"application/vnd.ms-excel",
"application/vnd.ms-powerpoint",
"application/octet-stream",
]
ct_lower = content_type.lower()
return any(bt in ct_lower for bt in binary_types)
def _handle_download_by_content_type(self, url: str, response: requests.Response, content_type: str) -> ToolResult:
"""Handle a URL that returned binary content instead of HTML."""
ct_lower = content_type.lower()
suffix_map = {
"application/pdf": ".pdf",
"application/vnd.openxmlformats-officedocument.wordprocessingml": ".docx",
"application/vnd.ms-excel": ".xls",
"application/vnd.openxmlformats-officedocument.spreadsheetml": ".xlsx",
"application/vnd.ms-powerpoint": ".ppt",
"application/vnd.openxmlformats-officedocument.presentationml": ".pptx",
}
detected_suffix = None
for ct_prefix, ext in suffix_map.items():
if ct_prefix in ct_lower:
detected_suffix = ext
break
if detected_suffix and detected_suffix in ALL_DOC_SUFFIXES:
# Re-fetch as document
return self._fetch_document(url if _get_url_suffix(url) in ALL_DOC_SUFFIXES
else self._rewrite_url_with_suffix(url, detected_suffix))
return ToolResult.fail(f"Error: URL returned binary content ({content_type}), not a supported document type")
@staticmethod
def _rewrite_url_with_suffix(url: str, suffix: str) -> str:
"""Append a suffix to the URL path so _get_url_suffix works correctly."""
parsed = urlparse(url)
new_path = parsed.path.rstrip("/") + suffix
return parsed._replace(path=new_path).geturl()
# ---- HTML extraction (unchanged) ----
@staticmethod
def _extract_title(html: str) -> str:
match = re.search(r"<title[^>]*>(.*?)</title>", html, re.IGNORECASE | re.DOTALL)
return match.group(1).strip() if match else "Untitled"
@staticmethod
def _extract_text(html: str) -> str:
text = re.sub(r"<script[^>]*>.*?</script>", "", html, flags=re.IGNORECASE | re.DOTALL)
text = re.sub(r"<style[^>]*>.*?</style>", "", text, flags=re.IGNORECASE | re.DOTALL)
text = re.sub(r"<[^>]+>", "", text)
text = text.replace("&amp;", "&").replace("&lt;", "<").replace("&gt;", ">")
text = text.replace("&quot;", '"').replace("&#39;", "'").replace("&nbsp;", " ")
text = re.sub(r"[^\S\n]+", " ", text)
text = re.sub(r"\n{3,}", "\n\n", text)
lines = [line.strip() for line in text.splitlines()]
text = "\n".join(lines)
return text.strip()

View File

@@ -0,0 +1,3 @@
from agent.tools.web_search.web_search import WebSearch
__all__ = ["WebSearch"]

View File

@@ -0,0 +1,487 @@
"""Web Search tool. Supports four backends with a unified response format:
- bocha (https://open.bochaai.com)
- zhipu (https://docs.bigmodel.cn/cn/guide/tools/web-search)
- qianfan (https://cloud.baidu.com/doc/qianfan/s/2mh4su4uy)
- linkai (https://link-ai.tech, fallback)
Provider selection
- strategy 'auto' (default): pick the first configured provider in the
canonical order [bocha, zhipu, qianfan, linkai]. When the caller passes
an explicit `provider` it overrides the pick; an invalid/unconfigured
one silently falls back to the auto order.
- strategy 'fixed': use the configured provider; if its credential is
missing at call time, silently fall back to auto order (no card hint).
Credentials
- bocha : tools.web_search.bocha_api_key -> env BOCHA_API_KEY
- zhipu : conf.zhipu_ai_api_key -> env ZHIPUAI_API_KEY
- qianfan : conf.qianfan_api_key -> env QIANFAN_API_KEY
- linkai : conf.linkai_api_key -> env LINKAI_API_KEY
"""
import json
import os
from typing import Any, Dict, List, Optional
import requests
from agent.tools.base_tool import BaseTool, ToolResult
from common.log import logger
from config import conf
DEFAULT_TIMEOUT = 30
# Canonical fallback order. Empirically ordered by Chinese real-time
# quality + relevance: bocha (best overall), qianfan (best for hot news),
# zhipu (strong on long-form articles), linkai (cloud aggregator, last
# resort).
PROVIDER_ORDER = ("bocha", "qianfan", "zhipu", "linkai")
PROVIDER_LABELS = {
"bocha": "Bocha",
"zhipu": "Zhipu",
"qianfan": "Baidu Qianfan",
"linkai": "LinkAI",
}
def _tools_web_search_conf() -> dict:
"""Return the tools.web_search config block (dict-like)."""
tools_cfg = conf().get("tools") or {}
if not isinstance(tools_cfg, dict):
return {}
block = tools_cfg.get("web_search") or {}
return block if isinstance(block, dict) else {}
def _get_api_key(provider: str) -> str:
"""Resolve API key for a provider, with conf -> env fallback."""
if provider == "bocha":
key = (_tools_web_search_conf().get("bocha_api_key") or "").strip()
return key or os.environ.get("BOCHA_API_KEY", "").strip()
if provider == "zhipu":
key = (conf().get("zhipu_ai_api_key") or "").strip()
return key or os.environ.get("ZHIPUAI_API_KEY", "").strip()
if provider == "qianfan":
key = (conf().get("qianfan_api_key") or "").strip()
return key or os.environ.get("QIANFAN_API_KEY", "").strip()
if provider == "linkai":
key = (conf().get("linkai_api_key") or "").strip()
return key or os.environ.get("LINKAI_API_KEY", "").strip()
return ""
def configured_providers() -> List[str]:
"""Return configured providers in canonical order."""
return [p for p in PROVIDER_ORDER if _get_api_key(p)]
def _configured_strategy() -> str:
return (_tools_web_search_conf().get("strategy") or "auto").strip().lower()
def _configured_provider() -> str:
return (_tools_web_search_conf().get("provider") or "").strip().lower()
class WebSearch(BaseTool):
"""Tool for searching the web across multiple providers."""
name: str = "web_search"
description: str = "Search the web for real-time information. Returns titles, URLs, and snippets."
params: dict = {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query string"
},
"count": {
"type": "integer",
"description": "Number of results to return (1-50, default: 10)"
},
"freshness": {
"type": "string",
"description": (
"Time range filter. Options: "
"'noLimit' (default), 'oneDay', 'oneWeek', 'oneMonth', 'oneYear', "
"or date range like '2025-01-01..2025-02-01'"
)
},
"summary": {
"type": "boolean",
"description": "Whether to include text summary for each result (default: false)"
}
},
"required": ["query"]
}
def __init__(self, config: dict = None):
self.config = config or {}
@staticmethod
def is_available() -> bool:
"""Tool is offered to the agent when at least one provider has a key."""
return bool(configured_providers())
@classmethod
def get_json_schema(cls) -> dict:
"""Augment the static schema with a `provider` field — only when the
user has ≥2 providers configured AND strategy is 'auto'. Otherwise
the backend picks silently and exposing the field would only waste
the agent's tokens."""
schema = {
"name": cls.name,
"description": cls.description,
"parameters": json.loads(json.dumps(cls.params)), # deep copy
}
if _configured_strategy() != "auto":
return schema
available = configured_providers()
if len(available) < 2:
return schema
schema["parameters"]["properties"]["provider"] = {
"type": "string",
"enum": available,
"description": "Optional. Specifies the search backend. You may switch between providers when the user wants results from a particular source or from multiple sources.",
}
return schema
# ------------------------------------------------------------------
# Provider resolution
# ------------------------------------------------------------------
def _resolve_provider(self, requested: Optional[str]) -> Optional[str]:
"""Pick a provider for this call.
Priority: caller-supplied (if configured) > fixed strategy (if
configured) > first configured in PROVIDER_ORDER. Silent fallback
when the desired one has no key.
"""
available = configured_providers()
if not available:
return None
if requested:
req = requested.strip().lower()
if req in available:
return req
logger.warning(f"[WebSearch] requested provider '{requested}' unavailable, falling back")
if _configured_strategy() == "fixed":
pinned = _configured_provider()
if pinned in available:
return pinned
if pinned:
logger.warning(f"[WebSearch] pinned provider '{pinned}' unavailable, falling back to auto")
return available[0]
@staticmethod
def _resolution_reason(requested: Optional[str], chosen: str) -> str:
"""Human-readable explanation for why `chosen` won the resolver."""
if requested and requested.strip().lower() == chosen:
return "caller-requested"
strategy = _configured_strategy()
if strategy == "fixed" and _configured_provider() == chosen:
return "fixed-strategy"
return "auto-fallback"
# ------------------------------------------------------------------
# Entry point
# ------------------------------------------------------------------
def execute(self, args: Dict[str, Any]) -> ToolResult:
query = (args.get("query") or "").strip()
if not query:
return ToolResult.fail("Error: 'query' parameter is required")
count = args.get("count", 10)
freshness = args.get("freshness", "noLimit")
summary = args.get("summary", False)
if not isinstance(count, int) or count < 1 or count > 50:
count = 10
requested = args.get("provider")
provider = self._resolve_provider(requested)
if not provider:
return ToolResult.fail(
"Error: No search provider configured. "
"Configure one of BOCHA_API_KEY / zhipu_ai_api_key / qianfan_api_key / linkai_api_key."
)
# Always log the routing decision so multi-provider deployments can
# tell at a glance which backend served any given query.
available = configured_providers()
reason = self._resolution_reason(requested, provider)
q_preview = query if len(query) <= 60 else (query[:57] + "...")
logger.info(
f"[WebSearch] provider={provider} reason={reason} "
f"available={list(available)} query={q_preview!r} count={count} freshness={freshness}"
)
try:
if provider == "bocha":
return self._search_bocha(query, count, freshness, summary)
if provider == "zhipu":
return self._search_zhipu(query, count, freshness)
if provider == "qianfan":
return self._search_qianfan(query, count, freshness)
if provider == "linkai":
return self._search_linkai(query, count, freshness)
return ToolResult.fail(f"Error: Unknown provider '{provider}'")
except requests.Timeout:
return ToolResult.fail(f"Error: Search request timed out after {DEFAULT_TIMEOUT}s")
except requests.ConnectionError:
return ToolResult.fail("Error: Failed to connect to search API")
except Exception as e:
logger.error(f"[WebSearch] Unexpected error ({provider}): {e}", exc_info=True)
return ToolResult.fail(f"Error: Search failed - {str(e)}")
# ------------------------------------------------------------------
# Bocha
# ------------------------------------------------------------------
def _search_bocha(self, query: str, count: int, freshness: str, summary: bool) -> ToolResult:
api_key = _get_api_key("bocha")
url = "https://api.bochaai.com/v1/web-search"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"Accept": "application/json",
}
payload = {"query": query, "count": count, "freshness": freshness, "summary": summary}
logger.debug(f"[WebSearch] bocha: query='{query}', count={count}")
resp = requests.post(url, headers=headers, json=payload, timeout=DEFAULT_TIMEOUT)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid bocha API key.")
if resp.status_code == 403:
return ToolResult.fail("Error: bocha API — insufficient balance. Top up at https://open.bochaai.com")
if resp.status_code == 429:
return ToolResult.fail("Error: bocha API rate limit reached.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: bocha API returned HTTP {resp.status_code}")
data = resp.json()
api_code = data.get("code")
if api_code is not None and api_code != 200:
msg = data.get("msg") or "Unknown error"
return ToolResult.fail(f"Error: bocha API error (code={api_code}): {msg}")
pages = (data.get("data") or {}).get("webPages", {}).get("value", []) or []
results = []
for p in pages:
item = {
"title": p.get("name", ""),
"url": p.get("url", ""),
"snippet": p.get("snippet", ""),
"siteName": p.get("siteName", ""),
"datePublished": p.get("datePublished") or p.get("dateLastCrawled", ""),
}
if p.get("summary"):
item["summary"] = p["summary"]
results.append(item)
total = (data.get("data") or {}).get("webPages", {}).get("totalEstimatedMatches", len(results))
return ToolResult.success({
"query": query, "backend": "bocha",
"total": total, "count": len(results), "results": results,
})
# ------------------------------------------------------------------
# Zhipu
# ------------------------------------------------------------------
def _search_zhipu(self, query: str, count: int, freshness: str) -> ToolResult:
api_key = _get_api_key("zhipu")
api_base = (conf().get("zhipu_ai_api_base") or "https://open.bigmodel.cn/api/paas/v4").rstrip("/")
url = f"{api_base}/web_search"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
# Zhipu Web Search expects `search_query` <= 70 chars; truncate
# gracefully so a long agent-supplied query doesn't get rejected.
trimmed_query = (query or "")[:70]
engine = (_tools_web_search_conf().get("zhipu_search_engine") or "search_pro").strip().lower()
if engine not in ("search_std", "search_pro", "search_pro_sogou", "search_pro_quark"):
engine = "search_pro"
payload: Dict[str, Any] = {
"search_engine": engine,
"search_query": trimmed_query,
"search_intent": False,
"count": max(1, min(int(count or 10), 50)),
"search_recency_filter": freshness if freshness in (
"oneDay", "oneWeek", "oneMonth", "oneYear", "noLimit"
) else "noLimit",
}
content_size = (_tools_web_search_conf().get("zhipu_content_size") or "").strip().lower()
if content_size in ("medium", "high"):
payload["content_size"] = content_size
logger.debug(f"[WebSearch] zhipu: query='{trimmed_query}', count={payload['count']}, engine={engine}")
resp = requests.post(url, headers=headers, json=payload, timeout=DEFAULT_TIMEOUT)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid Zhipu API key.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: Zhipu API returned HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
# Business-level errors (1701/1702/1703 etc.) come back as
# {"error": {"code","message"}} even on HTTP 200.
if isinstance(data, dict) and data.get("error"):
err = data["error"] or {}
return ToolResult.fail(f"Error: Zhipu returned {err.get('code')}: {err.get('message','')}")
items = data.get("search_result") or (data.get("data") or {}).get("search_result") or []
results = []
for it in items:
results.append({
"title": it.get("title", ""),
"url": it.get("link") or it.get("url", ""),
"snippet": it.get("content") or it.get("snippet", ""),
"siteName": it.get("media") or it.get("siteName", ""),
"datePublished": it.get("publish_date") or it.get("datePublished", ""),
})
return ToolResult.success({
"query": query, "backend": "zhipu",
"total": len(results), "count": len(results), "results": results,
})
# ------------------------------------------------------------------
# Qianfan (Baidu)
# ------------------------------------------------------------------
def _search_qianfan(self, query: str, count: int, freshness: str) -> ToolResult:
api_key = _get_api_key("qianfan")
api_base = (conf().get("qianfan_api_base") or "https://qianfan.baidubce.com/v2").rstrip("/")
url = f"{api_base}/ai_search/web_search"
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"X-Appbuilder-From": "cow",
}
count = max(1, min(int(count or 10), 50))
payload: Dict[str, Any] = {
"messages": [{"role": "user", "content": query}],
"search_source": "baidu_search_v2",
"resource_type_filter": [{"type": "web", "top_k": count}],
}
# Baidu AI Search expects freshness as a date-range filter, not a
# named recency token. Translate our shared vocabulary into the
# underlying page_time range expected by the API.
search_filter = self._qianfan_build_freshness_filter(freshness)
if search_filter:
payload["search_filter"] = search_filter
logger.debug(f"[WebSearch] qianfan: query='{query}', count={count}, freshness={freshness!r}")
resp = requests.post(url, headers=headers, json=payload, timeout=DEFAULT_TIMEOUT)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid Qianfan API key.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: Qianfan API returned HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
# Even on HTTP 200 Baidu surfaces business errors as {"code","message"}.
if isinstance(data, dict) and data.get("code"):
return ToolResult.fail(f"Error: Qianfan returned {data.get('code')}: {data.get('message','')}")
refs = data.get("references") or []
results = []
for d in refs:
results.append({
"title": d.get("title", ""),
"url": d.get("url", ""),
"snippet": (d.get("content") or "")[:200],
"siteName": d.get("web_anchor") or d.get("website") or "",
"datePublished": d.get("date", ""),
})
return ToolResult.success({
"query": query, "backend": "qianfan",
"total": len(results), "count": len(results), "results": results,
})
@staticmethod
def _qianfan_build_freshness_filter(freshness: str) -> Optional[Dict[str, Any]]:
if not freshness or freshness == "noLimit":
return None
delta_days = {"oneDay": 1, "oneWeek": 7, "oneMonth": 30, "oneYear": 365}.get(freshness)
if not delta_days:
return None
from datetime import datetime, timedelta
now = datetime.now()
end_date = (now + timedelta(days=1)).strftime("%Y-%m-%d")
start_date = (now - timedelta(days=delta_days)).strftime("%Y-%m-%d")
return {"range": {"page_time": {"gte": start_date, "lt": end_date}}}
# ------------------------------------------------------------------
# LinkAI (plugin)
# ------------------------------------------------------------------
def _search_linkai(self, query: str, count: int, freshness: str) -> ToolResult:
api_key = _get_api_key("linkai")
api_base = (conf().get("linkai_api_base") or "https://api.link-ai.tech").rstrip("/")
url = f"{api_base}/v1/plugin/execute"
from common.utils import get_cloud_headers
headers = get_cloud_headers(api_key)
payload = {"code": "web-search", "args": {"query": query, "count": count, "freshness": freshness}}
logger.debug(f"[WebSearch] linkai: query='{query}', count={count}")
resp = requests.post(url, headers=headers, json=payload, timeout=DEFAULT_TIMEOUT)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid LinkAI API key.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: LinkAI API returned HTTP {resp.status_code}")
data = resp.json()
if not data.get("success"):
msg = data.get("message") or "Unknown error"
return ToolResult.fail(f"Error: LinkAI search failed: {msg}")
raw = data.get("data", "")
if isinstance(raw, str):
try:
raw = json.loads(raw)
except (json.JSONDecodeError, TypeError):
return ToolResult.success({
"query": query, "backend": "linkai",
"total": 1, "count": 1, "results": [{"content": raw}],
})
if isinstance(raw, dict):
pages = (raw.get("webPages") or {}).get("value", []) or []
if pages:
results = []
for p in pages:
item = {
"title": p.get("name", ""),
"url": p.get("url", ""),
"snippet": p.get("snippet", ""),
"siteName": p.get("siteName", ""),
"datePublished": p.get("datePublished") or p.get("dateLastCrawled", ""),
}
if p.get("summary"):
item["summary"] = p["summary"]
results.append(item)
total = (raw.get("webPages") or {}).get("totalEstimatedMatches", len(results))
return ToolResult.success({
"query": query, "backend": "linkai",
"total": total, "count": len(results), "results": results,
})
return ToolResult.success({
"query": query, "backend": "linkai",
"total": 1, "count": 1, "results": [{"content": str(raw)}],
})

View File

@@ -0,0 +1,3 @@
from .write import Write
__all__ = ['Write']

View File

@@ -0,0 +1,97 @@
"""
Write tool - Write file content
Creates or overwrites files, automatically creates parent directories
"""
import os
from typing import Dict, Any
from pathlib import Path
from agent.tools.base_tool import BaseTool, ToolResult
from common.utils import expand_path
class Write(BaseTool):
"""Tool for writing file content"""
name: str = "write"
description: str = "Write content to a file. Creates the file if it doesn't exist, overwrites if it does. Automatically creates parent directories. IMPORTANT: Single write should not exceed 10KB. For large files, create a skeleton first, then use edit to add content in chunks."
params: dict = {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "Path to the file to write (relative or absolute)"
},
"content": {
"type": "string",
"description": "Content to write to the file"
}
},
"required": ["path", "content"]
}
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
self.memory_manager = self.config.get("memory_manager", None)
def execute(self, args: Dict[str, Any]) -> ToolResult:
"""
Execute file write operation
:param args: Contains file path and content
:return: Operation result
"""
path = args.get("path", "").strip()
content = args.get("content", "")
if not path:
return ToolResult.fail("Error: path parameter is required")
# Resolve path
absolute_path = self._resolve_path(path)
try:
# Create parent directory (if needed)
parent_dir = os.path.dirname(absolute_path)
if parent_dir:
os.makedirs(parent_dir, exist_ok=True)
# Write file
with open(absolute_path, 'w', encoding='utf-8') as f:
f.write(content)
# Get bytes written
bytes_written = len(content.encode('utf-8'))
# Auto-sync to memory database if this is a memory file
if self.memory_manager and 'memory/' in path:
self.memory_manager.mark_dirty()
result = {
"message": f"Successfully wrote {bytes_written} bytes to {path}",
"path": path,
"bytes_written": bytes_written
}
return ToolResult.success(result)
except PermissionError:
return ToolResult.fail(f"Error: Permission denied writing to {path}")
except Exception as e:
return ToolResult.fail(f"Error writing file: {str(e)}")
def _resolve_path(self, path: str) -> str:
"""
Resolve path to absolute path
:param path: Relative or absolute path
:return: Absolute path
"""
# Expand ~ to user home directory
path = expand_path(path)
if os.path.isabs(path):
return path
return os.path.abspath(os.path.join(self.cwd, path))

369
app.py
View File

@@ -1,24 +1,339 @@
# encoding:utf-8
import os
from config import conf, load_config
from channel import channel_factory
from common.log import logger
from plugins import *
import signal
import sys
import time
from channel import channel_factory
from common import const
from common.log import logger
from config import load_config, conf
from plugins import *
import threading
_channel_mgr = None
def get_channel_manager():
return _channel_mgr
def _parse_channel_type(raw) -> list:
"""
Parse channel_type config value into a list of channel names.
Supports:
- single string: "feishu"
- comma-separated string: "feishu, dingtalk"
- list: ["feishu", "dingtalk"]
"""
if isinstance(raw, list):
return [ch.strip() for ch in raw if ch.strip()]
if isinstance(raw, str):
return [ch.strip() for ch in raw.split(",") if ch.strip()]
return []
class ChannelManager:
"""
Manage the lifecycle of multiple channels running concurrently.
Each channel.startup() runs in its own daemon thread.
The web channel is started as default console unless explicitly disabled.
"""
def __init__(self):
self._channels = {} # channel_name -> channel instance
self._threads = {} # channel_name -> thread
self._primary_channel = None
self._lock = threading.Lock()
self.cloud_mode = False # set to True when cloud client is active
@property
def channel(self):
"""Return the primary (first non-web) channel for backward compatibility."""
return self._primary_channel
def get_channel(self, channel_name: str):
return self._channels.get(channel_name)
def start(self, channel_names: list, first_start: bool = False):
"""
Create and start one or more channels in sub-threads.
If first_start is True, plugins and linkai client will also be initialized.
"""
with self._lock:
channels = []
for name in channel_names:
ch = channel_factory.create_channel(name)
ch.cloud_mode = self.cloud_mode
self._channels[name] = ch
channels.append((name, ch))
if self._primary_channel is None and name != "web":
self._primary_channel = ch
if self._primary_channel is None and channels:
self._primary_channel = channels[0][1]
if first_start:
PluginManager().load_plugins()
# Cloud client is optional. It is only started when
# use_linkai=True AND cloud_deployment_id is set.
# By default neither is configured, so the app runs
# entirely locally without any remote connection.
if conf().get("use_linkai") and (
os.environ.get("CLOUD_DEPLOYMENT_ID") or conf().get("cloud_deployment_id")
):
try:
from common import cloud_client
threading.Thread(
target=cloud_client.start,
args=(self._primary_channel, self),
daemon=True,
).start()
except Exception:
pass
# Start web console first so its logs print cleanly,
# then start remaining channels after a brief pause.
web_entry = None
other_entries = []
for entry in channels:
if entry[0] == "web":
web_entry = entry
else:
other_entries.append(entry)
ordered = ([web_entry] if web_entry else []) + other_entries
for i, (name, ch) in enumerate(ordered):
if i > 0 and name != "web":
time.sleep(0.1)
t = threading.Thread(target=self._run_channel, args=(name, ch), daemon=True)
self._threads[name] = t
t.start()
logger.debug(f"[ChannelManager] Channel '{name}' started in sub-thread")
def _run_channel(self, name: str, channel):
try:
channel.startup()
except Exception as e:
logger.error(f"[ChannelManager] Channel '{name}' startup error: {e}")
logger.exception(e)
def stop(self, channel_name: str = None):
"""
Stop channel(s). If channel_name is given, stop only that channel;
otherwise stop all channels.
"""
# Pop under lock, then stop outside lock to avoid deadlock
with self._lock:
names = [channel_name] if channel_name else list(self._channels.keys())
to_stop = []
for name in names:
ch = self._channels.pop(name, None)
th = self._threads.pop(name, None)
to_stop.append((name, ch, th))
if channel_name and self._primary_channel is self._channels.get(channel_name):
self._primary_channel = None
for name, ch, th in to_stop:
if ch is None:
logger.warning(f"[ChannelManager] Channel '{name}' not found in managed channels")
if th and th.is_alive():
self._interrupt_thread(th, name)
continue
logger.info(f"[ChannelManager] Stopping channel '{name}'...")
graceful = False
if hasattr(ch, 'stop'):
try:
ch.stop()
graceful = True
except Exception as e:
logger.warning(f"[ChannelManager] Error during channel '{name}' stop: {e}")
if th and th.is_alive():
th.join(timeout=5)
if th.is_alive():
if graceful:
logger.info(f"[ChannelManager] Channel '{name}' thread still alive after stop(), "
"leaving daemon thread to finish on its own")
else:
logger.warning(f"[ChannelManager] Channel '{name}' thread did not exit in 5s, forcing interrupt")
self._interrupt_thread(th, name)
@staticmethod
def _interrupt_thread(th: threading.Thread, name: str):
"""Raise SystemExit in target thread to break blocking loops like start_forever."""
import ctypes
try:
tid = th.ident
if tid is None:
return
res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_ulong(tid), ctypes.py_object(SystemExit)
)
if res == 1:
logger.info(f"[ChannelManager] Interrupted thread for channel '{name}'")
elif res > 1:
ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_ulong(tid), None)
logger.warning(f"[ChannelManager] Failed to interrupt thread for channel '{name}'")
except Exception as e:
logger.warning(f"[ChannelManager] Thread interrupt error for '{name}': {e}")
def restart(self, new_channel_name: str):
"""
Restart a single channel with a new channel type.
Can be called from any thread (e.g. linkai config callback).
"""
logger.info(f"[ChannelManager] Restarting channel to '{new_channel_name}'...")
self.stop(new_channel_name)
_clear_singleton_cache(new_channel_name)
time.sleep(1)
self.start([new_channel_name], first_start=False)
logger.info(f"[ChannelManager] Channel restarted to '{new_channel_name}' successfully")
def add_channel(self, channel_name: str):
"""
Dynamically add and start a new channel.
If the channel is already running, restart it instead.
"""
with self._lock:
if channel_name in self._channels:
logger.info(f"[ChannelManager] Channel '{channel_name}' already exists, restarting")
if self._channels.get(channel_name):
self.restart(channel_name)
return
logger.info(f"[ChannelManager] Adding channel '{channel_name}'...")
_clear_singleton_cache(channel_name)
self.start([channel_name], first_start=False)
logger.info(f"[ChannelManager] Channel '{channel_name}' added successfully")
def remove_channel(self, channel_name: str):
"""
Dynamically stop and remove a running channel.
"""
with self._lock:
if channel_name not in self._channels:
logger.warning(f"[ChannelManager] Channel '{channel_name}' not found, nothing to remove")
return
logger.info(f"[ChannelManager] Removing channel '{channel_name}'...")
self.stop(channel_name)
logger.info(f"[ChannelManager] Channel '{channel_name}' removed successfully")
def _clear_singleton_cache(channel_name: str):
"""
Clear the singleton cache for the channel class so that
a new instance can be created with updated config.
"""
cls_map = {
"web": "channel.web.web_channel.WebChannel",
"wechatmp": "channel.wechatmp.wechatmp_channel.WechatMPChannel",
"wechatmp_service": "channel.wechatmp.wechatmp_channel.WechatMPChannel",
"wechatcom_app": "channel.wechatcom.wechatcomapp_channel.WechatComAppChannel",
const.WECHAT_KF: "channel.wechat_kf.wechat_kf_channel.WechatKfChannel",
const.FEISHU: "channel.feishu.feishu_channel.FeiShuChanel",
const.DINGTALK: "channel.dingtalk.dingtalk_channel.DingTalkChanel",
const.WECOM_BOT: "channel.wecom_bot.wecom_bot_channel.WecomBotChannel",
const.QQ: "channel.qq.qq_channel.QQChannel",
const.WEIXIN: "channel.weixin.weixin_channel.WeixinChannel",
"wx": "channel.weixin.weixin_channel.WeixinChannel",
}
module_path = cls_map.get(channel_name)
if not module_path:
return
try:
parts = module_path.rsplit(".", 1)
module_name, class_name = parts[0], parts[1]
import importlib
module = importlib.import_module(module_name)
wrapper = getattr(module, class_name, None)
if wrapper and hasattr(wrapper, '__closure__') and wrapper.__closure__:
for cell in wrapper.__closure__:
try:
cell_contents = cell.cell_contents
if isinstance(cell_contents, dict):
cell_contents.clear()
logger.debug(f"[ChannelManager] Cleared singleton cache for {class_name}")
break
except ValueError:
pass
except Exception as e:
logger.warning(f"[ChannelManager] Failed to clear singleton cache: {e}")
def sigterm_handler_wrap(_signo):
old_handler = signal.getsignal(_signo)
def func(_signo, _stack_frame):
logger.info("signal {} received, exiting...".format(_signo))
conf().save_user_datas()
if callable(old_handler): # check old_handler
if callable(old_handler): # check old_handler
return old_handler(_signo, _stack_frame)
sys.exit(0)
signal.signal(_signo, func)
def _warmup_mcp_tools():
"""
Kick off MCP server loading at process startup so subprocesses
(npx / uvx etc.) finish initializing before the first user message
arrives. Returns immediately — the actual work happens on a daemon
thread inside ToolManager. Safe to call when MCP is not configured.
"""
try:
from agent.tools import ToolManager
ToolManager()._load_mcp_tools()
except Exception as e:
logger.warning(f"[App] MCP warmup failed (non-fatal): {e}")
def _warmup_scheduler():
"""Eager-init AgentBridge so the scheduler thread starts at process
boot rather than waiting for the first user message."""
try:
from bridge.bridge import Bridge
Bridge().get_agent_bridge()
except Exception as e:
logger.warning(f"[App] Scheduler warmup failed: {e}")
def _sync_builtin_skills():
"""Sync builtin skills from project skills/ to workspace skills/ on startup."""
import shutil
try:
workspace = conf().get("agent_workspace", "~/cow")
workspace = os.path.expanduser(workspace)
project_root = os.path.dirname(os.path.abspath(__file__))
builtin_dir = os.path.join(project_root, "skills")
custom_dir = os.path.join(workspace, "skills")
if not os.path.isdir(builtin_dir):
return
os.makedirs(custom_dir, exist_ok=True)
synced = 0
for name in os.listdir(builtin_dir):
src = os.path.join(builtin_dir, name)
if not os.path.isdir(src) or not os.path.isfile(os.path.join(src, "SKILL.md")):
continue
dst = os.path.join(custom_dir, name)
try:
if os.path.isdir(dst):
shutil.rmtree(dst)
shutil.copytree(src, dst)
synced += 1
except Exception as e:
logger.warning(f"[App] Failed to sync builtin skill '{name}': {e}")
if synced:
logger.info(f"[App] Synced {synced} builtin skill(s) to workspace")
except Exception as e:
logger.warning(f"[App] Builtin skills sync failed: {e}")
def run():
global _channel_mgr
try:
# load config
load_config()
@@ -27,25 +342,43 @@ def run():
# kill signal
sigterm_handler_wrap(signal.SIGTERM)
# create channel
channel_name=conf().get('channel_type', 'wx')
# Parse channel_type into a list
raw_channel = conf().get("channel_type", "web")
if "--cmd" in sys.argv:
channel_name = 'terminal'
channel_names = ["terminal"]
else:
channel_names = _parse_channel_type(raw_channel)
if not channel_names:
channel_names = ["web"]
if channel_name == 'wxy':
os.environ['WECHATY_LOG']="warn"
# os.environ['WECHATY_PUPPET_SERVICE_ENDPOINT'] = '127.0.0.1:9001'
# Auto-start web console unless explicitly disabled
web_console_enabled = conf().get("web_console", True)
if web_console_enabled and "web" not in channel_names:
channel_names.append("web")
channel = channel_factory.create_channel(channel_name)
if channel_name in ['wx','wxy','terminal','wechatmp','wechatmp_service']:
PluginManager().load_plugins()
# Sync builtin skills to workspace before channels start
_sync_builtin_skills()
# startup channel
channel.startup()
# Kick off MCP server loading in the background so first-message
# latency isn't dominated by npx package downloads.
_warmup_mcp_tools()
_warmup_scheduler()
logger.info(f"[App] Starting channels: {channel_names}")
_channel_mgr = ChannelManager()
_channel_mgr.start(channel_names, first_start=True)
while True:
time.sleep(1)
except KeyboardInterrupt:
pass
except Exception as e:
logger.error("App startup failed!")
logger.exception(e)
if __name__ == '__main__':
run()
if __name__ == "__main__":
run()

View File

@@ -1,28 +0,0 @@
# encoding:utf-8
import requests
from bot.bot import Bot
from bridge.reply import Reply, ReplyType
# Baidu Unit对话接口 (可用, 但能力较弱)
class BaiduUnitBot(Bot):
def reply(self, query, context=None):
token = self.get_token()
url = 'https://aip.baidubce.com/rpc/2.0/unit/service/v3/chat?access_token=' + token
post_data = "{\"version\":\"3.0\",\"service_id\":\"S73177\",\"session_id\":\"\",\"log_id\":\"7758521\",\"skill_ids\":[\"1221886\"],\"request\":{\"terminal_id\":\"88888\",\"query\":\"" + query + "\", \"hyper_params\": {\"chat_custom_bot_profile\": 1}}}"
print(post_data)
headers = {'content-type': 'application/x-www-form-urlencoded'}
response = requests.post(url, data=post_data.encode(), headers=headers)
if response:
reply = Reply(ReplyType.TEXT, response.json()['result']['context']['SYS_PRESUMED_HIST'][1])
return reply
def get_token(self):
access_key = 'YOUR_ACCESS_KEY'
secret_key = 'YOUR_SECRET_KEY'
host = 'https://aip.baidubce.com/oauth/2.0/token?grant_type=client_credentials&client_id=' + access_key + '&client_secret=' + secret_key
response = requests.get(host)
if response:
print(response.json())
return response.json()['access_token']

View File

@@ -1,17 +0,0 @@
"""
Auto-replay chat robot abstract class
"""
from bridge.context import Context
from bridge.reply import Reply
class Bot(object):
def reply(self, query, context : Context =None) -> Reply:
"""
bot auto-reply content
:param req: received message
:return: reply content
"""
raise NotImplementedError

View File

@@ -1,32 +0,0 @@
"""
channel factory
"""
from common import const
def create_bot(bot_type):
"""
create a bot_type instance
:param bot_type: bot type code
:return: bot instance
"""
if bot_type == const.BAIDU:
# Baidu Unit对话接口
from bot.baidu.baidu_unit_bot import BaiduUnitBot
return BaiduUnitBot()
elif bot_type == const.CHATGPT:
# ChatGPT 网页端web接口
from bot.chatgpt.chat_gpt_bot import ChatGPTBot
return ChatGPTBot()
elif bot_type == const.OPEN_AI:
# OpenAI 官方对话模型API
from bot.openai.open_ai_bot import OpenAIBot
return OpenAIBot()
elif bot_type == const.CHATGPTONAZURE:
# Azure chatgpt service https://azure.microsoft.com/en-in/products/cognitive-services/openai-service/
from bot.chatgpt.chat_gpt_bot import AzureChatGPTBot
return AzureChatGPTBot()
raise RuntimeError

View File

@@ -1,148 +0,0 @@
# encoding:utf-8
from bot.bot import Bot
from bot.chatgpt.chat_gpt_session import ChatGPTSession
from bot.openai.open_ai_image import OpenAIImage
from bot.session_manager import SessionManager
from bridge.context import ContextType
from bridge.reply import Reply, ReplyType
from config import conf, load_config
from common.log import logger
from common.token_bucket import TokenBucket
import openai
import openai.error
import time
# OpenAI对话模型API (可用)
class ChatGPTBot(Bot,OpenAIImage):
def __init__(self):
super().__init__()
# set the default api_key
openai.api_key = conf().get('open_ai_api_key')
if conf().get('open_ai_api_base'):
openai.api_base = conf().get('open_ai_api_base')
proxy = conf().get('proxy')
if proxy:
openai.proxy = proxy
if conf().get('rate_limit_chatgpt'):
self.tb4chatgpt = TokenBucket(conf().get('rate_limit_chatgpt', 20))
self.sessions = SessionManager(ChatGPTSession, model= conf().get("model") or "gpt-3.5-turbo")
self.args ={
"model": conf().get("model") or "gpt-3.5-turbo", # 对话模型的名称
"temperature":conf().get('temperature', 0.9), # 值在[0,1]之间,越大表示回复越具有不确定性
# "max_tokens":4096, # 回复最大的字符数
"top_p":1,
"frequency_penalty":conf().get('frequency_penalty', 0.0), # [-2,2]之间,该值越大则更倾向于产生不同的内容
"presence_penalty":conf().get('presence_penalty', 0.0), # [-2,2]之间,该值越大则更倾向于产生不同的内容
"request_timeout": conf().get('request_timeout', None), # 请求超时时间openai接口默认设置为600对于难问题一般需要较长时间
"timeout": conf().get('request_timeout', None), #重试超时时间,在这个时间内,将会自动重试
}
def reply(self, query, context=None):
# acquire reply content
if context.type == ContextType.TEXT:
logger.info("[CHATGPT] query={}".format(query))
session_id = context['session_id']
reply = None
clear_memory_commands = conf().get('clear_memory_commands', ['#清除记忆'])
if query in clear_memory_commands:
self.sessions.clear_session(session_id)
reply = Reply(ReplyType.INFO, '记忆已清除')
elif query == '#清除所有':
self.sessions.clear_all_session()
reply = Reply(ReplyType.INFO, '所有人记忆已清除')
elif query == '#更新配置':
load_config()
reply = Reply(ReplyType.INFO, '配置已更新')
if reply:
return reply
session = self.sessions.session_query(query, session_id)
logger.debug("[CHATGPT] session query={}".format(session.messages))
api_key = context.get('openai_api_key')
# if context.get('stream'):
# # reply in stream
# return self.reply_text_stream(query, new_query, session_id)
reply_content = self.reply_text(session, api_key)
logger.debug("[CHATGPT] new_query={}, session_id={}, reply_cont={}, completion_tokens={}".format(session.messages, session_id, reply_content["content"], reply_content["completion_tokens"]))
if reply_content['completion_tokens'] == 0 and len(reply_content['content']) > 0:
reply = Reply(ReplyType.ERROR, reply_content['content'])
elif reply_content["completion_tokens"] > 0:
self.sessions.session_reply(reply_content["content"], session_id, reply_content["total_tokens"])
reply = Reply(ReplyType.TEXT, reply_content["content"])
else:
reply = Reply(ReplyType.ERROR, reply_content['content'])
logger.debug("[CHATGPT] reply {} used 0 tokens.".format(reply_content))
return reply
elif context.type == ContextType.IMAGE_CREATE:
ok, retstring = self.create_img(query, 0)
reply = None
if ok:
reply = Reply(ReplyType.IMAGE_URL, retstring)
else:
reply = Reply(ReplyType.ERROR, retstring)
return reply
else:
reply = Reply(ReplyType.ERROR, 'Bot不支持处理{}类型的消息'.format(context.type))
return reply
def reply_text(self, session:ChatGPTSession, api_key=None, retry_count=0) -> dict:
'''
call openai's ChatCompletion to get the answer
:param session: a conversation session
:param session_id: session id
:param retry_count: retry count
:return: {}
'''
try:
if conf().get('rate_limit_chatgpt') and not self.tb4chatgpt.get_token():
raise openai.error.RateLimitError("RateLimitError: rate limit exceeded")
# if api_key == None, the default openai.api_key will be used
response = openai.ChatCompletion.create(
api_key=api_key, messages=session.messages, **self.args
)
# logger.info("[ChatGPT] reply={}, total_tokens={}".format(response.choices[0]['message']['content'], response["usage"]["total_tokens"]))
return {"total_tokens": response["usage"]["total_tokens"],
"completion_tokens": response["usage"]["completion_tokens"],
"content": response.choices[0]['message']['content']}
except Exception as e:
need_retry = retry_count < 2
result = {"completion_tokens": 0, "content": "我现在有点累了,等会再来吧"}
if isinstance(e, openai.error.RateLimitError):
logger.warn("[CHATGPT] RateLimitError: {}".format(e))
result['content'] = "提问太快啦,请休息一下再问我吧"
if need_retry:
time.sleep(5)
elif isinstance(e, openai.error.Timeout):
logger.warn("[CHATGPT] Timeout: {}".format(e))
result['content'] = "我没有收到你的消息"
if need_retry:
time.sleep(5)
elif isinstance(e, openai.error.APIConnectionError):
logger.warn("[CHATGPT] APIConnectionError: {}".format(e))
need_retry = False
result['content'] = "我连接不到你的网络"
else:
logger.warn("[CHATGPT] Exception: {}".format(e))
need_retry = False
self.sessions.clear_session(session.session_id)
if need_retry:
logger.warn("[CHATGPT] 第{}次重试".format(retry_count+1))
return self.reply_text(session, api_key, retry_count+1)
else:
return result
class AzureChatGPTBot(ChatGPTBot):
def __init__(self):
super().__init__()
openai.api_type = "azure"
openai.api_version = "2023-03-15-preview"
self.args["deployment_id"] = conf().get("azure_deployment_id")

View File

@@ -1,113 +0,0 @@
# encoding:utf-8
from bot.bot import Bot
from bot.openai.open_ai_image import OpenAIImage
from bot.openai.open_ai_session import OpenAISession
from bot.session_manager import SessionManager
from bridge.context import ContextType
from bridge.reply import Reply, ReplyType
from config import conf
from common.log import logger
import openai
import openai.error
import time
user_session = dict()
# OpenAI对话模型API (可用)
class OpenAIBot(Bot, OpenAIImage):
def __init__(self):
super().__init__()
openai.api_key = conf().get('open_ai_api_key')
if conf().get('open_ai_api_base'):
openai.api_base = conf().get('open_ai_api_base')
proxy = conf().get('proxy')
if proxy:
openai.proxy = proxy
self.sessions = SessionManager(OpenAISession, model= conf().get("model") or "text-davinci-003")
self.args = {
"model": conf().get("model") or "text-davinci-003", # 对话模型的名称
"temperature":conf().get('temperature', 0.9), # 值在[0,1]之间,越大表示回复越具有不确定性
"max_tokens":1200, # 回复最大的字符数
"top_p":1,
"frequency_penalty":conf().get('frequency_penalty', 0.0), # [-2,2]之间,该值越大则更倾向于产生不同的内容
"presence_penalty":conf().get('presence_penalty', 0.0), # [-2,2]之间,该值越大则更倾向于产生不同的内容
"request_timeout": conf().get('request_timeout', None), # 请求超时时间openai接口默认设置为600对于难问题一般需要较长时间
"timeout": conf().get('request_timeout', None), #重试超时时间,在这个时间内,将会自动重试
"stop":["\n\n\n"]
}
def reply(self, query, context=None):
# acquire reply content
if context and context.type:
if context.type == ContextType.TEXT:
logger.info("[OPEN_AI] query={}".format(query))
session_id = context['session_id']
reply = None
if query == '#清除记忆':
self.sessions.clear_session(session_id)
reply = Reply(ReplyType.INFO, '记忆已清除')
elif query == '#清除所有':
self.sessions.clear_all_session()
reply = Reply(ReplyType.INFO, '所有人记忆已清除')
else:
session = self.sessions.session_query(query, session_id)
result = self.reply_text(session)
total_tokens, completion_tokens, reply_content = result['total_tokens'], result['completion_tokens'], result['content']
logger.debug("[OPEN_AI] new_query={}, session_id={}, reply_cont={}, completion_tokens={}".format(str(session), session_id, reply_content, completion_tokens))
if total_tokens == 0 :
reply = Reply(ReplyType.ERROR, reply_content)
else:
self.sessions.session_reply(reply_content, session_id, total_tokens)
reply = Reply(ReplyType.TEXT, reply_content)
return reply
elif context.type == ContextType.IMAGE_CREATE:
ok, retstring = self.create_img(query, 0)
reply = None
if ok:
reply = Reply(ReplyType.IMAGE_URL, retstring)
else:
reply = Reply(ReplyType.ERROR, retstring)
return reply
def reply_text(self, session:OpenAISession, retry_count=0):
try:
response = openai.Completion.create(
prompt=str(session), **self.args
)
res_content = response.choices[0]['text'].strip().replace('<|endoftext|>', '')
total_tokens = response["usage"]["total_tokens"]
completion_tokens = response["usage"]["completion_tokens"]
logger.info("[OPEN_AI] reply={}".format(res_content))
return {"total_tokens": total_tokens,
"completion_tokens": completion_tokens,
"content": res_content}
except Exception as e:
need_retry = retry_count < 2
result = {"completion_tokens": 0, "content": "我现在有点累了,等会再来吧"}
if isinstance(e, openai.error.RateLimitError):
logger.warn("[OPEN_AI] RateLimitError: {}".format(e))
result['content'] = "提问太快啦,请休息一下再问我吧"
if need_retry:
time.sleep(5)
elif isinstance(e, openai.error.Timeout):
logger.warn("[OPEN_AI] Timeout: {}".format(e))
result['content'] = "我没有收到你的消息"
if need_retry:
time.sleep(5)
elif isinstance(e, openai.error.APIConnectionError):
logger.warn("[OPEN_AI] APIConnectionError: {}".format(e))
need_retry = False
result['content'] = "我连接不到你的网络"
else:
logger.warn("[OPEN_AI] Exception: {}".format(e))
need_retry = False
self.sessions.clear_session(session.session_id)
if need_retry:
logger.warn("[OPEN_AI] 第{}次重试".format(retry_count+1))
return self.reply_text(session, retry_count+1)
else:
return result

View File

@@ -1,38 +0,0 @@
import time
import openai
import openai.error
from common.token_bucket import TokenBucket
from common.log import logger
from config import conf
# OPENAI提供的画图接口
class OpenAIImage(object):
def __init__(self):
openai.api_key = conf().get('open_ai_api_key')
if conf().get('rate_limit_dalle'):
self.tb4dalle = TokenBucket(conf().get('rate_limit_dalle', 50))
def create_img(self, query, retry_count=0):
try:
if conf().get('rate_limit_dalle') and not self.tb4dalle.get_token():
return False, "请求太快了,请休息一下再问我吧"
logger.info("[OPEN_AI] image_query={}".format(query))
response = openai.Image.create(
prompt=query, #图片描述
n=1, #每次生成图片的数量
size="256x256" #图片大小,可选有 256x256, 512x512, 1024x1024
)
image_url = response['data'][0]['url']
logger.info("[OPEN_AI] image_url={}".format(image_url))
return True, image_url
except openai.error.RateLimitError as e:
logger.warn(e)
if retry_count < 1:
time.sleep(5)
logger.warn("[OPEN_AI] ImgCreate RateLimit exceed, 第{}次重试".format(retry_count+1))
return self.create_img(query, retry_count+1)
else:
return False, "提问太快啦,请休息一下再问我吧"
except Exception as e:
logger.exception(e)
return False, str(e)

1030
bridge/agent_bridge.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,125 @@
"""
Agent Event Handler - Handles agent events and thinking process output
"""
from common import const
from common.log import logger
# Cap intermediate thinking messages on weixin to stay within send quota.
WEIXIN_THINKING_INSTANT_MAX = 7
class AgentEventHandler:
"""
Handles agent events and optionally sends intermediate messages to channel
"""
def __init__(self, context=None, original_callback=None):
self.context = context
self.original_callback = original_callback
self.channel = None
if context:
self.channel = context.kwargs.get("channel") if hasattr(context, "kwargs") else None
self.current_content = ""
self.turn_number = 0
channel_type = ""
if context and hasattr(context, "kwargs"):
channel_type = context.kwargs.get("channel_type", "") or ""
self._is_weixin = channel_type == const.WEIXIN
self._thinking_sent_count = 0
self._merged_buf: list[str] = []
def handle_event(self, event):
event_type = event.get("type")
data = event.get("data", {})
if event_type == "turn_start":
self._handle_turn_start(data)
elif event_type == "message_update":
self._handle_message_update(data)
elif event_type == "message_end":
self._handle_message_end(data)
elif event_type == "reasoning_update":
pass
elif event_type == "tool_execution_start":
self._handle_tool_execution_start(data)
elif event_type == "tool_execution_end":
self._handle_tool_execution_end(data)
elif event_type == "agent_end":
self._handle_agent_end(data)
if self.original_callback:
self.original_callback(event)
def _handle_turn_start(self, data):
self.turn_number = data.get("turn", 0)
self.current_content = ""
def _handle_message_update(self, data):
delta = data.get("delta", "")
self.current_content += delta
def _handle_message_end(self, data):
tool_calls = data.get("tool_calls", [])
if tool_calls:
if self.current_content.strip():
logger.info(f"💭 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
self._send_to_channel(self.current_content.strip())
else:
if self.current_content.strip():
logger.debug(f"💬 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
# Drain weixin buffer before final reply leaves chat_channel
self._flush_merged_now()
self.current_content = ""
def _handle_agent_end(self, data):
self._flush_merged_now()
def _handle_tool_execution_start(self, data):
pass
def _handle_tool_execution_end(self, data):
pass
def _send_to_channel(self, message):
if self.context and self.context.get("on_event"):
return
if not self.channel:
return
if not self._is_weixin:
self._do_send(message)
return
if self._thinking_sent_count < WEIXIN_THINKING_INSTANT_MAX:
self._do_send(message)
self._thinking_sent_count += 1
return
self._merged_buf.append(message)
def _flush_merged_now(self):
if not self._merged_buf:
return
merged = "\n\n".join(self._merged_buf)
count = len(self._merged_buf)
self._merged_buf = []
logger.debug(f"[AgentEventHandler] Flushing {count} merged thinking msgs, len={len(merged)}")
self._do_send(merged)
self._thinking_sent_count += 1
def _do_send(self, message):
try:
from bridge.reply import Reply, ReplyType
reply = Reply(ReplyType.TEXT, message)
self.channel._send(reply, self.context)
except Exception as e:
logger.debug(f"[AgentEventHandler] Failed to send to channel: {e}")
def log_summary(self):
pass

823
bridge/agent_initializer.py Normal file
View File

@@ -0,0 +1,823 @@
"""
Agent Initializer - Handles agent initialization logic
"""
import os
import asyncio
import datetime
import threading
import time
from typing import Optional, List
from agent.protocol import Agent
from agent.tools import ToolManager
from common.log import logger
from common.utils import expand_path
# Module-level lock to serialize scheduler init across concurrent sessions
_scheduler_init_lock = threading.Lock()
# Track whether the embedding model log has been printed in this process,
# so we avoid spamming it once per session.
_embedding_logged: bool = False
class AgentInitializer:
"""
Handles agent initialization including:
- Workspace setup
- Memory system initialization
- Tool loading
- System prompt building
"""
def __init__(self, bridge, agent_bridge):
"""
Initialize agent initializer
Args:
bridge: COW bridge instance
agent_bridge: AgentBridge instance (for create_agent method)
"""
self.bridge = bridge
self.agent_bridge = agent_bridge
def initialize_agent(self, session_id: Optional[str] = None) -> Agent:
"""
Initialize agent for a session
Args:
session_id: Session ID (None for default agent)
Returns:
Initialized agent instance
"""
from config import conf
# Get workspace from config
workspace_root = expand_path(conf().get("agent_workspace", "~/cow"))
# Migrate API keys
self._migrate_config_to_env(workspace_root)
# Load environment variables
self._load_env_file()
# Initialize workspace
from agent.prompt import ensure_workspace, load_context_files, PromptBuilder
workspace_files = ensure_workspace(workspace_root, create_templates=True)
if session_id is None:
logger.info(f"[AgentInitializer] Workspace initialized at: {workspace_root}")
# Setup memory system
memory_manager, memory_tools = self._setup_memory_system(workspace_root, session_id)
# Load tools
tools = self._load_tools(workspace_root, memory_manager, memory_tools, session_id)
# Initialize scheduler if needed
self._initialize_scheduler(tools, session_id)
# Load context files
context_files = load_context_files(workspace_root)
# Initialize skill manager
skill_manager = self._initialize_skill_manager(workspace_root, session_id)
# Build system prompt
prompt_builder = PromptBuilder(workspace_dir=workspace_root, language="zh")
runtime_info = self._get_runtime_info(workspace_root)
system_prompt = prompt_builder.build(
tools=tools,
context_files=context_files,
skill_manager=skill_manager,
memory_manager=memory_manager,
runtime_info=runtime_info,
)
# Get cost control parameters
from config import conf
max_steps = conf().get("agent_max_steps", 20)
max_context_tokens = conf().get("agent_max_context_tokens", 50000)
# Create agent
agent = self.agent_bridge.create_agent(
system_prompt=system_prompt,
tools=tools,
max_steps=max_steps,
output_mode="logger",
workspace_dir=workspace_root,
skill_manager=skill_manager,
enable_skills=True,
max_context_tokens=max_context_tokens,
runtime_info=runtime_info # Pass runtime_info for dynamic time updates
)
# Attach memory manager and share LLM model for summarization
if memory_manager:
agent.memory_manager = memory_manager
if hasattr(agent, 'model') and agent.model:
memory_manager.flush_manager.llm_model = agent.model
# Restore persisted conversation history for this session
if session_id:
self._restore_conversation_history(agent, session_id)
# Start daily memory flush timer (once, on first agent init regardless of session)
self._start_daily_flush_timer()
return agent
def _restore_conversation_history(self, agent, session_id: str) -> None:
"""
Load persisted conversation messages from SQLite and inject them
into the agent's in-memory message list.
Only user text and assistant text are restored. Tool call chains
(tool_use / tool_result) are stripped out because:
1. They are intermediate process, the value is already in the final
assistant text reply.
2. They consume massive context tokens (often 80%+ of history).
3. Different models have incompatible tool message formats, so
restoring tool chains across model switches causes 400 errors.
4. Eliminates the entire class of tool_use/tool_result pairing bugs.
"""
from config import conf
if not conf().get("conversation_persistence", True):
return
try:
from agent.memory import get_conversation_store
store = get_conversation_store()
max_turns = conf().get("agent_max_context_turns", 20)
# Scheduler tasks run on a stable isolated session per task and
# can fire many times a day; a smaller restore window keeps prompt
# cost bounded while still letting the agent see "last few" runs
# for trend / dedup style logic. Regular chat sessions keep the
# original heuristic so user dialogues feel continuous.
if session_id.startswith("scheduler_"):
restore_turns = max(1, max_turns // 5)
else:
restore_turns = max(3, max_turns // 6)
saved = store.load_messages(session_id, max_turns=restore_turns)
if saved:
filtered = self._filter_text_only_messages(saved)
if filtered:
with agent.messages_lock:
agent.messages = filtered
logger.debug(
f"[AgentInitializer] Restored {len(filtered)} text messages "
f"(from {len(saved)} total, {restore_turns} turns cap) "
f"for session={session_id}"
)
except Exception as e:
logger.warning(
f"[AgentInitializer] Failed to restore conversation history for "
f"session={session_id}: {e}"
)
@staticmethod
def _filter_text_only_messages(messages: list) -> list:
"""
Extract clean user/assistant turn pairs from raw message history.
Groups messages into turns (each starting with a real user query),
then keeps only:
- The first user text in each turn (the actual user input)
- The last assistant text in each turn (the final answer)
All tool_use, tool_result, intermediate assistant thoughts, and
internal hint messages injected by the agent loop are discarded.
"""
def _extract_text(content) -> str:
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
parts = [
b.get("text", "")
for b in content
if isinstance(b, dict) and b.get("type") == "text"
]
return "\n".join(p for p in parts if p).strip()
return ""
def _is_real_user_msg(msg: dict) -> bool:
"""True for actual user input, False for tool_result or internal hints."""
if msg.get("role") != "user":
return False
content = msg.get("content")
if isinstance(content, list):
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result"
for b in content
)
if has_tool_result:
return False
text = _extract_text(content)
return bool(text)
# Group into turns: each turn starts with a real user message
turns = []
current_turn = None
for msg in messages:
if _is_real_user_msg(msg):
if current_turn is not None:
turns.append(current_turn)
current_turn = {"user": msg, "assistants": []}
elif current_turn is not None and msg.get("role") == "assistant":
text = _extract_text(msg.get("content"))
if text:
current_turn["assistants"].append(text)
if current_turn is not None:
turns.append(current_turn)
# Build result: one user msg + one assistant msg per turn
filtered = []
for turn in turns:
user_text = _extract_text(turn["user"].get("content"))
if not user_text:
continue
filtered.append({
"role": "user",
"content": [{"type": "text", "text": user_text}]
})
if turn["assistants"]:
final_reply = turn["assistants"][-1]
filtered.append({
"role": "assistant",
"content": [{"type": "text", "text": final_reply}]
})
return filtered
def _load_env_file(self):
"""Load environment variables from .env file"""
env_file = expand_path("~/.cow/.env")
if os.path.exists(env_file):
try:
from dotenv import load_dotenv
load_dotenv(env_file, override=True)
except ImportError:
logger.warning("[AgentInitializer] python-dotenv not installed")
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to load .env file: {e}")
def _setup_memory_system(self, workspace_root: str, session_id: Optional[str] = None):
"""
Setup memory system
Returns:
(memory_manager, memory_tools) tuple
"""
memory_manager = None
memory_tools = []
try:
from agent.memory import MemoryManager, MemoryConfig
from agent.tools import MemorySearchTool, MemoryGetTool
from config import conf
memory_config = MemoryConfig(workspace_root=workspace_root)
embedding_provider = self._init_embedding_provider(
memory_config, session_id=session_id
)
memory_manager = MemoryManager(memory_config, embedding_provider=embedding_provider)
self._sync_memory(memory_manager, session_id)
memory_tools = [
MemorySearchTool(memory_manager),
MemoryGetTool(memory_manager)
]
if session_id is None:
logger.info("[AgentInitializer] Memory system initialized")
except Exception as e:
logger.warning(f"[AgentInitializer] Memory system not available: {e}")
return memory_manager, memory_tools
def _init_embedding_provider(self, memory_config, session_id: Optional[str] = None):
"""
Initialize the embedding provider for memory.
Two paths:
A. Default (no `embedding_provider` in config.json):
Auto-init OpenAI -> LinkAI fallback. Existing 1536-dim indices
keep working.
B. Explicit (`embedding_provider` is set):
Initialize the requested vendor with unified dim (default 1024).
If the index was built with a different dim, vector search will
quietly return no results (cosine returns 0) and keyword search
takes over until the user runs /memory rebuild-index.
"""
from agent.memory import create_embedding_provider
from config import conf
explicit_provider = (conf().get("embedding_provider") or "").strip().lower()
if not explicit_provider:
return self._init_embedding_provider_legacy(session_id=session_id)
return self._init_embedding_provider_explicit(
memory_config, explicit_provider, session_id=session_id,
)
def _init_embedding_provider_legacy(self, session_id: Optional[str] = None):
"""Legacy auto-init path: OpenAI -> LinkAI. Preserved verbatim for compat."""
from agent.memory import create_embedding_provider
from config import conf
embedding_provider = None
embedding_model = None
openai_api_key = conf().get("open_ai_api_key", "")
openai_api_base = conf().get("open_ai_api_base", "")
if openai_api_key and openai_api_key not in ["", "YOUR API KEY", "YOUR_API_KEY"]:
try:
model = "text-embedding-3-small"
embedding_provider = create_embedding_provider(
provider="openai",
model=model,
api_key=openai_api_key,
api_base=openai_api_base or "https://api.openai.com/v1"
)
embedding_model = f"openai/{model}"
except Exception as e:
logger.warning(f"[AgentInitializer] OpenAI embedding failed: {e}")
if embedding_provider is None:
linkai_api_key = conf().get("linkai_api_key", "") or os.environ.get("LINKAI_API_KEY", "")
linkai_api_base = conf().get("linkai_api_base", "https://api.link-ai.tech")
if linkai_api_key and linkai_api_key not in ["", "YOUR API KEY", "YOUR_API_KEY"]:
try:
model = "text-embedding-3-small"
embedding_provider = create_embedding_provider(
provider="linkai",
model=model,
api_key=linkai_api_key,
api_base=f"{linkai_api_base}/v1"
)
embedding_model = f"linkai/{model}"
except Exception as e:
logger.warning(f"[AgentInitializer] LinkAI embedding failed: {e}")
if embedding_provider is not None and embedding_model:
global _embedding_logged
if not _embedding_logged:
logger.info(
f"[AgentInitializer] Embedding model in use: {embedding_model} "
f"(dim={embedding_provider.dimensions})"
)
_embedding_logged = True
return embedding_provider
def _init_embedding_provider_explicit(
self,
memory_config,
provider_key: str,
session_id: Optional[str] = None,
):
"""Explicit-provider path: build the configured vendor.
If the index was built with a different dim, vector search will
silently return no results (cosine returns 0 for mismatched dims)
and keyword search takes over. Users switch vendors by running
/memory rebuild-index — see docs.
"""
from agent.memory import create_embedding_provider
from agent.memory.embedding import EMBEDDING_VENDORS
from config import conf
meta = EMBEDDING_VENDORS.get(provider_key)
if meta is None:
logger.error(
f"[AgentInitializer] Unknown embedding_provider '{provider_key}'. "
f"Supported: {sorted(EMBEDDING_VENDORS.keys())}. "
f"Memory will run in keyword-only mode."
)
return None
api_key = self._resolve_embedding_api_key(provider_key)
api_base = self._resolve_embedding_api_base(provider_key, meta["default_base_url"])
if not api_key:
logger.error(
f"[AgentInitializer] embedding_provider='{provider_key}' is set but its "
f"API key is missing. Memory will run in keyword-only mode."
)
return None
model = (conf().get("embedding_model") or "").strip() or meta["default_model"]
try:
cfg_dim = int(conf().get("embedding_dimensions") or 0)
except (TypeError, ValueError):
cfg_dim = 0
dim = cfg_dim if cfg_dim > 0 else meta["default_dimensions"]
try:
provider = create_embedding_provider(
provider=provider_key,
model=model,
api_key=api_key,
api_base=api_base,
dimensions=dim,
)
except Exception as e:
logger.error(
f"[AgentInitializer] Failed to init embedding provider "
f"'{provider_key}/{model}': {e}"
)
return None
global _embedding_logged
if not _embedding_logged:
logger.info(
f"[AgentInitializer] Embedding model in use: "
f"{provider_key}/{model} (dim={provider.dimensions})"
)
_embedding_logged = True
return provider
@staticmethod
def _resolve_embedding_api_key(provider_key: str) -> str:
"""Pick the API key for an explicit embedding provider from config."""
from config import conf
key_map = {
"openai": "open_ai_api_key",
"linkai": "linkai_api_key",
"dashscope": "dashscope_api_key",
"doubao": "ark_api_key",
"zhipu": "zhipu_ai_api_key",
}
field = key_map.get(provider_key)
if not field:
return ""
value = conf().get(field, "") or ""
if value in ["", "YOUR API KEY", "YOUR_API_KEY"]:
return ""
return value
@staticmethod
def _resolve_embedding_api_base(provider_key: str, default_base: str) -> str:
"""Pick the API base for an explicit embedding provider from config."""
from config import conf
base_map = {
"openai": "open_ai_api_base",
"linkai": "linkai_api_base",
"doubao": "ark_base_url",
"zhipu": "zhipu_ai_api_base",
}
field = base_map.get(provider_key)
if not field:
return default_base
value = (conf().get(field) or "").strip()
if not value:
return default_base
if provider_key == "linkai" and not value.rstrip("/").endswith("/v1"):
return f"{value.rstrip('/')}/v1"
return value
def _sync_memory(self, memory_manager, session_id: Optional[str] = None):
"""Sync memory database"""
try:
loop = asyncio.get_event_loop()
if loop.is_closed():
raise RuntimeError("Event loop is closed")
except RuntimeError:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
if loop.is_running():
asyncio.create_task(memory_manager.sync())
else:
loop.run_until_complete(memory_manager.sync())
except Exception as e:
logger.warning(f"[AgentInitializer] Memory sync failed: {e}")
def _load_tools(self, workspace_root: str, memory_manager, memory_tools: List, session_id: Optional[str] = None):
"""Load all tools"""
tool_manager = ToolManager()
tool_manager.load_tools()
tools = []
file_config = {
"cwd": workspace_root,
"memory_manager": memory_manager
} if memory_manager else {"cwd": workspace_root}
for tool_name in tool_manager.tool_classes.keys():
try:
# Skip web_search if no API key is available
if tool_name == "web_search":
from agent.tools.web_search.web_search import WebSearch
if not WebSearch.is_available():
logger.debug("[AgentInitializer] WebSearch skipped - no search provider configured")
continue
# Special handling for EnvConfig tool
if tool_name == "env_config":
from agent.tools import EnvConfig
tool = EnvConfig({"agent_bridge": self.agent_bridge})
else:
tool = tool_manager.create_tool(tool_name)
if tool:
# Apply workspace config to file operation tools.
# Merge into the existing tool.config (set by ToolManager from
# config.json's `tools.<name>` section) instead of replacing
# it, otherwise per-tool user configs (e.g. browser.cdp_endpoint)
# would be silently dropped.
if tool_name in ['read', 'write', 'edit', 'bash', 'grep', 'find', 'ls', 'web_fetch', 'send', 'browser']:
merged_config = dict(getattr(tool, 'config', None) or {})
merged_config.update(file_config)
tool.config = merged_config
tool.cwd = merged_config.get("cwd", getattr(tool, 'cwd', None))
if 'memory_manager' in merged_config:
tool.memory_manager = merged_config['memory_manager']
tools.append(tool)
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to load tool {tool_name}: {e}")
# Add MCP tools (snapshot to avoid races with the background loader)
mcp_tools_snapshot = list(tool_manager._mcp_tool_instances.items())
if mcp_tools_snapshot:
for _, mcp_tool in mcp_tools_snapshot:
tools.append(mcp_tool)
if session_id is None:
names = [name for name, _ in mcp_tools_snapshot]
logger.info(
f"[AgentInitializer] Added {len(names)} MCP tool(s): {names}"
)
# Add memory tools
if memory_tools:
tools.extend(memory_tools)
if session_id is None:
logger.info(f"[AgentInitializer] Added {len(memory_tools)} memory tools")
if session_id is None:
logger.info(f"[AgentInitializer] Loaded {len(tools)} tools: {[t.name for t in tools]}")
return tools
def _initialize_scheduler(self, tools: List, session_id: Optional[str] = None):
"""Initialize scheduler service if needed.
Serialize the check-and-set under a module-level lock so concurrent
first-time session inits cannot each create a new SchedulerService
(which would leak background scanning threads).
"""
if not self.agent_bridge.scheduler_initialized:
with _scheduler_init_lock:
if not self.agent_bridge.scheduler_initialized:
try:
from agent.tools.scheduler.integration import init_scheduler
if init_scheduler(self.agent_bridge):
self.agent_bridge.scheduler_initialized = True
if session_id is None:
logger.info("[AgentInitializer] Scheduler service initialized")
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to initialize scheduler: {e}")
# Inject scheduler dependencies
if self.agent_bridge.scheduler_initialized:
try:
from agent.tools.scheduler.integration import get_task_store, get_scheduler_service
from agent.tools import SchedulerTool
from config import conf
task_store = get_task_store()
scheduler_service = get_scheduler_service()
for tool in tools:
if isinstance(tool, SchedulerTool):
tool.task_store = task_store
tool.scheduler_service = scheduler_service
if not tool.config:
tool.config = {}
raw_ct = conf().get("channel_type", "unknown")
if isinstance(raw_ct, list):
ct = raw_ct[0] if raw_ct else "unknown"
elif isinstance(raw_ct, str) and "," in raw_ct:
ct = raw_ct.split(",")[0].strip()
else:
ct = raw_ct
tool.config["channel_type"] = ct
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to inject scheduler dependencies: {e}")
def _initialize_skill_manager(self, workspace_root: str, session_id: Optional[str] = None):
"""Initialize skill manager"""
try:
from agent.skills import SkillManager
skill_manager = SkillManager(custom_dir=os.path.join(workspace_root, "skills"))
return skill_manager
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to initialize SkillManager: {e}")
return None
def _get_runtime_info(self, workspace_root: str):
"""Get runtime information with dynamic time support"""
from config import conf
def get_current_time():
"""Get current time dynamically - called each time system prompt is accessed"""
now = datetime.datetime.now()
# Get timezone info
try:
offset = -time.timezone if not time.daylight else -time.altzone
hours = offset // 3600
minutes = (offset % 3600) // 60
timezone_name = f"UTC{hours:+03d}:{minutes:02d}" if minutes else f"UTC{hours:+03d}"
except Exception:
timezone_name = "UTC"
# Weekday: English name in en, Chinese mapping otherwise
weekday_en = now.strftime("%A")
try:
from common import i18n
is_en = i18n.get_language() == "en"
except Exception:
is_en = False
if is_en:
weekday = weekday_en
else:
weekday_map = {
'Monday': '星期一', 'Tuesday': '星期二', 'Wednesday': '星期三',
'Thursday': '星期四', 'Friday': '星期五', 'Saturday': '星期六', 'Sunday': '星期日'
}
weekday = weekday_map.get(weekday_en, weekday_en)
return {
'time': now.strftime("%Y-%m-%d %H:%M:%S"),
'weekday': weekday,
'timezone': timezone_name
}
def get_model():
"""Get current model name dynamically from config"""
return conf().get("model", "unknown")
return {
"_get_model": get_model,
"workspace": workspace_root,
"channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
"_get_current_time": get_current_time # Dynamic time function
}
def _migrate_config_to_env(self, workspace_root: str):
"""Migrate API keys from config.json to .env file"""
from config import conf
key_mapping = {
"open_ai_api_key": "OPENAI_API_KEY",
"open_ai_api_base": "OPENAI_API_BASE",
"gemini_api_key": "GEMINI_API_KEY",
"claude_api_key": "CLAUDE_API_KEY",
"linkai_api_key": "LINKAI_API_KEY",
}
env_file = expand_path("~/.cow/.env")
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
with open(env_file, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
if updated:
try:
env_dir = os.path.dirname(env_file)
os.makedirs(env_dir, exist_ok=True)
# Rewrite the entire .env file to ensure consistency
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")
def _start_daily_flush_timer(self):
"""Start a background thread that flushes all agents' memory daily at 23:55."""
if getattr(self.agent_bridge, '_daily_flush_started', False):
return
self.agent_bridge._daily_flush_started = True
import threading
def _daily_flush_loop():
import random
last_run_date = None # Track last successful run date to prevent same-day re-trigger
while True:
try:
now = datetime.datetime.now()
jitter_min = random.randint(50, 55)
jitter_sec = random.randint(0, 59)
target = now.replace(hour=23, minute=jitter_min, second=jitter_sec, microsecond=0)
# Always schedule for tomorrow if we already ran today, or if target time has passed
if target <= now or (last_run_date == now.date()):
target += datetime.timedelta(days=1)
wait_seconds = (target - now).total_seconds()
logger.info(f"[DailyFlush] Next flush at {target.strftime('%Y-%m-%d %H:%M:%S')} (in {wait_seconds/3600:.1f}h)")
time.sleep(wait_seconds)
self._flush_all_agents()
last_run_date = datetime.datetime.now().date()
except Exception as e:
logger.warning(f"[DailyFlush] Error in daily flush loop: {e}")
time.sleep(3600)
t = threading.Thread(target=_daily_flush_loop, daemon=True)
t.start()
def _flush_all_agents(self):
"""Flush memory for all active agent sessions, then run Deep Dream."""
agents = []
if self.agent_bridge.default_agent:
agents.append(("default", self.agent_bridge.default_agent))
for sid, agent in self.agent_bridge.agents.items():
agents.append((sid, agent))
if not agents:
return
# Phase 1: flush daily summaries
flushed = 0
flush_threads = []
dream_candidate = None
for label, agent in agents:
try:
if not agent.memory_manager:
continue
with agent.messages_lock:
messages = list(agent.messages)
if not messages:
continue
result = agent.memory_manager.flush_manager.create_daily_summary(messages)
if result:
flushed += 1
t = agent.memory_manager.flush_manager._last_flush_thread
if t:
flush_threads.append(t)
if dream_candidate is None:
dream_candidate = agent.memory_manager.flush_manager
except Exception as e:
logger.warning(f"[DailyFlush] Failed for session {label}: {e}")
if flushed:
logger.info(f"[DailyFlush] Flushed {flushed}/{len(agents)} agent session(s)")
# Wait for all flush threads to finish before dreaming
for t in flush_threads:
t.join(timeout=60)
# Phase 2: Deep Dream — distill daily memories → MEMORY.md + dream diary
if dream_candidate:
try:
result = dream_candidate.deep_dream()
if result:
logger.info("[DeepDream] Memory distillation completed successfully")
except Exception as e:
logger.warning(f"[DeepDream] Failed: {e}")

View File

@@ -1,50 +1,197 @@
from models.bot_factory import create_bot
from bridge.context import Context
from bridge.reply import Reply
from common.log import logger
from bot import bot_factory
from common.singleton import singleton
from voice import voice_factory
from config import conf
from common import const
from common.log import logger
from common.singleton import singleton
from config import conf
from translate.factory import create_translator
from voice.factory import create_voice
@singleton
class Bridge(object):
def __init__(self):
self.btype={
"chat": const.CHATGPT,
"voice_to_text": conf().get("voice_to_text", "openai"),
"text_to_voice": conf().get("text_to_voice", "google")
self.btype = {
"chat": const.OPENAI,
# Empty `voice_to_text` (the default in new configs) triggers
# the auto-pick below — see _auto_pick_voice_to_text for order.
"voice_to_text": conf().get("voice_to_text") or self._auto_pick_voice_to_text(),
"text_to_voice": conf().get("text_to_voice", "google"),
"translate": conf().get("translate", "baidu"),
}
model_type = conf().get("model")
if model_type in ["text-davinci-003"]:
self.btype['chat'] = const.OPEN_AI
if conf().get("use_azure_chatgpt", False):
self.btype['chat'] = const.CHATGPTONAZURE
self.bots={}
# 这边取配置的模型
bot_type = conf().get("bot_type")
if bot_type:
self.btype["chat"] = bot_type
else:
model_type = conf().get("model") or const.GPT_41_MINI
# Ensure model_type is string to prevent AttributeError when using startswith()
# This handles cases where numeric model names (e.g., "1") are parsed as integers from YAML
if not isinstance(model_type, str):
logger.warning(f"[Bridge] model_type is not a string: {model_type} (type: {type(model_type).__name__}), converting to string")
model_type = str(model_type)
if model_type in ["text-davinci-003"]:
self.btype["chat"] = const.OPEN_AI
if conf().get("use_azure_chatgpt", False):
self.btype["chat"] = const.CHATGPTONAZURE
if model_type in ["wenxin", "wenxin-4"]:
self.btype["chat"] = const.BAIDU
if model_type in ["xunfei"]:
self.btype["chat"] = const.XUNFEI
if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
self.btype["chat"] = const.QWEN_DASHSCOPE
if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
self.btype["chat"] = const.QWEN_DASHSCOPE
if model_type and model_type.startswith("gemini"):
self.btype["chat"] = const.GEMINI
if model_type and model_type.startswith("glm"):
self.btype["chat"] = const.ZHIPU_AI
if model_type and model_type.startswith("claude"):
self.btype["chat"] = const.CLAUDEAPI
def get_bot(self,typename):
if model_type in [const.MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k"]:
self.btype["chat"] = const.MOONSHOT
if model_type and model_type.startswith("kimi"):
self.btype["chat"] = const.MOONSHOT
if model_type and model_type.startswith("doubao"):
self.btype["chat"] = const.DOUBAO
if model_type and model_type.startswith("deepseek"):
self.btype["chat"] = const.DEEPSEEK
# 小米 MiMo 系列模型,全部以 mimo- 开头
if model_type and model_type.startswith("mimo-"):
self.btype["chat"] = const.MIMO
if model_type and isinstance(model_type, str):
lowered_model_type = model_type.lower()
if lowered_model_type == const.QIANFAN or lowered_model_type.startswith("ernie"):
self.btype["chat"] = const.QIANFAN
if model_type in [const.MODELSCOPE]:
self.btype["chat"] = const.MODELSCOPE
# MiniMax models
if model_type and (model_type in ["abab6.5-chat", "abab6.5"] or model_type.lower().startswith("minimax")):
self.btype["chat"] = const.MiniMax
if conf().get("use_linkai") and conf().get("linkai_api_key"):
self.btype["chat"] = const.LINKAI
if not conf().get("voice_to_text") or conf().get("voice_to_text") in ["openai"]:
self.btype["voice_to_text"] = const.LINKAI
if not conf().get("text_to_voice") or conf().get("text_to_voice") in ["openai", const.TTS_1, const.TTS_1_HD]:
self.btype["text_to_voice"] = const.LINKAI
self.bots = {}
self.chat_bots = {}
self._agent_bridge = None
def refresh_voice(self):
"""Re-read voice_to_text / text_to_voice from config and drop the
cached voice bots so the next call picks up the new provider.
Used by the web console after the user edits voice settings.
Does NOT touch the agent_bridge / agent state.
"""
new_v2t = conf().get("voice_to_text") or self._auto_pick_voice_to_text()
new_t2v = conf().get("text_to_voice", "google")
if conf().get("use_linkai") and conf().get("linkai_api_key"):
if not conf().get("voice_to_text") or conf().get("voice_to_text") in ["openai"]:
new_v2t = const.LINKAI
if not conf().get("text_to_voice") or conf().get("text_to_voice") in ["openai", const.TTS_1, const.TTS_1_HD]:
new_t2v = const.LINKAI
self.btype["voice_to_text"] = new_v2t
self.btype["text_to_voice"] = new_t2v
self.bots.pop("voice_to_text", None)
self.bots.pop("text_to_voice", None)
logger.info(f"[Bridge] voice refreshed: voice_to_text={new_v2t}, text_to_voice={new_t2v}")
@staticmethod
def _auto_pick_voice_to_text() -> str:
"""Pick an ASR provider by configured api keys when voice_to_text is
unset. Order matches the web console: openai → dashscope → zhipu →
linkai. Falls back to 'openai' when nothing is configured so the
original "missing key" error is preserved.
"""
def has(k: str) -> bool:
v = (conf().get(k) or "").strip()
return v != "" and v not in ("YOUR API KEY", "YOUR_API_KEY")
for key, provider in (
("open_ai_api_key", "openai"),
("dashscope_api_key", "dashscope"),
("zhipu_ai_api_key", "zhipu"),
("linkai_api_key", "linkai"),
):
if has(key):
return provider
return "openai"
# 模型对应的接口
def get_bot(self, typename):
if self.bots.get(typename) is None:
logger.info("create bot {} for {}".format(self.btype[typename],typename))
logger.info("create bot {} for {}".format(self.btype[typename], typename))
if typename == "text_to_voice":
self.bots[typename] = voice_factory.create_voice(self.btype[typename])
self.bots[typename] = create_voice(self.btype[typename])
elif typename == "voice_to_text":
self.bots[typename] = voice_factory.create_voice(self.btype[typename])
self.bots[typename] = create_voice(self.btype[typename])
elif typename == "chat":
self.bots[typename] = bot_factory.create_bot(self.btype[typename])
self.bots[typename] = create_bot(self.btype[typename])
elif typename == "translate":
self.bots[typename] = create_translator(self.btype[typename])
return self.bots[typename]
def get_bot_type(self,typename):
def get_bot_type(self, typename):
return self.btype[typename]
def fetch_reply_content(self, query, context : Context) -> Reply:
def fetch_reply_content(self, query, context: Context) -> Reply:
return self.get_bot("chat").reply(query, context)
def fetch_voice_to_text(self, voiceFile) -> Reply:
return self.get_bot("voice_to_text").voiceToText(voiceFile)
def fetch_text_to_voice(self, text) -> Reply:
return self.get_bot("text_to_voice").textToVoice(text)
def fetch_translate(self, text, from_lang="", to_lang="en") -> Reply:
return self.get_bot("translate").translate(text, from_lang, to_lang)
def find_chat_bot(self, bot_type: str):
if self.chat_bots.get(bot_type) is None:
self.chat_bots[bot_type] = create_bot(bot_type)
return self.chat_bots.get(bot_type)
def reset_bot(self):
"""
重置bot路由
"""
self.__init__()
def get_agent_bridge(self):
"""
Get agent bridge for agent-based conversations
"""
if self._agent_bridge is None:
from bridge.agent_bridge import AgentBridge
self._agent_bridge = AgentBridge(self)
return self._agent_bridge
def fetch_agent_reply(self, query: str, context: Context = None,
on_event=None, clear_history: bool = False) -> Reply:
"""
Use super agent to handle the query
Args:
query: User query
context: Context object
on_event: Event callback for streaming
clear_history: Whether to clear conversation history
Returns:
Reply object
"""
agent_bridge = self.get_agent_bridge()
return agent_bridge.agent_reply(query, context, on_event, clear_history)

Some files were not shown because too many files have changed in this diff Show More