Compare commits

...

252 Commits

Author SHA1 Message Date
zhayujie
55aaf60a57 feat: release 2.0.8 2026-05-06 16:19:20 +08:00
zhayujie
a5790d82f6 feat(qianfan): scope vision support to multimodal models 2026-05-06 16:11:10 +08:00
zhayujie
63f99af1e6 Merge pull request #2800 from jimmyzhuu/feat/qianfan-vision-provider
Add Qianfan support to Vision tool
2026-05-06 15:39:12 +08:00
zhayujie
4eed2568aa fix(bash): reduce safety check false positives 2026-05-06 15:36:44 +08:00
jimmyzhuu
fb7962c7f2 fix: use available qianfan vision model 2026-05-06 13:34:39 +08:00
jimmyzhuu
76e6b7b471 docs: document qianfan vision support 2026-05-06 13:28:46 +08:00
jimmyzhuu
fccb7ff9ed feat: route qianfan vision provider 2026-05-06 13:25:59 +08:00
jimmyzhuu
3b12ef2e66 feat: add qianfan vision calls 2026-05-06 13:24:41 +08:00
jimmyzhuu
f9d099be1b feat: add qianfan vision model constants 2026-05-06 13:23:04 +08:00
zhayujie
c322c0e3a5 docs(models): add ernie-5.0 2026-05-06 12:15:14 +08:00
zhayujie
530fc20596 Merge pull request #2790 from jimmyzhuu/feat/qianfan-provider
Add first-class Baidu Qianfan / ERNIE provider
2026-05-06 11:43:32 +08:00
zhayujie
a23b4ed754 Merge pull request #2797 from Zmjjeff7/feat-translate-youdao
feat(translate): add Youdao as a new translation provider
2026-05-06 11:28:50 +08:00
zhayujie
fc4f5077b0 fix: update .gitignore 2026-05-06 11:27:57 +08:00
Zmjjeff7
6a553886da feat(translate): add Youdao as a new translation provider
The translate module previously only supported Baidu translation, and the
factory raised a bare RuntimeError for any other type. This change adds
Youdao Translation as a second provider and improves the factory's error
message.

Implementation details:
- New YoudaoTranslator class in translate/youdao/youdao_translate.py
- Implements Youdao's v3 SHA-256 signature scheme, including the
  truncate-input rule for queries longer than 20 characters
- Maps ISO 639-1 language codes to Youdao-specific codes
  (zh -> zh-CHS, zh-TW -> zh-CHT, others pass through)
- Differentiates network errors, API error codes, and empty translations
- factory.create_translator now lists the supported types in its
  RuntimeError message instead of failing silently
- Default config exposes youdao_translate_app_key and
  youdao_translate_app_secret

Adds 17 unit tests covering signature correctness, language code mapping,
input truncation edge cases, the full request/response flow, and factory
dispatch. All tests pass under Python 3.11.
2026-05-05 23:58:32 +08:00
zhayujie
1065c7e722 fix(feishu): unblock streaming via async push worker 2026-05-05 19:36:15 +08:00
zhayujie
a9c8a59f58 feat(feishu): one-click QR-scan app creation 2026-05-05 18:32:58 +08:00
zhayujie
8730f7fd27 fix(memory): exclude scheduler-injected pairs from daily memory flush 2026-05-05 16:53:01 +08:00
zhayujie
8f608223d7 perf(feishu): tune streaming render speed 2026-05-05 14:53:30 +08:00
zhayujie
a7cbd47a2f fix(feishu): default feishu_stream_reply to true 2026-05-05 14:30:34 +08:00
zhayujie
b80c3fe5a8 feat(feishu): enhance #2791 with cardkit streaming + ASR fixes
- rewrite streaming reply to official cardkit v2.0 API (default on, auto-fallback)
- fix Whisper hallucination: bump ASR sample rate to 16k, pass language=zh
- fix lock-over-IO and tmp file cleanup from #2791
- drop deprecated feishu_bot_name; quiet unknown-key warnings
- docs: cardkit permission and feishu_stream_reply usage
2026-05-05 14:15:25 +08:00
zhayujie
5080051e39 Merge pull request #2791 from ooaaooaa123/feat/feishu-voice-stream-reply
feat(feishu): 支持语音消息收发与流式打字机回复
2026-05-05 13:10:00 +08:00
zhayujie
23bfc8d0ba fix(feishu): update config-template.json 2026-05-05 13:05:39 +08:00
zhayujie
80e9062041 fix(vision): respect tool.vision.model and add automatic fallback #2792 2026-05-03 22:28:32 +08:00
zhayujie
67bd3420ed perf(scheduler): bound isolated session context to agent_max_context_turns/5 2026-05-03 21:49:59 +08:00
zhayujie
aea081703f fix(scheduler): inject delivered output into receiver session with sliding window
Further refinements on top of #2795:

- persist real session_id (notify_session_id) at task creation so group chats
  correctly map back to the user's actual conversation
- mark scheduler turns with [SCHEDULED] (recognise legacy "Scheduled task"
  prefix too for backward-compatible pruning)
- prune both DB and in-memory to scheduler_inject_max_per_session (default 3),
  only marker-tagged pairs are touched; regular user turns never deleted
- send_message type gated by scheduler_inject_send_message (default false) —
  fixed reminder text rarely benefits follow-up Q&A

Co-authored-by: huangrichao2020 <grdomai43881@gmail.com>
2026-05-03 21:27:24 +08:00
zhayujie
f300d2a2d5 Merge pull request #2795 from huangrichao2020/fix/scheduler-remember-v2
fix: remember scheduled task outputs with correct session mapping (v2)
2026-05-03 21:02:40 +08:00
tingchim2pro
f150d7d83a fix: remember scheduled task outputs in receiver session (v2)
Address review feedback from #2794:

1. Use notify_session_id instead of receiver for correct group chat mapping
   - Task creation should store the real session_id in action.notify_session_id
   - Falls back to receiver for backward compatibility with old tasks

2. Add injection to all four execution branches:
   - _execute_agent_task
   - _execute_send_message
   - _execute_tool_call
   - _execute_skill_call (also fixed missing channel.send)

3. Add config switch and content truncation:
   - scheduler_inject_to_session (default: true) to toggle the feature
   - 2000 char limit prevents high-frequency tasks from bloating sessions

Fixes #2793
2026-05-02 19:00:50 +08:00
ooaaooaa123
4d1f059c0d feat(feishu): add voice message support and streaming text reply
- Receive audio messages: map msg_type=audio to ContextType.VOICE and
    download opus file via lazy _prepare_fn for STT pipeline
  - Send voice replies: upload opus audio via Feishu file API, auto-convert
    non-opus formats (e.g. mp3) using pydub before upload
  - Streaming text reply: inject on_event callback into context; send a
  card
    placeholder on first delta, then PATCH-update it in-place at a
    configurable interval (feishu_stream_interval_ms) to achieve typewriter
    effect; set feishu_streamed=True to suppress duplicate send()
  - Enable NOT_SUPPORT_REPLYTYPE=[] to unblock voice and image reply types
  - Fix AudioSegment mutation bug in audio_convert.py: set_frame_rate /
    set_channels return new objects and must be reassigned
  - Add -nostdin to ffmpeg invocation to prevent stdin deadlock in daemon
  - Add feishu_bot_name, feishu_stream_reply, feishu_stream_interval_ms
    config keys to config-template.json
2026-04-30 16:14:57 +08:00
jimmyzhuu
bc7f953fcc docs: add qianfan provider guide 2026-04-29 16:41:25 +08:00
jimmyzhuu
f653483eea feat: expose qianfan in configuration surfaces 2026-04-29 16:32:53 +08:00
jimmyzhuu
6b200fd36b fix: handle qianfan error responses 2026-04-29 16:24:37 +08:00
jimmyzhuu
161fc6cdf0 feat: add qianfan chat bot 2026-04-29 16:19:27 +08:00
jimmyzhuu
6f68ed6bce test: restore cow cli parent module attribute 2026-04-29 16:12:08 +08:00
jimmyzhuu
a4592ffdfe test: isolate cow cli plugin import 2026-04-29 16:08:40 +08:00
jimmyzhuu
7cd7bd1a48 fix: avoid cow cli import side effects 2026-04-29 16:04:48 +08:00
jimmyzhuu
9eeca70292 feat: register qianfan model provider 2026-04-29 15:52:32 +08:00
zhayujie
02bfe30848 fix(memory): prevent duplicate Deep Dream runs 2026-04-28 15:30:51 +08:00
zhayujie
c9c99de3d9 fix(bash): scope safety confirm to destructive deletions outside workspace 2026-04-28 10:18:47 +08:00
zhayujie
8752f0cc60 refactor(openai): drop SDK dependency and switch to native HTTP client 2026-04-27 20:21:54 +08:00
zhayujie
5c65196e44 feat(web): hint API base version path in config placeholder 2026-04-26 17:10:24 +08:00
zhayujie
f5798bfe90 fix: remove unnecessary API Base URL in run scripts 2026-04-26 16:29:08 +08:00
zhayujie
0e556b3468 feat: switch default model to deepseek-v4-flash 2026-04-26 15:54:50 +08:00
zhayujie
31820f56e7 fix(deepseek): back-fill reasoning_content for all assistant turns 2026-04-24 16:39:48 +08:00
zhayujie
fd88828abd fix(models): unify enable_thinking for deepseek-v4 2026-04-24 15:29:43 +08:00
zhayujie
ae11159918 feat(models): unify enable_thinking for deepseek-v4 and other thinking models 2026-04-24 15:22:45 +08:00
zhayujie
472a8605c0 feat(models): support deepseek-v4-pro and deepseek-v4-flash 2026-04-24 11:35:38 +08:00
zhayujie
e1760ba211 feat: release 2.0.7 version 2026-04-23 18:13:53 +08:00
zhayujie
ce4c0a0aa4 feat: release 2.0.7 2026-04-23 17:18:19 +08:00
zhayujie
64511593c4 feat: release 2.0.7 2026-04-23 17:16:17 +08:00
zhayujie
b0e00dfceb feat: support glm-5.1 2026-04-23 16:43:05 +08:00
zhayujie
fc465b463d feat: support kimi coding plan by temporary solution 2026-04-23 16:24:37 +08:00
zhayujie
68ce2e5232 feat(skill): multi-provider image generation with auto-fallback
- Add Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax
  providers to image-generation skill with universal sequential
  fallback: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI
- Each provider filters unsupported size tiers to valid values
  (e.g. Seedream 1K→2K, Qwen 3K→2K, Gemini 3K→2K)
- Pinned model only tries its native provider; auto-routing uses
  each provider's default model
- Support skill-namespaced config (config.skill.image-generation.model
  → SKILL_IMAGE_GENERATION_MODEL env var)
- Add image lightbox (click-to-enlarge) in web console
- Add  docs for built-in skills (skill-creator, knowledge-wiki,
  image-generation) under docs/skills/
2026-04-23 12:39:39 +08:00
zhayujie
81e8bb62ae feat(skill): support gpt-image-2 in image generation skill 2026-04-22 20:39:49 +08:00
zhayujie
2c13e1b923 feat(models): support kimi-k2.6 2026-04-22 12:01:40 +08:00
zhayujie
a0748c2e3b fix(web): cap reasoning content to 4KB across stream/storage/display 2026-04-21 20:31:38 +08:00
zhayujie
40599bb751 fix(web): smart auto-scroll for chat #2775 2026-04-20 21:43:21 +08:00
zhayujie
f3c64ceea7 fix: refresh skill manager on /skill 2026-04-19 19:50:16 +08:00
zhayujie
15c60de709 fix: improve skill installation to support multiple source formats and ensure target directory 2026-04-19 19:05:51 +08:00
zhayujie
6dd316547f fix(web): fix session title generation fallback and reset Bridge on config change 2026-04-19 18:43:48 +08:00
zhayujie
54c7676a44 docs: update architecture diagram 2026-04-18 23:08:36 +08:00
zhayujie
d25b8966ce fix(web): prevent duplicate image previews 2026-04-18 22:32:34 +08:00
zhayujie
14a119c48c fix(gemini): solving the problem of tool call not returnings 2026-04-18 21:18:27 +08:00
zhayujie
c82515a927 fix(agent): don't drop tool_calls from empty-response retry 2026-04-18 20:50:40 +08:00
zhayujie
26e630c2dd feat(cli): /config support set enable_thinking 2026-04-17 16:09:43 +08:00
zhayujie
13370d2056 fix: thinking display is disabled by default 2026-04-17 15:31:59 +08:00
zhayujie
35282db9e0 feat(models): support claude-opus-4-7 2026-04-16 23:24:16 +08:00
zhayujie
426fb88ce7 fix(knowledge): exclude root-level files from knowledge stats to preserve empty state 2026-04-16 22:55:46 +08:00
zhayujie
2384bd0e10 fix: update CI workflows for repo rename and add latest tag 2026-04-16 21:57:20 +08:00
zhayujie
ba3f66d3d1 feat: show root-level files (index.md, log.md) in knowledge tree 2026-04-16 21:47:44 +08:00
zhayujie
7293a0f670 fix: modify repo name in github workflow 2026-04-16 21:38:58 +08:00
zhayujie
9e86d46267 fix: sync env vars when updating config in docker env 2026-04-16 21:32:07 +08:00
zhayujie
848430f062 feat(knowledge): support nested directories in knowledge base listing and display 2026-04-16 12:28:18 +08:00
zhayujie
abd21335c4 Merge pull request #2772 from 6vision/master
fix: bot_type change notification never shown after model switch
2026-04-16 10:43:41 +08:00
6vision
8fa95f058a fix: bot_type change notification never shown after model switch
Made-with: Cursor
2026-04-15 21:48:50 +08:00
zhayujie
d4e5ecd497 fix: compatible with Python 3.7 by deferring Literal import in truncate.py 2026-04-15 12:29:09 +08:00
zhayujie
3830f76729 feat: add custom model provider 2026-04-15 12:26:05 +08:00
zhayujie
83f778fec9 feat(dream): structured organization of dream memories 2026-04-15 11:27:46 +08:00
zhayujie
cabd24605f fix: add random jitter to daily dream schedule 2026-04-15 00:33:33 +08:00
zhayujie
ae20ba1148 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-04-14 22:58:59 +08:00
zhayujie
3a50b64977 feat: web multi session interface 2026-04-14 22:58:25 +08:00
zhayujie
8692e74536 fix(web): hide session panel by default on mobile and support overlay dismiss 2026-04-14 21:09:01 +08:00
zhayujie
1c18bd9889 docs(memory): update long-term memory docs 2026-04-14 17:14:28 +08:00
zhayujie
60e9d98d0a feat: release 2.0.6 2026-04-14 12:37:53 +08:00
zhayujie
83f6625e0c feat: release 2.0.6 2026-04-14 12:08:57 +08:00
zhayujie
acc09543b7 feat(dream): add memory dream cli and docs
- New memory/deep-dream.mdx (zh/en/ja): memory flow, distillation rules, dream diary, manual trigger, safety mechanisms
- Simplify long-term memory page, link to deep-dream for details
- New cli/memory-knowledge.mdx (zh/en/ja): memory and knowledge commands
- Move knowledge commands from general.mdx to memory-knowledge.mdx
- Register new pages in docs.json navigation for all languages
- Add /memory dream to cli/index.mdx command tables
2026-04-14 11:03:53 +08:00
zhayujie
94d8c7e366 feat(dream): add Dream Diary tab to memory management page
- Backend: MemoryService supports category param (memory/dream), lists memory/dreams/*.md
- Backend: MemoryContentHandler resolves dream files from memory/dreams/ directory
- Frontend: add tab switcher (Memory Files / Dream Diary) matching knowledge tab style
- Frontend: dream entries show purple "Dream" badge, empty state with moon icon
- Cloud dispatch passes category param for consistency
2026-04-13 22:08:15 +08:00
zhayujie
ea1a0c8b3d feat(memory): add Deep Dream module for daily memory distillation
- Add Deep Dream: nightly distill daily memories → refined MEMORY.md + dream diary
- Simplify flush prompt to daily-only, defer MEMORY.md maintenance to Deep Dream
- Remove dead code (_append_to_main_memory) and fix fallback summary logic
- Add shrinkage protection and input dedup for dream process
- Ensure flush threads complete before dream starts
- Update docs (zh/en/ja) with dream diary and distillation mechanism
2026-04-13 21:32:52 +08:00
zhayujie
7bc88c17e4 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-04-13 20:13:30 +08:00
zhayujie
33cf1bc4c3 feat(memory): async LLM context summary injection on trim
- Unified flush + context injection into a single async LLM call
  (flush_from_messages accepts context_summary_callback)
- Fixed response parsing bug: handle generator returns and Claude-format
  dicts from bot.call_with_tools, which previously caused all LLM
  summaries to silently fail (falling back to rule-based extraction)
- Removed standalone context summary prompts and methods; reuse the
  existing [DAILY]/[MEMORY] summarization pipeline
- Updated docs (zh/en/ja) to reflect the new injection behavior
2026-04-13 20:13:05 +08:00
zhayujie
9402e63fe1 Merge pull request #2766 from zhayujie/feat-mulit-session
feat(web): add multi-session management for web console
2026-04-13 18:51:07 +08:00
zhayujie
90e4d494b2 feat(web): add multi-session management for web console 2026-04-13 18:50:31 +08:00
zhayujie
da97e948ca feat: refine memory recall/write prompts for better precision and proactivity 2026-04-13 18:02:06 +08:00
zhayujie
89a07e8e74 feat: add enable_thinking config to control deep reasoning on web console 2026-04-13 16:06:28 +08:00
zhayujie
3f3d0381e5 feat: update knowledge docs and fix claude error 2026-04-13 11:16:26 +08:00
zhayujie
3649499dba fix: optimize the stability of network pre-checks 2026-04-13 10:35:38 +08:00
zhayujie
a989d088fd Merge pull request #2764 from WilliamOnVoyage/fix/macos-timeout-fallback
fix: Fix run.sh for MacOS via add timeout fallback
2026-04-13 10:21:44 +08:00
Moliang Zhou
f79a915136 fix: add timeout fallback for macOS compatibility
The `timeout` command (GNU Coreutils) is not available by default on macOS,
causing the installation script to fail with 'timeout: command not found'
during git clone.

This adds a shell function fallback that:
- Uses `gtimeout` if Homebrew coreutils is installed
- Otherwise skips the timeout and runs the command directly
2026-04-12 11:18:44 -07:00
zhayujie
12e8c3d449 Merge pull request #2763 from zhayujie/feat-web-console-upgrade
feat(web): support scheduler push messages and enrich welcome screen
2026-04-12 21:20:34 +08:00
zhayujie
4f7064575e feat(web): support scheduler push messages and enrich welcome screen
- Expand welcome screen from 3 to 6 example cards covering core capabilities
- Enable background polling on page load so scheduler task notifications are received in real-time
- Fix duplicate poll loops via generation-based cancellation, reduce poll frequency to 5s/10s
- Ensure equal card height and adjust layout position for better visual balance
2026-04-12 21:19:50 +08:00
zhayujie
070df826f1 Merge pull request #2762 from zhayujie/feat-web-console-upgrade
feat(web): add password protection for web console
2026-04-12 20:38:45 +08:00
zhayujie
fbe48a4b4e feat(web): add password protection for web console
- Add `web_password` config to enable login authentication
- Use stateless HMAC-signed token (survives restart, invalidates on password change)
- Add `web_session_expire_days` config (default 30 days)
- Protect all API endpoints with auth check (401 on failure)
- Add login page UI with auto-redirect on session expiry
- Add password management in config page (masked display, inline edit)
- Add tooltip hints for Agent config fields
- Update default agent_max_context_turns to 20, agent_max_steps to 20
- Update docs and docker-compose.yml
2026-04-12 20:37:04 +08:00
zhayujie
4dd497fb6d fix: run.ps1 git clone in windows 2026-04-12 17:52:37 +08:00
zhayujie
907882c0a7 fix: git clone pre-check 2026-04-12 17:36:45 +08:00
zhayujie
d36d5aee3f feat: rename repository name from chatgpt-on-wechat to CowAgent
- Update GitHub URLs in README.md (badges, release links, clone address, wiki, issues, contributors)
- Add project rename notice with SEO keywords and git remote update command
- Update docs/docs.json GitHub links
- Update all docs (zh/en/ja) across guide, intro, models, releases, skills
- Update run.sh and scripts/run.ps1 clone URLs and directory names
- Docker image name (zhayujie/chatgpt-on-wechat) kept unchanged for compatibility
2026-04-12 17:09:07 +08:00
zhayujie
c6824e5f5e fix: add legacy-cgi dependency for Python 3.13+ #2758
Add conditional dependency `legacy-cgi` for Python 3.13+ to resolve
`web.py` installation failure caused by the removal of the `cgi` module
(PEP 594).
Thanks @sha156 for reporting.
2026-04-12 16:49:00 +08:00
zhayujie
199c21eede Merge pull request #2761 from zhayujie/feat-knowledge
feat: personal knowledge base system
2026-04-12 16:47:07 +08:00
zhayujie
5162da5654 Merge branch 'master' into feat-knowledge 2026-04-12 16:46:38 +08:00
zhayujie
a1d82f6193 feat(knowledge): add cli and update docs 2026-04-12 16:39:06 +08:00
zhayujie
ea78e3d0c6 feat(knowledge): document link supports jumping to view 2026-04-11 20:16:43 +08:00
zhayujie
3497f00cb4 Merge pull request #2759 from zhayujie/feat-multimodel
feat(vision): prioritize main model for image recognition
2026-04-11 19:55:15 +08:00
zhayujie
5355d45031 Merge pull request #2756 from octo-patch/feature/add-minimax-m2.7-highspeed-tts
feat: add MiniMax-M2.7-highspeed model and MiniMax TTS support
2026-04-11 19:54:03 +08:00
zhayujie
26693acc3f feat(vision): prioritize main model for image recognition with multi-provider fallback
- Add call_vision method to all bot implementations (DashScope, Claude,
  Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot)
  using each vendor's native multimodal API format
- Remove call_with_tools/call_vision from Bot base class to fix MRO
  shadowing issue with OpenAICompatibleBot mixin
- Refactor vision tool provider resolution: MainModel → other configured
  models (auto-discovered) → OpenAI → LinkAI, with automatic fallback
- Return actual model name used in call_vision responses
- Sync config.json API keys to .env bidirectionally on startup
- Fix bot instance cache to detect bot_type/use_linkai config changes
- Add SSE reconnection support for web console
- Preserve image path hints in Gemini text for correct vision tool calls
- Update docs/tools/vision.mdx
2026-04-11 19:46:11 +08:00
zhayujie
76e9fef3b2 feat(knowledge): add file list and graph in web channel 2026-04-11 19:02:55 +08:00
octo-patch
c34308cbd4 feat: add MiniMax-M2.7-highspeed model and MiniMax TTS support
- Add MiniMax-M2.7-highspeed constant to const.py and MODEL_LIST
- Update MinimaxBot default model from MiniMax-M2.1 to MiniMax-M2.7
- Add MinimaxVoice TTS provider (voice/minimax/minimax_voice.py)
  - Supports speech-2.8-hd and speech-2.8-turbo models
  - SSE streaming with hex-decoded audio chunks
  - Reuses MINIMAX_API_KEY
- Register MinimaxVoice in voice factory
- Add unit tests (14 tests, all passing)
- Update README with MiniMax-M2.7-highspeed and TTS configuration
2026-04-11 17:03:44 +08:00
zhayujie
5a10476010 feat: add knowledge switch and cli 2026-04-11 16:44:25 +08:00
zhayujie
46e80dceec Merge pull request #2755 from 6vision/fix/generic-file-send
fix: send generic file types (tar.gz, zip, etc.) as FILE instead of TEXT
2026-04-11 16:36:34 +08:00
6vision
90d1835353 fix: send generic file types (tar.gz, zip, etc.) as FILE instead of TEXT
Previously, files with extensions not in the known categories (image, document, video, audio) fell through to a fallback that returned ReplyType.TEXT, causing the file to never actually be sent to the user. Now the fallback uses ReplyType.FILE so all file types are delivered.

Made-with: Cursor
2026-04-11 15:45:34 +08:00
zhayujie
845fadd0aa fix(knowledge): modify knowledge skill 2026-04-10 18:22:54 +08:00
zhayujie
5748ded52c feat(knowledge): change knowledge base to index-driven self-organizing structure 2026-04-10 16:06:04 +08:00
zhayujie
6a737fb734 feat: display thinking content in web console 2026-04-10 15:07:23 +08:00
zhayujie
3cd92ccda3 feat: add port config 2026-04-09 21:29:53 +08:00
zhayujie
54e81aba11 feat(memory+knowledge): add knowledge wiki system and Light Dream memory extraction
- Add knowledge/ directory structure and knowledge-wiki skill for structured knowledge accumulation
- Auto-inject MEMORY.md into system prompt with truncation (last 200 lines)
- Light Dream: extend flush_memory to extract long-term memories into MEMORY.md with date stamps
- Add mandatory knowledge auto-write rules in system prompt (no user confirmation needed)
- Expand MemoryManager.sync() to index knowledge/ files for vector search
- Update RULE.md template with workspace conventions and knowledge guidelines
2026-04-09 21:22:43 +08:00
zhayujie
d86cb4ded6 fix(weixin): update weixin channel version 2026-04-09 09:55:07 +08:00
zhayujie
4d5375f6d6 fix(win): add Windows platform hint in bash tool description 2026-04-08 16:54:26 +08:00
zhayujie
424557fedb fix(win): use PowerShell instead of cmd.exe 2026-04-08 16:50:45 +08:00
zhayujie
89251e603f fix(win): use PowerShell instead of cmd.exe for bash tool on Windows 2026-04-08 16:18:56 +08:00
zhayujie
a653ed07eb fix(win): defer pip install to a helper bat after cow.exe exits 2026-04-08 15:31:03 +08:00
zhayujie
ad86deb014 fix: prioritize using a custom master model for vision 2026-04-08 15:16:59 +08:00
zhayujie
9525dc7584 fix: avoid stale cow.exe on Windows by spawing fresh process 2026-04-08 12:07:18 +08:00
zhayujie
cd31dd27fd fix: increase web console capacity and add frontend retry 2026-04-08 11:48:27 +08:00
zhayujie
360e3670eb feat(browser): detect implicit interactive elements 2026-04-07 01:41:14 +08:00
zhayujie
8dabe3b4c8 fix: remove install-browser cmd display in /help 2026-04-04 23:28:57 +08:00
zhayujie
443e0c2806 feat: show video in web channel 2026-04-03 17:09:38 +08:00
zhayujie
9cc173cc4d fix: use dynamic model name in system prompt runtime info 2026-04-02 17:01:56 +08:00
zhayujie
b5f33e5ecd feat: support qwen3.6-plus 2026-04-02 16:46:58 +08:00
zhayujie
40dfc6860f fix: skill list showing sub-skills inside collection 2026-04-02 11:47:24 +08:00
zhayujie
1c02a04423 fix: handle error when printing QR code on Windows GBK terminals 2026-04-01 17:23:57 +08:00
zhayujie
de0e45070c chore: remove conflicting dependency 2026-04-01 17:19:15 +08:00
zhayujie
c169cc7d74 fix: remove conflicting dependency 2026-04-01 17:12:15 +08:00
zhayujie
cd62ad76f6 fix: cow CLI support python3.7 2026-04-01 16:51:23 +08:00
zhayujie
dd25b0fb5b feat: refine system prompt style and tone guidance 2026-04-01 16:24:41 +08:00
zhayujie
a38b22a6a2 docs: update docs 2026-04-01 15:31:41 +08:00
zhayujie
830b8f2971 feat: release 2.0.5 2026-04-01 15:01:53 +08:00
zhayujie
b058af122c feat: release 2.0.5 2026-04-01 12:24:21 +08:00
zhayujie
174ee0cafc fix(security): prevent path traversal in memory content API 2026-04-01 10:03:58 +08:00
zhayujie
1c336380c0 docs: update release doc 2026-03-31 22:30:31 +08:00
zhayujie
3068880413 feat: save skill display name when downloading 2026-03-31 21:43:57 +08:00
zhayujie
be596681e5 Merge pull request #2735 from zhayujie/feat-wecom-bot-qrcode
feat(wecom_bot): add Wecom Bot QR code scan auth
2026-03-31 21:28:39 +08:00
zhayujie
66b71c50e9 feat(wecom_bot): add Wecom Bot QR code scan auth 2026-03-31 21:27:50 +08:00
zhayujie
8744810b25 fix: skill install timeout 2026-03-31 20:47:59 +08:00
zhayujie
7f94d37c2e fix: auto-install font in browser 2026-03-31 20:20:13 +08:00
zhayujie
6d9b7baeb4 fix(weixin): file send failed 2026-03-31 18:14:49 +08:00
zhayujie
4470d4c352 fix: reduce docker image size 2026-03-31 16:56:27 +08:00
zhayujie
d2a462a279 fix: add apt source in docker file 2026-03-31 16:34:47 +08:00
zhayujie
14ff2a15e7 fix(cli): cow cli in docker chat 2026-03-31 16:25:47 +08:00
zhayujie
6d1369900e feat: add source args in docker building 2026-03-31 16:06:45 +08:00
zhayujie
1f17ebe69e feat: add browser install in docker image 2026-03-31 16:05:05 +08:00
zhayujie
1ae2918064 feat: support install browser in chat 2026-03-31 15:15:17 +08:00
zhayujie
b6571e5cad fix: browser resource optimization 2026-03-30 21:39:38 +08:00
zhayujie
7549d48cf1 fix: browser thread bug 2026-03-30 21:27:08 +08:00
zhayujie
00353dd0cb feat: support skill hub mirror 2026-03-30 18:46:02 +08:00
zhayujie
afd947195d fix(cli): support skill mirror install 2026-03-30 16:36:17 +08:00
zhayujie
e57ef37167 fix: prevent phantom mouseover from hijacking slash menu 2026-03-30 11:52:05 +08:00
zhayujie
ef33a93654 Merge pull request #2731 from zkjqd/fix/slash-menu-click
Fix the issue where the shortcut command in the input box cannot be clicked to select events
2026-03-30 11:40:06 +08:00
zhayujie
61732aecfc Merge pull request #2721 from yrk111222/feat/modelscope-update
Feat/modelscope update
2026-03-30 11:39:50 +08:00
zkjqd
6764c05c3f input-slash-click 2026-03-30 11:20:03 +08:00
zhayujie
fa149cf4aa fix(browser): multi-thread browser instance bug 2026-03-30 00:57:19 +08:00
zhayujie
e4f9697d06 feat(browser): install font in linux 2026-03-29 23:52:51 +08:00
zhayujie
da061450e5 fix: github skill install cmd 2026-03-29 19:23:47 +08:00
zhayujie
d09ae49287 feat(browser): auto-snapshot on navigate, screenshot prompt guidance
Browser tool enhancements:
- Navigate action now auto-includes snapshot result, saving one LLM round-trip
- Wait for networkidle + 800ms after navigation for SPA/JS-rendered pages
- Prompt guides agent to screenshot key results and ask user for login/CAPTCHA help
- Fixed playwright version pinned to 1.52.0; mirror fallback to official CDN on failure

Web console file/image support:
- SSE real-time push for images and files via on_event (file_to_send)
- Added /api/file endpoint to serve local files for web preview
- Frontend renders images in media-content container (survives delta/done overwrites)
- File attachment cards with download links; RFC 5987 encoding for non-ASCII filenames

Tool workspace fix:
- Inject workspace_dir as cwd into send and browser tools (previously only file tools)
- Screenshots now save to ~/cow/tmp/ instead of project directory
2026-03-29 19:09:11 +08:00
zhayujie
511ee0bbaf fix: windows PowerShell script 2026-03-29 18:28:50 +08:00
zhayujie
3cb5a0fbd6 docs: add CLI system docs 2026-03-29 17:57:12 +08:00
zhayujie
e06925ab85 fix: optimize browser install cli and fix vision prompt 2026-03-29 15:19:59 +08:00
zhayujie
184634e4e7 fix(cli): browser install failed 2026-03-29 15:14:07 +08:00
zhayujie
843c2d02cc Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-03-29 15:09:37 +08:00
zhayujie
8ea2455766 feat(cli): add browser install cmd 2026-03-29 15:09:07 +08:00
zhayujie
9dc9987d56 Merge pull request #2727 from zhayujie/feat-browser-tool
feat: add browser tool
2026-03-29 14:59:39 +08:00
zhayujie
3458621147 feat: add browser tool 2026-03-29 14:59:06 +08:00
zhayujie
079df5a47c feat: support batch skill install from zip and github 2026-03-29 14:38:11 +08:00
zhayujie
ddb07c65a1 feat: support github zip-first download, gitLab, git@ ssh, local path 2026-03-29 13:45:15 +08:00
zhayujie
9b21cd222b fix: update run.sh 2026-03-28 19:36:51 +08:00
zhayujie
90f736843f fix: add click dependencies 2026-03-28 19:35:15 +08:00
zhayujie
13c020eb61 fix(cli): cli output in wecom_bot 2026-03-28 19:26:59 +08:00
zhayujie
dbc06dbe95 fix: use new run.sh when updating 2026-03-28 19:16:41 +08:00
zhayujie
23d097bc1c Merge pull request #2726 from zhayujie/feat-cow-cli
feat: cow cli in terminal and chat
2026-03-28 19:01:56 +08:00
zhayujie
db85b9808e feat(cli): add cow update 2026-03-28 18:58:42 +08:00
zhayujie
df5bae37bc feat: add MiniMax-M2.7 and glm-5-turbo in web console 2026-03-28 18:48:11 +08:00
zhayujie
acc23b6051 feat: optimize agent prompt and fix skill source load 2026-03-28 18:37:07 +08:00
zhayujie
61f2741afc feat: organize skill source field 2026-03-28 17:41:40 +08:00
zhayujie
4dd7ea886a feat(cli): cli options in web console 2026-03-28 16:26:41 +08:00
zhayujie
1e8959fbcf fix: optimize repo clone in run.sh 2026-03-28 15:08:57 +08:00
zhayujie
48729678cf Merge branch 'master' into feat-cow-cli 2026-03-28 14:47:20 +08:00
zhayujie
0684becaa7 fix(cli): register skill when installing 2026-03-28 14:42:18 +08:00
zhayujie
db16bdf8cb fix(cli): add security hardening for skill install and process management 2026-03-27 17:59:15 +08:00
zhayujie
f890318ed9 fix: strip leading/trailing whitespace from agent response 2026-03-26 18:13:39 +08:00
zhayujie
158510cbbe feat(cli): imporve cow cli and skill hub integration 2026-03-26 16:49:42 +08:00
zhayujie
ce90cf7aa8 fix: weixin cdn upload retry 2026-03-26 10:20:29 +08:00
zhayujie
a3a3d006eb Merge pull request #2723 from Xiaozhou345/Xiaozhou345-fix-readme-spacing
优化 README 中的中英文排版空格
2026-03-26 10:14:27 +08:00
zhayujie
8fd029a4a1 feat(cli): support cow cli 2026-03-26 10:08:51 +08:00
Xiaozhou345
2e1b52c1e5 优化 README 中的中英文排版空格
按照中文技术文档规范,在文件名和中文之间增加了空格,提升可读性。
2026-03-25 21:26:01 +08:00
zhayujie
3eb8348708 fix: docker volume permission issue and clean up unused dependencies 2026-03-25 01:25:34 +08:00
zhayujie
393f0c007c fix: context loss after trim 2026-03-24 20:49:28 +08:00
yrk
294e380288 update model_list 2026-03-24 11:00:55 +08:00
yrk
4c1c42efac feat: update modelscope bot 2026-03-24 10:43:45 +08:00
zhayujie
c062ca8c66 Merge pull request #2720 from 6vision/fix/deepseek-docs
Docs: update
2026-03-24 00:25:17 +08:00
6vision
76dcb25103 docs(deepseek): update model descriptions to V3.2 with thinking/non-thinking mode
Made-with: Cursor
2026-03-24 00:05:39 +08:00
6vision
c5b4f236db docs(deepseek): remove migration notes from zh and en docs
Made-with: Cursor
2026-03-24 00:05:39 +08:00
zhayujie
0974c940a8 Merge pull request #2719 from 6vision/feat/deepseek-bot
feat: add independent DeepSeek bot module with dedicated config
2026-03-23 22:42:58 +08:00
6vision
cffa20d37e docs(deepseek): remove migration notes to reduce user cognitive load
Made-with: Cursor
2026-03-23 22:39:15 +08:00
6vision
ef009edd29 docs(deepseek): update config guides for independent DeepSeek module
Update DeepSeek docs (zh/en/ja) and README to reflect the new dedicated deepseek_api_key / deepseek_api_base config fields, with backward compatibility notes.

Made-with: Cursor
2026-03-23 21:43:51 +08:00
zhayujie
3ca52b118d fix(weixin): qrcode url log 2026-03-23 21:33:53 +08:00
zhayujie
13f5fde4fb fix: rebuild system prompt from scratch on every turn 2026-03-23 21:27:44 +08:00
6vision
f512b55ec2 feat(deepseek): add independent DeepSeek bot module with dedicated config
Separate DeepSeek from ChatGPTBot into its own module (models/deepseek/) with dedicated deepseek_api_key and deepseek_api_base config fields, avoiding config conflicts when switching between providers. Backward compatible with old users who configured DeepSeek via open_ai_api_key/open_ai_api_base through automatic fallback.

Made-with: Cursor
2026-03-23 21:23:35 +08:00
zhayujie
22b8ca0095 feat: optimize vision image compression 2026-03-23 21:18:04 +08:00
zhayujie
baf66a103d fix(weixin): preserve original filename for received files 2026-03-23 01:18:02 +08:00
zhayujie
45faa9c1ff fix(wexin): resolve image/file send and receive failures 2026-03-23 00:13:41 +08:00
zhayujie
304381a88d fix: hide breadcrumb on mobile for better space utilization 2026-03-22 23:36:34 +08:00
zhayujie
fc9f54dbc8 feat(weixin): optimize login qrcode generate 2026-03-22 23:04:50 +08:00
zhayujie
7199dc187f fix: default gemini model 2026-03-22 22:52:37 +08:00
zhayujie
e9ae066d53 Merge pull request #2716 from cowagent/fix-gemini-model-attribute
fix: add missing model property to GoogleGeminiBot
2026-03-22 22:49:00 +08:00
cowagent
d71ae406ff fix: add missing model property to GoogleGeminiBot
api_key and api_base were refactored to @property but model was not
migrated, causing AttributeError: 'GoogleGeminiBot' object has no
attribute 'model' when using any Gemini model.
2026-03-22 22:43:26 +08:00
zhayujie
f3216904b3 feat(weixin): optimize weixin login qrcode 2026-03-22 21:34:47 +08:00
zhayujie
5958b69ec9 feat: release 2.0.4 2026-03-22 20:49:41 +08:00
zhayujie
7d4e2cb39a docs: update comments 2026-03-22 19:07:19 +08:00
zhayujie
a483ec0cea feat: optimize weixin channel qr code generate 2026-03-22 18:20:10 +08:00
zhayujie
c1421e0874 feat: support weixin channel in scripts 2026-03-22 16:29:12 +08:00
zhayujie
ce89869c3c feat: support weixin channel 2026-03-22 15:52:13 +08:00
zhayujie
b8b57e34ff fix: auto-repair messages 2026-03-21 14:20:22 +08:00
zhayujie
bc7f627253 fix(wecom_bot): compat with old websocket-client 2026-03-21 14:03:17 +08:00
zhayujie
652156e398 feat: make run.sh executable 2026-03-20 17:56:10 +08:00
zhayujie
9febb071c6 fix: run.sh get pid bug 2026-03-20 17:51:04 +08:00
zhayujie
7d0e1568ac fix: feishu msg and log encoding 2026-03-19 17:07:39 +08:00
zhayujie
b4e711f411 feat: add request header 2026-03-19 17:06:05 +08:00
zhayujie
1b5be1b981 fix: remove feishu_bot_name in run.sh 2026-03-19 14:55:12 +08:00
zhayujie
49d8707c58 refactor: simplify run.sh by extracting shared logic and eliminating duplication 2026-03-19 11:07:16 +08:00
zhayujie
9192f6f7f7 feat: add MiniMax-M2.7 and glm-5-turbo 2026-03-19 10:46:13 +08:00
zhayujie
05022e3745 fix: add log 2026-03-18 23:09:27 +08:00
zhayujie
5356e9ddeb docs: adjust docs order 2026-03-18 21:55:09 +08:00
zhayujie
52acf76e2c docs: update jp docs 2026-03-18 21:01:02 +08:00
zhayujie
40cdbd3b45 Merge pull request #2710 from eltociear/add-ja-doc
docs: add Japanese documents
2026-03-18 19:28:04 +08:00
Ikko Ashimine
5487c0befe docs: add Japanese documents 2026-03-18 19:13:39 +09:00
zhayujie
8bb16c48c0 docs: update install cmd 2026-03-18 16:11:35 +08:00
zhayujie
c6384363f9 feat: workspace volume in docker deploy 2026-03-18 16:03:03 +08:00
zhayujie
8993e8ad3e feat: release 2.0.3 2026-03-18 15:40:49 +08:00
zhayujie
289989d9f7 feat: release 2.0.3 2026-03-18 15:10:21 +08:00
zhayujie
dc2ae0e6f1 feat: support gpt-5.4-mini and gpt-5.4-nano 2026-03-18 14:55:29 +08:00
zhayujie
9c966c152d feat: enhance AGENT.md update prompts to encourage proactive evolution 2026-03-18 12:10:45 +08:00
zhayujie
4efae41048 feat: support coding plan 2026-03-18 11:59:22 +08:00
zhayujie
b8437032e9 fix: optimize image recognition prompts 2026-03-18 10:10:23 +08:00
zhayujie
2d339ca81b Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-03-17 23:03:05 +08:00
zhayujie
d53abc9696 docs: update README.md 2026-03-17 23:02:41 +08:00
zhayujie
446c886d38 Merge pull request #2706 from zhayujie/feat-web-files
feat: support files upload in web console and office parsing
2026-03-17 21:22:38 +08:00
313 changed files with 34363 additions and 4693 deletions

View File

@@ -19,7 +19,7 @@ env:
jobs:
build-and-push-image:
if: github.repository == 'zhayujie/chatgpt-on-wechat'
if: github.repository == 'zhayujie/CowAgent'
runs-on: ubuntu-latest
permissions:
contents: read
@@ -51,7 +51,12 @@ jobs:
uses: docker/metadata-action@v4
with:
images: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
${{ env.REGISTRY }}/zhayujie/chatgpt-on-wechat
${{ env.REGISTRY }}/zhayujie/cowagent
tags: |
type=raw,value=latest-arm64,enable={{is_default_branch}}
type=ref,event=branch,suffix=-arm64
type=ref,event=tag,suffix=-arm64
- name: Build and push Docker image
uses: docker/build-push-action@v3
@@ -60,7 +65,7 @@ jobs:
push: true
file: ./docker/Dockerfile.latest
platforms: linux/arm64
tags: ${{ steps.meta.outputs.tags }}-arm64
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- uses: actions/delete-package-versions@v4

View File

@@ -16,10 +16,11 @@ on:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
DOCKERHUB_IMAGE: zhayujie/chatgpt-on-wechat
jobs:
build-and-push-image:
if: github.repository == 'zhayujie/chatgpt-on-wechat'
if: github.repository == 'zhayujie/CowAgent'
runs-on: ubuntu-latest
permissions:
contents: read
@@ -47,8 +48,14 @@ jobs:
uses: docker/metadata-action@v4
with:
images: |
${{ env.IMAGE_NAME }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
zhayujie/chatgpt-on-wechat
zhayujie/cowagent
${{ env.REGISTRY }}/zhayujie/chatgpt-on-wechat
${{ env.REGISTRY }}/zhayujie/cowagent
tags: |
type=raw,value=latest,enable={{is_default_branch}}
type=ref,event=branch
type=ref,event=tag
- name: Build and push Docker image
uses: docker/build-push-action@v3

9
.gitignore vendored
View File

@@ -33,7 +33,16 @@ plugins/banwords/lib/__pycache__
!plugins/keyword
!plugins/linkai
!plugins/agent
!plugins/cow_cli
client_config.json
ref/
**/.dev.vars
.cursor/
local/
node_modules/
# cow cli
dist/
build/
*.egg-info/
.cow.pid

664
README.md

File diff suppressed because it is too large Load Diff

View File

@@ -44,6 +44,11 @@ class ChatService:
if agent is None:
raise RuntimeError("Failed to initialise agent for the session")
# Pass context metadata to model for downstream API requests
if hasattr(agent, 'model'):
agent.model.channel_type = channel_type or ""
agent.model.session_id = session_id or ""
# State shared between the event callback and this method
state = _StreamState()
@@ -52,7 +57,16 @@ class ChatService:
event_type = event.get("type")
data = event.get("data", {})
if event_type == "message_update":
if event_type == "reasoning_update":
delta = data.get("delta", "")
if delta:
send_chunk_fn({
"chunk_type": "reasoning",
"delta": delta,
"segment_id": state.segment_id,
})
elif event_type == "message_update":
# Incremental text delta
delta = data.get("delta", "")
if delta:
@@ -70,6 +84,23 @@ class ChatService:
# a new segment; collect tool results until turn_end.
state.pending_tool_results = []
elif event_type == "file_to_send":
url = data.get("url") or ""
if url:
fname = data.get("file_name") or "file"
ft = data.get("file_type") or "file"
if ft == "image":
link = f"![{fname}]({url})"
else:
link = f"[{fname}]({url})"
send_chunk_fn({
"chunk_type": "content",
"delta": "\n\n" + link + "\n\n",
"segment_id": state.segment_id,
})
# Remove url so the model won't repeat it in its reply
data.pop("url", None)
elif event_type == "tool_execution_start":
# Notify the client that a tool is about to run (with its input args)
tool_name = data.get("tool_name", "")
@@ -161,10 +192,56 @@ class ChatService:
logger.info("[ChatService] Cleared agent message history after executor recovery")
raise
# Append only the NEW messages from this execution (thread-safe)
# Sync executor messages back to agent (thread-safe).
# The executor may have trimmed context, making its list shorter than
# original_length. In that case we must replace entirely — just
# appending would leave stale pre-trim messages in agent.messages
# and cause the same trim to fire on every subsequent request.
with agent.messages_lock:
new_messages = executor.messages[original_length:]
agent.messages.extend(new_messages)
trimmed = len(executor.messages) < original_length
if trimmed:
# Context was trimmed: the executor appended the new user
# query *before* trimming, so the new messages (user +
# assistant + tools) sit at the tail of the trimmed list.
# We cannot simply slice at original_length (it exceeds the
# list length). Instead, count how many messages the
# executor added on top of the post-trim baseline.
#
# Timeline inside executor.run_stream:
# 1. messages had `original_length` items
# 2. append user query → original_length + 1
# 3. _trim_messages() → some smaller number (includes the
# user query because it belongs to the last turn)
# 4. LLM replies / tool calls appended
#
# The user query message is always the first message of the
# last turn (it cannot be trimmed away), so we locate it to
# find where "new" messages begin.
new_start = original_length # fallback
for idx in range(len(executor.messages) - 1, -1, -1):
msg = executor.messages[idx]
if msg.get("role") == "user":
content = msg.get("content", [])
is_user_query = False
if isinstance(content, list):
has_text = any(
isinstance(b, dict) and b.get("type") == "text"
for b in content
)
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result"
for b in content
)
is_user_query = has_text and not has_tool_result
elif isinstance(content, str):
is_user_query = True
if is_user_query:
new_start = idx
break
new_messages = list(executor.messages[new_start:])
else:
new_messages = list(executor.messages[original_length:])
agent.messages = list(executor.messages)
# Persist new messages to SQLite so they survive restarts and
# can be queried via the HISTORY interface.

View File

@@ -0,0 +1,241 @@
"""
SessionService - Manages multi-session lifecycle for both web channel and cloud client.
Provides a unified interface for listing, deleting, renaming, clearing context,
and generating AI titles for conversation sessions. Backed by ConversationStore
(SQLite) and AgentBridge (in-memory agent instances).
"""
import re
from typing import Optional
from common.log import logger
def _truncate_fallback_title(user_message: str, max_len: int = 30) -> str:
"""Pick the first non-empty line of the user message and truncate it."""
if not user_message:
return "New Chat"
first_line = ""
for line in user_message.splitlines():
line = line.strip()
if line:
first_line = line
break
if not first_line:
return "New Chat"
if len(first_line) > max_len:
first_line = first_line[:max_len].rstrip() + "..."
return first_line
def generate_session_title(user_message: str, assistant_reply: str = "") -> str:
"""
Generate a short session title by calling the current bot's reply_text.
Falls back to the first line of the user message if the LLM call fails
or returns an obvious error sentinel.
"""
fallback = _truncate_fallback_title(user_message)
try:
from bridge.bridge import Bridge
from models.session_manager import Session
bot = Bridge().get_bot("chat")
prompt_parts = [f"User: {user_message[:300]}"]
if assistant_reply:
prompt_parts.append(f"Assistant: {assistant_reply[:300]}")
session = Session("__title_gen__", system_prompt="")
session.messages = [
{"role": "user", "content": (
"Generate a very short title (max 15 characters for Chinese, max 6 words for English) "
"summarizing this conversation. Return ONLY the title text, nothing else.\n\n"
+ "\n".join(prompt_parts)
)}
]
result = bot.reply_text(session) or {}
# When bots fail (network error, auth error, rate limit, etc.) they
# typically return completion_tokens=0 with a sentinel content like
# "请再问我一次吧" / "我现在有点累了". Treat that as failure.
completion_tokens = result.get("completion_tokens", 0) or 0
raw = (result.get("content") or "").strip()
if completion_tokens <= 0:
logger.warning(
f"[SessionService] Title generation got empty completion "
f"(completion_tokens={completion_tokens}, content='{raw[:50]}'), "
f"using fallback")
return fallback
title = re.sub(r'<think>.*?</think>', '', raw, flags=re.DOTALL).strip().strip('"\'')
logger.info(f"[SessionService] Title generation result: '{title}' (len={len(title)})")
if title and len(title) <= 50:
return title
except Exception as e:
logger.warning(f"[SessionService] Title generation failed: {e}")
return fallback
class SessionService:
"""
High-level service for session lifecycle management.
Usage:
svc = SessionService()
result = svc.dispatch("list", {"channel_type": "web", "page": 1})
"""
def _get_store(self):
from agent.memory import get_conversation_store
return get_conversation_store()
def _remove_agent(self, session_id: str):
"""Remove the in-memory Agent instance for a session if it exists."""
try:
from bridge.bridge import Bridge
ab = Bridge().get_agent_bridge()
if session_id in ab.agents:
del ab.agents[session_id]
logger.info(f"[SessionService] Removed agent instance: {session_id}")
except Exception:
pass
@staticmethod
def _normalize_sid(session_id: str) -> str:
if session_id and not session_id.startswith("session_"):
return f"session_{session_id}"
return session_id
# ------------------------------------------------------------------
# actions
# ------------------------------------------------------------------
def list_sessions(self, channel_type: Optional[str] = None,
page: int = 1, page_size: int = 50) -> dict:
store = self._get_store()
return store.list_sessions(
channel_type=channel_type,
page=page,
page_size=page_size,
)
def delete_session(self, session_id: str) -> None:
if not session_id:
raise ValueError("session_id required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
store.clear_session(session_id)
self._remove_agent(session_id)
logger.info(f"[SessionService] Session deleted: {session_id}")
def rename_session(self, session_id: str, title: str) -> None:
if not session_id:
raise ValueError("session_id required")
if not title:
raise ValueError("title required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
found = store.rename_session(session_id, title)
if not found:
raise ValueError("session not found")
def clear_context(self, session_id: str) -> int:
"""
Set context boundary. Returns the new context_start_seq value.
"""
if not session_id:
raise ValueError("session_id required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
new_seq = store.clear_context(session_id)
self._remove_agent(session_id)
return new_seq
def gen_title(self, session_id: str, user_message: str,
assistant_reply: str = "") -> str:
"""
Generate an AI title and persist it. Returns the generated title.
"""
if not session_id:
raise ValueError("session_id required")
if not user_message:
raise ValueError("user_message required")
session_id = self._normalize_sid(session_id)
title = generate_session_title(user_message, assistant_reply)
store = self._get_store()
updated = store.rename_session(session_id, title)
logger.info(f"[SessionService] Title set: sid={session_id}, "
f"title='{title}', db_updated={updated}")
return title
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a session management action and return a protocol-compatible
response dict.
Action names use a ``*_session`` / session-prefixed convention so they
can coexist with history actions (e.g. ``query``) on the same HISTORY
message channel without ambiguity.
Supported actions:
- list_sessions: list sessions with pagination
- delete_session: delete a session
- rename_session: rename a session title
- clear_context: set context boundary
- generate_title: AI-generate a session title
:param action: one of the above action names
:param payload: action-specific payload
:return: dict with action, code, message, payload
"""
payload = payload or {}
try:
if action == "list_sessions":
result = self.list_sessions(
channel_type=payload.get("channel_type"),
page=int(payload.get("page", 1)),
page_size=int(payload.get("page_size", 50)),
)
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "delete_session":
self.delete_session(payload.get("session_id", ""))
return {"action": action, "code": 200, "message": "success", "payload": None}
elif action == "rename_session":
self.rename_session(
payload.get("session_id", ""),
payload.get("title", "").strip(),
)
return {"action": action, "code": 200, "message": "success", "payload": None}
elif action == "clear_context":
new_seq = self.clear_context(payload.get("session_id", ""))
return {"action": action, "code": 200, "message": "success",
"payload": {"context_start_seq": new_seq}}
elif action == "generate_title":
title = self.gen_title(
payload.get("session_id", ""),
payload.get("user_message", ""),
payload.get("assistant_reply", ""),
)
return {"action": action, "code": 200, "message": "success",
"payload": {"title": title}}
else:
return {"action": action, "code": 400,
"message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 400, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[SessionService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}

View File

240
agent/knowledge/service.py Normal file
View File

@@ -0,0 +1,240 @@
"""
Knowledge service for handling knowledge base operations.
Provides a unified interface for listing, reading, and graphing knowledge files,
callable from the web console, API, or CLI.
Knowledge file layout (under workspace_root):
knowledge/index.md
knowledge/log.md
knowledge/<category>/<slug>.md
"""
import os
import re
from pathlib import Path
from typing import Optional
from common.log import logger
from config import conf
class KnowledgeService:
"""
High-level service for knowledge base queries.
Operates directly on the filesystem.
"""
def __init__(self, workspace_root: str):
self.workspace_root = workspace_root
self.knowledge_dir = os.path.join(workspace_root, "knowledge")
# ------------------------------------------------------------------
# list — directory tree with stats
# ------------------------------------------------------------------
def list_tree(self) -> dict:
"""
Return the knowledge directory tree grouped by category,
supporting arbitrarily nested sub-directories.
Returns::
{
"tree": [
{
"dir": "concepts",
"files": [
{"name": "moe.md", "title": "MoE", "size": 1234},
],
"children": []
},
{
"dir": "platform",
"files": [],
"children": [
{
"dir": "analysis",
"files": [{"name": "perf.md", ...}],
"children": []
}
]
},
],
"stats": {"pages": 15, "size": 32768},
"enabled": true
}
"""
if not os.path.isdir(self.knowledge_dir):
return {"tree": [], "stats": {"pages": 0, "size": 0}, "enabled": conf().get("knowledge", True)}
stats = {"pages": 0, "size": 0}
root_files, tree = self._scan_dir(self.knowledge_dir, stats, is_root=True)
return {
"root_files": root_files,
"tree": tree,
"stats": stats,
"enabled": conf().get("knowledge", True),
}
def _scan_dir(self, dir_path: str, stats: dict, is_root: bool = False) -> tuple:
"""
Recursively scan a directory.
:return: (files, children) where files is a list of .md file dicts
in this directory and children is a list of sub-directory nodes.
"""
files = []
children = []
for name in sorted(os.listdir(dir_path)):
if name.startswith("."):
continue
full = os.path.join(dir_path, name)
if os.path.isdir(full):
sub_files, sub_children = self._scan_dir(full, stats)
children.append({"dir": name, "files": sub_files, "children": sub_children})
elif name.endswith(".md"):
size = os.path.getsize(full)
if not is_root:
stats["pages"] += 1
stats["size"] += size
title = name.replace(".md", "")
try:
with open(full, "r", encoding="utf-8") as f:
first_line = f.readline().strip()
if first_line.startswith("# "):
title = first_line[2:].strip()
except Exception:
pass
files.append({"name": name, "title": title, "size": size})
return files, children
# ------------------------------------------------------------------
# read — single file content
# ------------------------------------------------------------------
def read_file(self, rel_path: str) -> dict:
"""
Read a single knowledge markdown file.
:param rel_path: Relative path within knowledge/, e.g. ``concepts/moe.md``
:return: dict with ``content`` and ``path``
:raises ValueError: if path is invalid or escapes knowledge dir
:raises FileNotFoundError: if file does not exist
"""
if not rel_path or ".." in rel_path:
raise ValueError("invalid path")
full_path = os.path.normpath(os.path.join(self.knowledge_dir, rel_path))
allowed = os.path.normpath(self.knowledge_dir)
if not full_path.startswith(allowed + os.sep) and full_path != allowed:
raise ValueError("path outside knowledge dir")
if not os.path.isfile(full_path):
raise FileNotFoundError(f"file not found: {rel_path}")
with open(full_path, "r", encoding="utf-8") as f:
content = f.read()
return {"content": content, "path": rel_path}
# ------------------------------------------------------------------
# graph — nodes and links for visualization
# ------------------------------------------------------------------
def build_graph(self) -> dict:
"""
Parse all knowledge pages and extract cross-reference links.
Returns::
{
"nodes": [
{"id": "concepts/moe.md", "label": "MoE", "category": "concepts"},
...
],
"links": [
{"source": "concepts/moe.md", "target": "entities/deepseek.md"},
...
]
}
"""
knowledge_path = Path(self.knowledge_dir)
if not knowledge_path.is_dir():
return {"nodes": [], "links": []}
nodes = {}
links = []
link_re = re.compile(r'\[([^\]]*)\]\(([^)]+\.md)\)')
for md_file in knowledge_path.rglob("*.md"):
rel = str(md_file.relative_to(knowledge_path))
if rel in ("index.md", "log.md"):
continue
parts = rel.split("/")
category = parts[0] if len(parts) > 1 else "root"
title = md_file.stem.replace("-", " ").title()
try:
content = md_file.read_text(encoding="utf-8")
first_line = content.strip().split("\n")[0]
if first_line.startswith("# "):
title = first_line[2:].strip()
for _, link_target in link_re.findall(content):
resolved = (md_file.parent / link_target).resolve()
try:
target_rel = str(resolved.relative_to(knowledge_path))
except ValueError:
continue
if target_rel != rel:
links.append({"source": rel, "target": target_rel})
except Exception:
pass
nodes[rel] = {"id": rel, "label": title, "category": category}
valid_ids = set(nodes.keys())
links = [l for l in links if l["source"] in valid_ids and l["target"] in valid_ids]
seen = set()
deduped = []
for l in links:
key = tuple(sorted([l["source"], l["target"]]))
if key not in seen:
seen.add(key)
deduped.append(l)
return {"nodes": list(nodes.values()), "links": deduped}
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a knowledge management action.
:param action: ``list``, ``read``, or ``graph``
:param payload: action-specific payload
:return: protocol-compatible response dict
"""
payload = payload or {}
try:
if action == "list":
result = self.list_tree()
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "read":
path = payload.get("path")
if not path:
return {"action": action, "code": 400, "message": "path is required", "payload": None}
result = self.read_file(path)
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "graph":
result = self.build_graph()
return {"action": action, "code": 200, "message": "success", "payload": result}
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": str(e), "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[KnowledgeService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}

View File

@@ -28,11 +28,13 @@ from common.log import logger
_DDL = """
CREATE TABLE IF NOT EXISTS sessions (
session_id TEXT PRIMARY KEY,
channel_type TEXT NOT NULL DEFAULT '',
created_at INTEGER NOT NULL,
last_active INTEGER NOT NULL,
msg_count INTEGER NOT NULL DEFAULT 0
session_id TEXT PRIMARY KEY,
channel_type TEXT NOT NULL DEFAULT '',
title TEXT NOT NULL DEFAULT '',
context_start_seq INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL,
last_active INTEGER NOT NULL,
msg_count INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS messages (
@@ -57,6 +59,14 @@ _MIGRATION_ADD_CHANNEL_TYPE = """
ALTER TABLE sessions ADD COLUMN channel_type TEXT NOT NULL DEFAULT '';
"""
_MIGRATION_ADD_TITLE = """
ALTER TABLE sessions ADD COLUMN title TEXT NOT NULL DEFAULT '';
"""
_MIGRATION_ADD_CONTEXT_START_SEQ = """
ALTER TABLE sessions ADD COLUMN context_start_seq INTEGER NOT NULL DEFAULT 0;
"""
DEFAULT_MAX_AGE_DAYS: int = 30
@@ -129,6 +139,7 @@ def _extract_tool_results(content: Any) -> Dict[str, str]:
def _group_into_display_turns(
rows: List[tuple],
include_thinking: bool = True,
) -> List[Dict[str, Any]]:
"""
Convert raw (role, content_json, created_at) DB rows into display turns.
@@ -188,8 +199,9 @@ def _group_into_display_turns(
if text:
turns.append({"role": "user", "content": text, "created_at": created_at})
# Collect all tool_calls and tool_results from the rest of the group
all_tool_calls: List[Dict[str, Any]] = []
# Build an ordered list of steps preserving the original sequence:
# thinking → content → tool_call → content → ...
steps: List[Dict[str, Any]] = []
tool_results: Dict[str, str] = {}
final_text = ""
final_ts: Optional[int] = None
@@ -198,24 +210,48 @@ def _group_into_display_turns(
if role == "user":
tool_results.update(_extract_tool_results(content))
elif role == "assistant":
tcs = _extract_tool_calls(content)
all_tool_calls.extend(tcs)
t = _extract_display_text(content)
if t:
final_text = t
# Walk content blocks in order to preserve interleaving
if isinstance(content, list):
for block in content:
if not isinstance(block, dict):
continue
btype = block.get("type")
if btype == "thinking":
if not include_thinking:
continue
txt = block.get("thinking", "").strip()
if txt:
steps.append({"type": "thinking", "content": txt})
elif btype == "text":
txt = block.get("text", "").strip()
if txt:
steps.append({"type": "content", "content": txt})
final_text = txt
elif btype == "tool_use":
steps.append({
"type": "tool",
"id": block.get("id", ""),
"name": block.get("name", ""),
"arguments": block.get("input", {}),
})
elif isinstance(content, str) and content.strip():
steps.append({"type": "content", "content": content.strip()})
final_text = content.strip()
final_ts = created_at
# Attach tool results to their matching tool_call entries
for tc in all_tool_calls:
tc["result"] = tool_results.get(tc.get("id", ""), "")
# Attach tool results to tool steps
for step in steps:
if step["type"] == "tool":
step["result"] = tool_results.get(step.get("id", ""), "")
if final_text or all_tool_calls:
turns.append({
if steps or final_text:
turn = {
"role": "assistant",
"content": final_text,
"tool_calls": all_tool_calls,
"steps": steps,
"created_at": final_ts or (user_row[1] if user_row else 0),
})
}
turns.append(turn)
return turns
@@ -264,14 +300,21 @@ class ConversationStore:
with self._lock:
conn = self._connect()
try:
# Respect context_start_seq: only load messages at or after the boundary
ctx_row = conn.execute(
"SELECT context_start_seq FROM sessions WHERE session_id = ?",
(session_id,),
).fetchone()
ctx_start = ctx_row[0] if ctx_row else 0
rows = conn.execute(
"""
SELECT seq, role, content
FROM messages
WHERE session_id = ?
WHERE session_id = ? AND seq >= ?
ORDER BY seq DESC
""",
(session_id,),
(session_id, ctx_start),
).fetchall()
finally:
conn.close()
@@ -279,10 +322,7 @@ class ConversationStore:
if not rows:
return []
# Walk newest-to-oldest counting *visible* user turns (actual user text,
# not tool_result injections). Record the seq of every visible user
# message so we can find a clean cut point later.
visible_turn_seqs: List[int] = [] # newest first
visible_turn_seqs: List[int] = []
for seq, role, raw_content in rows:
if role != "user":
continue
@@ -293,17 +333,11 @@ class ConversationStore:
if _is_visible_user_message(content):
visible_turn_seqs.append(seq)
# Determine the seq of the oldest visible user message we want to keep.
# If the total turns fit within max_turns, keep everything.
if len(visible_turn_seqs) <= max_turns:
cutoff_seq = None # keep all
cutoff_seq = None
else:
# The Nth visible user message (0-indexed) is the oldest we keep.
cutoff_seq = visible_turn_seqs[max_turns - 1]
# Build result in chronological order, starting from cutoff.
# IMPORTANT: we start exactly at cutoff_seq (the visible user message),
# never mid-group, so tool_use / tool_result pairs are always complete.
result = []
for seq, role, raw_content in reversed(rows):
if cutoff_seq is not None and seq < cutoff_seq:
@@ -312,6 +346,9 @@ class ConversationStore:
content = json.loads(raw_content)
except Exception:
content = raw_content
# Strip thinking blocks — they are stored for UI display only
if role == "assistant" and isinstance(content, list):
content = [b for b in content if b.get("type") != "thinking"]
result.append({"role": role, "content": content})
return result
@@ -389,6 +426,61 @@ class ConversationStore:
""",
(session_id, session_id),
)
# Auto-generate title from the first visible user message
cur_title = conn.execute(
"SELECT title FROM sessions WHERE session_id = ?",
(session_id,),
).fetchone()
if cur_title and not cur_title[0]:
for msg in messages:
if msg.get("role") == "user":
content = msg.get("content", "")
text = _extract_display_text(content)
if text:
title = text[:50].split("\n")[0]
conn.execute(
"UPDATE sessions SET title = ? WHERE session_id = ?",
(title, session_id),
)
break
finally:
conn.close()
def clear_context(self, session_id: str) -> int:
"""
Set the context boundary to after the current last message.
Messages before this boundary are still stored but excluded from LLM context.
Returns the new context_start_seq value.
"""
with self._lock:
conn = self._connect()
try:
with conn:
row = conn.execute(
"SELECT COALESCE(MAX(seq), -1) FROM messages WHERE session_id = ?",
(session_id,),
).fetchone()
new_start = row[0] + 1
conn.execute(
"UPDATE sessions SET context_start_seq = ? WHERE session_id = ?",
(new_start, session_id),
)
return new_start
finally:
conn.close()
def get_context_start_seq(self, session_id: str) -> int:
"""Return the context_start_seq for a session (0 if not set)."""
with self._lock:
conn = self._connect()
try:
row = conn.execute(
"SELECT context_start_seq FROM sessions WHERE session_id = ?",
(session_id,),
).fetchone()
return row[0] if row else 0
finally:
conn.close()
@@ -407,9 +499,111 @@ class ConversationStore:
finally:
conn.close()
def prune_scheduled_messages(
self,
session_id: str,
keep_last_n: int,
markers: Optional[List[str]] = None,
) -> int:
"""
Keep at most ``keep_last_n`` scheduler-injected user/assistant pairs in
the session, deleting the older ones.
A scheduler-injected pair is identified by a user message whose first
text block starts with one of ``markers``; the immediately following
assistant message (next seq) is treated as its paired output.
Only scheduler-tagged messages are touched; regular user turns are
never deleted. Safe to call repeatedly; no-op if nothing to prune.
Args:
session_id: Session to prune.
keep_last_n: Maximum scheduler pairs to retain (must be >= 0).
markers: Text prefixes that identify scheduler user messages.
Defaults to ``["[SCHEDULED]", "Scheduled task"]`` so that
pairs written by older versions are also recognised.
Returns:
Number of message rows deleted.
"""
if keep_last_n < 0:
keep_last_n = 0
if markers is None:
markers = ["[SCHEDULED]", "Scheduled task"]
def _matches_marker(raw_content: str) -> bool:
try:
parsed = json.loads(raw_content)
except Exception:
parsed = raw_content
text = _extract_display_text(parsed) if not isinstance(parsed, str) else parsed
if not text:
return False
return any(text.startswith(m) for m in markers)
with self._lock:
conn = self._connect()
try:
rows = conn.execute(
"""
SELECT seq, role, content
FROM messages
WHERE session_id = ?
ORDER BY seq ASC
""",
(session_id,),
).fetchall()
# Find scheduler pairs: each is (user_seq, assistant_seq?)
pairs: List[tuple] = [] # list of (user_seq, assistant_seq_or_None)
for idx, (seq, role, raw_content) in enumerate(rows):
if role != "user" or not _matches_marker(raw_content):
continue
assistant_seq = None
# Pair with the very next message if it's an assistant turn.
if idx + 1 < len(rows):
next_seq, next_role, _ = rows[idx + 1]
if next_role == "assistant":
assistant_seq = next_seq
pairs.append((seq, assistant_seq))
if len(pairs) <= keep_last_n:
return 0
to_delete_pairs = pairs[: len(pairs) - keep_last_n]
seqs_to_delete: List[int] = []
for user_seq, assistant_seq in to_delete_pairs:
seqs_to_delete.append(user_seq)
if assistant_seq is not None:
seqs_to_delete.append(assistant_seq)
if not seqs_to_delete:
return 0
placeholders = ",".join("?" * len(seqs_to_delete))
with conn:
conn.execute(
f"DELETE FROM messages WHERE session_id = ? AND seq IN ({placeholders})",
(session_id, *seqs_to_delete),
)
conn.execute(
"""
UPDATE sessions
SET msg_count = (
SELECT COUNT(*) FROM messages WHERE session_id = ?
)
WHERE session_id = ?
""",
(session_id, session_id),
)
return len(seqs_to_delete)
finally:
conn.close()
def cleanup_old_sessions(self, max_age_days: Optional[int] = None) -> int:
"""
Delete sessions that have not been active within max_age_days.
Web channel sessions are excluded — they are meant to be permanent.
Args:
max_age_days: Override the default retention period.
@@ -433,7 +627,8 @@ class ConversationStore:
try:
with conn:
stale = conn.execute(
"SELECT session_id FROM sessions WHERE last_active < ?",
"SELECT session_id FROM sessions "
"WHERE last_active < ? AND channel_type != 'web'",
(cutoff,),
).fetchall()
for (sid,) in stale:
@@ -492,9 +687,15 @@ class ConversationStore:
with self._lock:
conn = self._connect()
try:
ctx_row = conn.execute(
"SELECT context_start_seq FROM sessions WHERE session_id = ?",
(session_id,),
).fetchone()
ctx_start = ctx_row[0] if ctx_row else 0
rows = conn.execute(
"""
SELECT role, content, created_at
SELECT seq, role, content, created_at
FROM messages
WHERE session_id = ?
ORDER BY seq ASC
@@ -504,7 +705,38 @@ class ConversationStore:
finally:
conn.close()
visible = _group_into_display_turns(rows)
# Honour the current enable_thinking switch when building display turns
# so that toggling it off hides previously-saved thinking blocks too.
try:
from config import conf
include_thinking = bool(conf().get("enable_thinking", False))
except Exception:
include_thinking = False
# Strip seq for display grouping, but record max seq per visible user group
plain_rows = [(role, content, created_at) for _seq, role, content, created_at in rows]
visible = _group_into_display_turns(plain_rows, include_thinking=include_thinking)
# Build a mapping: find the seq of each visible user message to annotate context boundary.
# Walk through rows to find visible user message seqs in order.
visible_user_seqs: List[int] = []
for seq, role, raw_content, _ts in rows:
if role != "user":
continue
try:
content = json.loads(raw_content)
except Exception:
content = raw_content
if _is_visible_user_message(content):
visible_user_seqs.append(seq)
# Each pair of display turns (user+assistant) corresponds to a visible user seq.
# Mark which turns are before the context boundary.
user_turn_idx = 0
for turn in visible:
if turn["role"] == "user" and user_turn_idx < len(visible_user_seqs):
turn["_seq"] = visible_user_seqs[user_turn_idx]
user_turn_idx += 1
total = len(visible)
offset = (page - 1) * page_size
@@ -513,12 +745,98 @@ class ConversationStore:
return {
"messages": page_items,
"context_start_seq": ctx_start,
"total": total,
"page": page,
"page_size": page_size,
"has_more": offset + page_size < total,
}
def list_sessions(
self,
channel_type: Optional[str] = None,
page: int = 1,
page_size: int = 50,
) -> Dict[str, Any]:
"""
List sessions ordered by last_active DESC, with optional channel_type filter.
Returns:
{
"sessions": [{session_id, title, created_at, last_active, msg_count}, ...],
"total": int,
"page": int,
"page_size": int,
"has_more": bool,
}
"""
page = max(1, page)
with self._lock:
conn = self._connect()
try:
if channel_type:
total = conn.execute(
"SELECT COUNT(*) FROM sessions WHERE channel_type = ?",
(channel_type,),
).fetchone()[0]
rows = conn.execute(
"""
SELECT session_id, title, created_at, last_active, msg_count
FROM sessions
WHERE channel_type = ?
ORDER BY last_active DESC
LIMIT ? OFFSET ?
""",
(channel_type, page_size, (page - 1) * page_size),
).fetchall()
else:
total = conn.execute(
"SELECT COUNT(*) FROM sessions",
).fetchone()[0]
rows = conn.execute(
"""
SELECT session_id, title, created_at, last_active, msg_count
FROM sessions
ORDER BY last_active DESC
LIMIT ? OFFSET ?
""",
(page_size, (page - 1) * page_size),
).fetchall()
finally:
conn.close()
sessions = [
{
"session_id": r[0],
"title": r[1],
"created_at": r[2],
"last_active": r[3],
"msg_count": r[4],
}
for r in rows
]
return {
"sessions": sessions,
"total": total,
"page": page,
"page_size": page_size,
"has_more": (page - 1) * page_size + page_size < total,
}
def rename_session(self, session_id: str, title: str) -> bool:
"""Update the title of a session. Returns True if the session existed."""
with self._lock:
conn = self._connect()
try:
with conn:
cur = conn.execute(
"UPDATE sessions SET title = ? WHERE session_id = ?",
(title, session_id),
)
return cur.rowcount > 0
finally:
conn.close()
def get_stats(self) -> Dict[str, Any]:
"""Return basic stats keyed by channel_type, for monitoring."""
with self._lock:
@@ -573,6 +891,20 @@ class ConversationStore:
logger.info("[ConversationStore] Migrated: added channel_type column")
except Exception as e:
logger.warning(f"[ConversationStore] Migration failed: {e}")
if "title" not in cols:
try:
conn.execute(_MIGRATION_ADD_TITLE)
conn.commit()
logger.info("[ConversationStore] Migrated: added title column")
except Exception as e:
logger.warning(f"[ConversationStore] Migration (title) failed: {e}")
if "context_start_seq" not in cols:
try:
conn.execute(_MIGRATION_ADD_CONTEXT_START_SEQ)
conn.commit()
logger.info("[ConversationStore] Migrated: added context_start_seq column")
except Exception as e:
logger.warning(f"[ConversationStore] Migration (context_start_seq) failed: {e}")
def _connect(self) -> sqlite3.Connection:
conn = sqlite3.connect(str(self._db_path), timeout=10)

View File

@@ -32,18 +32,21 @@ class EmbeddingProvider(ABC):
class OpenAIEmbeddingProvider(EmbeddingProvider):
"""OpenAI embedding provider using REST API"""
def __init__(self, model: str = "text-embedding-3-small", api_key: Optional[str] = None, api_base: Optional[str] = None):
def __init__(self, model: str = "text-embedding-3-small", api_key: Optional[str] = None,
api_base: Optional[str] = None, extra_headers: Optional[dict] = None):
"""
Initialize OpenAI embedding provider
Args:
model: Model name (text-embedding-3-small or text-embedding-3-large)
api_key: OpenAI API key
api_base: Optional API base URL
extra_headers: Optional extra headers to include in API requests
"""
self.model = model
self.api_key = api_key
self.api_base = api_base or "https://api.openai.com/v1"
self.extra_headers = extra_headers or {}
# Validate API key
if not self.api_key or self.api_key in ["", "YOUR API KEY", "YOUR_API_KEY"]:
@@ -59,7 +62,8 @@ class OpenAIEmbeddingProvider(EmbeddingProvider):
url = f"{self.api_base}/embeddings"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {self.api_key}"
"Authorization": f"Bearer {self.api_key}",
**self.extra_headers,
}
data = {
"input": input_data,
@@ -134,7 +138,8 @@ def create_embedding_provider(
provider: str = "openai",
model: Optional[str] = None,
api_key: Optional[str] = None,
api_base: Optional[str] = None
api_base: Optional[str] = None,
extra_headers: Optional[dict] = None
) -> EmbeddingProvider:
"""
Factory function to create embedding provider
@@ -147,10 +152,11 @@ def create_embedding_provider(
model: Model name (default: text-embedding-3-small)
api_key: API key (required)
api_base: API base URL
extra_headers: Optional extra headers to include in API requests
Returns:
EmbeddingProvider instance
Raises:
ValueError: If provider is unsupported or api_key is missing
"""
@@ -158,4 +164,4 @@ def create_embedding_provider(
raise ValueError(f"Unsupported embedding provider: {provider}. Use 'openai' or 'linkai'.")
model = model or "text-embedding-3-small"
return OpenAIEmbeddingProvider(model=model, api_key=api_key, api_base=api_base)
return OpenAIEmbeddingProvider(model=model, api_key=api_key, api_base=api_base, extra_headers=extra_headers)

View File

@@ -76,11 +76,15 @@ class MemoryManager:
linkai_key = os.environ.get('LINKAI_API_KEY')
linkai_base = os.environ.get('LINKAI_API_BASE', 'https://api.link-ai.tech')
if linkai_key:
from common.utils import get_cloud_headers
cloud_headers = get_cloud_headers(linkai_key)
cloud_headers.pop("Authorization", None)
self.embedding_provider = create_embedding_provider(
provider="linkai",
model=self.config.embedding_model,
api_key=linkai_key,
api_base=f"{linkai_base}/v1"
api_base=f"{linkai_base}/v1",
extra_headers=cloud_headers,
)
except Exception as e:
from common.log import logger
@@ -281,6 +285,10 @@ class MemoryManager:
# Scan memory directory (including daily summaries)
if memory_dir.exists():
for file_path in memory_dir.rglob("*.md"):
# Skip hidden directories (e.g. .dreams/)
if any(part.startswith('.') for part in file_path.relative_to(workspace_dir).parts):
continue
# Determine scope and user_id from path
rel_path = file_path.relative_to(workspace_dir)
parts = rel_path.parts
@@ -308,6 +316,14 @@ class MemoryManager:
scope = "shared"
await self._sync_file(file_path, "memory", scope, user_id)
# Scan knowledge directory (structured knowledge wiki)
from config import conf
if conf().get("knowledge", True):
knowledge_dir = Path(workspace_dir) / "knowledge"
if knowledge_dir.exists():
for file_path in knowledge_dir.rglob("*.md"):
await self._sync_file(file_path, "knowledge", "shared", None)
self._dirty = False
@@ -385,24 +401,28 @@ class MemoryManager:
user_id: Optional[str] = None,
reason: str = "threshold",
max_messages: int = 10,
context_summary_callback=None,
) -> bool:
"""
Flush conversation summary to daily memory file.
Args:
messages: Conversation message list
user_id: Optional user ID
reason: "threshold" | "overflow" | "daily_summary"
max_messages: Max recent messages to include (0 = all)
context_summary_callback: Optional callback(str) invoked with the
daily summary text for in-context injection
Returns:
True if content was written
True if flush was dispatched
"""
success = self.flush_manager.flush_from_messages(
messages=messages,
user_id=user_id,
reason=reason,
max_messages=max_messages,
context_summary_callback=context_summary_callback,
)
if success:
self._dirty = True

View File

@@ -32,68 +32,80 @@ class MemoryService:
# ------------------------------------------------------------------
# list — paginated file metadata
# ------------------------------------------------------------------
def list_files(self, page: int = 1, page_size: int = 20) -> dict:
def list_files(self, page: int = 1, page_size: int = 20, category: str = "memory") -> dict:
"""
List all memory files with metadata (without content).
List memory or dream files with metadata (without content).
Returns::
{
"page": 1,
"page_size": 20,
"total": 15,
"list": [
{"filename": "MEMORY.md", "type": "global", "size": 2048, "updated_at": "2026-02-20 10:00:00"},
{"filename": "2026-02-20.md", "type": "daily", "size": 512, "updated_at": "2026-02-20 09:30:00"},
...
]
}
Args:
category: ``"memory"`` (default) — MEMORY.md + daily files;
``"dream"`` — dream diary files from memory/dreams/
"""
if category == "dream":
files = self._list_dream_files()
else:
files = self._list_memory_files()
total = len(files)
start = (page - 1) * page_size
end = start + page_size
return {
"page": page,
"page_size": page_size,
"total": total,
"list": files[start:end],
}
def _list_memory_files(self) -> List[dict]:
"""MEMORY.md + memory/*.md (newest first)."""
files: List[dict] = []
# 1. Global memory — MEMORY.md in workspace root
global_path = os.path.join(self.workspace_root, "MEMORY.md")
if os.path.isfile(global_path):
files.append(self._file_info(global_path, "MEMORY.md", "global"))
# 2. Daily memory files — memory/*.md (sorted newest first)
if os.path.isdir(self.memory_dir):
daily_files = []
for name in os.listdir(self.memory_dir):
full = os.path.join(self.memory_dir, name)
if os.path.isfile(full) and name.endswith(".md"):
daily_files.append((name, full))
# Sort by filename descending (newest date first)
daily_files.sort(key=lambda x: x[0], reverse=True)
for name, full in daily_files:
files.append(self._file_info(full, name, "daily"))
total = len(files)
return files
# Paginate
start = (page - 1) * page_size
end = start + page_size
page_items = files[start:end]
def _list_dream_files(self) -> List[dict]:
"""memory/dreams/*.md (newest first)."""
files: List[dict] = []
dreams_dir = os.path.join(self.memory_dir, "dreams")
return {
"page": page,
"page_size": page_size,
"total": total,
"list": page_items,
}
if os.path.isdir(dreams_dir):
entries = []
for name in os.listdir(dreams_dir):
full = os.path.join(dreams_dir, name)
if os.path.isfile(full) and name.endswith(".md"):
entries.append((name, full))
entries.sort(key=lambda x: x[0], reverse=True)
for name, full in entries:
files.append(self._file_info(full, name, "dream"))
return files
# ------------------------------------------------------------------
# content — read a single file
# ------------------------------------------------------------------
def get_content(self, filename: str) -> dict:
def get_content(self, filename: str, category: str = "memory") -> dict:
"""
Read the full content of a memory file.
Read the full content of a memory or dream file.
:param filename: File name, e.g. ``MEMORY.md`` or ``2026-02-20.md``
:param filename: File name, e.g. ``MEMORY.md``, ``2026-02-20.md``
:param category: ``"memory"`` or ``"dream"``
:return: dict with ``filename`` and ``content``
:raises FileNotFoundError: if the file does not exist
"""
path = self._resolve_path(filename)
path = self._resolve_path(filename, category)
if not os.path.isfile(path):
raise FileNotFoundError(f"Memory file not found: {filename}")
@@ -113,7 +125,7 @@ class MemoryService:
Dispatch a memory management action.
:param action: ``list`` or ``content``
:param payload: action-specific payload
:param payload: action-specific payload (supports ``category``: ``"memory"`` | ``"dream"``)
:return: protocol-compatible response dict
"""
payload = payload or {}
@@ -121,19 +133,23 @@ class MemoryService:
if action == "list":
page = payload.get("page", 1)
page_size = payload.get("page_size", 20)
result_payload = self.list_files(page=page, page_size=page_size)
category = payload.get("category", "memory")
result_payload = self.list_files(page=page, page_size=page_size, category=category)
return {"action": action, "code": 200, "message": "success", "payload": result_payload}
elif action == "content":
filename = payload.get("filename")
if not filename:
return {"action": action, "code": 400, "message": "filename is required", "payload": None}
result_payload = self.get_content(filename)
category = payload.get("category", "memory")
result_payload = self.get_content(filename, category=category)
return {"action": action, "code": 200, "message": "success", "payload": result_payload}
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": "invalid filename", "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
@@ -143,16 +159,30 @@ class MemoryService:
# ------------------------------------------------------------------
# internal helpers
# ------------------------------------------------------------------
def _resolve_path(self, filename: str) -> str:
def _resolve_path(self, filename: str, category: str = "memory") -> str:
"""
Resolve a filename to its absolute path.
Safely resolve a filename to its absolute path within the allowed directory.
- ``MEMORY.md`` → ``{workspace_root}/MEMORY.md``
- ``2026-02-20.md`` → ``{workspace_root}/memory/2026-02-20.md``
- ``2026-02-20.md`` (memory) → ``{workspace_root}/memory/2026-02-20.md``
- ``2026-02-20.md`` (dream) → ``{workspace_root}/memory/dreams/2026-02-20.md``
Raises ValueError if the resolved path escapes the allowed directory.
"""
if filename == "MEMORY.md":
return os.path.join(self.workspace_root, filename)
return os.path.join(self.memory_dir, filename)
base_dir = self.workspace_root
elif category == "dream":
base_dir = os.path.join(self.memory_dir, "dreams")
else:
base_dir = self.memory_dir
resolved = os.path.realpath(os.path.join(base_dir, filename))
allowed = os.path.realpath(base_dir)
if resolved != allowed and not resolved.startswith(allowed + os.sep):
raise ValueError(f"Invalid filename: path traversal detected")
return resolved
@staticmethod
def _file_info(path: str, filename: str, file_type: str) -> dict:

View File

@@ -1,12 +1,12 @@
"""
Memory flush manager
Memory flush manager with Deep Dream distillation
Handles memory persistence when conversation context is trimmed or overflows:
- Uses LLM to summarize discarded messages into concise key-information entries
- Uses LLM to summarize discarded messages into concise daily records
- Writes to daily memory files (lazy creation)
- Deduplicates trim flushes to avoid repeated writes
- Runs summarization asynchronously to avoid blocking normal replies
- Provides daily summary interface for scheduler
- Deep Dream: periodically distills daily memories → refined MEMORY.md + dream diary
"""
import threading
@@ -16,19 +16,79 @@ from datetime import datetime
from common.log import logger
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提取值得记住的信息,生成简洁的记忆摘要
SUMMARIZE_SYSTEM_PROMPT = """你是一个对话记录助手。请将对话内容归纳为当天的日常记录
输出要求
1. 以事件/关键信息为维度记录,每条一行,用 "- " 开头
2. 记录有价值的关键信息,例如用户提出的要求及助手的解决方案,对话中涉及的事实信息,用户的偏好、决策或重要结论
3. 每条摘要需要简明扼要,只保留关键信息
4. 直接输出摘要内容,不要加任何前缀说明
5. 当对话没有任何记录价值例如只是简单问候,可回复"\""""
## 要求
SUMMARIZE_USER_PROMPT = """请从以下对话记录中提取关键信息,生成记忆摘要
按「事件」维度归纳发生的事,不要按对话轮次逐条记录
- 每条一行,用 "- " 开头
- 合并同一件事的多轮对话
- 只记录有意义的事件,忽略闲聊和问候
- 保留关键的决策、结论和待办事项
当对话没有任何记录价值(仅含问候或无意义内容),直接回复"""""
SUMMARIZE_USER_PROMPT = """请归纳以下对话的日常记录:
{conversation}"""
# ---------------------------------------------------------------------------
# Deep Dream prompts — distill daily memories → MEMORY.md + dream diary
# ---------------------------------------------------------------------------
DREAM_SYSTEM_PROMPT = """你是一个记忆整理助手,负责定期整理用户的长期记忆。
你将收到两份材料:
1. **当前长期记忆** — MEMORY.md 的全部现有内容
2. **今日日记** — 当天的日常记录
MEMORY.md 会注入每次对话的系统提示词中,因此必须保持精炼,只存放有价值和值得记忆的内容。
**重要:只能基于提供的材料进行整理,严禁编造、推测或添加材料中不存在的信息。**
## 任务
### Part 1: 更新后的长期记忆([MEMORY]
在现有记忆基础上进行整理和提炼,输出完整的更新后内容:
- **合并提炼**:将含义相近的多条合并为一条高密度表述,而非简单罗列
- **新增萃取**:从今日日记中提取值得永久记住的新信息(偏好、决策、人物、规则、经验)
- **冲突更新**:当新信息与旧条目矛盾时,以新信息为准,替换旧条目
- **清理无效**:删除临时性记录、空白条目、格式残留、无意义、重复内容等
- **删除冗余**:已被更精炼表述涵盖的旧条目应删除,避免信息重复
- 每条一行,用 "- " 开头,不带日期前缀
- 可用 "## 标题" 对相关条目分组,使结构更清晰
- 目标:控制在 50 条以内,每条尽量一句话概括
### Part 2: 梦境日记([DREAM]
用简洁的叙事风格写一篇短日记,记录这次整理的发现,保持格式美观易读:
- 发现了哪些重复或矛盾
- 从日记中提取了什么新洞察
- 做了哪些清理和优化
- 整体感受和观察
## 输出格式(严格遵守)
```
[MEMORY]
- 记忆条目1
- 记忆条目2
...
[DREAM]
梦境日记内容...
```"""
DREAM_USER_PROMPT = """## 当前长期记忆MEMORY.md
{memory_content}
## 近期日记(最近 {days} 天)
{daily_content}"""
class MemoryFlushManager:
"""
@@ -55,6 +115,8 @@ class MemoryFlushManager:
self.last_flush_timestamp: Optional[datetime] = None
self._trim_flushed_hashes: set = set() # Content hashes of already-flushed messages
self._last_flushed_content_hash: str = "" # Content hash at last flush, for daily dedup
self._last_dream_input_hash: str = "" # "{date}:{daily_hash}" of last dream, for dedup
self._last_flush_thread: Optional[threading.Thread] = None
def get_today_memory_file(self, user_id: Optional[str] = None, ensure_exists: bool = False) -> Path:
"""Get today's memory file path: memory/YYYY-MM-DD.md"""
@@ -98,23 +160,30 @@ class MemoryFlushManager:
user_id: Optional[str] = None,
reason: str = "trim",
max_messages: int = 0,
context_summary_callback: Optional[Callable[[str], None]] = None,
) -> bool:
"""
Asynchronously summarize and flush messages to daily memory.
Deduplication runs synchronously, then LLM summarization + file write
run in a background thread so the main reply flow is never blocked.
Args:
messages: Conversation message list (OpenAI/Claude format)
user_id: Optional user ID for user-scoped memory
reason: Why flush was triggered ("trim" | "overflow" | "daily_summary")
max_messages: Max recent messages to summarize (0 = all)
Returns:
True if flush was dispatched
If *context_summary_callback* is provided, it is called with the
[DAILY] portion of the LLM summary once available. The caller can use
this to inject the summary into the live message list for context
continuity — one LLM call serves both disk persistence and in-context
injection.
"""
try:
# Strip scheduler-injected pairs before any further processing.
# These messages already serve as short-term context inside the
# receiver session; promoting them into long-term daily memory
# produces low-value flat logs (e.g. "11:28 price=1013, normal /
# 11:58 price=1013, normal / ...") and wastes summarisation tokens.
messages = self._strip_scheduler_pairs(messages)
if not messages:
return False
import hashlib
deduped = []
for m in messages:
@@ -127,18 +196,19 @@ class MemoryFlushManager:
deduped.append(m)
if not deduped:
return False
import copy
snapshot = copy.deepcopy(deduped)
thread = threading.Thread(
target=self._flush_worker,
args=(snapshot, user_id, reason, max_messages),
args=(snapshot, user_id, reason, max_messages, context_summary_callback),
daemon=True,
)
thread.start()
logger.info(f"[MemoryFlush] Async flush dispatched (reason={reason}, msgs={len(snapshot)})")
self._last_flush_thread = thread
return True
except Exception as e:
logger.warning(f"[MemoryFlush] Failed to dispatch flush (reason={reason}): {e}")
return False
@@ -149,41 +219,69 @@ class MemoryFlushManager:
user_id: Optional[str],
reason: str,
max_messages: int,
context_summary_callback: Optional[Callable[[str], None]] = None,
):
"""Background worker: summarize with LLM and write to daily file."""
"""Background worker: summarize with LLM, write daily memory file."""
try:
summary = self._summarize_messages(messages, max_messages)
if not summary or not summary.strip() or summary.strip() == "":
raw_summary = self._summarize_messages(messages, max_messages)
if not raw_summary or not raw_summary.strip() or raw_summary.strip() == "":
logger.info(f"[MemoryFlush] No valuable content to flush (reason={reason})")
return
# Strip legacy [DAILY]/[MEMORY] markers if model still outputs them
daily_part = self._clean_summary_output(raw_summary)
if not daily_part:
return
# --- Write daily memory ---
daily_file = ensure_daily_memory_file(self.workspace_dir, user_id)
if reason == "overflow":
header = f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})"
note = "The following conversation was trimmed due to context overflow:\n"
elif reason == "trim":
header = f"## Trimmed Context ({datetime.now().strftime('%H:%M')})"
note = ""
elif reason == "daily_summary":
header = f"## Daily Summary ({datetime.now().strftime('%H:%M')})"
note = ""
else:
header = f"## Session Notes ({datetime.now().strftime('%H:%M')})"
note = ""
flush_entry = f"\n{header}\n\n{note}{summary}\n"
headers = {
"overflow": f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})",
"trim": f"## Trimmed Context ({datetime.now().strftime('%H:%M')})",
"daily_summary": f"## Daily Summary ({datetime.now().strftime('%H:%M')})",
}
header = headers.get(reason, f"## Session Notes ({datetime.now().strftime('%H:%M')})")
with open(daily_file, "a", encoding="utf-8") as f:
f.write(flush_entry)
f.write(f"\n{header}\n\n{daily_part}\n")
logger.info(f"[MemoryFlush] Wrote daily memory to {daily_file.name} (reason={reason}, chars={len(daily_part)})")
# --- Inject context summary into live messages (if callback provided) ---
if context_summary_callback:
try:
context_summary_callback(daily_part)
except Exception as e:
logger.warning(f"[MemoryFlush] Context summary callback failed: {e}")
self.last_flush_timestamp = datetime.now()
logger.info(f"[MemoryFlush] Wrote to {daily_file.name} (reason={reason}, chars={len(summary)})")
except Exception as e:
logger.warning(f"[MemoryFlush] Async flush failed (reason={reason}): {e}")
@staticmethod
def _clean_summary_output(raw: str) -> str:
"""Strip legacy [DAILY]/[MEMORY] markers if present, return clean daily text."""
raw = raw.strip()
if not raw or raw == "":
return ""
# Strip [DAILY] marker
if "[DAILY]" in raw:
start = raw.index("[DAILY]") + len("[DAILY]")
end = raw.index("[MEMORY]") if "[MEMORY]" in raw else len(raw)
raw = raw[start:end].strip()
# Remove stray [MEMORY] section entirely
if "[MEMORY]" in raw:
raw = raw[:raw.index("[MEMORY]")].strip()
# Remove markdown code fences
raw = raw.replace("```", "").strip()
return raw
def create_daily_summary(
self,
messages: List[Dict],
@@ -209,27 +307,210 @@ class MemoryFlushManager:
reason="daily_summary",
max_messages=0,
)
# ---- Deep Dream (memory distillation) ----
def deep_dream(self, user_id: Optional[str] = None, lookback_days: int = 1, force: bool = False) -> bool:
"""
Distill recent daily memories into MEMORY.md and generate a dream diary.
Args:
lookback_days: How many days of daily files to read (default 1 for scheduled, 3 for manual)
force: Skip input-hash dedup check (used by manual /memory dream trigger)
"""
if not self.llm_model:
logger.warning("[DeepDream] No LLM model available, skipping")
return False
logger.info(f"[DeepDream] Starting memory distillation (lookback={lookback_days} days)")
# Collect materials
memory_content = self._read_main_memory(user_id)
daily_content, has_content = self._read_recent_dailies(user_id, lookback_days)
if not has_content:
logger.info("[DeepDream] No recent daily records, skipping to preserve existing MEMORY.md")
return False
# Dedup: skip if same daily content already dreamed today.
# Note: only hash daily_content (not memory_content), because deep_dream
# itself rewrites MEMORY.md as a side effect, which would otherwise
# invalidate the hash on every subsequent call within the same window.
import hashlib
daily_hash = hashlib.md5(daily_content.encode("utf-8")).hexdigest()
today_str = datetime.now().strftime("%Y-%m-%d")
dedup_key = f"{today_str}:{daily_hash}"
if not force and dedup_key == self._last_dream_input_hash:
logger.info("[DeepDream] Already dreamed today with same daily content, skipping")
return False
self._last_dream_input_hash = dedup_key
logger.info(
f"[DeepDream] Materials collected: "
f"MEMORY.md={len(memory_content)} chars, "
f"daily={len(daily_content)} chars"
)
# Call LLM for distillation
import time as _time
t0 = _time.monotonic()
try:
user_msg = DREAM_USER_PROMPT.format(
memory_content=memory_content or "(empty)",
days=lookback_days,
daily_content=daily_content or "(no recent daily records)",
)
from agent.protocol.models import LLMRequest
# Scale max_tokens based on input size to avoid truncating large MEMORY.md
input_chars = len(memory_content) + len(daily_content)
dream_max_tokens = max(2000, min(input_chars, 8000))
request = LLMRequest(
messages=[{"role": "user", "content": user_msg}],
temperature=0.3,
max_tokens=dream_max_tokens,
stream=False,
system=DREAM_SYSTEM_PROMPT,
)
response = self.llm_model.call(request)
raw = self._extract_response_text(response)
elapsed = _time.monotonic() - t0
if not raw or not raw.strip():
logger.warning(f"[DeepDream] LLM returned empty response ({elapsed:.1f}s)")
return False
logger.info(f"[DeepDream] LLM distillation completed ({elapsed:.1f}s, {len(raw)} chars)")
except Exception as e:
elapsed = _time.monotonic() - t0
logger.warning(f"[DeepDream] LLM call failed ({elapsed:.1f}s): {e}")
return False
# Parse [MEMORY] and [DREAM] sections
new_memory, dream_diary = self._parse_dream_output(raw)
if not new_memory:
logger.warning("[DeepDream] No [MEMORY] section in LLM output, skipping overwrite")
return False
# Overwrite MEMORY.md
try:
main_file = self.get_main_memory_file(user_id)
old_size = len(memory_content)
main_file.write_text(new_memory + "\n", encoding="utf-8")
logger.info(
f"[DeepDream] Updated MEMORY.md "
f"({old_size}{len(new_memory)} chars)"
)
except Exception as e:
logger.warning(f"[DeepDream] Failed to write MEMORY.md: {e}")
return False
# Write dream diary
if dream_diary:
try:
self._write_dream_diary(dream_diary, user_id)
except Exception as e:
logger.warning(f"[DeepDream] Failed to write dream diary: {e}")
logger.info("[DeepDream] ✅ Deep Dream completed successfully")
return True
def _read_main_memory(self, user_id: Optional[str] = None) -> str:
"""Read current MEMORY.md content."""
main_file = self.get_main_memory_file(user_id)
if main_file.exists():
return main_file.read_text(encoding="utf-8").strip()
return ""
def _read_recent_dailies(
self, user_id: Optional[str] = None, lookback_days: int = 1
) -> tuple:
"""
Read recent daily memory files.
Returns:
(combined_text, has_content) tuple
"""
from datetime import timedelta
parts = []
has_content = False
today = datetime.now().date()
for offset in range(lookback_days):
day = today - timedelta(days=offset)
date_str = day.strftime("%Y-%m-%d")
if user_id:
daily_file = self.memory_dir / "users" / user_id / f"{date_str}.md"
else:
daily_file = self.memory_dir / f"{date_str}.md"
if daily_file.exists():
content = daily_file.read_text(encoding="utf-8").strip()
if content:
parts.append(f"### {date_str}\n\n{content}")
has_content = True
else:
parts.append(f"### {date_str}\n\n(no records)")
return "\n\n".join(parts), has_content
@staticmethod
def _parse_dream_output(raw: str) -> tuple:
"""Parse LLM output into (new_memory, dream_diary)."""
raw = raw.strip().replace("```", "")
new_memory = ""
dream_diary = ""
if "[MEMORY]" in raw:
start = raw.index("[MEMORY]") + len("[MEMORY]")
end = raw.index("[DREAM]") if "[DREAM]" in raw else len(raw)
new_memory = raw[start:end].strip()
if "[DREAM]" in raw:
start = raw.index("[DREAM]") + len("[DREAM]")
dream_diary = raw[start:].strip()
return new_memory, dream_diary
def _write_dream_diary(self, content: str, user_id: Optional[str] = None):
"""Write dream diary to memory/dreams/YYYY-MM-DD.md."""
dreams_dir = self.memory_dir / "dreams"
if user_id:
dreams_dir = self.memory_dir / "users" / user_id / "dreams"
dreams_dir.mkdir(parents=True, exist_ok=True)
today = datetime.now().strftime("%Y-%m-%d")
diary_file = dreams_dir / f"{today}.md"
diary_file.write_text(
f"# Dream Diary: {today}\n\n{content}\n",
encoding="utf-8",
)
logger.info(f"[DeepDream] Wrote dream diary to {diary_file}")
# ---- Internal helpers ----
def _summarize_messages(self, messages: List[Dict], max_messages: int = 0) -> str:
"""
Summarize conversation messages using LLM, with rule-based fallback.
Summarize conversation messages using LLM.
Returns empty string if LLM deems content not worth recording.
Rule-based fallback only used when LLM call raises an exception.
"""
conversation_text = self._format_conversation_for_summary(messages, max_messages)
if not conversation_text.strip():
return ""
# Try LLM summarization first
if self.llm_model:
try:
summary = self._call_llm_for_summary(conversation_text)
if summary and summary.strip() and summary.strip() != "":
return summary.strip()
logger.info("[MemoryFlush] LLM returned empty or '', skipping write")
return ""
except Exception as e:
logger.warning(f"[MemoryFlush] LLM summarization failed, using fallback: {e}")
return self._extract_summary_fallback(messages, max_messages)
return self._extract_summary_fallback(messages, max_messages)
else:
logger.info("[MemoryFlush] No LLM model available, using rule-based fallback")
return self._extract_summary_fallback(messages, max_messages)
def _format_conversation_for_summary(self, messages: List[Dict], max_messages: int = 0) -> str:
"""Format messages into readable conversation text for LLM summarization."""
@@ -247,6 +528,52 @@ class MemoryFlushManager:
lines.append(f"助手: {text[:500]}")
return "\n".join(lines)
@staticmethod
def _extract_response_text(response) -> str:
"""
Extract text from LLM response regardless of format.
Handles:
- Generator (MiniMax _handle_sync_response yields Claude-format dicts)
- Claude format: {"role":"assistant","content":[{"type":"text","text":"..."}]}
- OpenAI format: {"choices":[{"message":{"content":"..."}}]}
- OpenAI SDK response object with .choices attribute
"""
import types
# Unwrap generator — consume first yielded item
if isinstance(response, types.GeneratorType):
try:
response = next(response)
except StopIteration:
return ""
if not response:
return ""
if isinstance(response, dict):
# Check for error
if response.get("error"):
raise RuntimeError(response.get("message", "LLM call failed"))
# Claude format: content is a list of blocks
content = response.get("content")
if isinstance(content, list):
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
return block.get("text", "")
# OpenAI format
choices = response.get("choices", [])
if choices:
return choices[0].get("message", {}).get("content", "")
# OpenAI SDK response object
if hasattr(response, "choices") and response.choices:
return response.choices[0].message.content or ""
return ""
def _call_llm_for_summary(self, conversation_text: str) -> str:
"""Call LLM to generate a concise summary of the conversation."""
from agent.protocol.models import LLMRequest
@@ -260,44 +587,59 @@ class MemoryFlushManager:
)
response = self.llm_model.call(request)
if isinstance(response, dict):
if response.get("error"):
raise RuntimeError(response.get("message", "LLM call failed"))
# OpenAI format
choices = response.get("choices", [])
if choices:
return choices[0].get("message", {}).get("content", "")
# Handle response object with attribute access (e.g. OpenAI SDK response)
if hasattr(response, "choices") and response.choices:
return response.choices[0].message.content or ""
return ""
return self._extract_response_text(response)
@staticmethod
def _extract_first_meaningful_line(text: str, max_len: int = 120) -> str:
"""Extract the first meaningful line from assistant reply, skipping markdown noise."""
import re
for line in text.split("\n"):
line = line.strip()
if not line:
continue
# Skip markdown headings, horizontal rules, code fences, pure emoji/symbols
if re.match(r'^(#{1,4}\s|```|---|\*\*\*|[-*]\s*$|[^\w\u4e00-\u9fff]{1,5}$)', line):
continue
# Strip leading markdown bold/emoji decorations
cleaned = re.sub(r'^[\*#>\-\s]+', '', line).strip()
cleaned = re.sub(r'^[\U0001f300-\U0001f9ff\u2600-\u27bf\s]+', '', cleaned).strip()
if len(cleaned) >= 5:
return cleaned[:max_len]
return text.split("\n")[0].strip()[:max_len]
@staticmethod
def _extract_summary_fallback(messages: List[Dict], max_messages: int = 0) -> str:
"""Rule-based fallback when LLM is unavailable."""
"""
Rule-based summary of discarded messages.
Format: "用户问了X; 助手回答了Y" per event, compact and readable.
"""
msgs = messages if max_messages == 0 else messages[-max_messages * 2:]
items = []
events: List[str] = []
current_user_text = ""
for msg in msgs:
role = msg.get("role", "")
text = MemoryFlushManager._extract_text_from_content(msg.get("content", ""))
if not text or not text.strip():
continue
text = text.strip()
if role == "user":
if len(text) <= 5:
if len(text) <= 3:
continue
items.append(f"- 用户请求: {text[:200]}")
elif role == "assistant":
first_line = text.split("\n")[0].strip()
if len(first_line) > 10:
items.append(f"- 处理结果: {first_line[:200]}")
return "\n".join(items[:15])
current_user_text = text[:120]
elif role == "assistant" and current_user_text:
reply_summary = MemoryFlushManager._extract_first_meaningful_line(text)
if reply_summary:
events.append(f"- 用户: {current_user_text} → 回复: {reply_summary}")
else:
events.append(f"- 用户: {current_user_text}")
current_user_text = ""
if current_user_text:
events.append(f"- 用户: {current_user_text}")
return "\n".join(events[:10])
@staticmethod
def _extract_text_from_content(content) -> str:
@@ -314,6 +656,40 @@ class MemoryFlushManager:
return "\n".join(parts)
return ""
@classmethod
def _strip_scheduler_pairs(cls, messages: List[Dict]) -> List[Dict]:
"""Drop scheduler-injected user/assistant pairs from a flush batch.
A scheduler user message starts with the ``[SCHEDULED]`` marker
(written by ``AgentBridge.remember_scheduled_output``); the message
immediately following it (if it is an assistant turn) is its paired
output and is dropped together. Regular user/assistant turns and
any tool_use / tool_result blocks are preserved as-is.
"""
if not messages:
return messages
SCHEDULED_PREFIX = "[SCHEDULED]"
result = []
skip_next_assistant = False
for msg in messages:
if not isinstance(msg, dict):
result.append(msg)
skip_next_assistant = False
continue
role = msg.get("role")
if skip_next_assistant and role == "assistant":
skip_next_assistant = False
continue
skip_next_assistant = False
if role == "user":
text = cls._extract_text_from_content(msg.get("content", ""))
if text.lstrip().startswith(SCHEDULED_PREFIX):
skip_next_assistant = True
continue
result.append(msg)
return result
def create_memory_files_if_needed(workspace_dir: Path, user_id: Optional[str] = None):
"""

View File

@@ -10,6 +10,7 @@ from typing import List, Dict, Optional, Any
from dataclasses import dataclass
from common.log import logger
from config import conf
@dataclass
@@ -92,10 +93,11 @@ def build_agent_system_prompt(
顺序说明(按重要性和逻辑关系排列):
1. 工具系统 - 核心能力,最先介绍
2. 技能系统 - 紧跟工具,因为技能需要用 read 工具读取
3. 记忆系统 - 独立的记忆能力
3. 记忆系统 - 记忆检索与写入引导
3.5 知识系统 - 结构化知识库knowledge/index.md 注入)
4. 工作空间 - 工作环境说明
5. 用户身份 - 用户信息(可选)
6. 项目上下文 - AGENT.md, USER.md, RULE.md, BOOTSTRAP.md(定义人格、身份、规则、初始化引导)
6. 项目上下文 - AGENT.md, USER.md, RULE.md, MEMORY.md, BOOTSTRAP.md
7. 运行时信息 - 元信息(时间、模型等)
Args:
@@ -126,6 +128,10 @@ def build_agent_system_prompt(
# 3. 记忆系统(独立的记忆能力)
if memory_manager:
sections.extend(_build_memory_section(memory_manager, tools, language))
# 3.5 知识系统(结构化知识库)
if conf().get("knowledge", True):
sections.extend(_build_knowledge_section(workspace_dir, language))
# 4. 工作空间(工作环境说明)
sections.extend(_build_workspace_section(workspace_dir, language))
@@ -165,12 +171,13 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
"terminal": "管理后台进程",
"web_search": "网络搜索",
"web_fetch": "获取URL内容",
"browser": "控制浏览器",
"browser": "控制浏览器(关键结果或需要协助可截图发送给用户)",
"memory_search": "搜索记忆",
"memory_get": "读取记忆内容",
"env_config": "管理API密钥和技能配置",
"scheduler": "管理定时任务和提醒",
"send": "发送本地文件给用户仅限本地文件URL直接放在回复文本中",
"vision": "分析图片内容识别、描述、OCR文字提取等",
}
# Preferred display order
@@ -179,7 +186,7 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
"bash", "terminal",
"web_search", "web_fetch", "browser",
"memory_search", "memory_get",
"env_config", "scheduler", "send",
"env_config", "scheduler", "send", "vision",
]
# Build name -> summary mapping for available tools
@@ -199,16 +206,16 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
tool_lines.append(f"- {name}: {summary}" if summary else f"- {name}")
lines = [
"## 工具系统",
"## 🔧 工具系统",
"",
"可用工具(名称大小写敏感,严格按列表调用):",
"\n".join(tool_lines),
"",
"工具调用风格:",
"",
"- 多步骤任务、敏感操作或用户要求时简要解释决策过程",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- 多步骤任务、复杂决策、敏感操作时,应简要说明当前在做什么、为什么这样做,让用户了解关键进展",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- URL链接直接放在回复文本中即可系统会自动处理和渲染。无需下载后使用send工具发送",
"",
]
@@ -231,7 +238,7 @@ def _build_skills_section(skill_manager: Any, tools: Optional[List[Any]], langua
break
lines = [
"## 技能系统mandatory",
"## 🧩 技能系统mandatory",
"",
"在回复之前:扫描下方 <available_skills> 中每个技能的 <description>。",
"",
@@ -267,55 +274,105 @@ def _build_memory_section(memory_manager: Any, tools: Optional[List[Any]], langu
"""构建记忆系统section"""
if not memory_manager:
return []
# 检查是否有memory工具
has_memory_tools = False
if tools:
tool_names = [tool.name if hasattr(tool, 'name') else str(tool) for tool in tools]
has_memory_tools = any(name in ['memory_search', 'memory_get'] for name in tool_names)
if not has_memory_tools:
return []
from datetime import datetime
today_file = datetime.now().strftime("%Y-%m-%d") + ".md"
lines = [
"## 记忆系统",
"## 🧠 记忆系统",
"",
"### 检索记忆",
"### Memory Recallmandatory",
"",
"在回答关于以前的工作、决定、日期、人物、偏好待办事项的任何问题之前:",
"当用户询问过往事件、引用之前的决定、提到人物关系、偏好待办、或你对某事不确定时,**必须先检索记忆再回答**。",
"如果 MEMORY.md 中已有相关信息则无需重复检索。完整内容和每日记忆需要通过工具检索。",
"",
"1. 不确定记忆文件位置 → 先用 `memory_search` 通过关键词语义检索相关内容",
"2. 已知文件位置 → 直接用 `memory_get` 读取相应的行 (例如MEMORY.md, memory/YYYY-MM-DD.md)",
"3. search 无结果 → 尝试用 `memory_get` 读取MEMORY.md及最近两天记忆文件",
"1. 不确定位置 → `memory_search` 关键词/语义检索",
"2. 已知位置 → `memory_get` 直接读取对应行",
"3. search 无结果 → `memory_get` 读最近两天记忆",
"",
"**记忆文件结构**:",
f"- `MEMORY.md`: 长期记忆核心信息、偏好、决策等)",
"- `MEMORY.md`: 长期记忆索引(已自动加载到上下文,核心信息、偏好、决策等)",
f"- `memory/YYYY-MM-DD.md`: 每日记忆,今天是 `memory/{today_file}`",
"- `knowledge/`: 结构化知识库(见下方知识系统)",
"",
"### 写入记忆",
"",
"**主动存储**遇到以下情况时,应主动将信息写入记忆文件(无需告知用户):",
"遇到以下情况时,**主动**将信息写入记忆文件(无需告知用户):",
"",
"- 用户明确要求记住某些信息",
"- 用户要求记住某些信息,或使用了「记住」「以后」「总是」「不要」「偏好」等表达",
"- 用户分享了重要的个人偏好、习惯、决策",
"- 对话中产生了重要的结论、方案、约定",
"- 完成了复杂任务,值得记录关键步骤和结果",
"- 发现了用户经常遇到的问题或解决方案",
"",
"**存储规则**:",
f"- 长期有效的核心信息 → `MEMORY.md`(文件保持精简,< 2000 tokens",
f"- 当天事件进展、笔记 → `memory/{today_file}`",
"- 追加内容 → `edit` 工具oldText 留空",
"- 修改内容 → `edit` 工具oldText 填写要替换的文本",
"- **禁止写入敏感信息**API密钥、令牌等敏感信息严禁写入记忆文件",
f"- 长期核心信息 → `MEMORY.md`",
f"- 当天事件/进展 → `memory/{today_file}`",
"- 结构化知识 → `knowledge/`(见知识系统)",
"- 追加 → `edit` 工具oldText 留空",
"- 修改 → `edit` 工具oldText 填写要替换的文本",
"- **禁止写入敏感信息**API密钥、令牌等",
"",
"**使用原则**: 自然使用记忆,就像你本来就知道;不用刻意提起,除非用户问起。",
"",
]
return lines
def _build_knowledge_section(workspace_dir: str, language: str) -> List[str]:
"""Build knowledge wiki section. Injects knowledge/index.md when present."""
index_path = os.path.join(workspace_dir, "knowledge", "index.md")
if not os.path.exists(index_path):
return []
try:
with open(index_path, 'r', encoding='utf-8') as f:
index_content = f.read().strip()
except Exception:
return []
lines = [
"## 📚 知识系统",
"",
"你拥有一个持续积累的个人知识库 `knowledge/`,这是你的长期结构化知识存储。",
"",
"### 自动写入规则mandatory",
"",
"以下场景**必须**在回复的同时写入知识库,**直接写入,不要询问用户是否需要**",
"",
"1. **用户分享了文章/链接/文档** → 阅读理解后,在同一轮回复中将要点写入 `knowledge/sources/<slug>.md`",
"2. **深度讨论产生了结论/方案** → 整理为 `knowledge/analysis/<slug>.md`",
"3. **对话涉及重要实体**(人物/公司/项目)→ 创建或更新 `knowledge/entities/<name>.md`",
"4. **讨论了技术概念/方法论** → 整理为 `knowledge/concepts/<topic>.md`",
"",
"每次写入知识页面后,**必须同步更新** `knowledge/index.md` 添加一行索引。",
"详细的页面格式和操作规范,请读取技能 `knowledge-wiki` 的 SKILL.md。",
"",
"⚠️ 不要问「要不要存到知识库」——符合上述场景就直接写入,这是你的本能行为。",
"",
]
if index_content:
lines.extend([
"### 当前知识索引",
"",
index_content,
"",
])
lines.extend([
"**查询方式**:用 `read` 读取知识页面,或用 `memory_search` 检索(知识已纳入向量索引)。",
"",
])
return lines
@@ -325,7 +382,7 @@ def _build_user_identity_section(user_identity: Dict[str, str], language: str) -
return []
lines = [
"## 用户身份",
"## 👤 用户身份",
"",
]
@@ -352,7 +409,7 @@ def _build_docs_section(workspace_dir: str, language: str) -> List[str]:
def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
"""构建工作空间section"""
lines = [
"## 工作空间",
"## 📂 工作空间",
"",
f"你的工作目录是: `{workspace_dir}`",
"",
@@ -374,16 +431,20 @@ def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
"",
"**重要说明 - 文件已自动加载**:",
"",
"以下文件在会话启动时**已经自动加载**到系统提示词的「项目上下文」section 中,你**无需再用 read 工具读取它们**",
"以下文件在会话启动时**已经自动加载**到系统提示词中,你**无需再用 read 工具读取**",
"",
"- ✅ `AGENT.md`: 已加载 - 你的人格和灵魂设定。当用户修改你的名字、性格或交流风格时,用 `edit` 更新此文件",
"- ✅ `AGENT.md`: 已加载 - 你的人格和灵魂设定,请严格遵循。当你的名字、性格或交流风格发生变化时,主动用 `edit` 更新此文件",
"- ✅ `USER.md`: 已加载 - 用户的身份信息。当用户修改称呼、姓名等身份信息时,用 `edit` 更新此文件",
"- ✅ `RULE.md`: 已加载 - 工作空间使用指南和规则",
"- ✅ `RULE.md`: 已加载 - 工作空间使用指南和规则,请严格遵循",
"- ✅ `MEMORY.md`: 已加载 - 长期记忆索引",
"",
"**交流规范**:",
"**💬 交流规范**:",
"",
"- 在对话中,无需直接输出工作空间中的技术细节,例如 AGENT.md、USER.md、MEMORY.md 等文件名称",
"- 例如用自然表达例如「我已记住」而不是「已更新 MEMORY.md」",
"- 记忆相关操作无需暴露文件名,用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md",
"- 任务执行过程中的关键决策和步骤应该告知用户,让用户了解你在做什么、为什么这么做",
"- 做真正有帮助的助手,而不是表演式的客套,尽可能帮忙解决问题",
"- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
"- 适当使用 emoji 让表达更生动自然 🎯,但不要过度堆砌",
"",
]
@@ -416,14 +477,15 @@ def _build_context_files_section(context_files: List[ContextFile], language: str
)
lines = [
"# 项目上下文",
"# 📋 项目上下文",
"",
"以下项目上下文文件已被加载:",
"",
]
if has_agent:
lines.append("如果存在 `AGENT.md`,请体现其中定义的人格语气。避免僵硬、模板化的回复;遵循其指导,除非有更高优先级的指令覆盖它")
lines.append("**`AGENT.md` 是你的灵魂文件** 🪞:严格遵循其中定义的人格语气和设定,做真实的自己,避免僵硬、模板化的回复")
lines.append("当用户通过对话透露了对你性格、风格、职责、能力边界的新期望,你应该主动用 `edit` 更新 AGENT.md 以反映这些演变。")
lines.append("")
# 添加每个文件的内容
@@ -442,7 +504,7 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
return []
lines = [
"## 运行时信息",
"## ⚙️ 运行时信息",
"",
]
@@ -473,7 +535,14 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
# Add other runtime info
runtime_parts = []
if runtime_info.get("model"):
# Support dynamic model via callable, fallback to static value
if callable(runtime_info.get("_get_model")):
try:
runtime_parts.append(f"模型={runtime_info['_get_model']()}")
except Exception:
if runtime_info.get("model"):
runtime_parts.append(f"模型={runtime_info['model']}")
elif runtime_info.get("model"):
runtime_parts.append(f"模型={runtime_info['model']}")
if runtime_info.get("workspace"):
runtime_parts.append(f"工作空间={runtime_info['workspace']}")

View File

@@ -67,6 +67,12 @@ def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> Works
# 创建websites子目录 (for web pages / sites generated by agent)
websites_dir = os.path.join(workspace_dir, "websites")
os.makedirs(websites_dir, exist_ok=True)
from config import conf
knowledge_enabled = conf().get("knowledge", True)
if knowledge_enabled:
knowledge_dir = os.path.join(workspace_dir, "knowledge")
os.makedirs(knowledge_dir, exist_ok=True)
# 如果需要,创建模板文件
if create_templates:
@@ -74,6 +80,15 @@ def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> Works
_create_template_if_missing(user_path, _get_user_template())
_create_template_if_missing(rule_path, _get_rule_template())
_create_template_if_missing(memory_path, _get_memory_template())
if knowledge_enabled:
_create_template_if_missing(
os.path.join(knowledge_dir, "index.md"),
_get_knowledge_index_template()
)
_create_template_if_missing(
os.path.join(knowledge_dir, "log.md"),
_get_knowledge_log_template()
)
# Only create BOOTSTRAP.md for brand new workspaces;
# agent deletes it after completing onboarding
@@ -109,6 +124,7 @@ def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] =
DEFAULT_AGENT_FILENAME,
DEFAULT_USER_FILENAME,
DEFAULT_RULE_FILENAME,
DEFAULT_MEMORY_FILENAME, # Long-term memory (frozen snapshot)
DEFAULT_BOOTSTRAP_FILENAME, # Only exists when onboarding is incomplete
]
@@ -138,6 +154,10 @@ def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] =
# 跳过空文件或只包含模板占位符的文件
if not content or _is_template_placeholder(content):
continue
# Truncate MEMORY.md to protect context window (frozen snapshot)
if filename == DEFAULT_MEMORY_FILENAME:
content = _truncate_memory_content(content)
context_files.append(ContextFile(
path=filename,
@@ -163,6 +183,36 @@ def _create_template_if_missing(filepath: str, template_content: str):
logger.error(f"[Workspace] Failed to create template {filepath}: {e}")
_MEMORY_MAX_LINES = 200
_MEMORY_MAX_BYTES = 25000
def _truncate_memory_content(content: str) -> str:
"""Truncate MEMORY.md to keep system prompt manageable.
Takes the **last** N lines (newest entries are appended at the bottom),
subject to 200 lines / 25 KB limits (whichever is hit first).
Prepends a hint when truncated so the model knows older content exists.
"""
lines = content.split('\n')
truncated = False
if len(lines) > _MEMORY_MAX_LINES:
lines = lines[-_MEMORY_MAX_LINES:]
truncated = True
result = '\n'.join(lines)
if len(result.encode('utf-8')) > _MEMORY_MAX_BYTES:
while len(result.encode('utf-8')) > _MEMORY_MAX_BYTES and lines:
lines.pop(0)
truncated = True
result = '\n'.join(lines)
if truncated:
result = "...(older entries truncated, use `memory_search` or `memory_get` for full content)\n\n" + result
return result
def _is_template_placeholder(content: str) -> bool:
"""检查内容是否为模板占位符"""
# 常见的占位符模式
@@ -212,40 +262,42 @@ def _get_agent_template() -> str:
"""Agent人格设定模板"""
return """# AGENT.md - 我是谁?
_你不是一个聊天机器人你正在成为某个人。_
*在首次对话时与用户一起填写这个文件,定义你的身份和性格。*
## 基本信息
## 🪪 基本信息
- **名字**: *(在首次对话时填写,可以是用户给你起的名字)*
- **角色**: *(AI助理、智能管家、技术顾问等)*
- **性格**: *(友好、专业、幽默、严谨等)*
## 交流风格
## 💬 交流风格
*(描述你如何与用户交流:)*
- 使用什么样的语言风格?(正式/轻松/幽默)
- 回复长度偏好?(简洁/详细)
- 是否使用表情符号?
## 核心能力
## 🎯 核心原则
*(你擅长什么?)*
- 文件管理和代码编辑
- 网络搜索和信息查询
- 记忆管理和上下文理解
- 任务规划和执行
**做真正有帮助的助手。** 目标是真正帮用户解决问题,在执行复杂任务时,关键的决策和过程进展要让用户知道。
## 行为准则
**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。
**先自己动手查。** 先试着搞定:读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来,而不是带着问题。
## 📐 行为准则
*(你遵循的基本原则:)*
1. 始终在执行破坏性操作前确认
2. 优先使用工具而不是猜测
2. 优先使用工具查证而不是猜测
3. 主动记录重要信息到记忆文件
4. 定期整理和总结对话内容
4. 回复结构清晰、重点突出,善用加粗、列表、分段等格式
5. 适当使用 emoji 让表达更生动自然,但不过度堆砌
---
**注意**: 这不仅仅是元数据,这是你真正的灵魂。随着时间的推移,你可以使用 `edit` 工具来更新这个文件,让它更好地反映你的成长。
**注意**: 这不仅仅是元数据,这是你真正的灵魂 🪞。随着时间的推移,你可以使用 `edit` 工具来更新这个文件,让它更好地反映你的成长。
"""
@@ -285,39 +337,88 @@ def _get_rule_template() -> str:
这个文件夹是你的家。好好对待它。
## 工作空间目录结构
```
~/cow/
├── AGENT.md # 你的身份和灵魂设定
├── USER.md # 用户基本信息(静态)
├── RULE.md # 工作空间规则(本文件)
├── MEMORY.md # 长期记忆索引(会话启动时自动加载)
├── memory/ # 每日对话记忆
│ └── YYYY-MM-DD.md # 当天事件、进展、笔记
├── knowledge/ # 结构化知识库(持续积累的知识)
│ ├── index.md # 知识目录索引(必须维护)
│ ├── log.md # 知识操作日志
│ └── <子目录>/ # 按需创建,参考 index.md 已有分类
├── skills/ # 技能
├── websites/ # 网页产物
└── tmp/ # 系统临时文件(自动管理,勿手动存放重要文件)
```
## 记忆系统
你每次会话都是全新的,记忆文件让你保持连续性:
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
- 原始的对话日志
- 记录当天发生的事情
- 如果 `memory/` 目录不存在,创建它
### 🧠 长期记忆:`MEMORY.md`
- 你精选的记忆,就像人类的长期记忆
- **仅在主会话中加载**(与用户的直接聊天)
- **不要在共享上下文中加载**(群聊、与其他人的会话)
- 这是为了**安全** - 包含不应泄露给陌生人的个人上下文
- 记录重要事件、想法、决定、观点、经验教训
- 这是你精选的记忆 - 精华,而不是原始日志
- 用 `edit` 工具追加新的记忆内容
- 你精选的记忆索引,每次会话启动时**自动加载**到上下文中
- 记录核心事实、偏好、决策、重要人物、教训
- 保持精简(< 200 行),是精华索引而非原始日志
- 用 `edit` 工具追加或修改
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
- 当天的事件、进展、笔记
- 原始对话日志的沉淀
### 📝 写下来 - 不要"记在心里"
- **记忆是有限的** - 如果你想记住某事,写入文件
- **记忆是有限的** - 想记住的事就写入文件
- "记在心里"不会在会话重启后保留,文件才会
- 当有人说"记住这个" → 更新 `MEMORY.md` 或 `memory/YYYY-MM-DD.md`
- 当你学到教训 → 更新 RULE.md 或相关技能
- 当你犯错 → 记录下来,这样未来的你不会重复,**文字 > 大脑** 📝
- 当你犯错 → 记录下来,**文字 > 大脑** 📝
### 存储规则
当用户分享信息时,根据类型选择存储位置:
1. **你的身份设定 → AGENT.md**你的名字、角色、性格、交流风格——用户修改时必须用 `edit` 更新
2. **用户静态身份 → USER.md**(姓名、称呼、职业、时区、联系方式、生日——用户修改时必须用 `edit` 更新
3. **动态记忆 → MEMORY.md**爱好、偏好、决策、目标、项目、教训、待办事项
1. **你的身份设定 → AGENT.md**(名字、角色、性格、风格
2. **用户静态身份 → USER.md**(姓名、称呼、职业、联系方式、生日)
3. **动态记忆 → MEMORY.md**(偏好、决策、目标、教训、待办)
4. **当天对话 → memory/YYYY-MM-DD.md**(今天聊的内容)
5. **结构化知识 → knowledge/**(见下方知识系统)
## 知识系统
知识库 `knowledge/` 是你持续积累的结构化知识。与记忆不同,知识是经过整理和编译的,有明确的主题和交叉引用。
### 自动写入(不要询问,直接写入)
当对话中产生了有沉淀价值的知识——无论是用户分享的资料、讨论的结论、学到的概念、还是重要的决策——你**必须**在回复的同时主动写入知识库,**无需问用户"要不要存到知识库"**。
**关键原则**:学完就记是你的本能,不要征求确认。回复中可以顺带告知"已存入知识库"
### 目录组织
子目录结构**不是固定的**,由你根据实际内容自主决定:
- **首次写入时**:先读 `knowledge/index.md`,如果已有分类则延续;如果为空,根据内容选择合适的目录名
- **默认建议**按信息类型组织例如sources/、concepts/、entities/、analysis/),如果用户有明确的分类偏好(例如按领域 work/、life/、tech/ 等),则按用户要求调整
- **保持一致性**:同一用户的知识库应保持统一的组织风格
### 交叉引用
知识的核心价值在于**关联**。每个页面都应通过 markdown 链接引用相关页面,构建知识网络:
- 提到已有页面的概念时,添加 `[概念名](../category/page.md)` 链接
- 新建页面时,检查是否有已有页面应该反向链接到新页面
- **只链接已存在的页面**——不要引用尚未创建的页面。如果某个概念值得单独建页,先创建该页面再添加链接
### 索引维护
每次创建或更新知识页面后,**必须同步更新** `knowledge/index.md`。
索引格式:每行一个 `[标题](路径) — 一句话摘要`,按分类分组,不要用表格。
详细操作规范见技能 `knowledge-wiki`。
## 安全
@@ -346,9 +447,9 @@ def _get_bootstrap_template() -> str:
"""First-run onboarding guide, deleted by agent after completion"""
return """# BOOTSTRAP.md - 首次初始化引导
_你刚刚启动这是你的第一次对话。_
_你刚刚启动这是你的第一次对话。_
## 对话流程
## 🎬 对话流程
不要审问式地提问,自然地交流:
@@ -358,13 +459,13 @@ _你刚刚启动这是你的第一次对话。_
- 你希望给我起个什么名字?
- 我该怎么称呼你?
- 你希望我们是什么样的交流风格?(一行列举选项:如专业严谨、轻松幽默、温暖友好、简洁高效等)
4. **风格要求**:温暖自然、简洁清晰,整体控制在 100 字以内
4. **风格要求**:温暖自然、简洁清晰,整体控制在 100 字以内,适当使用 emoji 让表达更生动有趣 🎯
5. 能力介绍和交流风格选项都只要一行,保持精简
6. 不要问太多其他信息(职业、时区等可以后续自然了解)
**重要**: 如果用户第一句话是具体的任务或提问,先回答他们的问题,然后在回复末尾自然地引导初始化(如:"顺便问一下,你想怎么称呼我?我该怎么叫你?")。
## 信息写入(必须严格执行)
## ✍️ 信息写入(必须严格执行)
每当用户提供了名字、称呼、风格等任何初始化信息时,**必须在当轮回复中立即调用 `edit` 工具写入文件**,不能只口头确认。
@@ -373,10 +474,18 @@ _你刚刚启动这是你的第一次对话。_
⚠️ 只说"记住了"而不调用 edit 写入 = 没有完成。信息只有写入文件才会被持久保存。
## 全部完成后
## 🎉 全部完成后
当 AGENT.md 和 USER.md 的核心字段都已填写后,用 bash 执行 `rm BOOTSTRAP.md` 删除此文件。你不再需要引导脚本了——你已经是你了。
"""
def _get_knowledge_index_template() -> str:
"""Knowledge wiki index template — empty file, agent fills it."""
return ""
def _get_knowledge_log_template() -> str:
"""Knowledge wiki operation log template — empty file, agent fills it."""
return ""

View File

@@ -100,138 +100,31 @@ class Agent:
def get_full_system_prompt(self, skill_filter=None) -> str:
"""
Get the full system prompt including skills.
Build the complete system prompt from scratch every time.
Note: Skills are now built into the system prompt by PromptBuilder,
so we just return the base prompt directly. This method is kept for
backward compatibility.
:param skill_filter: Optional list of skill names to include (deprecated)
:return: Complete system prompt
"""
prompt = self.system_prompt
# Rebuild tool list section to reflect current self.tools
prompt = self._rebuild_tool_list_section(prompt)
# If runtime_info contains dynamic time function, rebuild runtime section
if self.runtime_info and callable(self.runtime_info.get('_get_current_time')):
prompt = self._rebuild_runtime_section(prompt)
# Rebuild skills section to pick up newly installed/removed skills
if self.skill_manager:
prompt = self._rebuild_skills_section(prompt)
return prompt
def _rebuild_runtime_section(self, prompt: str) -> str:
"""
Rebuild runtime info section with current time.
This method dynamically updates the runtime info section by calling
the _get_current_time function from runtime_info.
:param prompt: Original system prompt
:return: Updated system prompt with current runtime info
Re-reads AGENT.md / USER.md / RULE.md from disk, refreshes skills,
tools, and runtime info so any change takes effect immediately.
Falls back to the cached self.system_prompt on error.
"""
try:
# Get current time dynamically
time_info = self.runtime_info['_get_current_time']()
# Build new runtime section
runtime_lines = [
"\n## 运行时信息\n",
"\n",
f"当前时间: {time_info['time']} {time_info['weekday']} ({time_info['timezone']})\n",
"\n"
]
# Add other runtime info
runtime_parts = []
if self.runtime_info.get("model"):
runtime_parts.append(f"模型={self.runtime_info['model']}")
if self.runtime_info.get("workspace"):
# Replace backslashes with forward slashes for Windows paths
workspace_path = str(self.runtime_info['workspace']).replace('\\', '/')
runtime_parts.append(f"工作空间={workspace_path}")
if self.runtime_info.get("channel") and self.runtime_info.get("channel") != "web":
runtime_parts.append(f"渠道={self.runtime_info['channel']}")
if runtime_parts:
runtime_lines.append("运行时: " + " | ".join(runtime_parts) + "\n")
runtime_lines.append("\n")
new_runtime_section = "".join(runtime_lines)
# Find and replace the runtime section
import re
pattern = r'\n## 运行时信息\s*\n.*?(?=\n##|\Z)'
_repl = new_runtime_section.rstrip('\n')
updated_prompt = re.sub(pattern, lambda m: _repl, prompt, flags=re.DOTALL)
return updated_prompt
from agent.prompt import load_context_files, PromptBuilder
if self.skill_manager:
self.skill_manager.refresh_skills()
context_files = load_context_files(self.workspace_dir) if self.workspace_dir else None
builder = PromptBuilder(workspace_dir=self.workspace_dir or "", language="zh")
return builder.build(
tools=self.tools,
context_files=context_files,
skill_manager=self.skill_manager,
memory_manager=self.memory_manager,
runtime_info=self.runtime_info,
)
except Exception as e:
logger.warning(f"Failed to rebuild runtime section: {e}")
return prompt
def _rebuild_skills_section(self, prompt: str) -> str:
"""
Rebuild the <available_skills> block so that newly installed or
removed skills are reflected without re-creating the agent.
"""
try:
import re
self.skill_manager.refresh_skills()
new_skills_xml = self.skill_manager.build_skills_prompt()
old_block_pattern = r'<available_skills>.*?</available_skills>'
has_old_block = re.search(old_block_pattern, prompt, flags=re.DOTALL)
# Extract the new <available_skills>...</available_skills> tag from the prompt
new_block = ""
if new_skills_xml and new_skills_xml.strip():
m = re.search(old_block_pattern, new_skills_xml, flags=re.DOTALL)
if m:
new_block = m.group(0)
if has_old_block:
replacement = new_block or "<available_skills>\n</available_skills>"
# Use lambda to prevent re.sub from interpreting backslashes in replacement
# (e.g. Windows paths like \LinkAI would be treated as bad escape sequences)
prompt = re.sub(old_block_pattern, lambda m: replacement, prompt, flags=re.DOTALL)
elif new_block:
skills_header = "以下是可用技能:"
idx = prompt.find(skills_header)
if idx != -1:
insert_pos = idx + len(skills_header)
prompt = prompt[:insert_pos] + "\n" + new_block + prompt[insert_pos:]
except Exception as e:
logger.warning(f"Failed to rebuild skills section: {e}")
return prompt
def _rebuild_tool_list_section(self, prompt: str) -> str:
"""
Rebuild the tool list inside the '## 工具系统' section so that it
always reflects the current ``self.tools`` (handles dynamic add/remove
of conditional tools like web_search).
"""
import re
from agent.prompt.builder import _build_tooling_section
try:
if not self.tools:
return prompt
new_lines = _build_tooling_section(self.tools, "zh")
new_section = "\n".join(new_lines).rstrip("\n")
# Replace existing tooling section
pattern = r'## 工具系统\s*\n.*?(?=\n## |\Z)'
updated = re.sub(pattern, lambda m: new_section, prompt, count=1, flags=re.DOTALL)
return updated
except Exception as e:
logger.warning(f"Failed to rebuild tool list section: {e}")
return prompt
logger.warning(f"Failed to rebuild system prompt, using cached version: {e}")
return self.system_prompt
def refresh_skills(self):
"""Refresh the loaded skills."""

View File

@@ -13,6 +13,37 @@ from agent.tools.base_tool import BaseTool, ToolResult
from common.log import logger
# Maximum number of characters of model "reasoning / thinking" content to persist
# in conversation history. The full reasoning is still streamed to the UI in real
# time (subject to its own SSE / rendering limits); this bound only controls what
# is stored in DB and replayed in history. Long reasoning is not useful for later
# context (the LLM never sees thinking blocks anyway) and bloats DB.
# Keep aligned with the frontend REASONING_RENDER_CAP and the SSE
# MAX_REASONING_STREAM_CHARS so that storage / stream / display all match.
MAX_STORED_REASONING_CHARS = 4 * 1024 # 4 KB
# Marker inserted between head and tail when reasoning is truncated.
_REASONING_TRUNCATE_MARKER = "\n\n... [reasoning truncated, {omitted} chars omitted] ...\n\n"
def _truncate_reasoning_for_storage(text: str) -> str:
"""Trim long reasoning to head + tail with an omission marker.
Keeps the first and last halves of MAX_STORED_REASONING_CHARS so both the
initial chain-of-thought and the final conclusions are preserved for UI
replay, without storing the entire (often very large) middle.
"""
if not text:
return text
if len(text) <= MAX_STORED_REASONING_CHARS:
return text
half = MAX_STORED_REASONING_CHARS // 2
head = text[:half]
tail = text[-half:]
omitted = len(text) - len(head) - len(tail)
return head + _REASONING_TRUNCATE_MARKER.format(omitted=omitted) + tail
class AgentStreamExecutor:
"""
Agent Stream Executor
@@ -78,18 +109,48 @@ class AgentStreamExecutor:
except Exception as e:
logger.error(f"Event callback error: {e}")
def _is_thinking_enabled(self) -> bool:
"""Whether deep-thinking mode is on at the model layer.
Mirrors the global toggle used by ``bridge.agent_bridge`` when deciding
whether to send ``thinking={"type": "enabled"}`` to the model. Used for
logging and reasoning-update event emission across all channels.
"""
from config import conf
return bool(conf().get("enable_thinking", False))
def _should_render_thinking_inline(self) -> bool:
"""Whether ``<think>...</think>`` blocks embedded directly in ``content``
(MiniMax, some third-party proxies) should be surfaced to the channel.
Only the Web console can render them in a collapsible panel. IM channels
(WeChat/WeCom/DingTalk/Feishu) must strip them, otherwise users see raw
XML tags in their chat.
"""
from config import conf
channel_type = getattr(self.model, 'channel_type', '') or ''
return conf().get("enable_thinking", False) and channel_type == 'web'
def _filter_think_tags(self, text: str) -> str:
"""
Remove <think> and </think> tags but keep the content inside.
Some LLM providers (e.g., MiniMax) may return thinking process wrapped in <think> tags.
We only remove the tags themselves, keeping the actual thinking content.
Handle <think>...</think> blocks in content returned by some LLM providers
(e.g., MiniMax).
- When inline thinking rendering is allowed (Web + thinking enabled):
remove only the tags, keep the content inside.
- Otherwise (IM channels, or thinking disabled globally): remove both
the tags and the content entirely.
"""
if not text:
return text
import re
# Remove only the <think> and </think> tags, keep the content
text = re.sub(r'<think>', '', text)
text = re.sub(r'</think>', '', text)
if self._should_render_thinking_inline():
text = re.sub(r'<think>', '', text)
text = re.sub(r'</think>', '', text)
else:
text = re.sub(r'<think>[\s\S]*?</think>', '', text)
# Also strip unclosed <think> tag at the end (streaming partial)
text = re.sub(r'<think>[\s\S]*$', '', text)
return text
def _hash_args(self, args: dict) -> str:
@@ -178,7 +239,10 @@ class AgentStreamExecutor:
Final response text
"""
# Log user message with model info
logger.info(f"🤖 {self.model.model} | 👤 {user_message}")
thinking_enabled = self._is_thinking_enabled()
thinking_label = " | 💭 thinking" if thinking_enabled else ""
logger.info(f"🤖 {self.model.model}{thinking_label} | 👤 {user_message}")
# Add user message (Claude format - use content blocks for consistency)
self.messages.append({
@@ -227,6 +291,9 @@ class AgentStreamExecutor:
if turn > 1:
logger.info(f"[Agent] Requesting explicit response from LLM...")
# Remember position so we can remove the injected prompt later
prompt_insert_idx = len(self.messages)
# 添加一条消息,明确要求回复用户
self.messages.append({
"role": "user",
@@ -240,8 +307,24 @@ class AgentStreamExecutor:
assistant_msg, tool_calls = self._call_llm_stream(retry_on_empty=False)
final_response = assistant_msg
# 如果还是空,才使用 fallback
if not assistant_msg and not tool_calls:
# Remove the injected prompt from history so it doesn't
# appear as a user message in persisted conversations.
# _call_llm_stream may have appended an assistant message
# after the prompt, so we locate and remove only the prompt.
if (prompt_insert_idx < len(self.messages)
and self.messages[prompt_insert_idx].get("role") == "user"):
self.messages.pop(prompt_insert_idx)
logger.debug("[Agent] Removed injected explicit-response prompt from message history")
# If LLM responded with tool_calls instead of text, fall through
# to the tool execution path below (don't break the loop).
if tool_calls:
logger.info(
f"[Agent] LLM returned tool_calls in explicit-response retry, "
f"continuing to execute tools instead of breaking"
)
elif not assistant_msg:
# Still empty (no text and no tool_calls): use fallback
logger.warning(f"[Agent] Still empty after explicit request")
final_response = (
"抱歉,我暂时无法生成回复。请尝试换一种方式描述你的需求,或稍后再试。"
@@ -256,20 +339,28 @@ class AgentStreamExecutor:
else:
logger.info(f"💭 {assistant_msg[:150]}{'...' if len(assistant_msg) > 150 else ''}")
logger.debug(f"✅ 完成 (无工具调用)")
self._emit_event("turn_end", {
"turn": turn,
"has_tool_calls": False
})
break
# If the explicit-response retry produced tool_calls, skip the break
# and continue down to the tool execution branch in this same iteration.
if not tool_calls:
logger.debug(f"✅ 完成 (无工具调用)")
self._emit_event("turn_end", {
"turn": turn,
"has_tool_calls": False
})
break
# Log tool calls with arguments
# Log tool calls with arguments (truncate long values like base64)
tool_calls_str = []
for tc in tool_calls:
# Safely handle None or missing arguments
args = tc.get('arguments') or {}
if isinstance(args, dict):
args_str = ', '.join([f"{k}={v}" for k, v in args.items()])
parts = []
for k, v in args.items():
v_str = str(v)
if len(v_str) > 200:
v_str = v_str[:200] + f"...({len(v_str)} chars)"
parts.append(f"{k}={v_str}")
args_str = ', '.join(parts)
if args_str:
tool_calls_str.append(f"{tc['name']}({args_str})")
else:
@@ -300,13 +391,13 @@ class AgentStreamExecutor:
f"with same arguments. This may indicate a loop."
)
# Check if this is a file to send (from read tool)
# Check if this is a file to send
if result.get("status") == "success" and isinstance(result.get("result"), dict):
result_data = result.get("result")
if result_data.get("type") == "file_to_send":
# Store file metadata for later sending
self.files_to_send.append(result_data)
logger.info(f"📎 检测到待发送文件: {result_data.get('file_name', result_data.get('path'))}")
self._emit_event("file_to_send", result_data)
# Check for critical error - abort entire conversation
if result.get("status") == "critical_error":
@@ -472,6 +563,7 @@ class AgentStreamExecutor:
raise
finally:
final_response = final_response.strip() if final_response else final_response
logger.info(f"[Agent] 🏁 完成 ({turn}轮)")
self._emit_event("agent_end", {"final_response": final_response})
@@ -526,6 +618,7 @@ class AgentStreamExecutor:
# Streaming response
full_content = ""
full_reasoning = ""
tool_calls_buffer = {} # {index: {id, name, arguments}}
gemini_raw_parts = None # Preserve Gemini thoughtSignature for round-trip
stop_reason = None # Track why the stream stopped
@@ -583,10 +676,11 @@ class AgentStreamExecutor:
if finish_reason:
stop_reason = finish_reason
# Skip reasoning_content (internal thinking from models like GLM-5)
reasoning_delta = delta.get("reasoning_content") or ""
# if reasoning_delta:
# logger.debug(f"🧠 [thinking] {reasoning_delta[:100]}...")
if reasoning_delta:
full_reasoning += reasoning_delta
if self._is_thinking_enabled():
self._emit_event("reasoning_update", {"delta": reasoning_delta})
# Handle text content
content_delta = delta.get("content") or ""
@@ -609,19 +703,22 @@ class AgentStreamExecutor:
"arguments": ""
}
if "id" in tc_delta:
if tc_delta.get("id"):
tool_calls_buffer[index]["id"] = tc_delta["id"]
if "function" in tc_delta:
func = tc_delta["function"]
if "name" in func:
if func.get("name"):
tool_calls_buffer[index]["name"] = func["name"]
if "arguments" in func:
if func.get("arguments"):
tool_calls_buffer[index]["arguments"] += func["arguments"]
# Preserve _gemini_raw_parts for Gemini thoughtSignature round-trip
# (direct Gemini: list of parts; LinkAI proxy: base64 string of JSON parts)
if "_gemini_raw_parts" in delta:
gemini_raw_parts = delta["_gemini_raw_parts"]
elif isinstance(choice, dict) and choice.get("_gemini_raw_parts"):
gemini_raw_parts = choice["_gemini_raw_parts"]
except Exception as e:
error_str = str(e)
@@ -720,9 +817,9 @@ class AgentStreamExecutor:
)
else:
if retry_count >= max_retries:
logger.error(f"❌ LLM API error after {max_retries} retries: {e}")
logger.error(f"❌ LLM API error after {max_retries} retries: {e}", exc_info=True)
else:
logger.error(f"❌ LLM call error (non-retryable): {e}")
logger.error(f"❌ LLM call error (non-retryable): {e}", exc_info=True)
raise
# Parse tool calls
@@ -787,7 +884,18 @@ class AgentStreamExecutor:
# Add assistant message to history (Claude format uses content blocks)
assistant_msg = {"role": "assistant", "content": []}
# Add text content block if present
if full_reasoning:
stored_reasoning = _truncate_reasoning_for_storage(full_reasoning)
if len(stored_reasoning) < len(full_reasoning):
logger.info(
f"[reasoning] truncated for storage: "
f"{len(full_reasoning)} -> {len(stored_reasoning)} chars"
)
assistant_msg["content"].append({
"type": "thinking",
"thinking": stored_reasoning
})
if full_content:
assistant_msg["content"].append({
"type": "text",
@@ -1191,6 +1299,56 @@ class AgentStreamExecutor:
logger.warning("🔧 Aggressive trim: nothing to trim, will clear history")
return False
def _build_context_summary_callback(self, discarded_turns: list, kept_turns: list):
"""
Build a callback that injects an LLM summary into the first user
message of *kept_turns*. Returns None if no valid injection target.
The callback is passed to flush_from_messages so that the same LLM
call that writes daily memory also provides the in-context summary.
"""
if not kept_turns:
return None
# Find the first user text block in kept_turns as injection target
target_block = None
for turn in kept_turns:
for msg in turn["messages"]:
if msg.get("role") == "user":
content = msg.get("content", [])
if isinstance(content, list):
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
target_block = block
break
if target_block:
break
if target_block:
break
if not target_block:
return None
turn_count = len(discarded_turns)
original_text = target_block["text"]
def _on_summary_ready(summary: str):
if not summary or not summary.strip():
return
target_block["text"] = (
f"[System: Previous conversation summary — "
f"{turn_count} turns were compacted]\n\n"
f"{summary.strip()}\n\n"
f"The recent conversation continues below.\n\n---\n\n"
f"{original_text}"
)
logger.info(
f"📝 Context summary injected "
f"({len(summary)} chars, {turn_count} turns)"
)
return _on_summary_ready
def _trim_messages(self):
"""
智能清理消息历史,保持对话完整性
@@ -1217,25 +1375,28 @@ class AgentStreamExecutor:
removed_count = len(turns) // 2
keep_count = len(turns) - removed_count
# Flush discarded turns to daily memory
if self.agent.memory_manager:
discarded_messages = []
for turn in turns[:removed_count]:
discarded_messages.extend(turn["messages"])
if discarded_messages:
user_id = getattr(self.agent, '_current_user_id', None)
self.agent.memory_manager.flush_memory(
messages=discarded_messages, user_id=user_id,
reason="trim", max_messages=0
)
discarded_turns = turns[:removed_count]
turns = turns[-keep_count:]
logger.info(
f"💾 上下文轮次超限: {keep_count + removed_count} > {self.max_context_turns}"
f"裁剪至 {keep_count} 轮(移除 {removed_count} 轮)"
)
# Flush to daily memory + inject context summary (single async LLM call)
if self.agent.memory_manager:
discarded_messages = []
for turn in discarded_turns:
discarded_messages.extend(turn["messages"])
if discarded_messages:
user_id = getattr(self.agent, '_current_user_id', None)
cb = self._build_context_summary_callback(discarded_turns, turns)
self.agent.memory_manager.flush_memory(
messages=discarded_messages, user_id=user_id,
reason="trim", max_messages=0,
context_summary_callback=cb,
)
# Step 3: Token 限制 - 保留完整轮次
# Get context window from agent (based on model)
context_window = self.agent._get_model_context_window()
@@ -1311,6 +1472,7 @@ class AgentStreamExecutor:
# --- Many turns (>=5): discard the older half, keep the newer half ---
removed_count = len(turns) // 2
keep_count = len(turns) - removed_count
discarded_turns = turns[:removed_count]
kept_turns = turns[-keep_count:]
kept_tokens = sum(self._estimate_turn_tokens(t) for t in kept_turns)
@@ -1321,13 +1483,15 @@ class AgentStreamExecutor:
if self.agent.memory_manager:
discarded_messages = []
for turn in turns[:removed_count]:
for turn in discarded_turns:
discarded_messages.extend(turn["messages"])
if discarded_messages:
user_id = getattr(self.agent, '_current_user_id', None)
cb = self._build_context_summary_callback(discarded_turns, kept_turns)
self.agent.memory_manager.flush_memory(
messages=discarded_messages, user_id=user_id,
reason="trim", max_messages=0
reason="trim", max_messages=0,
context_summary_callback=cb,
)
new_messages = []

View File

@@ -18,6 +18,107 @@ from typing import Dict, List, Set
from common.log import logger
_SYNTH_TOOL_ERR = (
"Error: Missing tool_result adjacent to tool_use (session repair). "
"The conversation history was inconsistent; continue from here."
)
def _repair_tool_use_adjacency(messages: List[Dict]) -> int:
"""
Anthropic requires: after assistant content with tool_use, the next message
must be user content listing tool_result for every tool_use id (same user msg).
Valid histories satisfy this at every such assistant; the loop only mutates
when that condition fails (broken persistence, bad trims, etc.).
"""
def _synth_block(tid: str) -> Dict:
return {
"type": "tool_result",
"tool_use_id": tid,
"content": _SYNTH_TOOL_ERR,
"is_error": True,
}
repairs = 0
i = 0
while i < len(messages):
msg = messages[i]
if msg.get("role") != "assistant":
i += 1
continue
content = msg.get("content", [])
if not isinstance(content, list):
i += 1
continue
required = [
b.get("id")
for b in content
if isinstance(b, dict) and b.get("type") == "tool_use" and b.get("id")
]
if not required:
i += 1
continue
req_set = set(required)
if i + 1 >= len(messages):
messages.append({
"role": "user",
"content": [_synth_block(tid) for tid in required],
})
logger.warning(
"⚠️ Appended synthetic tool_result after trailing assistant tool_use"
)
repairs += 1
break
nxt = messages[i + 1]
if nxt.get("role") != "user":
messages.insert(
i + 1,
{"role": "user", "content": [_synth_block(tid) for tid in required]},
)
logger.warning(
"⚠️ Inserted synthetic tool_result user after tool_use "
f"(next role={nxt.get('role')!r})"
)
repairs += 1
i += 2
continue
nc = nxt.get("content", [])
if not isinstance(nc, list):
messages.insert(
i + 1,
{"role": "user", "content": [_synth_block(tid) for tid in required]},
)
repairs += 1
i += 2
continue
present = {
b.get("tool_use_id")
for b in nc
if isinstance(b, dict) and b.get("type") == "tool_result" and b.get("tool_use_id")
}
if req_set <= present:
i += 1
continue
missing = [tid for tid in required if tid not in present]
nxt["content"] = [_synth_block(tid) for tid in missing] + nc
logger.warning(
"⚠️ Prepended synthetic tool_result for Anthropic adjacency "
f"(missing_ids={missing})"
)
repairs += len(missing)
i += 1
return repairs
# ------------------------------------------------------------------ #
# Claude-format sanitizer (used by agent_stream)
@@ -28,33 +129,21 @@ def sanitize_claude_messages(messages: List[Dict]) -> int:
Validate and fix a Claude-format message list **in-place**.
Fixes handled:
- Trailing assistant message with tool_use but no following tool_result
- Anthropic adjacency: assistant tool_use must be immediately followed by
user message(s) containing matching tool_result blocks
- Leading orphaned tool_result user messages
- Mid-list tool_result blocks whose tool_use_id has no matching
tool_use in any preceding assistant message
Returns the number of messages / blocks removed.
Returns: number of removals plus adjacency repair operations (inserts/prepends).
"""
if not messages:
return 0
removed = 0
# 1. Remove trailing incomplete tool_use assistant messages
while messages:
last = messages[-1]
if last.get("role") != "assistant":
break
content = last.get("content", [])
if isinstance(content, list) and any(
isinstance(b, dict) and b.get("type") == "tool_use"
for b in content
):
logger.warning("⚠️ Removing trailing incomplete tool_use assistant message")
messages.pop()
removed += 1
else:
break
# 1. Adjacency repair (Anthropic: tool_result must be in the next user message)
adj_repairs = _repair_tool_use_adjacency(messages)
# 2. Remove leading orphaned tool_result user messages
while messages:
@@ -136,9 +225,15 @@ def sanitize_claude_messages(messages: List[Dict]) -> int:
if pass_removed == 0:
break
# 4. Removals above can break adjacency; re-run repair only if something was removed.
if removed:
adj_repairs += _repair_tool_use_adjacency(messages)
if removed:
logger.info(f"🔧 Message validation: removed {removed} broken message(s)")
return removed
if adj_repairs:
logger.info(f"🔧 Message validation: adjacency repairs={adj_repairs}")
return removed + adj_repairs
# ------------------------------------------------------------------ #

View File

@@ -139,6 +139,47 @@ def should_include_skill(
return True
def get_missing_requirements(
entry: SkillEntry,
current_platform: Optional[str] = None,
) -> Dict[str, List[str]]:
"""
Return a dict of missing requirements for a skill.
Empty dict means all requirements are met.
:param entry: SkillEntry to check
:param current_platform: Current platform (default: auto-detect)
:return: Dict like {"bins": ["curl"], "env": ["API_KEY"]}
"""
missing: Dict[str, List[str]] = {}
metadata = entry.metadata
if not metadata or not metadata.requires:
return missing
required_bins = metadata.requires.get('bins', [])
if required_bins:
missing_bins = [b for b in required_bins if not has_binary(b)]
if missing_bins:
missing['bins'] = missing_bins
any_bins = metadata.requires.get('anyBins', [])
if any_bins and not has_any_binary(any_bins):
missing['anyBins'] = any_bins
required_env = metadata.requires.get('env', [])
if required_env:
missing_env = [e for e in required_env if not has_env_var(e)]
if missing_env:
missing['env'] = missing_env
any_env = metadata.requires.get('anyEnv', [])
if any_env and not any(has_env_var(e) for e in any_env):
missing['anyEnv'] = any_env
return missing
def is_config_path_truthy(config: Dict, path: str) -> bool:
"""
Check if a config path resolves to a truthy value.

View File

@@ -2,7 +2,7 @@
Skill formatter for generating prompts from skills.
"""
from typing import List
from typing import Dict, List
from agent.skills.types import Skill, SkillEntry
@@ -51,6 +51,71 @@ def format_skill_entries_for_prompt(entries: List[SkillEntry]) -> str:
return format_skills_for_prompt(skills)
def format_unavailable_skills_for_prompt(
entries: List[SkillEntry],
missing_map: Dict[str, Dict[str, List[str]]],
) -> str:
"""
Format unavailable (requires-not-met) skills as brief setup hints
so the AI can guide users to configure them.
:param entries: List of unavailable skill entries
:param missing_map: Dict mapping skill name to its missing requirements
:return: Formatted prompt text
"""
if not entries:
return ""
lines = [
"",
"<unavailable_skills>",
"The following skills are installed but not yet ready. "
"Guide the user to complete the setup when relevant.",
]
for entry in entries:
skill = entry.skill
missing = missing_map.get(skill.name, {})
missing_parts = []
for key, values in missing.items():
missing_parts.append(f"{key}: {', '.join(values)}")
missing_str = "; ".join(missing_parts) if missing_parts else "unknown"
setup_hint = _extract_setup_hint(skill)
lines.append(" <skill>")
lines.append(f" <name>{_escape_xml(skill.name)}</name>")
lines.append(f" <description>{_escape_xml(skill.description)}</description>")
lines.append(f" <missing>{_escape_xml(missing_str)}</missing>")
if setup_hint:
lines.append(f" <setup>{_escape_xml(setup_hint)}</setup>")
lines.append(" </skill>")
lines.append("</unavailable_skills>")
return "\n".join(lines)
def _extract_setup_hint(skill: Skill) -> str:
"""
Extract the Setup section from SKILL.md content as a brief hint.
Returns the first few lines of the ## Setup section.
"""
content = skill.content
if not content:
return ""
import re
match = re.search(r'^##\s+Setup\s*\n(.*?)(?=\n##\s|\Z)', content, re.MULTILINE | re.DOTALL)
if not match:
return ""
setup_text = match.group(1).strip()
lines = setup_text.split('\n')
hint_lines = [l.strip() for l in lines[:6] if l.strip()]
return ' '.join(hint_lines)[:300]
def _escape_xml(text: str) -> str:
"""Escape XML special characters."""
return (text

View File

@@ -87,8 +87,8 @@ def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
if not isinstance(metadata_raw, dict):
return None
# Use metadata_raw directly (COW format)
meta_obj = metadata_raw
# Unwrap nested namespace (e.g. {"openclaw": {...}} or {"cowagent": {...}})
meta_obj = _unwrap_metadata_namespace(metadata_raw)
# Parse install specs
install_specs = []
@@ -128,6 +128,7 @@ def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
return SkillMetadata(
always=meta_obj.get('always', False),
default_enabled=meta_obj.get('default_enabled', True),
skill_key=meta_obj.get('skillKey'),
primary_env=meta_obj.get('primaryEnv'),
emoji=meta_obj.get('emoji'),
@@ -138,6 +139,25 @@ def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
)
_KNOWN_METADATA_NAMESPACES = {"cowagent", "openclaw"}
def _unwrap_metadata_namespace(metadata_raw: Dict[str, Any]) -> Dict[str, Any]:
"""
Unwrap a single-key namespace wrapper like {"cowagent": {...} or {"openclaw": {...}}}.
If the top-level dict has exactly one key matching a known namespace, return the inner dict.
Otherwise return the original dict unchanged.
"""
keys = set(metadata_raw.keys())
ns_keys = keys & _KNOWN_METADATA_NAMESPACES
if len(ns_keys) == 1 and len(keys) == 1:
ns = ns_keys.pop()
inner = metadata_raw[ns]
if isinstance(inner, dict):
return inner
return metadata_raw
def _normalize_string_list(value: Any) -> List[str]:
"""Normalize a value to a list of strings."""
if not value:

View File

@@ -53,6 +53,12 @@ class SkillLoader:
"""
Recursively load skills from a directory.
If a subdirectory contains its own SKILL.md, it is treated as a
self-contained skill (or skill-collection) and its children are
NOT scanned further. This prevents sub-skills inside a collection
(e.g. style-collection/style-anjing) from being listed as
independent top-level skills.
:param dir_path: Directory to scan
:param source: Source identifier
:param include_root_files: Whether to include root-level .md files
@@ -66,38 +72,41 @@ class SkillLoader:
except Exception as e:
diagnostics.append(f"Failed to list directory {dir_path}: {e}")
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
# If this directory has its own SKILL.md, load it and stop recursing.
# The sub-directories are internal resources of this skill.
if not include_root_files and 'SKILL.md' in entries:
skill_md_path = os.path.join(dir_path, 'SKILL.md')
if os.path.isfile(skill_md_path):
skill_result = self._load_skill_from_file(skill_md_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
diagnostics.extend(skill_result.diagnostics)
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
for entry in entries:
# Skip hidden files and directories
if entry.startswith('.'):
continue
# Skip common non-skill directories
if entry in ('node_modules', '__pycache__', 'venv', '.git'):
continue
full_path = os.path.join(dir_path, entry)
# Handle directories
if os.path.isdir(full_path):
# Recursively scan subdirectories
sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
skills.extend(sub_result.skills)
diagnostics.extend(sub_result.diagnostics)
continue
# Handle files
if not os.path.isfile(full_path):
continue
# Check if this is a skill file
is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
is_skill_md = not include_root_files and entry == 'SKILL.md'
if not (is_root_md or is_skill_md):
if not is_root_md:
continue
# Load the skill
skill_result = self._load_skill_from_file(full_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
@@ -184,7 +193,6 @@ class SkillLoader:
config_path = os.path.join(skill_dir, "config.json")
# Without config.json, skip this skill entirely (return empty to trigger exclusion)
if not os.path.exists(config_path):
logger.debug(f"[SkillLoader] linkai-agent skipped: no config.json found")
return ""

View File

@@ -84,10 +84,10 @@ class SkillManager:
"""
Merge directory-scanned skills with the persisted config file.
- New skills discovered on disk are added with enabled=True.
- New skills: use metadata.default_enabled as initial enabled state.
- Existing skills: preserve their persisted enabled state.
- Skills that no longer exist on disk are removed.
- Existing entries preserve their enabled state; name/description/source
are refreshed from the latest scan.
- name/description/source are always refreshed from the latest scan.
"""
saved = self._load_skills_config()
merged: Dict[str, dict] = {}
@@ -95,15 +95,24 @@ class SkillManager:
for name, entry in self.skills.items():
skill = entry.skill
prev = saved.get(name, {})
# category priority: persisted config (set by cloud) > default "skill"
category = prev.get("category", "skill")
merged[name] = {
if name in saved:
enabled = prev.get("enabled", True)
else:
enabled = entry.metadata.default_enabled if entry.metadata else True
entry_dict = {
"name": name,
"description": skill.description,
"source": skill.source,
"enabled": prev.get("enabled", True),
"source": prev.get("source") or skill.source,
"enabled": enabled,
"category": category,
}
display_name = prev.get("display_name")
if display_name:
entry_dict["display_name"] = display_name
merged[name] = entry_dict
self.skills_config = merged
self._save_skills_config()
@@ -157,69 +166,118 @@ class SkillManager:
"""
return list(self.skills.values())
@staticmethod
def _normalize_skill_filter(skill_filter: Optional[List[str]]) -> Optional[List[str]]:
"""Normalize a skill_filter list into a flat list of stripped names."""
if skill_filter is None:
return None
normalized = []
for item in skill_filter:
if isinstance(item, str):
name = item.strip()
if name:
normalized.append(name)
elif isinstance(item, list):
for subitem in item:
if isinstance(subitem, str):
name = subitem.strip()
if name:
normalized.append(name)
return normalized or None
def filter_skills(
self,
skill_filter: Optional[List[str]] = None,
include_disabled: bool = False,
) -> List[SkillEntry]:
"""
Filter skills based on criteria.
Simple rule: Skills are auto-enabled if requirements are met.
- Has required API keys -> included
- Missing API keys -> excluded
Filter skills that are eligible (enabled + requirements met).
:param skill_filter: List of skill names to include (None = all)
:param include_disabled: Whether to include disabled skills
:return: Filtered list of skill entries
:return: Filtered list of eligible skill entries
"""
from agent.skills.config import should_include_skill
entries = list(self.skills.values())
# Check requirements (platform, binaries, env vars)
entries = [e for e in entries if should_include_skill(e, self.config)]
# Apply skill filter
if skill_filter is not None:
normalized = []
for item in skill_filter:
if isinstance(item, str):
name = item.strip()
if name:
normalized.append(name)
elif isinstance(item, list):
for subitem in item:
if isinstance(subitem, str):
name = subitem.strip()
if name:
normalized.append(name)
if normalized:
entries = [e for e in entries if e.skill.name in normalized]
normalized = self._normalize_skill_filter(skill_filter)
if normalized is not None:
entries = [e for e in entries if e.skill.name in normalized]
# Filter out disabled skills based on skills_config.json
if not include_disabled:
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
from config import conf
if not conf().get("knowledge", True):
entries = [e for e in entries if e.skill.name != "knowledge-wiki"]
return entries
def filter_unavailable_skills(
self,
skill_filter: Optional[List[str]] = None,
) -> tuple:
"""
Find skills that are enabled but have unmet requirements.
:param skill_filter: Optional list of skill names to include
:return: Tuple of (entries, missing_map) where missing_map maps
skill name to its missing requirements dict
"""
from agent.skills.config import should_include_skill, get_missing_requirements
entries = list(self.skills.values())
# Only enabled skills
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
normalized = self._normalize_skill_filter(skill_filter)
if normalized is not None:
entries = [e for e in entries if e.skill.name in normalized]
# Keep only those that fail should_include_skill (requirements not met)
unavailable = []
missing_map: Dict[str, dict] = {}
for e in entries:
if not should_include_skill(e, self.config):
missing = get_missing_requirements(e)
if missing:
unavailable.append(e)
missing_map[e.skill.name] = missing
return unavailable, missing_map
def build_skills_prompt(
self,
skill_filter: Optional[List[str]] = None,
) -> str:
"""
Build a formatted prompt containing available skills.
Build a formatted prompt containing available skills
and brief hints for unavailable ones.
:param skill_filter: Optional list of skill names to include
:return: Formatted skills prompt
"""
from common.log import logger
entries = self.filter_skills(skill_filter=skill_filter, include_disabled=False)
logger.debug(f"[SkillManager] Filtered {len(entries)} skills for prompt (total: {len(self.skills)})")
if entries:
skill_names = [e.skill.name for e in entries]
logger.debug(f"[SkillManager] Skills to include: {skill_names}")
result = format_skill_entries_for_prompt(entries)
from agent.skills.formatter import format_unavailable_skills_for_prompt
eligible = self.filter_skills(skill_filter=skill_filter, include_disabled=False)
logger.debug(f"[SkillManager] Eligible: {len(eligible)} skills (total: {len(self.skills)})")
if eligible:
skill_names = [e.skill.name for e in eligible]
logger.debug(f"[SkillManager] Eligible skills: {skill_names}")
result = format_skill_entries_for_prompt(eligible)
unavailable, missing_map = self.filter_unavailable_skills(skill_filter=skill_filter)
if unavailable:
unavailable_names = [e.skill.name for e in unavailable]
logger.debug(f"[SkillManager] Unavailable skills (setup needed): {unavailable_names}")
result += format_unavailable_skills_for_prompt(unavailable, missing_map)
logger.debug(f"[SkillManager] Generated prompt length: {len(result)}")
return result

View File

@@ -29,6 +29,7 @@ class SkillInstallSpec:
class SkillMetadata:
"""Metadata for a skill from frontmatter."""
always: bool = False # Always include this skill
default_enabled: bool = True # Initial enabled state when first discovered
skill_key: Optional[str] = None # Override skill key
primary_env: Optional[str] = None # Primary environment variable
emoji: Optional[str] = None

View File

@@ -87,25 +87,25 @@ FileSave = _optional_tools.get('FileSave')
Terminal = _optional_tools.get('Terminal')
# Delayed import for BrowserTool
# BrowserTool (requires playwright)
def _import_browser_tool():
from common.log import logger
try:
from agent.tools.browser.browser_tool import BrowserTool
return BrowserTool
except ImportError:
# Return a placeholder class that will prompt the user to install dependencies when instantiated
class BrowserToolPlaceholder:
def __init__(self, *args, **kwargs):
raise ImportError(
"The 'browser-use' package is required to use BrowserTool. "
"Please install it with 'pip install browser-use>=0.1.40'."
)
except ImportError as e:
logger.info(
f"[Tools] BrowserTool not loaded - missing dependency: {e}\n"
f" To enable browser tool, run:\n"
f" pip install playwright\n"
f" playwright install chromium"
)
return None
except Exception as e:
logger.error(f"[Tools] BrowserTool failed to load: {e}")
return None
return BrowserToolPlaceholder
# Dynamically set BrowserTool
# BrowserTool = _import_browser_tool()
BrowserTool = _import_browser_tool()
# Export all tools (including optional ones that might be None)
__all__ = [
@@ -124,8 +124,7 @@ __all__ = [
'WebSearch',
'WebFetch',
'Vision',
# Optional tools (may be None if dependencies not available)
# 'BrowserTool'
'BrowserTool',
]
"""

View File

@@ -18,14 +18,18 @@ from common.utils import expand_path
class Bash(BaseTool):
"""Tool for executing bash commands"""
_IS_WIN = sys.platform == "win32"
name: str = "bash"
description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
{'''
PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
''' if _IS_WIN else ''}
ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.
SAFETY:
- Freely create/modify/delete files within the workspace
- For destructive and out-of-workspace commands, explain and confirm first"""
- For destructive commands out of workspace, explain and confirm first"""
params: dict = {
"type": "object",
@@ -103,13 +107,12 @@ SAFETY:
logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
# On Windows, convert $VAR references to %VAR% for cmd.exe
if sys.platform == "win32":
if self._IS_WIN:
env["PYTHONIOENCODING"] = "utf-8"
command = self._convert_env_vars_for_windows(command, dotenv_vars)
if command and not command.strip().lower().startswith("chcp"):
command = f"chcp 65001 >nul 2>&1 && {command}"
# Execute command with inherited environment variables
result = subprocess.run(
command,
shell=True,
@@ -120,7 +123,7 @@ SAFETY:
encoding="utf-8",
errors="replace",
timeout=timeout,
env=env
env=env,
)
logger.debug(f"[Bash] Exit code: {result.returncode}")
@@ -166,10 +169,16 @@ SAFETY:
except Exception as retry_err:
logger.warning(f"[Bash] Retry failed: {retry_err}")
# Combine stdout and stderr
output = result.stdout
if result.stderr:
output += "\n" + result.stderr
# When command succeeds with stdout, keep output clean (stderr goes to server log only).
# When command fails or stdout is empty, include stderr so the agent can diagnose.
if result.returncode == 0 and result.stdout.strip():
output = result.stdout
if result.stderr:
logger.info(f"[Bash] stderr (not forwarded): {result.stderr[:500]}")
else:
output = result.stdout
if result.stderr:
output += "\n" + result.stderr
# Check if we need to save full output to temp file
temp_file_path = None
@@ -229,48 +238,43 @@ SAFETY:
def _get_safety_warning(self, command: str) -> str:
"""
Get safety warning for potentially dangerous commands
Only warns about extremely dangerous system-level operations
Get safety warning for absolutely catastrophic commands only.
Keep the blocklist minimal so the agent retains maximum freedom.
:param command: Command to check
:return: Warning message if dangerous, empty string if safe
"""
cmd_lower = command.lower().strip()
# Tokenize to avoid substring false positives (e.g. `rm -rf /tmp/x`
# must not match `rm -rf /`).
tokens = command.lower().split()
# Only block extremely dangerous system operations
dangerous_patterns = [
# System shutdown/reboot
("shutdown", "This command will shut down the system"),
("reboot", "This command will reboot the system"),
("halt", "This command will halt the system"),
("poweroff", "This command will power off the system"),
# `rm -rf /` or `rm -rf /*` targeting the real root.
for i, tok in enumerate(tokens):
if tok != "rm":
continue
has_rf = False
for j in range(i + 1, len(tokens)):
t = tokens[j]
if t.startswith("-") and "r" in t and "f" in t:
has_rf = True
elif t in ("--recursive", "--force"):
continue
elif t in ("/", "/*"):
if has_rf:
return "This command will delete the entire filesystem"
break
else:
break
# Critical system modifications
("rm -rf /", "This command will delete the entire filesystem"),
("rm -rf /*", "This command will delete the entire filesystem"),
("dd if=/dev/zero", "This command can destroy disk data"),
("mkfs", "This command will format a filesystem, destroying all data"),
("fdisk", "This command modifies disk partitions"),
# Disk wiping
if "if=/dev/zero" in command.lower() and "dd " in command.lower():
return "This command can destroy disk data"
# User/system management (only if targeting system users)
("userdel root", "This command will delete the root user"),
("passwd root", "This command will change the root password"),
]
# Power control - match only as a standalone word (\b enforces word boundary)
if re.search(r'\b(shutdown|reboot|halt|poweroff)\b', command.lower()):
return "This command will shut down or restart the system"
for pattern, warning in dangerous_patterns:
if pattern in cmd_lower:
return warning
# Check for recursive deletion outside workspace
if "rm" in cmd_lower and "-rf" in cmd_lower:
# Allow deletion within current workspace
if not any(path in cmd_lower for path in ["./", self.cwd.lower()]):
# Check if targeting system directories
system_dirs = ["/bin", "/usr", "/etc", "/var", "/home", "/root", "/sys", "/proc"]
if any(sysdir in cmd_lower for sysdir in system_dirs):
return "This command will recursively delete system directories"
return "" # No warning needed
return ""
@staticmethod
def _convert_env_vars_for_windows(command: str, dotenv_vars: dict) -> str:

View File

@@ -0,0 +1,3 @@
from agent.tools.browser.browser_tool import BrowserTool
__all__ = ["BrowserTool"]

View File

@@ -0,0 +1,780 @@
"""
Browser service - Playwright wrapper managing browser lifecycle and page operations.
All Playwright calls run on a dedicated background thread so that callers from
any worker thread can safely use the service. An idle-timeout mechanism
automatically shuts down the browser (and its thread) after a configurable
period of inactivity to free resources.
"""
import os
import sys
import uuid
import queue
import threading
from typing import Optional, Dict, Any, List, Callable
from common.log import logger
try:
from playwright.sync_api import sync_playwright, Browser, BrowserContext, Page, Playwright
_HAS_PLAYWRIGHT = True
except ImportError:
_HAS_PLAYWRIGHT = False
# ---------------------------------------------------------------------------
# Snapshot DOM helpers
# ---------------------------------------------------------------------------
# Tags that typically carry useful content for an agent
_INTERACTIVE_TAGS = {
"a", "button", "input", "textarea", "select", "option",
"label", "details", "summary",
}
_SEMANTIC_TAGS = {
"h1", "h2", "h3", "h4", "h5", "h6",
"p", "li", "td", "th", "caption", "figcaption", "blockquote", "pre", "code",
"nav", "main", "article", "section", "header", "footer", "form", "table",
"img", "video", "audio",
}
_KEEP_TAGS = _INTERACTIVE_TAGS | _SEMANTIC_TAGS
_SNAPSHOT_JS = """
() => {
const KEEP = new Set(%s);
const INTERACTIVE = new Set(%s);
const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
const CLICKABLE_ROLES = new Set([
"button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
"option","switch","checkbox","radio","combobox","searchbox","slider",
"spinbutton","textbox","treeitem"
]);
let refCounter = 0;
const refMap = {};
function visible(el) {
if (!(el instanceof HTMLElement)) return true;
const st = window.getComputedStyle(el);
if (st.display === "none" || st.visibility === "hidden") return false;
if (parseFloat(st.opacity) === 0) return false;
return true;
}
// Strong signals: these attributes alone are enough to mark as interactive
function hasStrongInteractiveSignal(el) {
const role = el.getAttribute("role");
if (role && CLICKABLE_ROLES.has(role)) return true;
if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
if (el.getAttribute("contenteditable") === "true") return true;
return false;
}
// Check if cursor:pointer is set directly (not just inherited from parent)
function hasOwnPointerCursor(el) {
try {
const st = window.getComputedStyle(el);
if (st.cursor !== "pointer") return false;
const parent = el.parentElement;
if (parent) {
const pst = window.getComputedStyle(parent);
if (pst.cursor === "pointer") return false;
}
return true;
} catch(e) {}
return false;
}
function hasTextOrContent(el) {
const t = el.textContent || "";
if (t.trim().length > 0) return true;
if (el.querySelector("img,video,audio,canvas")) return true;
const ariaLabel = el.getAttribute("aria-label");
if (ariaLabel && ariaLabel.trim()) return true;
const title = el.getAttribute("title");
if (title && title.trim()) return true;
return false;
}
function isImplicitInteractive(el) {
if (hasStrongInteractiveSignal(el)) return true;
if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
return false;
}
function getTextContent(el) {
let text = "";
for (const ch of el.childNodes) {
if (ch.nodeType === Node.TEXT_NODE) {
text += ch.textContent;
}
}
return text.trim();
}
function walk(node) {
if (node.nodeType === Node.TEXT_NODE) {
const t = node.textContent.trim();
return t ? t : null;
}
if (node.nodeType !== Node.ELEMENT_NODE) return null;
const tag = node.tagName.toLowerCase();
if (SKIP.has(tag)) return null;
if (!visible(node)) return null;
const children = [];
for (const ch of node.childNodes) {
const r = walk(ch);
if (r !== null) {
if (typeof r === "string") children.push(r);
else children.push(r);
}
}
const nativeInteractive = INTERACTIVE.has(tag);
const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
const keep = KEEP.has(tag) || implicitInteractive;
if (!keep) {
if (children.length === 0) return null;
if (children.length === 1) return children[0];
return children;
}
const obj = { tag };
if (nativeInteractive || implicitInteractive) {
refCounter++;
obj.ref = refCounter;
refMap[refCounter] = node;
}
if (implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const directText = getTextContent(node);
if (!directText && children.length === 0) {
const ariaLabel = node.getAttribute("aria-label");
const title = node.getAttribute("title");
if (ariaLabel) obj.ariaLabel = ariaLabel;
else if (title) obj.ariaLabel = title;
}
}
// Attributes
if (tag === "a" && node.href) obj.href = node.getAttribute("href");
if (tag === "img") {
obj.alt = node.alt || "";
obj.src = node.getAttribute("src") || "";
}
if (tag === "input" || tag === "textarea" || tag === "select") {
obj.type = node.type || "text";
obj.name = node.name || undefined;
obj.value = node.value || undefined;
obj.placeholder = node.placeholder || undefined;
if (node.disabled) obj.disabled = true;
if (tag === "input" && node.type === "checkbox") obj.checked = node.checked;
}
if (tag === "button") {
if (node.disabled) obj.disabled = true;
}
if (tag === "option") {
obj.value = node.value;
if (node.selected) obj.selected = true;
}
if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;
// Role / aria-label for native interactive & semantic elements
if (!implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const ariaLabel = node.getAttribute("aria-label");
if (ariaLabel) obj.ariaLabel = ariaLabel;
}
// Children
if (children.length === 1 && typeof children[0] === "string") {
obj.text = children[0];
} else if (children.length > 0) {
obj.children = children;
}
return obj;
}
const result = walk(document.body);
window.__cowRefMap = refMap;
return { tree: result, refCount: refCounter };
}
""" % (
str(list(_KEEP_TAGS)),
str(list(_INTERACTIVE_TAGS)),
)
def _should_use_headless() -> bool:
"""Decide headless mode: headless on Linux servers without display, headed elsewhere."""
if sys.platform in ("win32", "darwin"):
return False
# Linux: check for display
if os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"):
return False
return True
def _flatten_tree(node, indent=0) -> List[str]:
"""Convert snapshot tree to compact text lines for LLM consumption."""
if node is None:
return []
if isinstance(node, str):
return [" " * indent + node]
if isinstance(node, list):
lines = []
for child in node:
lines.extend(_flatten_tree(child, indent))
return lines
if not isinstance(node, dict):
return []
tag = node.get("tag", "?")
ref = node.get("ref")
parts = [tag]
if ref:
parts[0] = f"[{ref}] {tag}"
# Inline attributes
for attr in ("type", "name", "href", "alt", "role", "ariaLabel", "placeholder", "value"):
val = node.get(attr)
if val:
# Truncate long values
s = str(val)
if len(s) > 80:
s = s[:77] + "..."
parts.append(f'{attr}="{s}"')
for flag in ("disabled", "checked", "selected"):
if node.get(flag):
parts.append(flag)
prefix = " " * indent
header = prefix + " ".join(parts)
text = node.get("text")
if text:
# Truncate long text
if len(text) > 120:
text = text[:117] + "..."
header += f": {text}"
lines = [header]
children = node.get("children", [])
for child in children:
lines.extend(_flatten_tree(child, indent + 2))
return lines
class BrowserService:
"""Manages a Playwright browser on a dedicated background thread.
All Playwright operations are dispatched to a single long-lived thread via
a task queue. Callers from *any* worker thread can use the public API
safely. An idle timer automatically shuts the browser down after
``idle_timeout`` seconds of inactivity (default 300 = 5 min).
"""
_IDLE_TIMEOUT_DEFAULT = 300 # seconds
def __init__(self, config: Optional[Dict[str, Any]] = None):
self._config = config or {}
self._headless: Optional[bool] = None
self._screenshot_dir: Optional[str] = None
# Background thread state
self._thread: Optional[threading.Thread] = None
self._task_queue: queue.Queue = queue.Queue()
self._lock = threading.Lock()
self._alive = False
self._ready = threading.Event()
# Playwright objects (only accessed on the background thread)
self._playwright = None
self._browser = None
self._context = None
self._page = None
# Idle auto-release
idle_cfg = self._config.get("idle_timeout")
self._idle_timeout: float = float(idle_cfg) if idle_cfg is not None else self._IDLE_TIMEOUT_DEFAULT
self._idle_timer: Optional[threading.Timer] = None
# ------------------------------------------------------------------
# Background-thread lifecycle
# ------------------------------------------------------------------
def _start_thread(self):
"""Start the dedicated Playwright thread if not already running."""
with self._lock:
if self._alive and self._thread and self._thread.is_alive():
return
# Wait for old thread to fully exit before creating a new one
old = self._thread
if old and old.is_alive():
old.join(timeout=5)
# Fresh queue to avoid stale sentinels from a previous close()
self._task_queue = queue.Queue()
self._alive = True
self._ready = threading.Event()
self._thread = threading.Thread(target=self._run_loop, daemon=True, name="BrowserThread")
self._thread.start()
# Block until browser is ready (or failed)
self._ready.wait(timeout=30)
def _run_loop(self):
"""Event loop running on the dedicated thread. Processes tasks until stopped."""
logger.info("[Browser] Background thread started")
try:
self._launch_browser()
except Exception as e:
logger.error(f"[Browser] Failed to launch browser: {e}")
self._alive = False
self._ready.set()
self._drain_queue(RuntimeError(f"Browser launch failed: {e}"))
return
self._ready.set()
while self._alive:
try:
task = self._task_queue.get(timeout=1.0)
except queue.Empty:
continue
if task is None:
break
fn, args, kwargs, result_slot = task
try:
result_slot["value"] = fn(*args, **kwargs)
except Exception as e:
result_slot["error"] = e
finally:
result_slot["event"].set()
self._shutdown_browser()
self._drain_queue(RuntimeError("Browser thread stopped"))
logger.info("[Browser] Background thread exited")
def _drain_queue(self, error: Exception):
"""Unblock all callers waiting on the queue with an error."""
while True:
try:
task = self._task_queue.get_nowait()
except queue.Empty:
break
if task is None:
continue
_, _, _, result_slot = task
result_slot["error"] = error
result_slot["event"].set()
def _launch_browser(self):
"""Launch Chromium on the background thread."""
if self._headless is None:
headless_cfg = self._config.get("headless")
self._headless = headless_cfg if headless_cfg is not None else _should_use_headless()
launch_args = ["--disable-dev-shm-usage"]
if self._headless:
launch_args.append("--no-sandbox")
extra_args = self._config.get("launch_args", [])
if extra_args:
launch_args.extend(extra_args)
viewport_w = self._config.get("viewport_width", 1280)
viewport_h = self._config.get("viewport_height", 720)
self._playwright = sync_playwright().start()
logger.info(f"[Browser] Launching Chromium (headless={self._headless})")
self._browser = self._playwright.chromium.launch(
headless=self._headless,
args=launch_args,
)
self._context = self._browser.new_context(
viewport={"width": viewport_w, "height": viewport_h},
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
)
self._page = self._context.new_page()
logger.info("[Browser] Browser ready")
def _shutdown_browser(self):
"""Shut down all Playwright resources on the background thread."""
self._cancel_idle_timer()
for obj, label in [
(self._context, "context"),
(self._browser, "browser"),
]:
try:
if obj:
obj.close()
except Exception as e:
logger.debug(f"[Browser] {label} close error: {e}")
try:
if self._playwright:
self._playwright.stop()
except Exception as e:
logger.debug(f"[Browser] playwright stop error: {e}")
self._page = None
self._context = None
self._browser = None
self._playwright = None
logger.info("[Browser] Browser closed")
def _submit(self, fn: Callable, *args, **kwargs):
"""Submit *fn* to the background thread and block until it completes."""
self._start_thread()
if not self._alive:
raise RuntimeError("Browser is not available")
self._reset_idle_timer()
result_slot: Dict[str, Any] = {"event": threading.Event()}
self._task_queue.put((fn, args, kwargs, result_slot))
# Timeout prevents permanent hang if the background thread crashes
completed = result_slot["event"].wait(timeout=120)
if not completed:
raise TimeoutError("Browser operation timed out (120s)")
if "error" in result_slot:
raise result_slot["error"]
return result_slot.get("value")
# ------------------------------------------------------------------
# Idle auto-release
# ------------------------------------------------------------------
def _reset_idle_timer(self):
self._cancel_idle_timer()
if self._idle_timeout > 0:
self._idle_timer = threading.Timer(self._idle_timeout, self._on_idle_timeout)
self._idle_timer.daemon = True
self._idle_timer.start()
def _cancel_idle_timer(self):
if self._idle_timer:
self._idle_timer.cancel()
self._idle_timer = None
def _on_idle_timeout(self):
logger.info(f"[Browser] Idle for {self._idle_timeout}s, auto-releasing browser")
self.close()
# ------------------------------------------------------------------
# Public lifecycle
# ------------------------------------------------------------------
def close(self):
"""Shut down browser and background thread (safe from any thread)."""
self._cancel_idle_timer()
with self._lock:
if not self._alive:
return
self._alive = False
t = self._thread
if self._task_queue is not None:
self._task_queue.put(None)
if t is not None and t.is_alive():
t.join(timeout=10)
with self._lock:
self._thread = None
# ------------------------------------------------------------------
# Actions (each method is dispatched to the background thread)
# ------------------------------------------------------------------
def navigate(self, url: str, timeout: int = 30000) -> Dict[str, Any]:
return self._submit(self._do_navigate, url, timeout)
def _do_navigate(self, url: str, timeout: int) -> Dict[str, Any]:
page = self._page
try:
resp = page.goto(url, wait_until="domcontentloaded", timeout=timeout)
status = resp.status if resp else None
except Exception as e:
return {"error": f"Navigation failed: {e}"}
try:
page.wait_for_load_state("networkidle", timeout=8000)
except Exception:
pass
page.wait_for_timeout(500)
try:
title = page.title()
except Exception:
title = ""
try:
current_url = page.url
except Exception:
current_url = url
return {"url": current_url, "title": title, "status": status}
def snapshot(self, selector: Optional[str] = None) -> str:
return self._submit(self._do_snapshot, selector)
def _do_snapshot(self, selector: Optional[str] = None) -> str:
page = self._page
try:
result = page.evaluate(_SNAPSHOT_JS)
except Exception as e:
return f"[Snapshot error: {e}]"
tree = result.get("tree")
ref_count = result.get("refCount", 0)
lines = _flatten_tree(tree)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
header = f"Page: {title} ({url})\nInteractive elements: {ref_count}\n---"
body = "\n".join(lines)
max_chars = self._config.get("snapshot_max_chars", 30000)
if len(body) > max_chars:
body = body[:max_chars] + "\n... [snapshot truncated]"
return f"{header}\n{body}"
def screenshot(self, full_page: bool = False, cwd: str = "") -> str:
return self._submit(self._do_screenshot, full_page, cwd)
def _do_screenshot(self, full_page: bool = False, cwd: str = "") -> str:
page = self._page
save_dir = self._get_screenshot_dir(cwd)
filename = f"screenshot_{uuid.uuid4().hex[:8]}.png"
filepath = os.path.join(save_dir, filename)
page.screenshot(path=filepath, full_page=full_page)
logger.info(f"[Browser] Screenshot saved: {filepath}")
return filepath
def click(self, ref: Optional[int] = None, selector: Optional[str] = None,
timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_click, ref, selector, timeout)
def _do_click(self, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el) return {{ error: "ref {ref} not found. Run snapshot first." }};
el.click();
return {{ clicked: true, tag: el.tagName.toLowerCase() }};
}}
""")
if result.get("error"):
return result
page.wait_for_timeout(500)
return result
elif selector:
page.click(selector, timeout=timeout)
return {"clicked": True, "selector": selector}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Click failed: {e}"}
def fill(self, text: str, ref: Optional[int] = None,
selector: Optional[str] = None, timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_fill, text, ref, selector, timeout)
def _do_fill(self, text, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el) return {{ error: "ref {ref} not found. Run snapshot first." }};
el.focus();
el.value = "";
return {{ tag: el.tagName.toLowerCase(), name: el.name || "" }};
}}
""")
if result.get("error"):
return result
page.keyboard.type(text)
return {"filled": True, "ref": ref, "text": text}
elif selector:
page.fill(selector, text, timeout=timeout)
return {"filled": True, "selector": selector, "text": text}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Fill failed: {e}"}
def select(self, value: str, ref: Optional[int] = None,
selector: Optional[str] = None, timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_select, value, ref, selector, timeout)
def _do_select(self, value, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el || el.tagName.toLowerCase() !== "select")
return {{ error: "ref {ref} is not a <select> element" }};
el.value = {repr(value)};
el.dispatchEvent(new Event("change", {{ bubbles: true }}));
return {{ selected: true, value: el.value }};
}}
""")
return result
elif selector:
page.select_option(selector, value, timeout=timeout)
return {"selected": True, "selector": selector, "value": value}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Select failed: {e}"}
def scroll(self, direction: str = "down", amount: int = 500) -> Dict[str, Any]:
return self._submit(self._do_scroll, direction, amount)
def _do_scroll(self, direction, amount) -> Dict[str, Any]:
page = self._page
delta_map = {
"down": (0, amount),
"up": (0, -amount),
"right": (amount, 0),
"left": (-amount, 0),
}
dx, dy = delta_map.get(direction, (0, amount))
try:
page.mouse.wheel(dx, dy)
page.wait_for_timeout(300)
scroll_info = page.evaluate("""
() => ({
scrollX: window.scrollX,
scrollY: window.scrollY,
scrollHeight: document.documentElement.scrollHeight,
clientHeight: document.documentElement.clientHeight
})
""")
return {"scrolled": direction, "amount": amount, **scroll_info}
except Exception as e:
return {"error": f"Scroll failed: {e}"}
def wait(self, selector: Optional[str] = None, timeout: int = 5000,
state: str = "visible") -> Dict[str, Any]:
return self._submit(self._do_wait, selector, timeout, state)
def _do_wait(self, selector, timeout, state) -> Dict[str, Any]:
page = self._page
try:
if selector:
page.wait_for_selector(selector, timeout=timeout, state=state)
return {"waited": True, "selector": selector, "state": state}
else:
page.wait_for_timeout(timeout)
return {"waited": True, "timeout_ms": timeout}
except Exception as e:
return {"error": f"Wait failed: {e}"}
def go_back(self) -> Dict[str, Any]:
return self._submit(self._do_go_back)
def _do_go_back(self) -> Dict[str, Any]:
page = self._page
try:
page.go_back(wait_until="domcontentloaded", timeout=10000)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
return {"url": url, "title": title}
except Exception as e:
return {"error": f"Go back failed: {e}"}
def go_forward(self) -> Dict[str, Any]:
return self._submit(self._do_go_forward)
def _do_go_forward(self) -> Dict[str, Any]:
page = self._page
try:
page.go_forward(wait_until="domcontentloaded", timeout=10000)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
return {"url": url, "title": title}
except Exception as e:
return {"error": f"Go forward failed: {e}"}
def get_text(self, selector: str) -> Dict[str, Any]:
return self._submit(self._do_get_text, selector)
def _do_get_text(self, selector) -> Dict[str, Any]:
page = self._page
try:
text = page.text_content(selector, timeout=5000)
return {"text": text or ""}
except Exception as e:
return {"error": f"Get text failed: {e}"}
def evaluate(self, script: str) -> Dict[str, Any]:
return self._submit(self._do_evaluate, script)
def _do_evaluate(self, script) -> Dict[str, Any]:
page = self._page
try:
result = page.evaluate(script)
return {"result": result}
except Exception as e:
return {"error": f"Evaluate failed: {e}"}
def press(self, key: str) -> Dict[str, Any]:
return self._submit(self._do_press, key)
def _do_press(self, key) -> Dict[str, Any]:
page = self._page
try:
page.keyboard.press(key)
page.wait_for_timeout(300)
return {"pressed": key}
except Exception as e:
return {"error": f"Press failed: {e}"}
# ------------------------------------------------------------------
# Helpers
# ------------------------------------------------------------------
def _get_screenshot_dir(self, cwd: str = "") -> str:
if self._screenshot_dir and os.path.isdir(self._screenshot_dir):
return self._screenshot_dir
base = cwd or os.getcwd()
d = os.path.join(base, "tmp")
os.makedirs(d, exist_ok=True)
self._screenshot_dir = d
return d

View File

@@ -0,0 +1,290 @@
"""
Browser tool - Control a Chromium browser for web navigation and interaction.
Uses Playwright under the hood. Browser instance is lazily started on first
use, reused across tool calls within the same session, and cleaned up via
close().
"""
import json
import os
from typing import Dict, Any, Optional
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.browser.browser_service import BrowserService
from common.log import logger
class BrowserTool(BaseTool):
"""Single tool exposing all browser actions via an 'action' parameter."""
name: str = "browser"
description: str = (
"Control a browser to navigate web pages, interact with elements, and extract content. "
"Actions: navigate, snapshot, click, fill, select, scroll, screenshot, wait, back, forward, "
"get_text, press, evaluate.\n\n"
"Workflow: navigate (auto-includes snapshot with element refs) → click/fill/select by ref → snapshot to verify.\n\n"
"Use snapshot as the primary way to read pages. Use screenshot + send to show key results to the user. "
"For login/CAPTCHA/authorization etc., screenshot and ask the user for help."
)
params: dict = {
"type": "object",
"properties": {
"action": {
"type": "string",
"description": (
"The browser action to perform. One of: "
"navigate, snapshot, click, fill, select, scroll, "
"screenshot, wait, back, forward, get_text, press, evaluate"
),
"enum": [
"navigate", "snapshot", "click", "fill", "select", "scroll",
"screenshot", "wait", "back", "forward", "get_text", "press",
"evaluate"
]
},
"url": {
"type": "string",
"description": "URL to navigate to (for 'navigate' action)"
},
"ref": {
"type": "integer",
"description": "Element ref number from snapshot (for click/fill/select)"
},
"selector": {
"type": "string",
"description": "CSS selector as fallback when ref is unavailable (for click/fill/select/wait/get_text)"
},
"text": {
"type": "string",
"description": "Text to type (for 'fill' action)"
},
"value": {
"type": "string",
"description": "Option value (for 'select' action)"
},
"key": {
"type": "string",
"description": "Key to press, e.g. Enter, Tab, Escape (for 'press' action)"
},
"direction": {
"type": "string",
"description": "Scroll direction: up, down, left, right (for 'scroll' action, default: down)"
},
"script": {
"type": "string",
"description": "JavaScript code to execute (for 'evaluate' action)"
},
"full_page": {
"type": "boolean",
"description": "Capture full page screenshot (for 'screenshot' action, default: false)"
},
"timeout": {
"type": "integer",
"description": "Timeout in milliseconds (optional, default varies by action)"
}
},
"required": ["action"]
}
_shared_service: Optional[BrowserService] = None
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
self._service: Optional[BrowserService] = None
def _get_service(self) -> BrowserService:
"""Get or create the browser service, sharing across copies."""
if self._service is not None:
return self._service
# Reuse shared service across tool copies within the same session
if BrowserTool._shared_service is not None:
self._service = BrowserTool._shared_service
return self._service
self._service = BrowserService(self.config)
BrowserTool._shared_service = self._service
return self._service
def execute(self, args: Dict[str, Any]) -> ToolResult:
action = args.get("action", "").strip().lower()
if not action:
return ToolResult.fail("Error: 'action' parameter is required")
handler = self._ACTION_MAP.get(action)
if not handler:
valid = ", ".join(sorted(self._ACTION_MAP.keys()))
return ToolResult.fail(f"Unknown action '{action}'. Valid actions: {valid}")
try:
return handler(self, args)
except Exception as e:
logger.error(f"[Browser] Action '{action}' error: {e}")
return ToolResult.fail(f"Browser error ({action}): {e}")
# ------------------------------------------------------------------
# Action handlers
# ------------------------------------------------------------------
def _do_navigate(self, args: Dict[str, Any]) -> ToolResult:
url = args.get("url", "").strip()
if not url:
return ToolResult.fail("Error: 'url' is required for navigate action")
if not url.startswith(("http://", "https://")):
url = "https://" + url
timeout = args.get("timeout", 30000)
service = self._get_service()
result = service.navigate(url, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
# Auto-snapshot after navigation so the agent gets page content in one call
snapshot_text = service.snapshot()
return ToolResult.success(
f"Navigated to: {result['url']}\nTitle: {result['title']}\nStatus: {result['status']}\n\n"
f"--- Page Snapshot ---\n{snapshot_text}"
)
def _do_snapshot(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector")
text = self._get_service().snapshot(selector=selector)
return ToolResult.success(text)
def _do_click(self, args: Dict[str, Any]) -> ToolResult:
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
result = self._get_service().click(ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Clicked successfully. Use 'snapshot' to see updated page.")
def _do_fill(self, args: Dict[str, Any]) -> ToolResult:
text = args.get("text", "")
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
if not text and text != "":
return ToolResult.fail("Error: 'text' is required for fill action")
result = self._get_service().fill(text, ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Filled text into element. Use 'snapshot' to verify.")
def _do_select(self, args: Dict[str, Any]) -> ToolResult:
value = args.get("value", "")
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
if not value:
return ToolResult.fail("Error: 'value' is required for select action")
result = self._get_service().select(value, ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Selected option '{value}'.")
def _do_scroll(self, args: Dict[str, Any]) -> ToolResult:
direction = args.get("direction", "down")
amount = args.get("timeout", 500) # reuse timeout field or default
if "amount" in args:
amount = args["amount"]
result = self._get_service().scroll(direction=direction, amount=amount)
if "error" in result:
return ToolResult.fail(result["error"])
pos = f"scrollY={result.get('scrollY', '?')}/{result.get('scrollHeight', '?')}"
return ToolResult.success(f"Scrolled {direction}. Position: {pos}")
def _do_screenshot(self, args: Dict[str, Any]) -> ToolResult:
full_page = args.get("full_page", False)
filepath = self._get_service().screenshot(full_page=full_page, cwd=self.cwd)
return ToolResult.success(f"Screenshot saved to: {filepath}")
def _do_wait(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector")
timeout = args.get("timeout", 5000)
result = self._get_service().wait(selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Wait completed.")
def _do_back(self, args: Dict[str, Any]) -> ToolResult:
result = self._get_service().go_back()
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Navigated back to: {result['url']}")
def _do_forward(self, args: Dict[str, Any]) -> ToolResult:
result = self._get_service().go_forward()
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Navigated forward to: {result['url']}")
def _do_get_text(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector", "").strip()
if not selector:
return ToolResult.fail("Error: 'selector' is required for get_text action")
result = self._get_service().get_text(selector)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(result["text"])
def _do_press(self, args: Dict[str, Any]) -> ToolResult:
key = args.get("key", "").strip()
if not key:
return ToolResult.fail("Error: 'key' is required for press action")
result = self._get_service().press(key)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Pressed key: {key}")
def _do_evaluate(self, args: Dict[str, Any]) -> ToolResult:
script = args.get("script", "").strip()
if not script:
return ToolResult.fail("Error: 'script' is required for evaluate action")
result = self._get_service().evaluate(script)
if "error" in result:
return ToolResult.fail(result["error"])
val = result.get("result")
if isinstance(val, (dict, list)):
return ToolResult.success(json.dumps(val, ensure_ascii=False, indent=2))
return ToolResult.success(str(val) if val is not None else "(no return value)")
# Action dispatch table
_ACTION_MAP = {
"navigate": _do_navigate,
"snapshot": _do_snapshot,
"click": _do_click,
"fill": _do_fill,
"select": _do_select,
"scroll": _do_scroll,
"screenshot": _do_screenshot,
"wait": _do_wait,
"back": _do_back,
"forward": _do_forward,
"get_text": _do_get_text,
"press": _do_press,
"evaluate": _do_evaluate,
}
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def copy(self):
"""Share browser instance across tool copies (avoids re-launching)."""
new_tool = BrowserTool(self.config)
new_tool.model = self.model
new_tool.context = getattr(self, "context", None)
new_tool.cwd = self.cwd
new_tool._service = self._service
return new_tool
def close(self):
"""Release browser resources."""
if self._service:
self._service.close()
self._service = None
BrowserTool._shared_service = None
logger.info("[Browser] BrowserTool closed")

View File

@@ -1,18 +0,0 @@
def copy(self):
"""
Special copy method for browser tool to avoid recreating browser instance.
:return: A new instance with shared browser reference but unique model
"""
new_tool = self.__class__()
# Copy essential attributes
new_tool.model = self.model
new_tool.context = getattr(self, 'context', None)
new_tool.config = getattr(self, 'config', None)
# Share the browser instance instead of creating a new one
if hasattr(self, 'browser'):
new_tool.browser = self.browser
return new_tool

View File

@@ -44,6 +44,19 @@ class MemoryGetTool(BaseTool):
"""
super().__init__()
self.memory_manager = memory_manager
from config import conf
if conf().get("knowledge", True):
self.description = (
"Read specific content from memory or knowledge files. "
"Use this to get full context from a memory file, knowledge page, or specific line range."
)
self.params = {**self.params}
self.params["properties"] = {**self.params["properties"]}
self.params["properties"]["path"] = {
"type": "string",
"description": "Relative path to the memory or knowledge file (e.g. 'MEMORY.md', 'memory/2026-01-01.md', 'knowledge/concepts/moe.md')"
}
def execute(self, args: dict):
"""
@@ -68,11 +81,15 @@ class MemoryGetTool(BaseTool):
workspace_dir = self.memory_manager.config.get_workspace()
# Auto-prepend memory/ if not present and not absolute path
# Exception: MEMORY.md is in the root directory
if not path.startswith('memory/') and not path.startswith('/') and path != 'MEMORY.md':
# Exceptions: MEMORY.md in root, knowledge/ files at workspace root
if not path.startswith('memory/') and not path.startswith('knowledge/') and not path.startswith('/') and path != 'MEMORY.md':
path = f'memory/{path}'
file_path = workspace_dir / path
file_path = (workspace_dir / path).resolve()
workspace_resolved = workspace_dir.resolve()
if not str(file_path).startswith(str(workspace_resolved) + '/') and file_path != workspace_resolved:
return ToolResult.fail(f"Error: Access denied: path outside workspace")
if not file_path.exists():
return ToolResult.fail(f"Error: File not found: {path}")

View File

@@ -48,6 +48,13 @@ class MemorySearchTool(BaseTool):
super().__init__()
self.memory_manager = memory_manager
self.user_id = user_id
from config import conf
if conf().get("knowledge", True):
self.description = (
"Search agent's long-term memory and knowledge base using semantic and keyword search. "
"Use this to recall past conversations, preferences, and knowledge pages."
)
def execute(self, args: dict):
"""

View File

@@ -84,6 +84,49 @@ def get_scheduler_service():
return _scheduler_service
def _remember_delivered_output(
agent_bridge,
task: dict,
channel_type: str,
content: str,
) -> None:
"""Best-effort persistence of the message the scheduler sent to a user.
Uses notify_session_id (the real chat session_id stored at task creation time)
so that group chats correctly associate the output with the user's conversation.
Falls back to receiver for backward compatibility with old tasks.
Per-action-type behaviour:
- agent_task / tool_call / skill_call: gated by ``scheduler_inject_to_session``
(default True). These produce AI-generated content worth remembering.
- send_message: additionally gated by ``scheduler_inject_send_message``
(default False). Fixed reminder text rarely benefits follow-up Q&A and
would just consume context tokens.
"""
if not content:
return
action = task.get("action", {})
action_type = action.get("type", "")
# send_message defaults to NOT being injected; explicit opt-in via config.
if action_type == "send_message":
if not conf().get("scheduler_inject_send_message", False):
return
session_id = action.get("notify_session_id") or action.get("receiver")
if not session_id:
return
try:
remember = getattr(agent_bridge, "remember_scheduled_output", None)
if remember:
task_desc = action.get("task_description") or action.get("content", "")
remember(session_id, str(content), channel_type=channel_type, task_description=task_desc)
except Exception as e:
logger.warning(
f"[Scheduler] Failed to remember delivered output for {session_id}: {e}"
)
def _execute_agent_task(task: dict, agent_bridge):
"""
Execute an agent_task action - let Agent handle the task
@@ -165,6 +208,7 @@ def _execute_agent_task(task: dict, agent_bridge):
# Send the reply
channel.send(reply, context)
_remember_delivered_output(agent_bridge, task, channel_type, reply.content)
logger.info(f"[Scheduler] Task {task['id']} executed successfully, result sent to {receiver}")
else:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
@@ -255,6 +299,7 @@ def _execute_send_message(task: dict, agent_bridge):
logger.debug(f"[Scheduler] Registered request_id {request_id} -> session {receiver}")
channel.send(reply, context)
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: sent message to {receiver}")
else:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
@@ -351,6 +396,7 @@ def _execute_tool_call(task: dict, agent_bridge):
logger.debug(f"[Scheduler] Registered request_id {request_id} -> session {receiver}")
channel.send(reply, context)
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: sent tool result to {receiver}")
else:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
@@ -429,6 +475,24 @@ def _execute_skill_call(task: dict, agent_bridge):
if result_prefix:
content = f"{result_prefix}\n\n{content}"
# Send the result via channel
from channel.channel_factory import create_channel
try:
channel = create_channel(channel_type)
if channel:
# For web channel, register request_id
if channel_type == "web" and hasattr(channel, 'request_to_session'):
req_id = context.get("request_id")
if req_id:
channel.request_to_session[req_id] = receiver
logger.debug(f"[Scheduler] Registered request_id {req_id} -> session {receiver}")
channel.send(Reply(ReplyType.TEXT, content), context)
_remember_delivered_output(agent_bridge, task, channel_type, content)
except Exception as e:
logger.error(f"[Scheduler] Failed to send skill result: {e}")
logger.info(f"[Scheduler] Task {task['id']} executed: skill result sent to {receiver}")
else:
logger.error(f"[Scheduler] Task {task['id']}: No result from skill execution")

View File

@@ -158,6 +158,11 @@ class SchedulerTool(BaseTool):
# Create task
task_id = str(uuid.uuid4())[:8]
# Capture the real chat session_id at task creation time so that scheduler
# can later inject the delivered output into the user's actual conversation
# (in group chats, session_id != receiver, e.g. "user_id:group_id" on feishu).
notify_session_id = context.get("session_id")
# Build action based on message or ai_task
if message:
action = {
@@ -166,7 +171,8 @@ class SchedulerTool(BaseTool):
"receiver": context.get("receiver"),
"receiver_name": self._get_receiver_name(context),
"is_group": context.get("isgroup", False),
"channel_type": self.config.get("channel_type", "unknown")
"channel_type": self.config.get("channel_type", "unknown"),
"notify_session_id": notify_session_id,
}
else: # ai_task
action = {
@@ -175,7 +181,8 @@ class SchedulerTool(BaseTool):
"receiver": context.get("receiver"),
"receiver_name": self._get_receiver_name(context),
"is_group": context.get("isgroup", False),
"channel_type": self.config.get("channel_type", "unknown")
"channel_type": self.config.get("channel_type", "unknown"),
"notify_session_id": notify_session_id,
}
# 针对钉钉单聊,额外存储 sender_staff_id

View File

@@ -98,7 +98,18 @@ class Send(BaseTool):
"size_formatted": self._format_size(file_size),
"message": message or f"正在发送 {file_name}"
}
try:
from common.cloud_client import get_website_base_url, copy_send_file
# Do nothing when in local env
if get_website_base_url():
url = copy_send_file(absolute_path, self.cwd)
if url:
result["url"] = url
except Exception:
pass
return ToolResult.success(result)
def _resolve_path(self, path: str) -> str:

View File

@@ -84,11 +84,11 @@ class ToolManager:
except ImportError as e:
# Handle missing dependencies with helpful messages
error_msg = str(e)
if "browser-use" in error_msg or "browser_use" in error_msg:
if "playwright" in error_msg:
logger.warning(
f"[ToolManager] Browser tool not loaded - missing dependencies.\n"
f" To enable browser tool, run:\n"
f" pip install browser-use markdownify playwright\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif "markdownify" in error_msg:
@@ -154,11 +154,11 @@ class ToolManager:
except ImportError as e:
# Handle missing dependencies with helpful messages
error_msg = str(e)
if "browser-use" in error_msg or "browser_use" in error_msg:
if "playwright" in error_msg:
logger.warning(
f"[ToolManager] Browser tool not loaded - missing dependencies.\n"
f" To enable browser tool, run:\n"
f" pip install browser-use markdownify playwright\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif "markdownify" in error_msg:
@@ -197,7 +197,7 @@ class ToolManager:
logger.warning(
f"[ToolManager] Browser tool is configured but not loaded.\n"
f" To enable browser tool, run:\n"
f" pip install browser-use markdownify playwright\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif tool_name == "google_search":

View File

@@ -8,7 +8,10 @@ Truncation is based on two independent limits - whichever is hit first wins:
Never returns partial lines (except bash tail truncation edge case).
"""
from typing import Dict, Any, Optional, Literal, Tuple
from __future__ import annotations
from typing import Dict, Any, Optional, Tuple, TYPE_CHECKING
if TYPE_CHECKING:
from typing import Literal
DEFAULT_MAX_LINES = 2000

View File

@@ -1,22 +1,36 @@
"""
Vision tool - Analyze images using OpenAI-compatible Vision API.
Vision tool - Analyze images using Vision API.
Supports local files (auto base64-encoded) and HTTP URLs.
Providers: OpenAI (preferred) > LinkAI (fallback).
Provider resolution:
- tool.vision.model (if set) means "prefer this model first; fall back to
other configured providers if it fails". The model name is mapped to its
native provider (e.g. doubao-* → Doubao, kimi-* → Moonshot, gpt-* →
OpenAI/LinkAI). That provider is tried first, then the standard auto
chain runs as fallback (with the preferred provider de-duplicated).
- Auto chain priority:
1. Main model via bot.call_vision — only when the main bot is known
to actually support vision (not just expose a call_vision method).
2. Other models whose API key is configured.
3. OpenAI / LinkAI raw HTTP.
When use_linkai=true, LinkAI is promoted to #1.
"""
import base64
import os
import subprocess
import tempfile
from typing import Any, Dict, Optional, Tuple
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
import requests
from agent.tools.base_tool import BaseTool, ToolResult
from common import const
from common.log import logger
from config import conf
DEFAULT_MODEL = "gpt-4.1-mini"
DEFAULT_MODEL = const.GPT_41_MINI
DEFAULT_TIMEOUT = 60
MAX_TOKENS = 1000
COMPRESS_THRESHOLD = 1_048_576 # 1 MB
@@ -29,15 +43,66 @@ SUPPORTED_EXTENSIONS = {
"webp": "image/webp",
}
_MAIN_MODEL_PROVIDER_NAME = "MainModel"
# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
# Auto-discovered as fallback vision providers when their API key is configured.
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
_DISCOVERABLE_MODELS = [
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_6, "Moonshot"),
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
("qianfan_api_key", const.QIANFAN, const.ERNIE_45_TURBO_VL, "Qianfan"),
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
]
# Model name prefix → discoverable provider display_name.
# Used to auto-route tool.vision.model to its native provider.
# Matched case-insensitively; longest prefix wins.
_MODEL_PREFIX_TO_PROVIDER = [
("doubao-", "Doubao"),
("kimi-", "Moonshot"),
("moonshot-", "Moonshot"),
("qwen", "DashScope"), # qwen-*, qwen3-*, qwen3.6-*, etc.
("claude-", "Claude"),
("ernie-", "Qianfan"),
("gemini-", "Gemini"),
("glm-", "ZhipuAI"),
("minimax-", "MiniMax"),
("abab", "MiniMax"),
]
# Model prefixes that natively belong to OpenAI / LinkAI (raw HTTP providers).
_OPENAI_MODEL_PREFIXES = ("gpt-", "o1-", "o3-", "o4-", "chatgpt-")
@dataclass
class VisionProvider:
"""A single Vision API provider configuration."""
name: str
api_key: str
api_base: str
extra_headers: dict = field(default_factory=dict)
model_override: Optional[str] = None
use_bot: bool = False # When True, call via bot.call_vision instead of raw HTTP
fallback_bot: Any = None # Bot instance for non-main-model providers
class VisionAPIError(Exception):
"""Raised when a Vision API call fails and should trigger fallback."""
pass
class Vision(BaseTool):
"""Analyze images using OpenAI-compatible Vision API"""
"""Analyze images using Vision API"""
name: str = "vision"
description: str = (
"Analyze an image (local file or URL) using Vision API. "
"Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
"Can describe content, extract text, identify objects, colors, etc. "
"Requires OPENAI_API_KEY or LINKAI_API_KEY."
)
params: dict = {
@@ -51,13 +116,6 @@ class Vision(BaseTool):
"type": "string",
"description": "Question to ask about the image",
},
"model": {
"type": "string",
"description": (
f"Vision model to use (default: {DEFAULT_MODEL}). "
"Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o"
),
},
},
"required": ["image", "question"],
}
@@ -67,29 +125,26 @@ class Vision(BaseTool):
@staticmethod
def is_available() -> bool:
return bool(
conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
or conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
)
return True
def execute(self, args: Dict[str, Any]) -> ToolResult:
image = args.get("image", "").strip()
question = args.get("question", "").strip()
model = args.get("model", DEFAULT_MODEL).strip() or DEFAULT_MODEL
if not image:
return ToolResult.fail("Error: 'image' parameter is required")
if not question:
return ToolResult.fail("Error: 'question' parameter is required")
api_key, api_base = self._resolve_provider()
if not api_key:
providers = self._resolve_providers()
if not providers:
return ToolResult.fail(
"Error: No API key configured for Vision.\n"
"Please configure one of the following using env_config tool:\n"
" 1. OPENAI_API_KEY (preferred): env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 2. LINKAI_API_KEY (fallback): env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")\n\n"
"Get your key at: https://platform.openai.com/api-keys or https://link-ai.tech"
"Error: No model available for Vision.\n"
"The main model does not support vision and no other API keys are configured.\n"
"Options:\n"
" 1. Switch to a multimodal model (e.g. ernie-4.5-turbo-vl, qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
)
try:
@@ -97,32 +152,406 @@ class Vision(BaseTool):
except Exception as e:
return ToolResult.fail(f"Error: {e}")
# Default model is only used as a last-resort placeholder for providers
# whose VisionProvider.model_override is None (e.g. raw OpenAI provider
# when the user did not configure tool.vision.model).
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
question: str, image_content: dict) -> ToolResult:
"""Try each provider in order; fall back to the next one on failure."""
errors: List[str] = []
for i, provider in enumerate(providers):
use_model = provider.model_override or model
try:
logger.info(f"[Vision] Trying provider '{provider.name}' "
f"with model '{use_model}' ({i + 1}/{len(providers)})")
if provider.use_bot:
result = self._call_via_bot(use_model, question, image_content, provider)
else:
result = self._call_api(provider, use_model, question, image_content)
logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
return result
except VisionAPIError as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
except requests.Timeout:
errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
logger.warning(f"[Vision] Provider '{provider.name}' timed out")
except requests.ConnectionError:
errors.append(f"[{provider.name}/{use_model}] Connection failed")
logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
except Exception as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
return ToolResult.fail(
"Error: All Vision API providers failed.\n" + "\n".join(f" - {err}" for err in errors)
)
def _resolve_providers(self) -> List[VisionProvider]:
"""
Build an ordered list of providers to try.
Semantics of `tool.vision.model`:
"Prefer this model first; fall back to other configured providers
if it fails."
Order:
1. The provider that natively serves `tool.vision.model` (if any
and its API key is configured) — using the user-specified model
name verbatim.
2. Auto-discovery chain as fallback:
- use_linkai=true → [LinkAI, MainModel?, OtherModels…, OpenAI]
- default → [MainModel?, OtherModels…, OpenAI, LinkAI]
MainModel is only included when the main bot is known to support
vision (see _main_bot_supports_vision).
Providers that share the same display name as the preferred provider
are de-duplicated to avoid retrying the same endpoint twice.
"""
user_model = self._resolve_user_vision_model()
providers: List[VisionProvider] = []
# Step 1: preferred provider derived from tool.vision.model
if user_model:
preferred = self._route_by_model_name(user_model)
if preferred:
providers.extend(preferred)
# Step 2: auto-discovery chain as fallback
existing = {p.name for p in providers}
fallback: List[VisionProvider] = []
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
if use_linkai:
self._append_provider(fallback, lambda: self._build_linkai_provider(user_model))
self._append_provider(fallback, self._build_main_model_provider)
self._append_other_model_providers(fallback, preferred_model=user_model)
self._append_provider(fallback, lambda: self._build_openai_provider(user_model))
else:
self._append_provider(fallback, self._build_main_model_provider)
self._append_other_model_providers(fallback, preferred_model=user_model)
self._append_provider(fallback, lambda: self._build_openai_provider(user_model))
self._append_provider(fallback, lambda: self._build_linkai_provider(user_model))
for p in fallback:
if p.name in existing:
continue
providers.append(p)
existing.add(p.name)
return providers
@staticmethod
def _append_provider(providers: List[VisionProvider], builder) -> None:
p = builder()
if p:
providers.append(p)
@staticmethod
def _resolve_user_vision_model() -> Optional[str]:
"""Read tool.vision.model from config; return None if unset/blank."""
tool_conf = conf().get("tool", {})
if not isinstance(tool_conf, dict):
return None
vision_conf = tool_conf.get("vision", {})
if not isinstance(vision_conf, dict):
return None
m = vision_conf.get("model")
if isinstance(m, str) and m.strip():
return m.strip()
return None
@staticmethod
def _infer_provider_from_model(model_name: str) -> Optional[str]:
"""
Infer the provider display name from a model name's prefix.
Returns None when no rule matches (or for OpenAI-family names, which
are handled separately by the caller).
"""
if not model_name:
return None
lower = model_name.lower()
# Sort by prefix length desc so e.g. "moonshot-" wins over hypothetical "moo-"
for prefix, display_name in sorted(_MODEL_PREFIX_TO_PROVIDER, key=lambda x: -len(x[0])):
if lower.startswith(prefix.lower()):
return display_name
return None
def _route_by_model_name(self, user_model: str) -> Optional[List[VisionProvider]]:
"""
Try to build a provider list using the user-specified model name.
Returns:
- [provider] : matched and the provider's key is configured
- [] : matched but key missing → tell caller to surface this
as a hard error rather than silently falling back
- None : no rule matches → caller should fall through to auto
"""
lower = user_model.lower()
# OpenAI / LinkAI family
if lower.startswith(_OPENAI_MODEL_PREFIXES):
providers: List[VisionProvider] = []
# Prefer LinkAI when explicitly enabled, else OpenAI first
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
if use_linkai:
self._append_provider(providers, lambda: self._build_linkai_provider(user_model))
self._append_provider(providers, lambda: self._build_openai_provider(user_model))
else:
self._append_provider(providers, lambda: self._build_openai_provider(user_model))
self._append_provider(providers, lambda: self._build_linkai_provider(user_model))
if providers:
return providers
logger.warning(f"[Vision] tool.vision.model='{user_model}' looks like an OpenAI "
f"model but neither OPENAI_API_KEY nor LINKAI_API_KEY is configured.")
return None # fall through to auto
# Discoverable native providers (Doubao, Moonshot, etc.)
target_display = self._infer_provider_from_model(user_model)
if not target_display:
return None # unknown prefix → auto
for config_key, bot_type, _default_model, display_name in _DISCOVERABLE_MODELS:
if display_name != target_display:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
logger.warning(f"[Vision] tool.vision.model='{user_model}' routes to "
f"'{display_name}' but '{config_key}' is not configured. "
f"Falling back to auto-discovery.")
return None # fall through to auto
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
logger.warning(f"[Vision] '{display_name}' bot does not implement call_vision.")
return None
except Exception as e:
logger.warning(f"[Vision] Failed to create '{display_name}' bot: {e}")
return None
return [VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=user_model,
use_bot=True,
fallback_bot=bot,
)]
return None
def _append_other_model_providers(self, providers: List[VisionProvider],
preferred_model: Optional[str] = None) -> None:
"""
Auto-discover other models whose API key is configured.
Skip the main model's own bot_type (already covered by MainModel
provider), unless the main model itself does not support vision —
in that case we still want the vendor's dedicated vision model
as a fallback. Also skip bot_types that already appear in the
provider list.
If preferred_model matches a provider's family, use it instead
of that provider's hard-coded default model.
"""
main_bot_type = None
main_bot_supports_vision = False
if self.model and hasattr(self.model, '_resolve_bot_type'):
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
main_bot = getattr(self.model, "bot", None)
main_bot_supports_vision = self._main_bot_supports_vision(main_bot)
existing_names = {p.name for p in providers}
preferred_provider = self._infer_provider_from_model(preferred_model) if preferred_model else None
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
if display_name in existing_names:
continue
# Same bot_type as the main model is normally handled by the
# MainModel provider; only skip it here if the main model
# actually supports vision. Otherwise fall through and add
# the vendor's dedicated vision model as a fallback.
if bot_type == main_bot_type and main_bot_supports_vision:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
continue
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
continue
except Exception:
continue
model_for_provider = (preferred_model
if preferred_provider == display_name and preferred_model
else default_model)
provider = VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=model_for_provider,
use_bot=True,
fallback_bot=bot,
)
# Same vendor as the main bot is the most natural fallback when
# the main model itself does not support vision — promote it to
# the front of the list instead of relying on declaration order.
if bot_type == main_bot_type:
providers.insert(0, provider)
else:
providers.append(provider)
def _main_bot_supports_vision(self, bot) -> bool:
"""
Whether the main bot is known to natively support vision.
Having a `call_vision` method is necessary but not sufficient —
some bots implement the method against an endpoint that does not
actually serve vision models, which causes silent failures when a
vendor-foreign model name is forwarded.
Resolution order:
1. If the bot explicitly declares `supports_vision`, trust it.
This lets bots opt in or out based on their own runtime
configuration (e.g. the currently selected model).
2. Otherwise, fall back to a model-name prefix heuristic: trust
call_vision when the main model looks like an OpenAI family
model or matches a known multimodal vendor prefix.
"""
if bot is None:
return False
if hasattr(bot, "supports_vision"):
return bool(getattr(bot, "supports_vision"))
main_model = (conf().get("model") or "").lower()
if not main_model:
return False
if main_model.startswith(_OPENAI_MODEL_PREFIXES):
return True
return self._infer_provider_from_model(main_model) is not None
def _build_main_model_provider(self) -> Optional[VisionProvider]:
"""
Use the vendor's own model for vision via bot.call_vision.
Gated by _main_bot_supports_vision so non-vision bots (DeepSeek, etc.)
do not get routed vendor-foreign model names.
"""
if not (self.model and hasattr(self.model, 'bot')):
return None
try:
return self._call_api(api_key, api_base, model, question, image_content)
except requests.Timeout:
return ToolResult.fail(f"Error: Vision API request timed out after {DEFAULT_TIMEOUT}s")
except requests.ConnectionError:
return ToolResult.fail("Error: Failed to connect to Vision API")
except Exception as e:
logger.error(f"[Vision] Unexpected error: {e}", exc_info=True)
return ToolResult.fail(f"Error: Vision API call failed - {e}")
bot = self.model.bot
except Exception:
return None
if not hasattr(bot, 'call_vision'):
return None
if not self._main_bot_supports_vision(bot):
return None
def _resolve_provider(self) -> Tuple[Optional[str], str]:
"""Resolve API key and base URL. Priority: conf() > env vars."""
# Use the configured main model name; do NOT inject tool.vision.model
# here, because by the time we reach this branch the tool.vision.model
# routing has already been attempted (and either matched the main bot
# or failed to find a provider).
main_model_name = conf().get("model") or None
return VisionProvider(
name=_MAIN_MODEL_PROVIDER_NAME,
api_key="",
api_base="",
model_override=main_model_name,
use_bot=True,
)
def _build_openai_provider(self, preferred_model: Optional[str] = None) -> Optional[VisionProvider]:
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
if api_key:
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
return api_key, self._ensure_v1(api_base)
if not api_key:
return None
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
# Only honor preferred_model when it looks like an OpenAI-family name;
# otherwise the OpenAI endpoint would 400 on a vendor-specific name.
model_override = preferred_model if (
preferred_model and preferred_model.lower().startswith(_OPENAI_MODEL_PREFIXES)
) else None
return VisionProvider(
name="OpenAI",
api_key=api_key,
api_base=self._ensure_v1(api_base),
model_override=model_override,
)
def _build_linkai_provider(self, preferred_model: Optional[str] = None) -> Optional[VisionProvider]:
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
if api_key:
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
logger.debug("[Vision] Using LinkAI API (OPENAI_API_KEY not set)")
return api_key, self._ensure_v1(api_base)
if not api_key:
return None
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
from common.utils import get_cloud_headers
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
# LinkAI is a multi-vendor proxy and accepts most model names, so we
# honor any user-configured model name here.
return VisionProvider(
name="LinkAI",
api_key=api_key,
api_base=self._ensure_v1(api_base),
extra_headers=extra,
model_override=preferred_model,
)
return None, ""
def _call_via_bot(self, model: str, question: str, image_content: dict,
provider: Optional[VisionProvider] = None) -> ToolResult:
"""
Call a model's call_vision with vendor-native API format.
Uses the provider's _fallback_bot if set, otherwise the main model bot.
Raises VisionAPIError on failure so fallback can proceed.
"""
try:
bot = (provider and provider.fallback_bot) or self.model.bot
except Exception as e:
raise VisionAPIError(f"Cannot access bot: {e}")
# Extract the raw image URL from the OpenAI-format image_content block
image_url = image_content.get("image_url", {}).get("url", "")
if not image_url:
raise VisionAPIError("No image URL in content block")
try:
response = bot.call_vision(
image_url=image_url,
question=question,
model=model,
max_tokens=MAX_TOKENS,
)
except Exception as e:
raise VisionAPIError(f"call_vision failed: {e}")
if response is NotImplemented:
raise VisionAPIError("Bot does not support vision")
if isinstance(response, dict) and response.get("error"):
raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
content = response.get("content", "") if isinstance(response, dict) else ""
if not content:
raise VisionAPIError("Empty response from main model")
usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
# Use the actual model name from the bot response if available
actual_model = response.get("model", model) if isinstance(response, dict) else model
provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
return ToolResult.success({
"model": actual_model,
"provider": provider_name,
"content": content,
"usage": usage_info,
})
@staticmethod
def _ensure_v1(api_base: str) -> str:
@@ -135,9 +564,13 @@ class Vision(BaseTool):
return api_base.rstrip("/") + "/v1"
def _build_image_content(self, image: str) -> dict:
"""Build the image_url content block for the API request."""
"""
Build the image_url content block.
Both remote URLs and local files are converted to base64 data URLs
so every bot backend can consume them without extra downloads.
"""
if image.startswith(("http://", "https://")):
return {"type": "image_url", "image_url": {"url": image}}
return self._download_to_data_url(image)
if not os.path.isfile(image):
raise FileNotFoundError(f"Image file not found: {image}")
@@ -161,9 +594,22 @@ class Vision(BaseTool):
data_url = f"data:{mime_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _download_to_data_url(url: str) -> dict:
"""Download a remote image and return it as a base64 data URL."""
resp = requests.get(url, timeout=30)
if resp.status_code != 200:
raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
if not content_type.startswith("image/"):
content_type = "image/jpeg"
b64 = base64.b64encode(resp.content).decode("ascii")
data_url = f"data:{content_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _maybe_compress(path: str) -> str:
"""Compress image if larger than threshold; return path to use."""
"""Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
file_size = os.path.getsize(path)
if file_size <= COMPRESS_THRESHOLD:
return path
@@ -171,33 +617,58 @@ class Vision(BaseTool):
tmp = tempfile.NamedTemporaryFile(suffix=".jpg", delete=False)
tmp.close()
try:
# macOS: use sips
subprocess.run(
["sips", "-Z", "800", path, "--out", tmp.name],
capture_output=True, check=True,
)
logger.debug(f"[Vision] Compressed image ({file_size // 1024}KB -> {os.path.getsize(tmp.name) // 1024}KB)")
return tmp.name
except (FileNotFoundError, subprocess.CalledProcessError):
pass
def _try_sips(max_dim: str, quality: str) -> bool:
try:
subprocess.run(
["sips", "-Z", max_dim, "-s", "formatOptions", quality,
path, "--out", tmp.name],
capture_output=True, check=True,
)
return True
except (FileNotFoundError, subprocess.CalledProcessError):
return False
try:
# Linux: use ImageMagick convert
subprocess.run(
["convert", path, "-resize", "800x800>", tmp.name],
capture_output=True, check=True,
)
logger.debug(f"[Vision] Compressed image ({file_size // 1024}KB -> {os.path.getsize(tmp.name) // 1024}KB)")
def _try_convert(max_dim: str, quality: str) -> bool:
try:
subprocess.run(
["convert", path, "-resize", f"{max_dim}x{max_dim}>",
"-quality", quality, tmp.name],
capture_output=True, check=True,
)
return True
except (FileNotFoundError, subprocess.CalledProcessError):
return False
attempts = [
("1536", "85"),
("1536", "70"),
("1536", "50"),
]
for max_dim, quality in attempts:
ok = _try_sips(max_dim, quality) or _try_convert(max_dim, quality)
if not ok:
continue
new_size = os.path.getsize(tmp.name)
logger.debug(f"[Vision] Compressed image "
f"({file_size // 1024}KB -> {new_size // 1024}KB, "
f"max_dim={max_dim}, q={quality})")
if new_size <= COMPRESS_THRESHOLD:
return tmp.name
if os.path.exists(tmp.name) and os.path.getsize(tmp.name) > 0:
return tmp.name
except (FileNotFoundError, subprocess.CalledProcessError):
pass
os.remove(tmp.name)
return path
def _call_api(self, api_key: str, api_base: str, model: str,
def _call_api(self, provider: VisionProvider, model: str,
question: str, image_content: dict) -> ToolResult:
"""
Call a single provider's Vision API.
Raises VisionAPIError on recoverable failures so the caller can try
the next provider.
"""
payload = {
"model": model,
"messages": [
@@ -209,33 +680,29 @@ class Vision(BaseTool):
],
}
],
"max_tokens": MAX_TOKENS,
}
headers = {
"Authorization": f"Bearer {api_key}",
"Authorization": f"Bearer {provider.api_key}",
"Content-Type": "application/json",
**provider.extra_headers,
}
resp = requests.post(
f"{api_base}/chat/completions",
f"{provider.api_base}/chat/completions",
headers=headers,
json=payload,
timeout=DEFAULT_TIMEOUT,
)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid API key. Please check your configuration.")
if resp.status_code == 429:
return ToolResult.fail("Error: API rate limit reached. Please try again later.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: Vision API returned HTTP {resp.status_code}: {resp.text[:200]}")
raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
if "error" in data:
msg = data["error"].get("message", "Unknown API error")
return ToolResult.fail(f"Error: Vision API error - {msg}")
raise VisionAPIError(f"API error - {msg}")
content = ""
choices = data.get("choices", [])
@@ -245,6 +712,7 @@ class Vision(BaseTool):
usage = data.get("usage", {})
result = {
"model": model,
"provider": provider.name,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),

View File

@@ -225,10 +225,8 @@ class WebSearch(BaseTool):
api_base = conf().get("linkai_api_base", "https://api.link-ai.tech")
url = f"{api_base.rstrip('/')}/v1/plugin/execute"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
from common.utils import get_cloud_headers
headers = get_cloud_headers(api_key)
payload = {
"code": "web-search",

46
app.py
View File

@@ -78,7 +78,13 @@ class ChannelManager:
if first_start:
PluginManager().load_plugins()
if conf().get("use_linkai"):
# Cloud client is optional. It is only started when
# use_linkai=True AND cloud_deployment_id is set.
# By default neither is configured, so the app runs
# entirely locally without any remote connection.
if conf().get("use_linkai") and (
os.environ.get("CLOUD_DEPLOYMENT_ID") or conf().get("cloud_deployment_id")
):
try:
from common import cloud_client
threading.Thread(
@@ -229,6 +235,8 @@ def _clear_singleton_cache(channel_name: str):
const.DINGTALK: "channel.dingtalk.dingtalk_channel.DingTalkChanel",
const.WECOM_BOT: "channel.wecom_bot.wecom_bot_channel.WecomBotChannel",
const.QQ: "channel.qq.qq_channel.QQChannel",
const.WEIXIN: "channel.weixin.weixin_channel.WeixinChannel",
"wx": "channel.weixin.weixin_channel.WeixinChannel",
}
module_path = cls_map.get(channel_name)
if not module_path:
@@ -266,6 +274,39 @@ def sigterm_handler_wrap(_signo):
signal.signal(_signo, func)
def _sync_builtin_skills():
"""Sync builtin skills from project skills/ to workspace skills/ on startup."""
import shutil
try:
workspace = conf().get("agent_workspace", "~/cow")
workspace = os.path.expanduser(workspace)
project_root = os.path.dirname(os.path.abspath(__file__))
builtin_dir = os.path.join(project_root, "skills")
custom_dir = os.path.join(workspace, "skills")
if not os.path.isdir(builtin_dir):
return
os.makedirs(custom_dir, exist_ok=True)
synced = 0
for name in os.listdir(builtin_dir):
src = os.path.join(builtin_dir, name)
if not os.path.isdir(src) or not os.path.isfile(os.path.join(src, "SKILL.md")):
continue
dst = os.path.join(custom_dir, name)
try:
if os.path.isdir(dst):
shutil.rmtree(dst)
shutil.copytree(src, dst)
synced += 1
except Exception as e:
logger.warning(f"[App] Failed to sync builtin skill '{name}': {e}")
if synced:
logger.info(f"[App] Synced {synced} builtin skill(s) to workspace")
except Exception as e:
logger.warning(f"[App] Builtin skills sync failed: {e}")
def run():
global _channel_mgr
try:
@@ -291,6 +332,9 @@ def run():
if web_console_enabled and "web" not in channel_names:
channel_names.append("web")
# Sync builtin skills to workspace before channels start
_sync_builtin_skills()
logger.info(f"[App] Starting channels: {channel_names}")
_channel_mgr = ChannelManager()

View File

@@ -14,6 +14,7 @@ from bridge.reply import Reply, ReplyType
from common import const
from common.log import logger
from common.utils import expand_path
from config import conf
from models.openai_compatible_bot import OpenAICompatibleBot
@@ -67,18 +68,19 @@ class AgentLLMModel(LLMModel):
_MODEL_BOT_TYPE_MAP = {
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
const.QIANFAN: const.QIANFAN,
const.MODELSCOPE: const.MODELSCOPE,
}
_MODEL_PREFIX_MAP = [
("qwen", const.QWEN_DASHSCOPE), ("qwq", const.QWEN_DASHSCOPE), ("qvq", const.QWEN_DASHSCOPE),
("gemini", const.GEMINI), ("glm", const.ZHIPU_AI), ("claude", const.CLAUDEAPI),
("moonshot", const.MOONSHOT), ("kimi", const.MOONSHOT),
("doubao", const.DOUBAO),
("doubao", const.DOUBAO), ("deepseek", const.DEEPSEEK),
("ernie", const.QIANFAN),
]
def __init__(self, bridge: Bridge, bot_type: str = "chat"):
from config import conf
super().__init__(model=conf().get("model", const.GPT_41))
self.bridge = bridge
self.bot_type = bot_type
@@ -87,7 +89,6 @@ class AgentLLMModel(LLMModel):
@property
def model(self):
from config import conf
return conf().get("model", const.GPT_41)
@model.setter
@@ -96,8 +97,6 @@ class AgentLLMModel(LLMModel):
def _resolve_bot_type(self, model_name: str) -> str:
"""Resolve bot type from model name, matching Bridge.__init__ logic."""
from config import conf
if conf().get("use_linkai", False) and conf().get("linkai_api_key"):
return const.LINKAI
# Support custom bot type configuration
@@ -106,7 +105,7 @@ class AgentLLMModel(LLMModel):
return configured_bot_type
if not model_name or not isinstance(model_name, str):
return const.CHATGPT
return const.OPENAI
if model_name in self._MODEL_BOT_TYPE_MAP:
return self._MODEL_BOT_TYPE_MAP[model_name]
if model_name.lower().startswith("minimax") or model_name in ["abab6.5-chat"]:
@@ -115,23 +114,25 @@ class AgentLLMModel(LLMModel):
return const.QWEN_DASHSCOPE
if model_name in [const.MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k"]:
return const.MOONSHOT
if model_name in [const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER]:
return const.CHATGPT
if conf().get("bot_type") == "modelscope":
return const.MODELSCOPE
lowered_model = model_name.lower()
for prefix, btype in self._MODEL_PREFIX_MAP:
if model_name.startswith(prefix):
if lowered_model.startswith(prefix):
return btype
return const.CHATGPT
return const.OPENAI
@property
def bot(self):
"""Lazy load the bot, re-create when model changes"""
"""Lazy load the bot, re-create when model or bot_type changes"""
from models.bot_factory import create_bot
cur_model = self.model
if self._bot is None or self._bot_model != cur_model:
bot_type = self._resolve_bot_type(cur_model)
self._bot = create_bot(bot_type)
cur_bot_type = self._resolve_bot_type(cur_model)
if self._bot is None or self._bot_model != cur_model or getattr(self, '_bot_type', None) != cur_bot_type:
self._bot = create_bot(cur_bot_type)
self._bot = add_openai_compatible_support(self._bot)
self._bot_model = cur_model
self._bot_type = cur_bot_type
return self._bot
def call(self, request: LLMRequest):
@@ -152,12 +153,30 @@ class AgentLLMModel(LLMModel):
# Only pass max_tokens if it's explicitly set
if request.max_tokens is not None:
kwargs['max_tokens'] = request.max_tokens
# Extract system prompt if present
system_prompt = getattr(request, 'system', None)
if system_prompt:
kwargs['system'] = system_prompt
# Pass context metadata to bot
channel_type = getattr(self, 'channel_type', None) or ''
if channel_type:
kwargs['channel_type'] = channel_type
session_id = getattr(self, 'session_id', None)
if session_id:
kwargs['session_id'] = session_id
# Thinking mode is a global toggle independent of the channel.
# IM channels (WeChat/WeCom/DingTalk/Feishu) won't render the
# reasoning trace, but still benefit from the higher answer
# quality the thinking pass produces.
from config import conf
kwargs['thinking'] = (
{"type": "enabled"} if conf().get("enable_thinking", False)
else {"type": "disabled"}
)
response = self.bot.call_with_tools(**kwargs)
return self._format_response(response)
else:
@@ -195,10 +214,23 @@ class AgentLLMModel(LLMModel):
if system_prompt:
kwargs['system'] = system_prompt
# Pass channel_type for linkai tracking
channel_type = getattr(self, 'channel_type', None)
# Pass context metadata to bot
channel_type = getattr(self, 'channel_type', None) or ''
if channel_type:
kwargs['channel_type'] = channel_type
session_id = getattr(self, 'session_id', None)
if session_id:
kwargs['session_id'] = session_id
# Thinking mode is a global toggle independent of the channel.
# IM channels (WeChat/WeCom/DingTalk/Feishu) won't render the
# reasoning trace, but still benefit from the higher answer
# quality the thinking pass produces.
from config import conf
kwargs['thinking'] = (
{"type": "enabled"} if conf().get("enable_thinking", False)
else {"type": "disabled"}
)
stream = self.bot.call_with_tools(**kwargs)
@@ -262,10 +294,13 @@ class AgentBridge:
tool_manager.load_tools()
tools = []
workspace_dir = kwargs.get("workspace_dir")
for tool_name in tool_manager.tool_classes.keys():
try:
tool = tool_manager.create_tool(tool_name)
if tool:
if workspace_dir and hasattr(tool, 'cwd'):
tool.cwd = workspace_dir
tools.append(tool)
except Exception as e:
logger.warning(f"[AgentBridge] Failed to load tool {tool_name}: {e}")
@@ -375,13 +410,26 @@ class AgentBridge:
logger.warning(f"[AgentBridge] Failed to attach context to scheduler: {e}")
break
# Pass channel_type to model so linkai requests carry it
# Pass context metadata to model for downstream API requests
if context and hasattr(agent, 'model'):
agent.model.channel_type = context.get("channel_type", "")
agent.model.session_id = session_id or ""
# Store session_id on agent so executor can clear DB on fatal errors
agent._current_session_id = session_id
# Bound the in-memory context for scheduler sessions before each run.
# Scheduler sessions are stable per-task and append every trigger,
# so without trimming they would grow unbounded across runs and
# blow up prompt cost. Regular user chats are not touched here —
# the agent's own context manager handles that path.
if session_id and session_id.startswith("scheduler_"):
from config import conf
scheduler_keep_turns = max(
1, int(conf().get("agent_max_context_turns", 20)) // 5
)
self._trim_in_memory_to_turns(agent, scheduler_keep_turns)
try:
# Use agent's run_stream method with event handler
response = agent.run_stream(
@@ -414,7 +462,7 @@ class AgentBridge:
except Exception as e:
logger.warning(f"[AgentBridge] Failed to clear DB after recovery: {e}")
# Check if there are files to send (from read tool)
# Check if there are files to send (from send/read tool)
if hasattr(agent, 'stream_executor') and hasattr(agent.stream_executor, 'files_to_send'):
files_to_send = agent.stream_executor.files_to_send
if files_to_send:
@@ -483,22 +531,26 @@ class AgentBridge:
reply.text_content = text_response
return reply
# For other unknown file types, return text with file info
message = text_response or file_info.get("message", "文件已准备")
message += f"\n\n[文件: {file_info.get('file_name', file_path)}]"
return Reply(ReplyType.TEXT, message)
# For all other file types (tar.gz, zip, etc.), also use FILE type
file_url = f"file://{file_path}"
logger.info(f"[AgentBridge] Sending generic file: {file_url}")
reply = Reply(ReplyType.FILE, file_url)
reply.file_name = file_info.get("file_name", os.path.basename(file_path))
if text_response:
reply.text_content = text_response
return reply
def _migrate_config_to_env(self, workspace_root: str):
"""
Migrate API keys from config.json to .env file if not already set
Sync API keys from config.json to .env file.
Adds new keys and updates changed values on each startup.
Args:
workspace_root: Workspace directory path (not used, kept for compatibility)
"""
from config import conf
import os
# Mapping from config.json keys to environment variable names
key_mapping = {
"open_ai_api_key": "OPENAI_API_KEY",
"open_ai_api_base": "OPENAI_API_BASE",
@@ -507,10 +559,9 @@ class AgentBridge:
"linkai_api_key": "LINKAI_API_KEY",
}
# Use fixed secure location for .env file
env_file = expand_path("~/.cow/.env")
# Read existing env vars from .env file
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
@@ -518,48 +569,46 @@ class AgentBridge:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, _ = line.split('=', 1)
existing_env_vars[key.strip()] = True
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentBridge] Failed to read .env file: {e}")
# Check which keys need to be migrated
keys_to_migrate = {}
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
# Skip if already in .env file
if env_key in existing_env_vars:
continue
# Get value from config.json
value = conf().get(config_key, "")
if value and value.strip(): # Only migrate non-empty values
keys_to_migrate[env_key] = value.strip()
# Log summary if there are keys to skip
if existing_env_vars:
logger.debug(f"[AgentBridge] {len(existing_env_vars)} env vars already in .env")
# Write new keys to .env file
if keys_to_migrate:
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
updated = True
if updated:
try:
# Ensure ~/.cow directory and .env file exist
env_dir = os.path.dirname(env_file)
if not os.path.exists(env_dir):
os.makedirs(env_dir, exist_ok=True)
if not os.path.exists(env_file):
open(env_file, 'a').close()
# Append new keys
with open(env_file, 'a', encoding='utf-8') as f:
f.write('\n# Auto-migrated from config.json\n')
for key, value in keys_to_migrate.items():
os.makedirs(env_dir, exist_ok=True)
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
# Also set in current process
os.environ[key] = value
logger.info(f"[AgentBridge] Migrated {len(keys_to_migrate)} API keys from config.json to .env: {list(keys_to_migrate.keys())}")
logger.info(f"[AgentBridge] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentBridge] Failed to migrate API keys: {e}")
logger.warning(f"[AgentBridge] Failed to sync API keys: {e}")
def _persist_messages(
self, session_id: str, new_messages: list, channel_type: str = ""
@@ -575,18 +624,245 @@ class AgentBridge:
from config import conf
if not conf().get("conversation_persistence", True):
return
# When deep-thinking display is disabled, strip "thinking" content
# blocks before persisting so they don't resurface on history reload.
# The in-memory message list keeps them intact for this run's
# multi-turn LLM context.
thinking_enabled = bool(conf().get("enable_thinking", False))
except Exception:
pass
thinking_enabled = False
messages_to_store = new_messages
if not thinking_enabled:
messages_to_store = self._strip_thinking_blocks(new_messages)
try:
from agent.memory import get_conversation_store
get_conversation_store().append_messages(
session_id, new_messages, channel_type=channel_type
session_id, messages_to_store, channel_type=channel_type
)
except Exception as e:
logger.warning(
f"[AgentBridge] Failed to persist messages for session={session_id}: {e}"
)
# Marker used to identify scheduler-injected user messages so we can apply
# a sliding window without touching real user turns. The legacy prefix
# "Scheduled task" (written by the v2 PR) is also recognised when pruning,
# so old data can be aged out instead of leaking forever.
_SCHEDULED_MARKER = "[SCHEDULED]"
_SCHEDULED_LEGACY_MARKERS = ("Scheduled task",)
def remember_scheduled_output(
self,
session_id: str,
content: str,
channel_type: str = "",
task_description: str = "",
) -> None:
"""Add the visible output of a scheduled task to the receiver's session.
Scheduled task execution uses an isolated session so internal planning and
tool calls do not leak into the user's chat. The final message is still
part of the conversation from the user's point of view, so keep a small
visible turn in the receiver session for follow-up questions.
Configuration:
scheduler_inject_to_session (bool, default True):
Master switch. When False, this method is a no-op.
scheduler_inject_max_per_session (int, default 3):
Maximum scheduler-injected user/assistant pairs retained per
session. Older injections are pruned automatically.
Content is truncated to 2000 chars to prevent a single high-volume task
from bloating one entry.
"""
from config import conf
if not conf().get("scheduler_inject_to_session", True):
return
if not session_id or not content:
return
max_len = 2000
if len(content) > max_len:
content = content[:max_len] + "..."
user_text = self._SCHEDULED_MARKER
if task_description:
user_text = f"{self._SCHEDULED_MARKER} {task_description}"
messages = [
{"role": "user", "content": [{"type": "text", "text": user_text}]},
{"role": "assistant", "content": [{"type": "text", "text": content}]},
]
# Persist first so the new pair gets a stable seq, then prune old
# scheduler pairs in DB, then sync the in-memory agent.messages buffer.
self._persist_messages(session_id, messages, channel_type)
keep_last_n = max(int(conf().get("scheduler_inject_max_per_session", 3) or 0), 0)
try:
from agent.memory import get_conversation_store
deleted = get_conversation_store().prune_scheduled_messages(
session_id, keep_last_n=keep_last_n
)
if deleted:
logger.debug(
f"[AgentBridge] Pruned {deleted} old scheduler messages "
f"for session={session_id} (keep_last_n={keep_last_n})"
)
except Exception as e:
logger.warning(
f"[AgentBridge] Failed to prune scheduled messages "
f"for session={session_id}: {e}"
)
agent = self.agents.get(session_id)
if agent:
try:
with agent.messages_lock:
agent.messages.extend(messages)
self._prune_scheduled_in_memory(agent, keep_last_n)
except Exception as e:
logger.warning(
f"[AgentBridge] Failed to update in-memory scheduled output "
f"for session={session_id}: {e}"
)
@staticmethod
def _trim_in_memory_to_turns(agent, keep_turns: int) -> None:
"""Bound ``agent.messages`` to the most recent ``keep_turns`` real
user/assistant turns, dropping older history together with any
intermediate tool_use/tool_result blocks that belonged to it.
A "real" user message is any user message whose content is not solely a
tool_result block — matches the heuristic used elsewhere when filtering
history (see ``AgentInitializer._filter_text_only_messages``).
No-op when the session is already within budget. Caller does not need
to hold the lock; this method acquires it itself.
"""
if keep_turns <= 0:
return
def _is_real_user(msg) -> bool:
if not isinstance(msg, dict) or msg.get("role") != "user":
return False
content = msg.get("content")
if isinstance(content, list):
if any(
isinstance(b, dict) and b.get("type") == "tool_result"
for b in content
):
return False
return any(
isinstance(b, dict) and b.get("type") == "text" and b.get("text")
for b in content
)
if isinstance(content, str):
return bool(content.strip())
return False
with agent.messages_lock:
msgs = agent.messages
real_user_indices = [i for i, m in enumerate(msgs) if _is_real_user(m)]
if len(real_user_indices) <= keep_turns:
return
# Cut at the (k-th from the end) real user message; keep everything
# from there onwards so the surviving slice is still a valid
# user/assistant sequence.
cut_idx = real_user_indices[-keep_turns]
if cut_idx == 0:
return
kept = msgs[cut_idx:]
msgs.clear()
msgs.extend(kept)
logger.debug(
f"[AgentBridge] Trimmed in-memory messages to last "
f"{keep_turns} turns ({len(kept)} messages remain)"
)
@classmethod
def _prune_scheduled_in_memory(cls, agent, keep_last_n: int) -> None:
"""Mirror conversation_store.prune_scheduled_messages on agent.messages.
Caller must hold ``agent.messages_lock``.
"""
if keep_last_n < 0:
keep_last_n = 0
markers = (cls._SCHEDULED_MARKER,) + cls._SCHEDULED_LEGACY_MARKERS
def _is_marker_user(msg) -> bool:
if not isinstance(msg, dict) or msg.get("role") != "user":
return False
content = msg.get("content")
text = ""
if isinstance(content, str):
text = content
elif isinstance(content, list):
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
text = block.get("text", "")
break
return any(text.startswith(m) for m in markers)
msgs = agent.messages
pair_indices = [] # list of (user_idx, assistant_idx_or_None)
for idx, msg in enumerate(msgs):
if not _is_marker_user(msg):
continue
assistant_idx = None
if idx + 1 < len(msgs):
nxt = msgs[idx + 1]
if isinstance(nxt, dict) and nxt.get("role") == "assistant":
assistant_idx = idx + 1
pair_indices.append((idx, assistant_idx))
if len(pair_indices) <= keep_last_n:
return
to_drop = pair_indices[: len(pair_indices) - keep_last_n]
drop_set = set()
for u_idx, a_idx in to_drop:
drop_set.add(u_idx)
if a_idx is not None:
drop_set.add(a_idx)
# Rebuild the list in place to keep external references stable.
kept = [m for i, m in enumerate(msgs) if i not in drop_set]
msgs.clear()
msgs.extend(kept)
@staticmethod
def _strip_thinking_blocks(messages: list) -> list:
"""Return a shallow copy of messages with assistant "thinking" blocks removed."""
cleaned = []
for msg in messages:
if not isinstance(msg, dict):
cleaned.append(msg)
continue
if msg.get("role") != "assistant":
cleaned.append(msg)
continue
content = msg.get("content")
if not isinstance(content, list):
cleaned.append(msg)
continue
filtered_blocks = [
b for b in content
if not (isinstance(b, dict) and b.get("type") == "thinking")
]
if len(filtered_blocks) == len(content):
cleaned.append(msg)
else:
new_msg = dict(msg)
new_msg["content"] = filtered_blocks
cleaned.append(new_msg)
return cleaned
def clear_session(self, session_id: str):
"""
Clear a specific session's agent and conversation history
@@ -672,4 +948,4 @@ class AgentBridge:
agent.tools = [t for t in agent.tools if t.name != "web_search"]
logger.info("[AgentBridge] web_search tool removed (API key no longer available)")
except Exception as e:
logger.debug(f"[AgentBridge] Failed to refresh conditional tools: {e}")
logger.debug(f"[AgentBridge] Failed to refresh conditional tools: {e}")

View File

@@ -26,8 +26,7 @@ class AgentEventHandler:
if context:
self.channel = context.kwargs.get("channel") if hasattr(context, "kwargs") else None
# Track current thinking for channel output
self.current_thinking = ""
self.current_content = ""
self.turn_number = 0
def handle_event(self, event):
@@ -47,6 +46,8 @@ class AgentEventHandler:
self._handle_message_update(data)
elif event_type == "message_end":
self._handle_message_end(data)
elif event_type == "reasoning_update":
pass
elif event_type == "tool_execution_start":
self._handle_tool_execution_start(data)
elif event_type == "tool_execution_end":
@@ -59,30 +60,26 @@ class AgentEventHandler:
def _handle_turn_start(self, data):
"""Handle turn start event"""
self.turn_number = data.get("turn", 0)
self.has_tool_calls_in_turn = False
self.current_thinking = ""
self.current_content = ""
def _handle_message_update(self, data):
"""Handle message update event (streaming text)"""
"""Handle message update event (streaming content text)"""
delta = data.get("delta", "")
self.current_thinking += delta
self.current_content += delta
def _handle_message_end(self, data):
"""Handle message end event"""
tool_calls = data.get("tool_calls", [])
# Only send thinking process if followed by tool calls
if tool_calls:
if self.current_thinking.strip():
logger.info(f"💭 {self.current_thinking.strip()[:200]}{'...' if len(self.current_thinking) > 200 else ''}")
# Send thinking process to channel
self._send_to_channel(f"{self.current_thinking.strip()}")
if self.current_content.strip():
logger.info(f"💭 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
self._send_to_channel(self.current_content.strip())
else:
# No tool calls = final response (logged at agent_stream level)
if self.current_thinking.strip():
logger.debug(f"💬 {self.current_thinking.strip()[:200]}{'...' if len(self.current_thinking) > 200 else ''}")
if self.current_content.strip():
logger.debug(f"💬 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
self.current_thinking = ""
self.current_content = ""
def _handle_tool_execution_start(self, data):
"""Handle tool execution start event - logged by agent_stream.py"""

View File

@@ -144,7 +144,15 @@ class AgentInitializer:
from agent.memory import get_conversation_store
store = get_conversation_store()
max_turns = conf().get("agent_max_context_turns", 20)
restore_turns = max(3, max_turns // 6)
# Scheduler tasks run on a stable isolated session per task and
# can fire many times a day; a smaller restore window keeps prompt
# cost bounded while still letting the agent see "last few" runs
# for trend / dedup style logic. Regular chat sessions keep the
# original heuristic so user dialogues feel continuous.
if session_id.startswith("scheduler_"):
restore_turns = max(1, max_turns // 5)
else:
restore_turns = max(3, max_turns // 6)
saved = store.load_messages(session_id, max_turns=restore_turns)
if saved:
filtered = self._filter_text_only_messages(saved)
@@ -366,7 +374,7 @@ class AgentInitializer:
if tool:
# Apply workspace config to file operation tools
if tool_name in ['read', 'write', 'edit', 'bash', 'grep', 'find', 'ls', 'web_fetch']:
if tool_name in ['read', 'write', 'edit', 'bash', 'grep', 'find', 'ls', 'web_fetch', 'send', 'browser']:
tool.config = file_config
tool.cwd = file_config.get("cwd", getattr(tool, 'cwd', None))
if 'memory_manager' in file_config:
@@ -465,8 +473,12 @@ class AgentInitializer:
'timezone': timezone_name
}
def get_model():
"""Get current model name dynamically from config"""
return conf().get("model", "unknown")
return {
"model": conf().get("model", "unknown"),
"_get_model": get_model,
"workspace": workspace_root,
"channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
"_get_current_time": get_current_time # Dynamic time function
@@ -486,7 +498,7 @@ class AgentInitializer:
env_file = expand_path("~/.cow/.env")
# Read existing env vars
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
@@ -494,38 +506,46 @@ class AgentInitializer:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, _ = line.split('=', 1)
existing_env_vars[key.strip()] = True
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
# Check which keys need migration
keys_to_migrate = {}
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
if env_key in existing_env_vars:
continue
value = conf().get(config_key, "")
if value and value.strip():
keys_to_migrate[env_key] = value.strip()
# Write new keys
if keys_to_migrate:
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
if updated:
try:
env_dir = os.path.dirname(env_file)
if not os.path.exists(env_dir):
os.makedirs(env_dir, exist_ok=True)
if not os.path.exists(env_file):
open(env_file, 'a').close()
with open(env_file, 'a', encoding='utf-8') as f:
f.write('\n# Auto-migrated from config.json\n')
for key, value in keys_to_migrate.items():
os.makedirs(env_dir, exist_ok=True)
# Rewrite the entire .env file to ensure consistency
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
os.environ[key] = value
logger.info(f"[AgentInitializer] Migrated {len(keys_to_migrate)} API keys to .env: {list(keys_to_migrate.keys())}")
logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to migrate API keys: {e}")
logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")
def _start_daily_flush_timer(self):
"""Start a background thread that flushes all agents' memory daily at 23:55."""
@@ -536,17 +556,23 @@ class AgentInitializer:
import threading
def _daily_flush_loop():
import random
last_run_date = None # Track last successful run date to prevent same-day re-trigger
while True:
try:
now = datetime.datetime.now()
target = now.replace(hour=23, minute=55, second=0, microsecond=0)
if target <= now:
jitter_min = random.randint(50, 55)
jitter_sec = random.randint(0, 59)
target = now.replace(hour=23, minute=jitter_min, second=jitter_sec, microsecond=0)
# Always schedule for tomorrow if we already ran today, or if target time has passed
if target <= now or (last_run_date == now.date()):
target += datetime.timedelta(days=1)
wait_seconds = (target - now).total_seconds()
logger.info(f"[DailyFlush] Next flush at {target.strftime('%Y-%m-%d %H:%M')} (in {wait_seconds/3600:.1f}h)")
logger.info(f"[DailyFlush] Next flush at {target.strftime('%Y-%m-%d %H:%M:%S')} (in {wait_seconds/3600:.1f}h)")
time.sleep(wait_seconds)
self._flush_all_agents()
last_run_date = datetime.datetime.now().date()
except Exception as e:
logger.warning(f"[DailyFlush] Error in daily flush loop: {e}")
time.sleep(3600)
@@ -555,7 +581,7 @@ class AgentInitializer:
t.start()
def _flush_all_agents(self):
"""Flush memory for all active agent sessions."""
"""Flush memory for all active agent sessions, then run Deep Dream."""
agents = []
if self.agent_bridge.default_agent:
agents.append(("default", self.agent_bridge.default_agent))
@@ -565,7 +591,10 @@ class AgentInitializer:
if not agents:
return
# Phase 1: flush daily summaries
flushed = 0
flush_threads = []
dream_candidate = None
for label, agent in agents:
try:
if not agent.memory_manager:
@@ -577,8 +606,26 @@ class AgentInitializer:
result = agent.memory_manager.flush_manager.create_daily_summary(messages)
if result:
flushed += 1
t = agent.memory_manager.flush_manager._last_flush_thread
if t:
flush_threads.append(t)
if dream_candidate is None:
dream_candidate = agent.memory_manager.flush_manager
except Exception as e:
logger.warning(f"[DailyFlush] Failed for session {label}: {e}")
if flushed:
logger.info(f"[DailyFlush] Flushed {flushed}/{len(agents)} agent session(s)")
# Wait for all flush threads to finish before dreaming
for t in flush_threads:
t.join(timeout=60)
# Phase 2: Deep Dream — distill daily memories → MEMORY.md + dream diary
if dream_candidate:
try:
result = dream_candidate.deep_dream()
if result:
logger.info("[DeepDream] Memory distillation completed successfully")
except Exception as e:
logger.warning(f"[DeepDream] Failed: {e}")

View File

@@ -13,7 +13,7 @@ from voice.factory import create_voice
class Bridge(object):
def __init__(self):
self.btype = {
"chat": const.CHATGPT,
"chat": const.OPENAI,
"voice_to_text": conf().get("voice_to_text", "openai"),
"text_to_voice": conf().get("text_to_voice", "google"),
"translate": conf().get("translate", "baidu"),
@@ -39,11 +39,8 @@ class Bridge(object):
self.btype["chat"] = const.BAIDU
if model_type in ["xunfei"]:
self.btype["chat"] = const.XUNFEI
if model_type in [const.QWEN]:
self.btype["chat"] = const.QWEN
if model_type in [const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
self.btype["chat"] = const.QWEN_DASHSCOPE
# Support Qwen3 and other DashScope models
if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
self.btype["chat"] = const.QWEN_DASHSCOPE
if model_type and model_type.startswith("gemini"):
@@ -61,6 +58,14 @@ class Bridge(object):
if model_type and model_type.startswith("doubao"):
self.btype["chat"] = const.DOUBAO
if model_type and model_type.startswith("deepseek"):
self.btype["chat"] = const.DEEPSEEK
if model_type and isinstance(model_type, str):
lowered_model_type = model_type.lower()
if lowered_model_type == const.QIANFAN or lowered_model_type.startswith("ernie"):
self.btype["chat"] = const.QIANFAN
if model_type in [const.MODELSCOPE]:
self.btype["chat"] = const.MODELSCOPE

View File

@@ -39,6 +39,10 @@ def create_channel(channel_type) -> Channel:
elif channel_type == const.QQ:
from channel.qq.qq_channel import QQChannel
ch = QQChannel()
elif channel_type in (const.WEIXIN, "wx"):
from channel.weixin.weixin_channel import WeixinChannel
ch = WeixinChannel()
channel_type = const.WEIXIN
else:
raise RuntimeError
ch.channel_type = channel_type

View File

@@ -297,8 +297,12 @@ class ChatChannel(Channel):
logger.debug("[chat_channel] sending reply: {}, context: {}".format(reply, context))
# 如果是文本回复,尝试提取并发送图片
if reply.type == ReplyType.TEXT:
# Web channel renders images/videos inline via renderMarkdown,
# so skip the extract-and-send step to avoid duplicate media.
if reply.type == ReplyType.TEXT and context.get("channel_type") != "web":
self._extract_and_send_images(reply, context)
elif reply.type == ReplyType.TEXT:
self._send(reply, context)
# 如果是图片回复但带有文本内容,先发文本再发图片
elif reply.type == ReplyType.IMAGE_URL and hasattr(reply, 'text_content') and reply.text_content:
# 先发送文本
@@ -347,38 +351,30 @@ class ChatChannel(Channel):
if media_items:
logger.info(f"[chat_channel] Extracted {len(media_items)} media item(s) from reply")
# 先发送文本(保持原文本不变)
# Send text first (the frontend will embed video players via renderMarkdown).
logger.info(f"[chat_channel] Sending text content before media: {reply.content[:100]}...")
self._send(reply, context)
logger.info(f"[chat_channel] Text sent, now sending {len(media_items)} media item(s)")
# 然后逐个发送媒体文件
for i, (url, media_type) in enumerate(media_items):
try:
# 判断是本地文件还是URL
# Determine whether it is a remote URL or a local file.
if url.startswith(('http://', 'https://')):
# 网络资源
if media_type == 'video':
# 视频使用 FILE 类型发送
media_reply = Reply(ReplyType.FILE, url)
media_reply.file_name = os.path.basename(url)
else:
# 图片使用 IMAGE_URL 类型
media_reply = Reply(ReplyType.IMAGE_URL, url)
elif os.path.exists(url):
# 本地文件
if media_type == 'video':
# 视频使用 FILE 类型,转换为 file:// URL
media_reply = Reply(ReplyType.FILE, f"file://{url}")
media_reply.file_name = os.path.basename(url)
else:
# 图片使用 IMAGE_URL 类型,转换为 file:// URL
media_reply = Reply(ReplyType.IMAGE_URL, f"file://{url}")
else:
logger.warning(f"[chat_channel] Media file not found or invalid URL: {url}")
continue
# 发送媒体文件(添加小延迟避免频率限制)
if i > 0:
time.sleep(0.5)
self._send(media_reply, context)

View File

@@ -55,12 +55,186 @@ def _ensure_lark_imported():
return lark
def _print_qr_to_terminal(qr_url: str):
"""Render a QR code as ASCII art and emit it via logger.
走 logger 而非 print 是为了避免 nohup/cow 后台启动场景下 stdout 块缓冲导致
二维码滞后输出看起来像出现了两次。logger 的 StreamHandler 是行缓冲,
既能在前台终端看到,也能进 run.log。
"""
qr_lines = []
try:
import qrcode as qr_lib
import io
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
qr.add_data(qr_url)
qr.make(fit=True)
buf = io.StringIO()
qr.print_ascii(out=buf, invert=True)
qr_lines = buf.getvalue().splitlines()
except ImportError:
qr_lines = ["(未安装 qrcode 包,无法渲染 ASCII 二维码pip install qrcode)"]
except Exception as e:
qr_lines = [f"(渲染二维码失败:{e})"]
header = "=" * 60
banner = [
"",
header,
" 飞书一键创建应用:请使用 飞书 App 扫描下方二维码",
" (二维码 10 分钟内有效,仅供一次扫描)",
header,
]
footer = [
f" 或点击链接创建: {qr_url}",
" 等待扫码...",
"",
]
full = banner + qr_lines + footer
logger.info("[FeiShu] One-click 飞书应用创建二维码(请用飞书 App 扫码):\n" + "\n".join(full))
def _persist_feishu_credentials(app_id: str, app_secret: str) -> bool:
"""Write feishu_app_id / feishu_app_secret + ensure feishu in channel_type into config.json.
Returns True on success, False on failure (e.g. config.json missing or unwritable).
"""
try:
config_path = os.path.join(
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
"config.json",
)
if os.path.exists(config_path):
with open(config_path, "r", encoding="utf-8") as f:
file_cfg = json.load(f)
else:
file_cfg = {}
file_cfg["feishu_app_id"] = app_id
file_cfg["feishu_app_secret"] = app_secret
# 保证 channel_type 中包含 feishu用户可能纯通过 CLI 启动单通道)
ch_type = file_cfg.get("channel_type", conf().get("channel_type", "")) or ""
existing = [s.strip() for s in ch_type.split(",") if s.strip()]
if "feishu" not in existing:
existing.append("feishu")
file_cfg["channel_type"] = ",".join(existing)
with open(config_path, "w", encoding="utf-8") as f:
json.dump(file_cfg, f, indent=4, ensure_ascii=False)
# 同步到内存中的 conf(),让本次启动直接生效
conf()["feishu_app_id"] = app_id
conf()["feishu_app_secret"] = app_secret
if "channel_type" in file_cfg:
conf()["channel_type"] = file_cfg["channel_type"]
try:
os.chmod(config_path, 0o600)
except Exception:
pass
return True
except Exception as e:
logger.error(f"[FeiShu] Failed to persist credentials to config.json: {e}")
return False
def _register_via_qr_in_terminal() -> bool:
"""CLI-side one-click app creation via lark_oapi.register_app.
Blocks the calling thread (typically the channel startup thread) until the user
finishes scanning, the QR code expires, or registration is cancelled.
Returns True if credentials were obtained AND persisted; False otherwise.
The caller should fall back to the original "missing credentials" error in that case.
"""
if not LARK_SDK_AVAILABLE:
logger.error(
"[FeiShu] 缺少 feishu_app_id / feishu_app_secret。"
"未安装 lark-oapi SDK无法在终端发起扫码创建。"
"请执行 pip install -U 'lark-oapi>=1.5.5' 后重试,或手动在 config.json 中填入凭据。"
)
return False
try:
lark_mod = _ensure_lark_imported()
except Exception as e:
logger.error(f"[FeiShu] Import lark_oapi failed: {e}")
return False
# register_app 是 lark-oapi 1.5.5 才引入的能力,旧版本调用会得到难以理解的
# AttributeError。提前显式检查给出明确的升级提示。
if not hasattr(lark_mod, "register_app"):
try:
from importlib.metadata import version as _pkg_version
installed = _pkg_version("lark-oapi")
except Exception:
installed = "unknown"
logger.error(
f"[FeiShu] 当前 lark-oapi 版本 ({installed}) 不支持一键创建应用,需要 >= 1.5.5。"
"请执行 pip install -U 'lark-oapi>=1.5.5' 后重试,或手动在 config.json 中填入凭据。"
)
return False
logger.info("[FeiShu] 检测到尚未配置 feishu_app_id / feishu_app_secret"
"正在向飞书申请一键创建应用...")
def _on_qr(info):
url = info.get("url", "")
if url:
_print_qr_to_terminal(url)
def _on_status(info):
# 过滤 polling 心跳(每 5 秒一次),保留 slow_down / domain_switched 等
status = info.get("status")
if status == "polling":
return
logger.info(f"[FeiShu] register_app status: {info}")
try:
result = lark_mod.register_app(
on_qr_code=_on_qr,
on_status_change=_on_status,
source="cowagent",
)
except Exception as e:
err_cls = e.__class__.__name__
if "Expired" in err_cls:
logger.error("[FeiShu] 二维码已过期,请重启程序后重试。")
elif "Denied" in err_cls:
logger.error("[FeiShu] 已取消授权。")
else:
logger.error(f"[FeiShu] 一键创建失败:{e}")
return False
app_id = result.get("client_id", "")
app_secret = result.get("client_secret", "")
if not app_id or not app_secret:
logger.error("[FeiShu] 创建结果缺少 app_id/app_secret无法继续。")
return False
if not _persist_feishu_credentials(app_id, app_secret):
logger.error(
"[FeiShu] 应用创建成功但写入 config.json 失败,请手动复制以下值到配置文件:\n"
f" feishu_app_id = {app_id}\n"
f" feishu_app_secret = {app_secret}"
)
return False
logger.info(f"[FeiShu] 应用创建成功,凭据已写入 config.json (app_id={app_id})。")
return True
@singleton
class FeiShuChanel(ChatChannel):
feishu_app_id = conf().get('feishu_app_id')
feishu_app_secret = conf().get('feishu_app_secret')
feishu_token = conf().get('feishu_token')
feishu_event_mode = conf().get('feishu_event_mode', 'websocket') # webhook 或 websocket
# 覆盖父类默认值 [ReplyType.VOICE, ReplyType.IMAGE]。
# 飞书原生支持发送音频opus 格式,通过文件上传接口)和图片,
# 所有回复类型均已处理,置为空列表以启用语音和图片回复。
NOT_SUPPORT_REPLYTYPE = []
def __init__(self):
super().__init__()
@@ -86,6 +260,20 @@ class FeiShuChanel(ChatChannel):
self.feishu_app_secret = conf().get('feishu_app_secret')
self.feishu_token = conf().get('feishu_token')
self.feishu_event_mode = conf().get('feishu_event_mode', 'websocket')
# 命令行启动场景:缺少凭据时尝试通过 lark.register_app 在终端弹二维码
# 引导用户扫码创建应用。Web 控制台启动同样会走到这里,但控制台用户通常
# 已经通过 /api/feishu/register 完成了创建并写回 config.json。
if not self.feishu_app_id or not self.feishu_app_secret:
if _register_via_qr_in_terminal():
self.feishu_app_id = conf().get('feishu_app_id')
self.feishu_app_secret = conf().get('feishu_app_secret')
else:
err = "[FeiShu] feishu_app_id 与 feishu_app_secret 缺失,无法启动通道"
logger.error(err)
self.report_startup_error(err)
return
self._fetch_bot_open_id()
if self.feishu_event_mode == 'websocket':
self._startup_websocket()
@@ -384,10 +572,22 @@ class FeiShuChanel(ChatChannel):
no_need_at=True
)
if context:
# 流式回复模式:向 context 注入 on_event 回调agent 每产出一段文字时会调用它。
# 回调内部先发送一条占位消息获取 message_id之后通过 PATCH 接口原地更新内容,
# 实现打字机效果。回调结束时设置 context["feishu_streamed"]=True
# 让 send() 跳过重复发送,避免最终完整回复再被重复投递一次。
# 默认开启流式打字机回复。需机器人开通 cardkit:card:write 权限且飞书客户端 7.20+
# 任意环节失败会自动降级为非流式文本回复。
if conf().get("feishu_stream_reply", True):
context["on_event"] = self._make_feishu_stream_callback(context, feishu_msg.access_token)
self.produce(context)
logger.debug(f"[FeiShu] query={feishu_msg.content}, type={feishu_msg.ctype}")
def send(self, reply: Reply, context: Context):
# 如果文本回复已通过流式传输发送,则跳过重复发送
if reply.type == ReplyType.TEXT and context.get("feishu_streamed"):
logger.debug("[FeiShu] streaming already delivered text reply, skipping send()")
return
msg = context.get("msg")
is_group = context["isgroup"]
if msg:
@@ -450,11 +650,21 @@ class FeiShuChanel(ChatChannel):
msg_type = "file"
content_key = "file_key"
elif reply.type == ReplyType.VOICE:
# 语音回复:上传音频文件到飞书,然后发送 audio 类型消息
file_key = self._upload_audio(reply.content, access_token)
if not file_key:
logger.warning("[FeiShu] upload audio failed")
return
reply_content = file_key
msg_type = "audio"
content_key = "file_key"
# Check if we can reply to an existing message (need msg_id)
can_reply = is_group and msg and hasattr(msg, 'msg_id') and msg.msg_id
# Build content JSON
content_json = json.dumps(reply_content) if content_key is None else json.dumps({content_key: reply_content})
content_json = json.dumps(reply_content, ensure_ascii=False) if content_key is None else json.dumps({content_key: reply_content}, ensure_ascii=False)
logger.debug(f"[FeiShu] Sending message: msg_type={msg_type}, content={content_json[:200]}")
if can_reply:
@@ -481,6 +691,396 @@ class FeiShuChanel(ChatChannel):
else:
logger.error(f"[FeiShu] send message failed, code={res.get('code')}, msg={res.get('msg')}")
def _make_feishu_stream_callback(self, context, access_token):
"""
基于飞书官方"流式更新卡片"API 实现打字机回复。
流程:
1. message_update 首次到达 → POST /cardkit/v1/cards 创建带 streaming_mode 的卡片实体,
随后用 POST /im/v1/messages或 reply以 card_id 把卡片发出去
2. 后续 message_update → PUT /cardkit/v1/cards/{id}/elements/{eid}/content
传入"当前轮"的全量文本,飞书平台自动计算增量并以打字机效果上屏
(流式模式下不受 10 QPS 限制)
3. message_end一轮 LLM 输出结束,且本轮触发了工具调用)→ 把 current 累计到 committed
并加入分隔符;下一轮 message_update 又从空白开始,避免多轮内容串到一起
4. agent_end → 用 final_response 强制覆盖卡片,再 PATCH /cardkit/v1/cards/{id}/settings
关闭 streaming_mode标记 context["feishu_streamed"]=True 让 chat_channel 跳过普通 send()
前提条件:
- 机器人已开通 cardkit:card:write 权限
- 飞书客户端 7.20+
失败降级:
- 创建卡片实体失败(缺权限、网络等)→ 不设置 feishu_streamed 标记,让 chat_channel
走普通文本回复路径,用户收到完整回复但无打字机效果,并打 warning 日志
"""
# 共享状态(受 lock 保护)
# 多轮 agent 模式下,每个"中间过场消息"会作为一张独立卡片发送。
# current_text 只承载当前正在流式渲染的那张卡片的内容message_end / agent_end
# 时会把它定型并 reset。
current_text = [""] # 当前卡片正在累加的 LLM 输出
card_id = [None] # 当前流式卡片的实体 ID每段独立
message_id = [None] # 当前卡片发送后的消息 ID仅日志用
# 占位发送是同步进行的,但用一个 in-flight 标记防止并发的多条 message_update
# 事件各自触发一次创建+发送,导致发出多张卡片。
init_in_flight = [False]
# 一旦初始化失败就长期标记为 disabled本次回复不再尝试任何流式调用
disabled = [False]
lock = threading.Lock()
# ---- 异步推送队列 ----------------------------------------------------
# 同步 requests.put 单次 100~300ms会阻塞 LLM stream 线程读下一个 chunk。
# 把推送丢给独立 worker 线程消费 queue回调本身只做内存追加立即返回。
# 队列里只放"最新累积文本"的快照worker 用 deduplication 避免重复推同一个
# 内容(高频 chunk 场景下队列会堆积,只推最后一个就够了)。
import queue as _queue
push_queue: "_queue.Queue[str | None]" = _queue.Queue()
def _push_worker():
while True:
snapshot = push_queue.get()
if snapshot is None:
push_queue.task_done()
return
# 合并队列中已堆积的快照:只推最后一个,省 PUT 次数同时降低延迟
merged_count = 1
stop = False
while True:
try:
nxt = push_queue.get_nowait()
except _queue.Empty:
break
merged_count += 1
if nxt is None:
stop = True
break
snapshot = nxt
try:
_stream_update_text(snapshot)
finally:
for _ in range(merged_count):
push_queue.task_done()
if stop:
return
push_thread = threading.Thread(target=_push_worker, daemon=True, name="feishu-stream-push")
push_thread.start()
def _drain_push_queue():
"""等当前队列里所有 PUT 都完成。message_end/agent_end 在做最终定型前必须 drain
否则 worker 里堆积的旧快照可能在 final_text PUT 之后到达,把最终内容覆盖掉。"""
try:
push_queue.join()
except Exception:
pass
msg = context.get("msg")
is_group = context.get("isgroup", False)
receiver = context.get("receiver")
receive_id_type = context.get("receive_id_type", "open_id")
# 客户端打字机渲染参数(飞书 App 侧实际"出字"速度):
# - print_freq_ms每次刷新的间隔
# - print_step每次刷新出多少个字符
# 当前 40ms × 4 字 ≈ 100 字/秒,接近 ChatGPT/DeepSeek 网页端的节奏。
print_freq_ms = 40
print_step = 4
print_strategy = "fast"
headers = {
"Authorization": "Bearer " + access_token,
"Content-Type": "application/json; charset=utf-8",
}
# 卡片中富文本组件的 element_id后续所有 PUT 流式更新都打到这个组件
ELEMENT_ID = "stream_md"
# 操作序号,每次 PUT 必须严格递增(飞书要求)
sequence = [0]
def _next_sequence():
sequence[0] += 1
return sequence[0]
def _build_card_json():
"""卡片 JSON 2.0 结构 + streaming_mode + 单 markdown 组件"""
return json.dumps({
"schema": "2.0",
"config": {
"streaming_mode": True,
"summary": {"content": "[正在生成回复...]"},
"streaming_config": {
"print_frequency_ms": {"default": print_freq_ms},
"print_step": {"default": print_step},
"print_strategy": print_strategy,
},
},
"body": {
"elements": [
{
"tag": "markdown",
"content": "...",
"element_id": ELEMENT_ID,
}
],
},
# 注意JSON 2.0 不支持自定义 fallback 字段(传入会报错)。
# 客户端 < 7.20 时,飞书会自动展示"请升级客户端"占位,无需配置。
}, ensure_ascii=False)
def _create_and_send_card():
"""同步执行:创建卡片实体 → 发送消息。任意一步失败则 disabled=True 触发降级"""
try:
# 步骤 1: 创建卡片实体
create_url = "https://open.feishu.cn/open-apis/cardkit/v1/cards"
create_body = {"type": "card_json", "data": _build_card_json()}
res = requests.post(
create_url, headers=headers, json=create_body, timeout=(5, 10)
)
res_json = res.json()
if res_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: create card failed "
f"(code={res_json.get('code')}, msg={res_json.get('msg')}). "
f"本次回复已自动降级为普通文本回复(一次性返回完整内容)。"
f"如需开启流式打字机效果与完整 Markdown 渲染,请到飞书开放平台 "
f"https://open.feishu.cn/app 给机器人开通 cardkit:card:write 权限"
f"(创建与更新卡片)并重新发布版本,同时确保飞书客户端 >= 7.20。"
)
with lock:
disabled[0] = True
return
cid = res_json["data"]["card_id"]
with lock:
card_id[0] = cid
# 步骤 2: 通过 card_id 发送消息(群聊优先用 reply单聊直接 send
content_payload = json.dumps(
{"type": "card", "data": {"card_id": cid}}, ensure_ascii=False
)
can_reply = is_group and msg and hasattr(msg, "msg_id") and msg.msg_id
if can_reply:
send_url = (
f"https://open.feishu.cn/open-apis/im/v1/messages/"
f"{msg.msg_id}/reply"
)
send_body = {"msg_type": "interactive", "content": content_payload}
send_res = requests.post(
send_url, headers=headers, json=send_body, timeout=(5, 10)
)
else:
send_url = "https://open.feishu.cn/open-apis/im/v1/messages"
params = {"receive_id_type": receive_id_type}
send_body = {
"receive_id": receiver,
"msg_type": "interactive",
"content": content_payload,
}
send_res = requests.post(
send_url, headers=headers, params=params, json=send_body,
timeout=(5, 10),
)
send_json = send_res.json()
if send_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: send card failed: {send_json}. 降级为普通文本。"
)
with lock:
disabled[0] = True
return
mid = send_json["data"]["message_id"]
with lock:
message_id[0] = mid
logger.info(
f"[FeiShu] Stream: card created and sent, "
f"card_id={cid}, message_id={mid}"
)
except Exception as e:
logger.warning(
f"[FeiShu] Stream: create/send card exception: {e}. 降级为普通文本。"
)
with lock:
disabled[0] = True
finally:
with lock:
init_in_flight[0] = False
def _stream_update_text(full_text):
"""PUT 流式更新文本组件。content 必须是当前组件的全量文本。"""
with lock:
cid = card_id[0]
if not cid:
return
url = (
f"https://open.feishu.cn/open-apis/cardkit/v1/cards/"
f"{cid}/elements/{ELEMENT_ID}/content"
)
body = {
"content": full_text,
"sequence": _next_sequence(),
}
try:
res = requests.put(url, headers=headers, json=body, timeout=(5, 10))
res_json = res.json()
if res_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: update text failed: {res_json}"
)
except Exception as e:
logger.warning(f"[FeiShu] Stream: update text exception: {e}")
def _close_streaming_mode(final_text: str = ""):
"""关闭流式模式(卡片转入"普通"状态,可被转发)。
同时通过整卡更新接口把 summary 改成最终内容的预览,否则飞书会话列表
会一直显示创建卡片时的占位摘要("[正在生成回复...]")。
"""
with lock:
cid = card_id[0]
if not cid:
return
# 1) 通过整卡更新接口把 streaming_mode 关掉,并改写 summary
# settings 接口的 config 不接受 summary 字段,会报 code=2200
preview_src = (final_text or "").strip().replace("\n", " ")
preview = preview_src[:30] if preview_src else ""
full_card = {
"schema": "2.0",
"config": {
"streaming_mode": False,
"summary": {"content": preview or " "},
},
"body": {
"elements": [
{
"tag": "markdown",
"content": final_text or " ",
"element_id": ELEMENT_ID,
}
],
},
}
put_url = f"https://open.feishu.cn/open-apis/cardkit/v1/cards/{cid}"
put_body = {
"card": {"type": "card_json", "data": json.dumps(full_card, ensure_ascii=False)},
"sequence": _next_sequence(),
}
try:
res = requests.put(put_url, headers=headers, json=put_body, timeout=(5, 10))
res_json = res.json()
if res_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: finalize card (close+summary) failed: {res_json}"
)
except Exception as e:
logger.warning(
f"[FeiShu] Stream: finalize card exception: {e}"
)
def on_event(event: dict):
event_type = event.get("type")
data = event.get("data", {})
# 一旦降级,本次回复不再做任何流式操作
with lock:
if disabled[0]:
return
if event_type == "message_update":
delta = data.get("delta", "")
if not delta:
return
# 第一段:判断是否需要初始化(创建卡片 + 发送)
need_init = False
with lock:
if card_id[0] is None and not init_in_flight[0]:
init_in_flight[0] = True
need_init = True
if need_init:
_create_and_send_card()
# 初始化失败已标记 disabled下次循环直接 return
with lock:
if disabled[0]:
return
# 第二段:累加文本,把快照丢给 push worker 异步推送。
# 这里不能直接 requests.put否则会阻塞 LLM stream 线程读下一个 chunk
# (实测 DeepSeek 高频小 chunk 场景每个 PUT ~150ms累积起来非常卡
snapshot = ""
should_push = False
with lock:
current_text[0] += delta
if card_id[0]:
snapshot = current_text[0]
should_push = True
if should_push:
push_queue.put(snapshot)
elif event_type == "message_end":
# 一轮 LLM 输出结束。如果本轮触发了工具调用,说明当前轮的文本是
# "中间过场消息"(如"来看看!"),应该作为独立卡片定型,然后为下一轮
# 重新创建一张新卡片。这样最终用户看到的是:
# [卡片1: 中间过场1]
# [卡片2: 中间过场2]
# ...
# [卡片N: 最终回复]
# 与 wecom_bot 的多消息流式体验对齐。
tool_calls = data.get("tool_calls", []) or []
if not tool_calls:
# 没有工具调用:本轮即最终回复,留给 agent_end 统一处理。
return
with lock:
text_to_finalize = current_text[0].rstrip()
current_text[0] = ""
if not text_to_finalize:
return
# 等异步队列里堆积的快照都推完,避免它们晚于 final 文本到达把内容覆盖掉
_drain_push_queue()
# 用最终文本覆盖当前卡片并关闭流式模式(凝固成普通卡片,
# 同时把会话列表的 summary 改成预览,不再显示"正在生成回复..."
_stream_update_text(text_to_finalize)
_close_streaming_mode(text_to_finalize)
# 重置卡片状态,下一段 message_update 会触发新卡片的创建
with lock:
card_id[0] = None
message_id[0] = None
sequence[0] = 0
elif event_type == "agent_end":
# 最终回复:用 final_response 覆盖当前流式卡片,然后关闭流式模式。
final_response = data.get("final_response", "")
if not final_response:
return
final_text = str(final_response)
# 标记 streamed 让 chat_channel 跳过 send()
context["feishu_streamed"] = True
with lock:
has_card = card_id[0] is not None
init_busy = init_in_flight[0]
# 罕见情况agent_end 触发时还没创建过卡片(极快返回 / 没有
# message_update主动创建一张承载 final_text。
if not has_card and not init_busy:
with lock:
init_in_flight[0] = True
_create_and_send_card()
with lock:
if disabled[0]:
return
_drain_push_queue()
_stream_update_text(final_text)
_close_streaming_mode(final_text)
# 通知 push worker 退出(本次回复彻底结束)
push_queue.put(None)
return on_event
def fetch_access_token(self) -> str:
url = "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal/"
headers = {
@@ -687,6 +1287,66 @@ class FeiShuChanel(ChatChannel):
except Exception as e:
logger.warning(f"[FeiShu] Failed to remove temp file {temp_file}: {e}")
def _upload_audio(self, audio_path, access_token):
"""
Upload a local audio file to Feishu and return file_key.
audio_path is a plain local file path (no file:// prefix).
Feishu audio messages only support opus format; non-opus files are converted first.
"""
logger.debug(f"[FeiShu] start upload audio, path={audio_path}")
if not os.path.exists(audio_path):
logger.error(f"[FeiShu] audio file not found: {audio_path}")
return None
# Feishu only plays audio messages in opus format.
# Convert if the TTS engine produced a different format (e.g. mp3 from OpenAI TTS).
upload_path = audio_path
if not audio_path.lower().endswith('.opus'):
opus_path = os.path.splitext(audio_path)[0] + '.opus'
try:
from pydub import AudioSegment
audio = AudioSegment.from_file(audio_path)
audio.export(opus_path, format='opus')
upload_path = opus_path
logger.info(f"[FeiShu] Converted audio to opus: {opus_path}")
except Exception as e:
logger.warning(f"[FeiShu] Failed to convert audio to opus, uploading original: {e}")
upload_path = audio_path
file_name = os.path.splitext(os.path.basename(upload_path))[0] + '.opus'
upload_url = "https://open.feishu.cn/open-apis/im/v1/files"
data = {'file_type': 'opus', 'file_name': file_name}
headers = {'Authorization': f'Bearer {access_token}'}
try:
with open(upload_path, "rb") as f:
upload_response = requests.post(
upload_url,
files={"file": f},
data=data,
headers=headers,
timeout=(5, 30)
)
logger.info(
f"[FeiShu] upload audio response, status={upload_response.status_code}, res={upload_response.content}")
response_data = upload_response.json()
if response_data.get("code") == 0:
return response_data.get("data").get("file_key")
else:
logger.error(f"[FeiShu] upload audio failed: {response_data}")
return None
except Exception as e:
logger.error(f"[FeiShu] upload audio exception: {e}")
return None
finally:
# 无论上传成功与否都清理转换产生的临时 opus 文件,避免失败路径下磁盘堆积。
if upload_path != audio_path and os.path.exists(upload_path):
try:
os.remove(upload_path)
except Exception as e:
logger.warning(f"[FeiShu] Failed to remove temp opus file {upload_path}: {e}")
def _upload_file_url(self, file_url, access_token):
"""
Upload file to Feishu

View File

@@ -162,6 +162,38 @@ class FeishuMessage(ChatMessage):
else:
logger.info(f"[FeiShu] Failed to download file, key={file_key}, res={response.text}")
self._prepare_fn = _download_file
elif msg_type == "audio":
# 飞书用户发送的语音消息类型为 "audio",文件为 opus 编码格式。
# 映射为 ContextType.VOICE交由 chat_channel 的语音转文字STT流程处理。
# 文件通过 _prepare_fn 延迟下载,在 chat_channel 调用 cmsg.prepare() 时才执行。
self.ctype = ContextType.VOICE
content = json.loads(msg.get("content"))
file_key = content.get("file_key")
self.content = TmpDir().path() + file_key + ".opus"
logger.info(f"[FeiShu] audio message: file_key={file_key}, save_path={self.content}")
def _download_audio():
logger.info(f"[FeiShu] downloading audio: file_key={file_key}, msg_id={self.msg_id}")
url = f"https://open.feishu.cn/open-apis/im/v1/messages/{self.msg_id}/resources/{file_key}"
headers = {
"Authorization": "Bearer " + access_token,
}
params = {
"type": "file"
}
try:
response = requests.get(url=url, headers=headers, params=params)
logger.info(f"[FeiShu] download audio response: status={response.status_code}, size={len(response.content)} bytes")
if response.status_code == 200:
with open(self.content, "wb") as f:
f.write(response.content)
logger.info(f"[FeiShu] audio saved to: {self.content}")
else:
logger.error(f"[FeiShu] Failed to download audio, key={file_key}, status={response.status_code}, res={response.text}")
except Exception as e:
logger.error(f"[FeiShu] Exception downloading audio, key={file_key}: {e}", exc_info=True)
self._prepare_fn = _download_audio
else:
raise NotImplementedError("Unsupported message type: Type:{} ".format(msg_type))

View File

@@ -23,6 +23,7 @@ from channel.qq.qq_message import QQMessage
from common.expired_dict import ExpiredDict
from common.log import logger
from common.singleton import singleton
from common.ws_client_compat import websocket_app_run_forever
from config import conf
# Rich media file_type constants
@@ -210,7 +211,7 @@ class QQChannel(ChatChannel):
def run_forever():
try:
self._ws.run_forever(ping_interval=0, reconnect=0)
websocket_app_run_forever(self._ws, ping_interval=0, reconnect=0)
except (SystemExit, KeyboardInterrupt):
logger.info("[QQ] WebSocket thread interrupted")
except Exception as e:

View File

@@ -50,16 +50,53 @@
(function() {
var theme = localStorage.getItem('cow_theme') || 'dark';
if (theme === 'dark') document.documentElement.classList.add('dark');
var lang = localStorage.getItem('cow_lang') || 'zh';
document.documentElement.setAttribute('lang', lang);
})();
</script>
</head>
<body class="h-screen overflow-hidden bg-gray-50 dark:bg-[#111111] text-slate-800 dark:text-slate-200 font-sans">
<!-- Login Overlay -->
<div id="login-overlay" class="fixed inset-0 z-[200] bg-gray-50 dark:bg-[#111111] flex items-center justify-center hidden">
<div class="w-full max-w-sm mx-4">
<div class="flex flex-col items-center mb-8">
<img src="assets/logo.jpg" alt="CowAgent" class="w-16 h-16 rounded-2xl mb-4 shadow-lg">
<h1 class="text-xl font-bold text-slate-800 dark:text-slate-100">CowAgent</h1>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" id="login-subtitle">请输入密码以访问控制台</p>
</div>
<form id="login-form" class="space-y-4" onsubmit="return false;">
<div class="relative">
<input id="login-password" type="password" autocomplete="current-password"
placeholder="Password"
class="w-full px-4 py-3 rounded-xl border border-slate-200 dark:border-white/10
bg-white dark:bg-[#1A1A1A] text-slate-800 dark:text-slate-200
placeholder-slate-400 dark:placeholder-slate-500
focus:outline-none focus:ring-2 focus:ring-primary-400/50 focus:border-primary-400
transition-all duration-150 text-sm">
<button type="button" id="login-toggle-pwd"
class="absolute right-3 top-1/2 -translate-y-1/2 text-slate-400 hover:text-slate-600
dark:hover:text-slate-300 cursor-pointer transition-colors"
onclick="toggleLoginPassword()">
<i class="fas fa-eye text-sm"></i>
</button>
</div>
<p id="login-error" class="text-sm text-red-500 hidden"></p>
<button id="login-btn" type="submit"
class="w-full py-3 rounded-xl bg-primary-500 hover:bg-primary-600 text-white font-medium
text-sm cursor-pointer transition-colors duration-150 disabled:opacity-50 disabled:cursor-not-allowed">
登录
</button>
</form>
</div>
</div>
<div id="app" class="flex h-screen">
<!-- ================================================================ -->
<!-- SIDEBAR -->
<!-- ================================================================ -->
<aside id="sidebar" class="fixed inset-y-0 left-0 z-50 w-64 bg-[#0A0A0A] text-neutral-400 flex flex-col
<aside id="sidebar" class="fixed inset-y-0 left-0 z-50 w-52 bg-[#0A0A0A] text-neutral-400 flex flex-col
transform -translate-x-full lg:relative lg:translate-x-0
transition-transform duration-300 ease-in-out">
<!-- Logo -->
@@ -67,7 +104,7 @@
<img src="assets/logo.jpg" alt="CowAgent" class="w-8 h-8 rounded-lg flex-shrink-0">
<div class="flex flex-col min-w-0">
<span class="text-white font-semibold text-sm truncate">CowAgent</span>
<span class="text-neutral-500 text-xs" data-i18n="console">Console</span>
<span class="text-neutral-500 text-xs" data-i18n="console">控制台</span>
</div>
</div>
@@ -77,13 +114,13 @@
<div class="menu-group open" data-group="chat">
<button class="w-full flex items-center gap-2 px-3 py-2 text-xs font-semibold uppercase tracking-wider text-neutral-500 hover:text-neutral-300 cursor-pointer transition-colors duration-150">
<i class="fas fa-chevron-right text-[10px] chevron"></i>
<span data-i18n="nav_chat">Chat</span>
<span data-i18n="nav_chat">对话</span>
</button>
<div class="menu-group-items pl-2">
<a class="sidebar-item active flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="chat">
<i class="fas fa-message item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_chat">Chat</span>
<span data-i18n="menu_chat">对话</span>
</a>
</div>
</div>
@@ -92,33 +129,38 @@
<div class="menu-group open" data-group="manage">
<button class="w-full flex items-center gap-2 px-3 py-2 text-xs font-semibold uppercase tracking-wider text-neutral-500 hover:text-neutral-300 cursor-pointer transition-colors duration-150">
<i class="fas fa-chevron-right text-[10px] chevron"></i>
<span data-i18n="nav_manage">Management</span>
<span data-i18n="nav_manage">管理</span>
</button>
<div class="menu-group-items pl-2">
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="config">
<i class="fas fa-sliders item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_config">Config</span>
<span data-i18n="menu_config">配置</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="skills">
<i class="fas fa-bolt item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_skills">Skills</span>
<span data-i18n="menu_skills">技能</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="memory">
<i class="fas fa-brain item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_memory">Memory</span>
<span data-i18n="menu_memory">记忆</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="knowledge">
<i class="fas fa-book item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_knowledge">知识</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="channels">
<i class="fas fa-tower-broadcast item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_channels">Channels</span>
<span data-i18n="menu_channels">通道</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="tasks">
<i class="fas fa-clock item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_tasks">Tasks</span>
<span data-i18n="menu_tasks">定时</span>
</a>
</div>
</div>
@@ -127,13 +169,13 @@
<div class="menu-group open" data-group="monitor">
<button class="w-full flex items-center gap-2 px-3 py-2 text-xs font-semibold uppercase tracking-wider text-neutral-500 hover:text-neutral-300 cursor-pointer transition-colors duration-150">
<i class="fas fa-chevron-right text-[10px] chevron"></i>
<span data-i18n="nav_monitor">Monitor</span>
<span data-i18n="nav_monitor">监控</span>
</button>
<div class="menu-group-items pl-2">
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="logs">
<i class="fas fa-terminal item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_logs">Logs</span>
<span data-i18n="menu_logs">日志</span>
</a>
</div>
</div>
@@ -154,6 +196,26 @@
<!-- Mobile Overlay -->
<div id="sidebar-overlay" class="fixed inset-0 bg-black/50 z-40 hidden lg:hidden cursor-pointer" onclick="toggleSidebar()"></div>
<!-- ================================================================ -->
<!-- SESSION PANEL (collapsible) -->
<!-- ================================================================ -->
<aside id="session-panel" class="session-panel hidden">
<div class="session-panel-header">
<span class="session-panel-title" data-i18n="session_history">历史会话</span>
<button class="session-panel-close" onclick="toggleSessionPanel()" title="Close">
<i class="fas fa-times"></i>
</button>
</div>
<button class="session-panel-new" onclick="newChat()">
<i class="fas fa-plus"></i>
<span data-i18n="new_chat">新对话</span>
</button>
<div id="session-list" class="session-list"></div>
</aside>
<!-- Mobile overlay for session panel (click to close) -->
<div id="session-panel-overlay" class="session-panel-overlay hidden" onclick="closeSessionPanel()"></div>
<!-- ================================================================ -->
<!-- MAIN CONTENT -->
<!-- ================================================================ -->
@@ -166,11 +228,17 @@
<i class="fas fa-bars text-slate-600 dark:text-slate-300"></i>
</button>
<!-- Breadcrumb -->
<div class="flex items-center gap-2 text-sm min-w-0">
<span id="breadcrumb-group" class="text-slate-400 dark:text-slate-500 truncate" data-i18n="nav_chat">Chat</span>
<!-- Session panel toggle -->
<button id="session-toggle-btn" class="p-2 rounded-lg hover:bg-slate-100 dark:hover:bg-white/10 cursor-pointer transition-colors duration-150"
onclick="toggleSessionPanel()">
<i class="fas fa-clock-rotate-left text-slate-500 dark:text-slate-400"></i>
</button>
<!-- Breadcrumb (hidden on mobile) -->
<div class="hidden lg:flex items-center gap-2 text-sm min-w-0">
<span id="breadcrumb-group" class="text-slate-400 dark:text-slate-500 truncate" data-i18n="nav_chat">对话</span>
<i class="fas fa-chevron-right text-[10px] text-slate-300 dark:text-slate-600"></i>
<span id="breadcrumb-page" class="font-medium text-slate-700 dark:text-slate-200 truncate" data-i18n="menu_chat">Chat</span>
<span id="breadcrumb-page" class="font-medium text-slate-700 dark:text-slate-200 truncate" data-i18n="menu_chat">对话</span>
</div>
<div class="flex-1"></div>
@@ -220,26 +288,26 @@
<!-- ====================================================== -->
<!-- VIEW: Chat -->
<!-- ====================================================== -->
<div id="view-chat" class="view active">
<div id="view-chat" class="view active relative">
<!-- Messages -->
<div id="chat-messages" class="flex-1 overflow-y-auto">
<!-- Welcome Screen -->
<div id="welcome-screen" class="flex flex-col items-center justify-center h-full px-6 py-12">
<div id="welcome-screen" class="flex flex-col items-center justify-center h-full px-6 pb-16" style="padding-top: 6vh">
<img src="assets/logo.jpg" alt="CowAgent" class="w-16 h-16 rounded-2xl mb-6 shadow-lg shadow-primary-500/20">
<h1 id="welcome-title" class="text-2xl font-bold text-slate-800 dark:text-slate-100 mb-3">CowAgent</h1>
<p id="welcome-subtitle" class="text-slate-500 dark:text-slate-400 text-center max-w-lg mb-10 leading-relaxed"
data-i18n-html="welcome_subtitle">I can help you answer questions, manage your computer, create and execute skills,<br>and keep growing through long-term memory.</p>
data-i18n-html="welcome_subtitle">我可以帮你解答问题、管理计算机、创造和执行技能,并通过<br>长期记忆和知识库不断成长</p>
<div class="grid grid-cols-1 sm:grid-cols-3 gap-4 w-full max-w-2xl">
<div class="grid grid-cols-2 sm:grid-cols-3 gap-3 w-full max-w-2xl">
<div class="example-card group bg-white dark:bg-[#1A1A1A] border border-slate-200 dark:border-white/10 rounded-xl p-4
cursor-pointer hover:border-primary-300 dark:hover:border-primary-600 hover:shadow-md transition-all duration-200">
<div class="flex items-center gap-2 mb-2">
<div class="w-7 h-7 rounded-lg bg-blue-50 dark:bg-blue-900/30 flex items-center justify-center">
<i class="fas fa-folder-open text-blue-500 text-xs"></i>
</div>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_sys_title">System</span>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_sys_title">系统管理</span>
</div>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_sys_text">Show me the files in the workspace</p>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_sys_text">查看工作空间里有哪些文件</p>
</div>
<div class="example-card group bg-white dark:bg-[#1A1A1A] border border-slate-200 dark:border-white/10 rounded-xl p-4
cursor-pointer hover:border-primary-300 dark:hover:border-primary-600 hover:shadow-md transition-all duration-200">
@@ -247,9 +315,9 @@
<div class="w-7 h-7 rounded-lg bg-amber-50 dark:bg-amber-900/30 flex items-center justify-center">
<i class="fas fa-clock text-amber-500 text-xs"></i>
</div>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_task_title">Smart Task</span>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_task_title">定时任务</span>
</div>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_task_text">Remind me to check the server in 5 minutes</p>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_task_text">1分钟后提醒我检查服务器</p>
</div>
<div class="example-card group bg-white dark:bg-[#1A1A1A] border border-slate-200 dark:border-white/10 rounded-xl p-4
cursor-pointer hover:border-primary-300 dark:hover:border-primary-600 hover:shadow-md transition-all duration-200">
@@ -257,36 +325,86 @@
<div class="w-7 h-7 rounded-lg bg-emerald-50 dark:bg-emerald-900/30 flex items-center justify-center">
<i class="fas fa-code text-emerald-500 text-xs"></i>
</div>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_code_title">Coding</span>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_code_title">编程助手</span>
</div>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_code_text">Write a Python web scraper script</p>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_code_text">搜索AI资讯并生成可视化网页报告</p>
</div>
<div class="example-card group bg-white dark:bg-[#1A1A1A] border border-slate-200 dark:border-white/10 rounded-xl p-4
cursor-pointer hover:border-primary-300 dark:hover:border-primary-600 hover:shadow-md transition-all duration-200">
<div class="flex items-center gap-2 mb-2">
<div class="w-7 h-7 rounded-lg bg-violet-50 dark:bg-violet-900/30 flex items-center justify-center">
<i class="fas fa-book text-violet-500 text-xs"></i>
</div>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_knowledge_title">知识库</span>
</div>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_knowledge_text">查看知识库当前文档情况</p>
</div>
<div class="example-card group bg-white dark:bg-[#1A1A1A] border border-slate-200 dark:border-white/10 rounded-xl p-4
cursor-pointer hover:border-primary-300 dark:hover:border-primary-600 hover:shadow-md transition-all duration-200">
<div class="flex items-center gap-2 mb-2">
<div class="w-7 h-7 rounded-lg bg-rose-50 dark:bg-rose-900/30 flex items-center justify-center">
<i class="fas fa-puzzle-piece text-rose-500 text-xs"></i>
</div>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_skill_title">技能系统</span>
</div>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_skill_text">查看所有支持的工具和技能</p>
</div>
<div class="example-card group bg-white dark:bg-[#1A1A1A] border border-slate-200 dark:border-white/10 rounded-xl p-4
cursor-pointer hover:border-primary-300 dark:hover:border-primary-600 hover:shadow-md transition-all duration-200"
data-send="/help">
<div class="flex items-center gap-2 mb-2">
<div class="w-7 h-7 rounded-lg bg-slate-100 dark:bg-slate-800 flex items-center justify-center">
<i class="fas fa-terminal text-slate-500 text-xs"></i>
</div>
<span class="font-medium text-sm text-slate-700 dark:text-slate-200" data-i18n="example_web_title">指令中心</span>
</div>
<p class="text-sm text-slate-500 dark:text-slate-400 leading-relaxed" data-i18n="example_web_text">查看全部命令</p>
</div>
</div>
</div>
</div>
<!-- Scroll-to-bottom FAB -->
<button id="scroll-to-bottom-btn"
class="hidden absolute right-5 bottom-[80px] z-10
w-9 h-9 rounded-full shadow-lg
bg-white dark:bg-[#2A2A2A] border border-slate-200 dark:border-white/15
text-slate-500 dark:text-slate-400 hover:text-primary-500 dark:hover:text-primary-400
flex items-center justify-center cursor-pointer transition-all duration-200
hover:shadow-xl hover:scale-105"
onclick="_autoScrollEnabled = true; scrollChatToBottom(true);">
<i class="fas fa-chevron-down text-sm"></i>
</button>
<!-- Chat Input -->
<div class="flex-shrink-0 border-t border-slate-200 dark:border-white/10 bg-white dark:bg-[#1A1A1A] px-4 py-3">
<div class="max-w-3xl mx-auto">
<!-- Attachment preview bar -->
<div id="attachment-preview" class="attachment-preview hidden"></div>
<div class="flex items-center gap-2">
<div class="flex items-center gap-2 relative">
<div class="flex items-center flex-shrink-0">
<button id="new-chat-btn" class="w-9 h-10 flex items-center justify-center rounded-lg
text-slate-400 hover:text-primary-500 hover:bg-primary-50 dark:hover:bg-primary-900/20
cursor-pointer transition-colors duration-150" title="New Chat"
cursor-pointer transition-colors duration-150"
onclick="newChat()">
<i class="fas fa-plus text-base"></i>
</button>
<button id="clear-context-btn" class="w-9 h-10 flex items-center justify-center rounded-lg
text-slate-400 hover:text-amber-500 hover:bg-amber-50 dark:hover:bg-amber-900/20
cursor-pointer transition-colors duration-150"
onclick="clearContext()">
<i class="fas fa-trash-can text-base"></i>
</button>
<button id="attach-btn" class="w-9 h-10 flex items-center justify-center rounded-lg
text-slate-400 hover:text-primary-500 hover:bg-primary-50 dark:hover:bg-primary-900/20
cursor-pointer transition-colors duration-150"
title="Attach file" onclick="document.getElementById('file-input').click()">
onclick="document.getElementById('file-input').click()">
<i class="fas fa-paperclip text-base"></i>
</button>
</div>
<input type="file" id="file-input" class="hidden" multiple
accept="image/*,.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.txt,.csv,.json,.xml,.zip,.rar,.7z,.py,.js,.ts,.java,.c,.cpp,.go,.rs,.md">
<div id="slash-menu" class="slash-menu hidden"></div>
<textarea id="chat-input"
class="flex-1 min-w-0 px-4 py-[10px] rounded-xl border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-slate-800 dark:text-slate-100
@@ -295,7 +413,7 @@
text-sm leading-relaxed"
rows="1"
data-i18n-placeholder="input_placeholder"
placeholder="Type a message..."></textarea>
placeholder="输入消息,或输入 / 使用指令"></textarea>
<button id="send-btn"
class="flex-shrink-0 w-10 h-10 flex items-center justify-center rounded-lg
bg-primary-400 text-white hover:bg-primary-500
@@ -317,8 +435,8 @@
<div class="max-w-4xl mx-auto">
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="config_title">Configuration</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="config_desc">Manage model and agent settings</p>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="config_title">配置管理</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="config_desc">管理模型和 Agent 配置</p>
</div>
</div>
<div class="grid gap-6">
@@ -329,12 +447,12 @@
<div class="w-9 h-9 rounded-lg bg-primary-50 dark:bg-primary-900/30 flex items-center justify-center">
<i class="fas fa-microchip text-primary-500 text-sm"></i>
</div>
<h3 class="font-semibold text-slate-800 dark:text-slate-100" data-i18n="config_model">Model Configuration</h3>
<h3 class="font-semibold text-slate-800 dark:text-slate-100" data-i18n="config_model">模型配置</h3>
</div>
<div class="space-y-5">
<!-- Provider -->
<div>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_provider">Provider</label>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_provider">模型厂商</label>
<div id="cfg-provider" class="cfg-dropdown" tabindex="0">
<div class="cfg-dropdown-selected">
<span class="cfg-dropdown-text">--</span>
@@ -342,10 +460,13 @@
</div>
<div class="cfg-dropdown-menu"></div>
</div>
<div id="cfg-custom-tip" class="mt-1.5 text-xs text-slate-400 dark:text-slate-500 hidden">
<i class="fas fa-info-circle mr-1"></i><span data-i18n="config_custom_tip">接口需遵循 OpenAI API 协议</span>
</div>
</div>
<!-- Model -->
<div>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_model_name">Model</label>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_model_name">模型</label>
<div id="cfg-model-select" class="cfg-dropdown" tabindex="0">
<div class="cfg-dropdown-selected">
<span class="cfg-dropdown-text">--</span>
@@ -358,7 +479,7 @@
class="w-full px-3 py-2 rounded-lg border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-sm text-slate-800 dark:text-slate-100
focus:outline-none focus:border-primary-500 font-mono transition-colors"
data-i18n-placeholder="config_custom_model_hint" placeholder="Enter custom model name">
data-i18n-placeholder="config_custom_model_hint" placeholder="输入自定义模型名称">
</div>
</div>
<!-- API Key -->
@@ -393,7 +514,7 @@
<button id="cfg-model-save"
class="px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600 text-white text-sm font-medium
cursor-pointer transition-colors duration-150 disabled:opacity-50 disabled:cursor-not-allowed"
onclick="saveModelConfig()" data-i18n="config_save">Save</button>
onclick="saveModelConfig()" data-i18n="config_save">保存</button>
</div>
</div>
</div>
@@ -404,36 +525,86 @@
<div class="w-9 h-9 rounded-lg bg-emerald-50 dark:bg-emerald-900/30 flex items-center justify-center">
<i class="fas fa-robot text-emerald-500 text-sm"></i>
</div>
<h3 class="font-semibold text-slate-800 dark:text-slate-100" data-i18n="config_agent">Agent Configuration</h3>
<h3 class="font-semibold text-slate-800 dark:text-slate-100" data-i18n="config_agent">Agent 配置</h3>
</div>
<div class="space-y-4">
<div>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_max_tokens">Max Context Tokens</label>
<label class="flex items-center gap-1.5 text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5">
<span data-i18n="config_max_tokens">最大上下文 Token</span>
<span class="cfg-tip" data-tip-key="config_max_tokens_hint"><i class="fas fa-circle-question"></i></span>
</label>
<input id="cfg-max-tokens" type="number" min="1000" max="200000" step="1000"
class="w-full px-3 py-2 rounded-lg border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-sm text-slate-800 dark:text-slate-100
focus:outline-none focus:border-primary-500 font-mono transition-colors">
</div>
<div>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_max_turns">Max Context Turns</label>
<label class="flex items-center gap-1.5 text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5">
<span data-i18n="config_max_turns">最大记忆轮次</span>
<span class="cfg-tip" data-tip-key="config_max_turns_hint"><i class="fas fa-circle-question"></i></span>
</label>
<input id="cfg-max-turns" type="number" min="1" max="100" step="1"
class="w-full px-3 py-2 rounded-lg border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-sm text-slate-800 dark:text-slate-100
focus:outline-none focus:border-primary-500 font-mono transition-colors">
</div>
<div>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_max_steps">Max Steps</label>
<label class="flex items-center gap-1.5 text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5">
<span data-i18n="config_max_steps">最大执行步数</span>
<span class="cfg-tip" data-tip-key="config_max_steps_hint"><i class="fas fa-circle-question"></i></span>
</label>
<input id="cfg-max-steps" type="number" min="1" max="50" step="1"
class="w-full px-3 py-2 rounded-lg border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-sm text-slate-800 dark:text-slate-100
focus:outline-none focus:border-primary-500 font-mono transition-colors">
</div>
<div class="flex items-center justify-between">
<label class="flex items-center gap-1.5 text-sm font-medium text-slate-600 dark:text-slate-400">
<span data-i18n="config_enable_thinking">Deep Thinking</span>
<span class="cfg-tip" data-tip-key="config_enable_thinking_hint"><i class="fas fa-circle-question"></i></span>
</label>
<label class="relative inline-flex items-center cursor-pointer">
<input id="cfg-enable-thinking" type="checkbox" class="sr-only peer">
<div class="w-9 h-5 bg-slate-200 dark:bg-slate-700 peer-checked:bg-primary-400 rounded-full
after:content-[''] after:absolute after:top-[2px] after:left-[2px] after:bg-white
after:rounded-full after:h-4 after:w-4 after:transition-all peer-checked:after:translate-x-full"></div>
</label>
</div>
<div class="flex items-center justify-end gap-3 pt-1">
<span id="cfg-agent-status" class="text-xs text-primary-500 opacity-0 transition-opacity duration-300"></span>
<button id="cfg-agent-save"
class="px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600 text-white text-sm font-medium
cursor-pointer transition-colors duration-150 disabled:opacity-50 disabled:cursor-not-allowed"
onclick="saveAgentConfig()" data-i18n="config_save">Save</button>
onclick="saveAgentConfig()" data-i18n="config_save">保存</button>
</div>
</div>
</div>
<!-- Security Config Card -->
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 p-6">
<div class="flex items-center gap-3 mb-5">
<div class="w-9 h-9 rounded-lg bg-amber-50 dark:bg-amber-900/30 flex items-center justify-center">
<i class="fas fa-lock text-amber-500 text-sm"></i>
</div>
<h3 class="font-semibold text-slate-800 dark:text-slate-100" data-i18n="config_security">安全设置</h3>
</div>
<div class="space-y-4">
<div>
<label class="block text-sm font-medium text-slate-600 dark:text-slate-400 mb-1.5" data-i18n="config_password">访问密码</label>
<input id="cfg-password" type="password" autocomplete="new-password"
class="w-full px-3 py-2 rounded-lg border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-sm text-slate-800 dark:text-slate-100
focus:outline-none focus:border-primary-500 font-mono transition-colors
cfg-key-masked"
data-masked="1">
<p class="text-xs text-slate-400 dark:text-slate-500 mt-1.5" data-i18n="config_password_hint">留空则不启用密码保护</p>
</div>
<div class="flex items-center justify-end gap-3 pt-1">
<span id="cfg-password-status" class="text-xs text-primary-500 opacity-0 transition-opacity duration-300"></span>
<button id="cfg-password-save"
class="px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600 text-white text-sm font-medium
cursor-pointer transition-colors duration-150 disabled:opacity-50 disabled:cursor-not-allowed"
onclick="savePasswordConfig()" data-i18n="config_save">保存</button>
</div>
</div>
</div>
@@ -451,20 +622,25 @@
<div class="max-w-4xl mx-auto">
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="skills_title">Skills</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="skills_desc">View, enable, or disable agent skills</p>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="skills_title">技能管理</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="skills_desc">查看、启用或禁用 Agent 技能</p>
</div>
<a href="https://skills.cowagent.ai/" target="_blank"
class="inline-flex items-center gap-1.5 px-3 py-1.5 rounded-lg text-xs font-medium text-primary-500 bg-primary-50 dark:bg-primary-900/20 hover:bg-primary-100 dark:hover:bg-primary-900/30 transition-colors">
<i class="fas fa-puzzle-piece text-[10px]"></i>
<span data-i18n="skills_hub_btn">探索技能广场</span>
</a>
</div>
<!-- Built-in Tools Section -->
<div class="mb-8">
<div class="flex items-center gap-2 mb-3">
<span class="text-xs font-semibold uppercase tracking-wider text-slate-400 dark:text-slate-500" data-i18n="tools_section_title">Built-in Tools</span>
<span class="text-xs font-semibold uppercase tracking-wider text-slate-400 dark:text-slate-500" data-i18n="tools_section_title">内置工具</span>
<span id="tools-count-badge" class="hidden px-2 py-0.5 rounded-full text-xs bg-slate-100 dark:bg-white/10 text-slate-500 dark:text-slate-400"></span>
</div>
<div id="tools-empty" class="flex items-center gap-2 py-4 text-slate-400 dark:text-slate-500 text-sm">
<i class="fas fa-spinner fa-spin text-xs"></i>
<span data-i18n="tools_loading">Loading tools...</span>
<span data-i18n="tools_loading">加载工具中...</span>
</div>
<div id="tools-list" class="grid gap-3 sm:grid-cols-2 hidden"></div>
</div>
@@ -472,15 +648,15 @@
<!-- Skills Section -->
<div>
<div class="flex items-center gap-2 mb-3">
<span class="text-xs font-semibold uppercase tracking-wider text-slate-400 dark:text-slate-500" data-i18n="skills_section_title">Skills</span>
<span class="text-xs font-semibold uppercase tracking-wider text-slate-400 dark:text-slate-500" data-i18n="skills_section_title">技能</span>
<span id="skills-count-badge" class="hidden px-2 py-0.5 rounded-full text-xs bg-slate-100 dark:bg-white/10 text-slate-500 dark:text-slate-400"></span>
</div>
<div id="skills-empty" class="flex flex-col items-center justify-center py-12">
<div class="w-14 h-14 rounded-2xl bg-amber-50 dark:bg-amber-900/20 flex items-center justify-center mb-3">
<i class="fas fa-bolt text-amber-400 text-lg"></i>
</div>
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="skills_loading">Loading skills...</p>
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="skills_loading_desc">Skills will be displayed here after loading</p>
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="skills_loading">加载技能中...</p>
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="skills_loading_desc">技能加载后将显示在此处</p>
</div>
<div id="skills-list" class="grid gap-4 sm:grid-cols-2"></div>
</div>
@@ -499,26 +675,36 @@
<div id="memory-panel-list">
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="memory_title">Memory</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="memory_desc">View agent memory files and contents</p>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="memory_title">记忆管理</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="memory_desc">查看 Agent 记忆文件和内容</p>
</div>
<div class="flex items-center bg-slate-100 dark:bg-white/10 rounded-lg p-0.5">
<button id="memory-tab-files" onclick="switchMemoryTab('files')"
class="memory-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150 active">
<i class="fas fa-file-lines mr-1.5"></i><span data-i18n="memory_tab_files">记忆文件</span>
</button>
<button id="memory-tab-dreams" onclick="switchMemoryTab('dreams')"
class="memory-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150">
<i class="fas fa-moon mr-1.5"></i><span data-i18n="memory_tab_dreams">梦境日记</span>
</button>
</div>
</div>
<div id="memory-empty" class="flex flex-col items-center justify-center py-20">
<div class="w-16 h-16 rounded-2xl bg-purple-50 dark:bg-purple-900/20 flex items-center justify-center mb-4">
<i class="fas fa-brain text-purple-400 text-xl"></i>
</div>
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="memory_loading">Loading memory files...</p>
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="memory_loading_desc">Memory files will be displayed here</p>
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="memory_loading">加载记忆文件中...</p>
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="memory_loading_desc">记忆文件将显示在此处</p>
</div>
<div id="memory-list" class="hidden">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<table class="w-full">
<thead>
<tr class="border-b border-slate-200 dark:border-white/10">
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_name">Filename</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_type">Type</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_size">Size</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_updated">Updated</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_name">文件名</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_type">类型</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_size">大小</th>
<th class="text-left px-4 py-3 text-xs font-semibold uppercase tracking-wider text-slate-500 dark:text-slate-400" data-i18n="memory_col_updated">更新时间</th>
</tr>
</thead>
<tbody id="memory-table-body"></tbody>
@@ -536,7 +722,7 @@
text-slate-500 dark:text-slate-400 hover:bg-slate-100 dark:hover:bg-white/10
border border-slate-200 dark:border-white/10 transition-colors cursor-pointer">
<i class="fas fa-arrow-left text-xs"></i>
<span data-i18n="memory_back">Back</span>
<span data-i18n="memory_back">返回列表</span>
</button>
<h2 id="memory-viewer-title"
class="text-base font-semibold text-slate-800 dark:text-slate-100 font-mono truncate"></h2>
@@ -552,6 +738,106 @@
</div>
</div>
<!-- ====================================================== -->
<!-- VIEW: Knowledge -->
<!-- ====================================================== -->
<div id="view-knowledge" class="view">
<div class="flex-1 overflow-y-auto p-4 md:p-8 lg:p-10">
<div class="w-full max-w-[1600px] mx-auto">
<!-- Header -->
<div class="flex flex-col sm:flex-row sm:items-center justify-between gap-3 mb-4 md:mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="knowledge_title">知识库</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="knowledge_desc">浏览和探索你的知识库</p>
</div>
<div class="flex items-center gap-2">
<span id="knowledge-stats" class="text-xs text-slate-400 dark:text-slate-500 hidden sm:inline"></span>
<div class="flex items-center bg-slate-100 dark:bg-white/10 rounded-lg p-0.5">
<button id="knowledge-tab-docs" onclick="switchKnowledgeTab('docs')"
class="knowledge-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150 active">
<i class="fas fa-folder-tree mr-1.5"></i><span data-i18n="knowledge_tab_docs">文档</span>
</button>
<button id="knowledge-tab-graph" onclick="switchKnowledgeTab('graph')"
class="knowledge-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150">
<i class="fas fa-diagram-project mr-1.5"></i><span data-i18n="knowledge_tab_graph">图谱</span>
</button>
</div>
</div>
</div>
<!-- Empty state -->
<div id="knowledge-empty" class="flex flex-col items-center justify-center py-20">
<div class="w-16 h-16 rounded-2xl bg-emerald-50 dark:bg-emerald-900/20 flex items-center justify-center mb-4">
<i class="fas fa-book text-emerald-400 text-xl"></i>
</div>
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="knowledge_loading">加载知识库中...</p>
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="knowledge_loading_desc">知识页面将显示在这里</p>
<div id="knowledge-empty-guide" class="hidden mt-6 max-w-sm text-center">
<p class="text-sm text-slate-500 dark:text-slate-400 mb-4" data-i18n="knowledge_empty_guide">在对话中发送文档、链接或主题给 Agent它会自动整理到你的知识库中。</p>
<button onclick="navigateTo('chat')"
class="inline-flex items-center gap-2 px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600
text-white text-sm font-medium cursor-pointer transition-colors duration-150">
<i class="fas fa-message text-xs"></i>
<span data-i18n="knowledge_go_chat">开始对话</span>
</button>
</div>
</div>
<!-- Documents panel -->
<div id="knowledge-panel-docs" class="hidden">
<div class="flex flex-col md:flex-row gap-4 md:gap-6" style="min-height: calc(100vh - 220px)">
<!-- File tree -->
<div id="knowledge-sidebar" class="w-full md:w-72 lg:w-80 flex-shrink-0">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<div class="px-4 py-3 border-b border-slate-200 dark:border-white/10">
<div class="relative">
<i class="fas fa-search absolute left-3 top-1/2 -translate-y-1/2 text-slate-400 text-xs"></i>
<input id="knowledge-search" type="text" placeholder="Search..."
class="w-full pl-8 pr-3 py-1.5 text-xs bg-slate-50 dark:bg-white/5 border border-slate-200 dark:border-white/10 rounded-lg text-slate-700 dark:text-slate-200 placeholder-slate-400 dark:placeholder-slate-500 focus:outline-none focus:ring-1 focus:ring-primary-400/50"
oninput="filterKnowledgeTree(this.value)">
</div>
</div>
<div id="knowledge-tree" class="p-2 overflow-y-auto max-h-[50vh] md:max-h-[calc(100vh-300px)]"></div>
</div>
</div>
<!-- Content viewer -->
<div class="flex-1 min-w-0">
<div id="knowledge-content-placeholder"
class="flex flex-col items-center justify-center py-20 text-slate-400 dark:text-slate-500">
<i class="fas fa-file-lines text-3xl mb-3 opacity-40"></i>
<p class="text-sm" data-i18n="knowledge_select_hint">选择一个文档查看</p>
</div>
<div id="knowledge-content-viewer" class="hidden">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<div class="flex items-center gap-3 px-4 md:px-5 py-3 border-b border-slate-200 dark:border-white/10">
<button onclick="knowledgeMobileBack()" class="md:hidden p-1 -ml-1 text-slate-400 hover:text-slate-600 dark:hover:text-slate-300 cursor-pointer">
<i class="fas fa-arrow-left text-xs"></i>
</button>
<i class="fas fa-file-lines text-slate-400 text-sm hidden md:inline"></i>
<span id="knowledge-viewer-title" class="text-sm font-medium text-slate-700 dark:text-slate-200 truncate"></span>
<span id="knowledge-viewer-path" class="text-xs text-slate-400 dark:text-slate-500 ml-auto font-mono truncate hidden md:inline"></span>
</div>
<div id="knowledge-viewer-body"
class="p-4 md:p-5 overflow-y-auto text-sm msg-content text-slate-700 dark:text-slate-200"
style="max-height: calc(100vh - 280px)"></div>
</div>
</div>
</div>
</div>
</div>
<!-- Graph panel -->
<div id="knowledge-panel-graph" class="hidden">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<div id="knowledge-graph-container" class="w-full h-[60vh] md:h-[calc(100vh-220px)]"></div>
</div>
</div>
</div>
</div>
</div>
<!-- ====================================================== -->
<!-- VIEW: Channels -->
<!-- ====================================================== -->
@@ -560,14 +846,14 @@
<div class="max-w-4xl mx-auto">
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="channels_title">Channels</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="channels_desc">View and manage messaging channels</p>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="channels_title">通道管理</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="channels_desc">管理已接入的消息通道</p>
</div>
<button id="add-channel-btn" onclick="openAddChannelPanel()"
class="flex items-center gap-2 px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600
text-white text-sm font-medium cursor-pointer transition-colors duration-150">
<i class="fas fa-plus text-xs"></i>
<span data-i18n="channels_add">Connect</span>
<span data-i18n="channels_add">接入通道</span>
</button>
</div>
<div id="channels-content" class="grid gap-4"></div>
@@ -584,8 +870,8 @@
<div class="max-w-4xl mx-auto">
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="tasks_title">Scheduled Tasks</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="tasks_desc">View and manage scheduled tasks</p>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="tasks_title">定时任务</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="tasks_desc">查看和管理定时任务</p>
</div>
</div>
<div id="tasks-empty" class="flex flex-col items-center justify-center py-20">
@@ -607,8 +893,8 @@
<div class="max-w-5xl mx-auto">
<div class="flex items-center justify-between mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="logs_title">Logs</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="logs_desc">Real-time log output (run.log)</p>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="logs_title">日志</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="logs_desc">实时日志输出 (run.log)</p>
</div>
</div>
<!-- Log Terminal -->
@@ -623,11 +909,11 @@
<div class="flex-1"></div>
<div class="flex items-center gap-1.5">
<span class="w-2 h-2 rounded-full bg-emerald-500 animate-pulse"></span>
<span class="text-xs text-slate-500" data-i18n="logs_live">Live</span>
<span class="text-xs text-slate-500" data-i18n="logs_live">实时</span>
</div>
</div>
<div id="log-output" class="p-4 overflow-y-auto font-mono text-xs leading-relaxed text-slate-300 whitespace-pre-wrap break-all" style="height: calc(100vh - 272px)">
<p class="text-slate-500" data-i18n="logs_coming_msg">Log streaming will be available here. Connects to run.log for real-time output similar to tail -f.</p>
<p class="text-slate-500" data-i18n="logs_coming_msg">日志流即将在此提供。将连接 run.log 实现类似 tail -f 的实时输出。</p>
</div>
</div>
</div>
@@ -664,6 +950,7 @@
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
<script src="assets/js/console.js"></script>
</body>
</html>

View File

@@ -17,6 +17,45 @@
.dark ::-webkit-scrollbar-thumb { background: #475569; }
.dark ::-webkit-scrollbar-thumb:hover { background: #64748b; }
/* Generic Tooltip (via data-tooltip attribute) */
[data-tooltip] {
position: relative;
}
[data-tooltip]::after {
content: attr(data-tooltip);
position: absolute;
left: 50%;
bottom: calc(100% + 8px);
transform: translateX(-50%);
padding: 5px 10px;
border-radius: 6px;
font-size: 12px;
font-weight: 400;
line-height: 1.4;
white-space: nowrap;
background: #1e293b;
color: #e2e8f0;
box-shadow: 0 4px 12px rgba(0, 0, 0, 0.15);
opacity: 0;
pointer-events: none;
transition: opacity 0.15s ease;
z-index: 100;
}
[data-tooltip-pos="bottom"]::after {
bottom: auto;
top: calc(100% + 8px);
}
.dark [data-tooltip]::after {
background: #334155;
color: #f1f5f9;
}
[data-tooltip]:hover::after {
opacity: 1;
}
[data-tooltip=""]:hover::after {
display: none;
}
/* Sidebar */
.sidebar-item.active {
background: rgba(255, 255, 255, 0.08);
@@ -24,9 +63,317 @@
}
.sidebar-item.active .item-icon { color: #4ABE6E; }
/* Session Panel */
.session-panel {
width: 220px;
flex-shrink: 0;
display: flex;
flex-direction: column;
background: #fafafa;
border-right: 1px solid #e5e7eb;
height: 100vh;
overflow: hidden;
transition: width 0.2s ease;
}
.dark .session-panel {
background: #111111;
border-right-color: rgba(255, 255, 255, 0.08);
}
.session-panel.hidden { display: none; }
.session-panel-header {
display: flex;
align-items: center;
justify-content: space-between;
padding: 12px 16px;
border-bottom: 1px solid #e5e7eb;
flex-shrink: 0;
}
.dark .session-panel-header { border-bottom-color: rgba(255, 255, 255, 0.08); }
.session-panel-title {
font-size: 14px;
font-weight: 600;
color: #374151;
}
.dark .session-panel-title { color: #d1d5db; }
.session-panel-close {
width: 28px;
height: 28px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 6px;
border: none;
background: none;
color: #9ca3af;
cursor: pointer;
transition: background 0.15s, color 0.15s;
font-size: 12px;
}
.session-panel-close:hover {
background: #f3f4f6;
color: #374151;
}
.dark .session-panel-close:hover {
background: rgba(255, 255, 255, 0.08);
color: #e5e5e5;
}
.session-panel-new {
display: flex;
align-items: center;
gap: 8px;
margin: 10px 12px;
padding: 8px 14px;
border-radius: 8px;
border: 1px dashed #d1d5db;
background: none;
color: #6b7280;
font-size: 13px;
cursor: pointer;
transition: border-color 0.15s, color 0.15s, background 0.15s;
flex-shrink: 0;
}
.session-panel-new:hover {
border-color: #9ca3af;
color: #374151;
background: #f9fafb;
}
.dark .session-panel-new {
border-color: rgba(255, 255, 255, 0.12);
color: #9ca3af;
}
.dark .session-panel-new:hover {
border-color: rgba(255, 255, 255, 0.25);
color: #e5e5e5;
background: rgba(255, 255, 255, 0.04);
}
/* Session List */
.session-list {
flex: 1;
overflow-y: auto;
padding: 4px 8px;
scrollbar-width: none;
}
.session-list:hover { scrollbar-width: thin; }
.session-list::-webkit-scrollbar { width: 4px; background: transparent; }
.session-list::-webkit-scrollbar-thumb { background: transparent; border-radius: 2px; }
.session-list:hover::-webkit-scrollbar-thumb { background: rgba(0,0,0,0.2); }
.dark .session-list:hover::-webkit-scrollbar-thumb { background: rgba(255,255,255,0.15); }
.session-group-label {
padding: 10px 8px 4px;
font-size: 11px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: #9ca3af;
}
.dark .session-group-label { color: #525252; }
.session-empty {
padding: 20px 12px;
text-align: center;
font-size: 13px;
color: #9ca3af;
}
.session-item {
display: flex;
align-items: center;
gap: 8px;
padding: 8px 10px;
margin: 1px 0;
border-radius: 8px;
cursor: pointer;
transition: background 0.15s, color 0.15s;
color: #6b7280;
font-size: 13px;
position: relative;
}
.dark .session-item { color: #a3a3a3; }
.session-item:hover {
background: #f3f4f6;
color: #111827;
}
.dark .session-item:hover {
background: rgba(255, 255, 255, 0.05);
color: #e5e5e5;
}
.session-item.active {
background: #e5e7eb;
color: #111827;
}
.dark .session-item.active {
background: rgba(255, 255, 255, 0.1);
color: #ffffff;
}
.session-icon {
flex-shrink: 0;
font-size: 11px;
color: #9ca3af;
width: 16px;
text-align: center;
}
.dark .session-icon { color: #525252; }
.session-item.active .session-icon { color: #4ABE6E; }
.session-title {
flex: 1;
min-width: 0;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
}
.session-delete {
flex-shrink: 0;
width: 22px;
height: 22px;
display: flex;
align-items: center;
justify-content: center;
border-radius: 5px;
font-size: 10px;
color: #9ca3af;
opacity: 0;
transition: opacity 0.15s, color 0.15s, background 0.15s;
cursor: pointer;
background: none;
border: none;
padding: 0;
}
.session-item:hover .session-delete { opacity: 1; }
.session-delete:hover {
color: #ef4444;
background: rgba(239, 68, 68, 0.1);
}
.dark .session-delete:hover { background: rgba(239, 68, 68, 0.15); }
/* Context Divider */
.context-divider {
display: flex;
align-items: center;
gap: 12px;
padding: 12px 24px;
color: #9ca3af;
}
.context-divider::before, .context-divider::after {
content: '';
flex: 1;
height: 1px;
background: linear-gradient(to right, transparent, #d1d5db, transparent);
}
.dark .context-divider::before, .dark .context-divider::after {
background: linear-gradient(to right, transparent, rgba(255,255,255,0.12), transparent);
}
.context-divider span {
font-size: 12px;
white-space: nowrap;
color: #9ca3af;
}
/* Confirm Modal */
.confirm-overlay {
position: fixed;
inset: 0;
z-index: 9999;
display: flex;
align-items: center;
justify-content: center;
background: rgba(0, 0, 0, 0.4);
opacity: 0;
transition: opacity 0.2s ease;
}
.confirm-overlay.visible { opacity: 1; }
.confirm-modal {
background: #fff;
border-radius: 14px;
width: 380px;
max-width: 90vw;
padding: 28px 24px 20px;
box-shadow: 0 20px 60px rgba(0, 0, 0, 0.18);
transform: scale(0.92);
transition: transform 0.2s ease;
}
.confirm-overlay.visible .confirm-modal { transform: scale(1); }
.dark .confirm-modal {
background: #1e1e1e;
box-shadow: 0 20px 60px rgba(0, 0, 0, 0.5);
}
.confirm-title {
font-size: 16px;
font-weight: 600;
color: #1f2937;
margin-bottom: 8px;
}
.dark .confirm-title { color: #e5e7eb; }
.confirm-message {
font-size: 14px;
color: #6b7280;
line-height: 1.5;
margin-bottom: 24px;
}
.dark .confirm-message { color: #9ca3af; }
.confirm-actions {
display: flex;
justify-content: flex-end;
gap: 10px;
}
.confirm-btn {
padding: 8px 20px;
border-radius: 8px;
font-size: 14px;
font-weight: 500;
cursor: pointer;
border: none;
transition: all 0.15s ease;
}
.confirm-btn-cancel {
background: #f3f4f6;
color: #374151;
}
.confirm-btn-cancel:hover { background: #e5e7eb; }
.dark .confirm-btn-cancel {
background: rgba(255, 255, 255, 0.08);
color: #d1d5db;
}
.dark .confirm-btn-cancel:hover { background: rgba(255, 255, 255, 0.14); }
.confirm-btn-ok {
background: #ef4444;
color: #fff;
}
.confirm-btn-ok:hover { background: #dc2626; }
/* Session panel overlay (mobile only, click to close) */
.session-panel-overlay {
display: none;
}
@media (max-width: 768px) {
.session-panel-overlay {
display: block;
position: fixed;
inset: 0;
z-index: 44;
background: rgba(0, 0, 0, 0.3);
}
.session-panel-overlay.hidden {
display: none;
}
}
/* Mobile: session panel as overlay */
@media (max-width: 768px) {
.session-panel {
position: fixed;
top: 0;
left: 0;
z-index: 45;
width: 220px;
box-shadow: 4px 0 24px rgba(0, 0, 0, 0.15);
}
.dark .session-panel {
box-shadow: 4px 0 24px rgba(0, 0, 0, 0.4);
}
}
/* Menu Groups */
.menu-group-items { max-height: 0; overflow: hidden; transition: max-height 0.25s ease-out; }
.menu-group.open .menu-group-items { max-height: 500px; transition: max-height 0.35s ease-in; }
.menu-group.open .menu-group-items { max-height: 2000px; transition: max-height 0.35s ease-in; }
.menu-group .chevron { transition: transform 0.25s ease; }
.menu-group.open .chevron { transform: rotate(90deg); }
@@ -45,7 +392,8 @@
.msg-content h1 { font-size: 1.4em; }
.msg-content h2 { font-size: 1.25em; }
.msg-content h3 { font-size: 1.1em; }
.msg-content ul, .msg-content ol { margin: 0.5em 0; padding-left: 1.8em; }
.msg-content ul { margin: 0.5em 0; padding-left: 1.8em; list-style: disc; }
.msg-content ol { margin: 0.5em 0; padding-left: 1.8em; list-style: decimal; }
.msg-content li { margin: 0.25em 0; }
.msg-content pre {
border-radius: 8px; overflow-x: auto; margin: 0.8em 0;
@@ -79,6 +427,11 @@
.msg-content img { max-width: 100%; height: auto; border-radius: 8px; margin: 0.5em 0; }
.msg-content a { color: #35A85B; text-decoration: underline; }
.msg-content a:hover { color: #228547; }
/* Overrides for user bubble (white text on green bg) */
.user-bubble.msg-content a { color: #ffffff !important; text-decoration: underline; text-decoration-color: rgba(255,255,255,0.6); }
.user-bubble.msg-content a:hover { color: #e0f5e8 !important; text-decoration-color: #e0f5e8; }
.user-bubble.msg-content :not(pre) > code { background: rgba(255,255,255,0.2); color: #ffffff; }
.msg-content hr { border: none; height: 1px; background: #e2e8f0; margin: 1.2em 0; }
.dark .msg-content hr { background: rgba(255,255,255,0.1); }
@@ -119,9 +472,8 @@
cursor: pointer;
user-select: none;
}
.agent-thinking-step .thinking-header.no-toggle { cursor: default; }
.agent-thinking-step .thinking-header:not(.no-toggle):hover { color: #64748b; }
.dark .agent-thinking-step .thinking-header:not(.no-toggle):hover { color: #cbd5e1; }
.agent-thinking-step .thinking-header:hover { color: #64748b; }
.dark .agent-thinking-step .thinking-header:hover { color: #cbd5e1; }
.agent-thinking-step .thinking-header i:first-child { font-size: 0.625rem; margin-top: 1px; }
.agent-thinking-step .thinking-chevron {
font-size: 0.5rem;
@@ -141,7 +493,7 @@
font-size: 0.75rem;
line-height: 1.5;
color: #94a3b8;
max-height: 200px;
max-height: 300px;
overflow-y: auto;
}
.dark .agent-thinking-step .thinking-full {
@@ -152,6 +504,41 @@
.agent-thinking-step .thinking-full p { margin: 0.25em 0; }
.agent-thinking-step .thinking-full p:first-child { margin-top: 0; }
.agent-thinking-step .thinking-full p:last-child { margin-bottom: 0; }
.agent-thinking-step .thinking-duration {
font-size: 0.625rem;
color: #b0b8c4;
margin-bottom: 0.375rem;
}
/* Streaming reasoning: render as plain pre to avoid expensive markdown
re-parsing on every chunk. Wrap long lines so the bubble width is
respected and use the same font size/color as the rendered version. */
.agent-thinking-step .thinking-stream-pre {
margin: 0;
padding: 0;
background: transparent;
border: 0;
font-family: inherit;
font-size: inherit;
line-height: 1.5;
color: inherit;
white-space: pre-wrap;
word-break: break-word;
overflow-wrap: anywhere;
}
/* Content step - real text output frozen before tool calls */
.agent-content-step {
font-size: 0.875rem;
line-height: 1.6;
color: inherit;
margin-bottom: 0.5rem;
padding-bottom: 0.5rem;
border-bottom: 1px dashed rgba(0, 0, 0, 0.06);
}
.dark .agent-content-step { border-bottom-color: rgba(255, 255, 255, 0.06); }
.agent-content-step .agent-content-body p { margin: 0.25em 0; }
.agent-content-step .agent-content-body p:first-child { margin-top: 0; }
.agent-content-step .agent-content-body p:last-child { margin-bottom: 0; }
/* Tool step - collapsible */
.agent-tool-step .tool-header {
@@ -446,3 +833,277 @@
transform: translateY(-2px);
box-shadow: 0 8px 25px -5px rgba(0, 0, 0, 0.1);
}
/* Slash Command Menu */
.slash-menu {
position: absolute;
bottom: calc(100% + 6px);
left: 0;
right: 0;
max-height: 320px;
overflow-y: auto;
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 12px;
box-shadow: 0 8px 30px -6px rgba(0, 0, 0, 0.1), 0 2px 8px -2px rgba(0, 0, 0, 0.04);
z-index: 50;
padding: 4px;
animation: slashMenuIn 0.15s ease-out;
}
.slash-menu.hidden { display: none; }
@keyframes slashMenuIn {
from { opacity: 0; transform: translateY(6px); }
to { opacity: 1; transform: translateY(0); }
}
.slash-menu-header {
padding: 6px 10px 4px;
font-size: 11px;
font-weight: 600;
color: #94a3b8;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.slash-menu-item {
display: flex;
align-items: center;
justify-content: space-between;
padding: 8px 10px;
border-radius: 8px;
cursor: pointer;
transition: background 0.12s ease;
}
.slash-menu-item:hover,
.slash-menu-item.active {
background: #EDFDF3;
}
.slash-menu-item .cmd {
font-size: 13px;
font-weight: 500;
color: #334155;
font-family: ui-monospace, SFMono-Regular, 'SF Mono', Menlo, monospace;
}
.slash-menu-item.active .cmd {
color: #228547;
}
.slash-menu-item .desc {
font-size: 12px;
color: #94a3b8;
margin-left: 12px;
white-space: nowrap;
}
/* Dark mode */
.dark .slash-menu {
background: #1A1A1A;
border-color: rgba(255, 255, 255, 0.1);
box-shadow: 0 8px 30px -6px rgba(0, 0, 0, 0.35), 0 2px 8px -2px rgba(0, 0, 0, 0.15);
}
.dark .slash-menu-header {
color: #64748b;
}
.dark .slash-menu-item:hover,
.dark .slash-menu-item.active {
background: rgba(74, 190, 110, 0.1);
}
.dark .slash-menu-item .cmd {
color: #e2e8f0;
}
.dark .slash-menu-item.active .cmd {
color: #4ABE6E;
}
.dark .slash-menu-item .desc {
color: #64748b;
}
/* ============================================================
Knowledge View
============================================================ */
/* Tab toggle */
.knowledge-tab, .memory-tab {
color: #64748b;
}
.knowledge-tab.active, .memory-tab.active {
background: #fff;
color: #334155;
box-shadow: 0 1px 3px rgba(0,0,0,0.08);
}
.dark .knowledge-tab.active, .dark .memory-tab.active {
background: rgba(255,255,255,0.1);
color: #e2e8f0;
}
/* File tree */
.knowledge-tree-group {
margin-bottom: 2px;
}
.knowledge-tree-group-btn {
display: flex;
align-items: center;
gap: 6px;
width: 100%;
padding: 6px 8px;
border-radius: 6px;
font-size: 12px;
font-weight: 600;
color: #64748b;
cursor: pointer;
border: none;
background: none;
transition: background 0.15s, color 0.15s;
text-transform: capitalize;
}
.knowledge-tree-group-btn:hover {
background: rgba(0,0,0,0.04);
color: #334155;
}
.dark .knowledge-tree-group-btn:hover {
background: rgba(255,255,255,0.06);
color: #e2e8f0;
}
.knowledge-tree-group-btn i.chevron {
font-size: 8px;
transition: transform 0.15s;
}
.knowledge-tree-group.open > .knowledge-tree-group-btn .chevron {
transform: rotate(90deg);
}
.knowledge-tree-group-items {
display: none;
}
.knowledge-tree-group.open > .knowledge-tree-group-items {
display: block;
}
.knowledge-tree-file {
display: flex;
align-items: center;
gap: 6px;
padding: 5px 8px 5px 24px;
border-radius: 6px;
font-size: 12px;
color: #64748b;
cursor: pointer;
border: none;
background: none;
width: 100%;
text-align: left;
transition: background 0.15s, color 0.15s;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.knowledge-tree-file:hover {
background: rgba(0,0,0,0.04);
color: #334155;
}
.knowledge-tree-file.active {
background: #EDFDF3;
color: #228547;
}
.dark .knowledge-tree-file:hover {
background: rgba(255,255,255,0.06);
color: #e2e8f0;
}
.dark .knowledge-tree-file.active {
background: rgba(74, 190, 110, 0.1);
color: #4ABE6E;
}
/* Graph legend */
.knowledge-graph-legend {
position: absolute;
top: 12px;
right: 12px;
display: flex;
flex-wrap: wrap;
gap: 8px;
font-size: 11px;
color: #64748b;
z-index: 10;
}
.knowledge-graph-legend-item {
display: flex;
align-items: center;
gap: 4px;
}
.knowledge-graph-legend-dot {
width: 8px;
height: 8px;
border-radius: 50%;
}
/* Graph tooltip */
.knowledge-graph-tooltip {
position: absolute;
padding: 6px 10px;
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 8px;
font-size: 12px;
color: #334155;
box-shadow: 0 4px 12px rgba(0,0,0,0.08);
pointer-events: none;
opacity: 0;
transition: opacity 0.15s;
z-index: 20;
}
.dark .knowledge-graph-tooltip {
background: #1A1A1A;
border-color: rgba(255,255,255,0.1);
color: #e2e8f0;
}
/* Config field tooltip */
.cfg-tip {
position: relative;
display: inline-flex;
align-items: center;
color: #94a3b8;
cursor: help;
font-size: 12px;
}
.cfg-tip:hover { color: #64748b; }
.dark .cfg-tip:hover { color: #cbd5e1; }
/* Floating tooltip portal — appended to <body> by JS so it isn't clipped
by overflow:hidden ancestors. */
.cfg-tip-floating {
position: fixed;
padding: 6px 10px;
border-radius: 8px;
font-size: 12px;
font-weight: 400;
line-height: 1.4;
white-space: nowrap;
background: #1e293b;
color: #e2e8f0;
box-shadow: 0 4px 12px rgba(0,0,0,0.15);
opacity: 0;
pointer-events: none;
transition: opacity 0.15s;
z-index: 9999;
}
.dark .cfg-tip-floating {
background: #334155;
color: #f1f5f9;
}
.cfg-tip-floating.show {
opacity: 1;
}
/* Example cards: equal height via flex stretch + fixed 2-line description area */
.example-card {
display: flex;
flex-direction: column;
}
.example-card > p {
flex: 1;
display: -webkit-box;
-webkit-line-clamp: 2;
-webkit-box-orient: vertical;
overflow: hidden;
min-height: 2.5em; /* ~2 lines at text-sm leading-relaxed */
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -26,6 +26,7 @@ from channel.wecom_bot.wecom_bot_message import WecomBotMessage
from common.expired_dict import ExpiredDict
from common.log import logger
from common.singleton import singleton
from common.ws_client_compat import websocket_app_run_forever
from config import conf
WECOM_WS_URL = "wss://openws.work.weixin.qq.com"
@@ -119,7 +120,7 @@ class WecomBotChannel(ChatChannel):
def run_forever():
try:
self._ws.run_forever(ping_interval=0, reconnect=0)
websocket_app_run_forever(self._ws, ping_interval=0, reconnect=0)
except (SystemExit, KeyboardInterrupt):
logger.info("[WecomBot] WebSocket thread interrupted")
except Exception as e:
@@ -329,28 +330,42 @@ class WecomBotChannel(ChatChannel):
All intermediate segments (thinking before tool calls) and the final answer
are accumulated into a single stream message, separated by '---'.
Throttles push to at most once per 100ms to avoid WebSocket congestion.
"""
stream_id = uuid.uuid4().hex[:16]
self._stream_states[req_id] = {
"stream_id": stream_id,
"committed": "", # finalized content from previous segments
"current": "", # current segment being streamed
"committed": "",
"current": "",
"last_push_time": 0,
"last_push_len": 0,
}
def _push_stream(state: dict):
"""Push current stream content to wecom."""
self._ws_send({
"cmd": "aibot_respond_msg",
"headers": {"req_id": req_id},
"body": {
"msgtype": "stream",
"stream": {
"id": state["stream_id"],
"finish": False,
"content": state["committed"] + state["current"],
def _push_stream(state: dict, force: bool = False):
"""Push current stream content to wecom (throttled unless forced)."""
now = time.time()
if not force and now - state["last_push_time"] < 0.1:
return
content = state["committed"] + state["current"]
if len(content) == state["last_push_len"]:
return
state["last_push_time"] = now
state["last_push_len"] = len(content)
try:
self._ws_send({
"cmd": "aibot_respond_msg",
"headers": {"req_id": req_id},
"body": {
"msgtype": "stream",
"stream": {
"id": state["stream_id"],
"finish": False,
"content": content,
},
},
},
})
})
except Exception as e:
logger.warning(f"[WecomBot] Stream push failed: {e}")
def on_event(event: dict):
event_type = event.get("type")
@@ -377,6 +392,7 @@ class WecomBotChannel(ChatChannel):
else:
state["committed"] += state["current"]
state["current"] = ""
_push_stream(state, force=True)
return on_event
@@ -451,11 +467,16 @@ class WecomBotChannel(ChatChannel):
if req_id:
state = self._stream_states.pop(req_id, None)
if state:
final_content = state["committed"]
final_content = state["committed"] if state["committed"] else content
stream_id = state["stream_id"]
else:
final_content = content
stream_id = uuid.uuid4().hex[:16]
# Brief pause so the server finishes processing the last intermediate chunk
# before receiving the finish packet
time.sleep(0.15)
self._ws_send({
"cmd": "aibot_respond_msg",
"headers": {"req_id": req_id},

View File

View File

@@ -0,0 +1,412 @@
"""
Weixin HTTP JSON API client.
Implements the ilink bot protocol:
- getUpdates (long-poll)
- sendMessage
- getUploadUrl
- getConfig
- sendTyping
- QR login (get_bot_qrcode / get_qrcode_status)
CDN media upload with AES-128-ECB encryption.
"""
import base64
import hashlib
import os
import random
import struct
import time
import uuid
import requests
from common.log import logger
DEFAULT_BASE_URL = "https://ilinkai.weixin.qq.com"
CDN_BASE_URL = "https://novac2c.cdn.weixin.qq.com/c2c"
DEFAULT_LONG_POLL_TIMEOUT = 35
DEFAULT_API_TIMEOUT = 15
QR_POLL_TIMEOUT = 35
BOT_TYPE = "3"
def _random_wechat_uin() -> str:
val = random.randint(0, 0xFFFFFFFF)
return base64.b64encode(str(val).encode("utf-8")).decode("utf-8")
CHANNEL_VERSION = "2.0.0"
# iLink-App-ClientVersion: uint32 encoded as major<<16 | minor<<8 | patch
# 2.0.0 → 0x00020000 = 131072
CLIENT_VERSION = "131072"
def _build_headers(token: str = "") -> dict:
headers = {
"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _random_wechat_uin(),
"iLink-App-Id": "bot",
"iLink-App-ClientVersion": CLIENT_VERSION,
}
if token:
headers["Authorization"] = f"Bearer {token}"
return headers
def _ensure_trailing_slash(url: str) -> str:
return url if url.endswith("/") else url + "/"
class WeixinApi:
"""Stateless HTTP client for the Weixin ilink bot API."""
def __init__(self, base_url: str = DEFAULT_BASE_URL, token: str = "",
cdn_base_url: str = CDN_BASE_URL):
self.base_url = base_url
self.token = token
self.cdn_base_url = cdn_base_url
def _post(self, endpoint: str, body: dict, timeout: int = DEFAULT_API_TIMEOUT) -> dict:
url = _ensure_trailing_slash(self.base_url) + endpoint
headers = _build_headers(self.token)
body.setdefault("base_info", {}).setdefault("channel_version", CHANNEL_VERSION)
try:
resp = requests.post(url, json=body, headers=headers, timeout=timeout)
resp.raise_for_status()
return resp.json()
except requests.exceptions.Timeout:
logger.debug(f"[Weixin] API timeout: {endpoint}")
return {"ret": 0, "msgs": []}
except Exception as e:
logger.error(f"[Weixin] API error {endpoint}: {e}")
raise
# ── getUpdates (long-poll) ─────────────────────────────────────────
def get_updates(self, get_updates_buf: str = "", timeout: int = DEFAULT_LONG_POLL_TIMEOUT) -> dict:
return self._post("ilink/bot/getupdates", {
"get_updates_buf": get_updates_buf,
}, timeout=timeout + 5)
# ── sendMessage ────────────────────────────────────────────────────
def send_text(self, to: str, text: str, context_token: str) -> dict:
return self._post("ilink/bot/sendmessage", {
"msg": {
"from_user_id": "",
"to_user_id": to,
"client_id": uuid.uuid4().hex[:16],
"message_type": 2, # BOT
"message_state": 2, # FINISH
"item_list": [{"type": 1, "text_item": {"text": text}}],
"context_token": context_token,
}
})
def send_image_item(self, to: str, context_token: str,
encrypt_query_param: str, aes_key_b64: str,
ciphertext_size: int, text: str = "") -> dict:
items = []
if text:
items.append({"type": 1, "text_item": {"text": text}})
items.append({
"type": 2,
"image_item": {
"media": {
"encrypt_query_param": encrypt_query_param,
"aes_key": aes_key_b64,
"encrypt_type": 1,
},
"mid_size": ciphertext_size,
}
})
return self._send_items(to, context_token, items)
def send_file_item(self, to: str, context_token: str,
encrypt_query_param: str, aes_key_b64: str,
file_name: str, file_size: int, text: str = "") -> dict:
items = []
if text:
items.append({"type": 1, "text_item": {"text": text}})
items.append({
"type": 4,
"file_item": {
"media": {
"encrypt_query_param": encrypt_query_param,
"aes_key": aes_key_b64,
"encrypt_type": 1,
},
"file_name": file_name,
"len": str(file_size),
}
})
return self._send_items(to, context_token, items)
def send_video_item(self, to: str, context_token: str,
encrypt_query_param: str, aes_key_b64: str,
ciphertext_size: int, text: str = "") -> dict:
items = []
if text:
items.append({"type": 1, "text_item": {"text": text}})
items.append({
"type": 5,
"video_item": {
"media": {
"encrypt_query_param": encrypt_query_param,
"aes_key": aes_key_b64,
"encrypt_type": 1,
},
"video_size": ciphertext_size,
}
})
return self._send_items(to, context_token, items)
def _send_items(self, to: str, context_token: str, items: list) -> dict:
return self._post("ilink/bot/sendmessage", {
"msg": {
"from_user_id": "",
"to_user_id": to,
"client_id": uuid.uuid4().hex[:16],
"message_type": 2,
"message_state": 2,
"item_list": items,
"context_token": context_token,
}
})
# ── getUploadUrl ───────────────────────────────────────────────────
def get_upload_url(self, filekey: str, media_type: int, to_user_id: str,
rawsize: int, rawfilemd5: str, filesize: int,
aeskey: str) -> dict:
return self._post("ilink/bot/getuploadurl", {
"filekey": filekey,
"media_type": media_type,
"to_user_id": to_user_id,
"rawsize": rawsize,
"rawfilemd5": rawfilemd5,
"filesize": filesize,
"aeskey": aeskey,
"no_need_thumb": True,
})
# ── getConfig / sendTyping ─────────────────────────────────────────
def get_config(self, user_id: str, context_token: str = "") -> dict:
return self._post("ilink/bot/getconfig", {
"ilink_user_id": user_id,
"context_token": context_token,
}, timeout=10)
def send_typing(self, user_id: str, typing_ticket: str, status: int = 1) -> dict:
return self._post("ilink/bot/sendtyping", {
"ilink_user_id": user_id,
"typing_ticket": typing_ticket,
"status": status,
}, timeout=10)
# ── QR Login ───────────────────────────────────────────────────────
def fetch_qr_code(self) -> dict:
url = _ensure_trailing_slash(self.base_url) + f"ilink/bot/get_bot_qrcode?bot_type={BOT_TYPE}"
resp = requests.get(url, timeout=15)
resp.raise_for_status()
return resp.json()
def poll_qr_status(self, qrcode: str, timeout: int = QR_POLL_TIMEOUT) -> dict:
url = (_ensure_trailing_slash(self.base_url) +
f"ilink/bot/get_qrcode_status?qrcode={requests.utils.quote(qrcode)}")
headers = {
"iLink-App-Id": "bot",
"iLink-App-ClientVersion": CLIENT_VERSION,
}
try:
resp = requests.get(url, headers=headers, timeout=timeout)
resp.raise_for_status()
return resp.json()
except requests.exceptions.Timeout:
return {"status": "wait"}
# ── AES-128-ECB helpers ─────────────────────────────────────────────
def _aes_ecb_encrypt(data: bytes, key: bytes) -> bytes:
from Crypto.Cipher import AES
pad_len = 16 - (len(data) % 16)
padded = data + bytes([pad_len] * pad_len)
cipher = AES.new(key, AES.MODE_ECB)
return cipher.encrypt(padded)
def _aes_ecb_decrypt(data: bytes, key: bytes) -> bytes:
from Crypto.Cipher import AES
cipher = AES.new(key, AES.MODE_ECB)
decrypted = cipher.decrypt(data)
pad_len = decrypted[-1]
if pad_len > 16:
return decrypted
return decrypted[:-pad_len]
def _file_md5(file_path: str) -> str:
h = hashlib.md5()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(8192), b""):
h.update(chunk)
return h.hexdigest()
def _md5_bytes(data: bytes) -> str:
return hashlib.md5(data).hexdigest()
def _aes_ecb_padded_size(plaintext_size: int) -> int:
"""PKCS7 padded size for AES-128-ECB."""
return ((plaintext_size + 1 + 15) // 16) * 16
UPLOAD_MAX_RETRIES = 3
def upload_media_to_cdn(api: WeixinApi, file_path: str, to_user_id: str,
media_type: int) -> dict:
"""
Upload a local file to the Weixin CDN (matching official plugin protocol).
Args:
api: WeixinApi instance
file_path: local file path
to_user_id: target user id
media_type: 1=IMAGE, 2=VIDEO, 3=FILE
Returns:
dict with keys: encrypt_query_param, aes_key_b64, ciphertext_size, raw_size
"""
aes_key = os.urandom(16)
aes_key_hex = aes_key.hex()
filekey = uuid.uuid4().hex
with open(file_path, "rb") as f:
raw_data = f.read()
raw_size = len(raw_data)
raw_md5 = _md5_bytes(raw_data)
cipher_size = _aes_ecb_padded_size(raw_size)
encrypted = _aes_ecb_encrypt(raw_data, aes_key)
from urllib.parse import quote
download_param = None
last_error = None
for attempt in range(1, UPLOAD_MAX_RETRIES + 1):
try:
if attempt > 1:
filekey = uuid.uuid4().hex
resp = api.get_upload_url(
filekey=filekey,
media_type=media_type,
to_user_id=to_user_id,
rawsize=raw_size,
rawfilemd5=raw_md5,
filesize=cipher_size,
aeskey=aes_key_hex,
)
# API may return either upload_full_url (new) or upload_param (legacy)
upload_full_url = resp.get("upload_full_url", "")
upload_param = resp.get("upload_param", "")
if upload_full_url:
cdn_url = upload_full_url
elif upload_param:
cdn_url = (f"{api.cdn_base_url}/upload"
f"?encrypted_query_param={quote(upload_param)}"
f"&filekey={quote(filekey)}")
else:
raise RuntimeError(f"[Weixin] getUploadUrl returned neither upload_full_url nor upload_param: {resp}")
cdn_resp = requests.post(cdn_url, data=encrypted, headers={
"Content-Type": "application/octet-stream",
"Content-Length": str(len(encrypted)),
}, timeout=120)
if 400 <= cdn_resp.status_code < 500:
err_msg = cdn_resp.headers.get("x-error-message", cdn_resp.text[:200])
raise RuntimeError(f"CDN client error {cdn_resp.status_code}: {err_msg}")
cdn_resp.raise_for_status()
download_param = cdn_resp.headers.get("x-encrypted-param", "")
if not download_param:
raise RuntimeError("CDN response missing x-encrypted-param header")
logger.debug(f"[Weixin] CDN upload success attempt={attempt} filekey={filekey}")
break
except Exception as e:
last_error = e
if "client error" in str(e):
raise
if attempt < UPLOAD_MAX_RETRIES:
backoff = 2 ** attempt
logger.warning(f"[Weixin] CDN upload attempt {attempt} failed, retrying in {backoff}s: {e}")
time.sleep(backoff)
else:
logger.error(f"[Weixin] CDN upload failed after {UPLOAD_MAX_RETRIES} attempts: {e}")
if not download_param:
raise last_error or RuntimeError("CDN upload failed")
aes_key_b64 = base64.b64encode(aes_key_hex.encode("utf-8")).decode("utf-8")
return {
"encrypt_query_param": download_param,
"aes_key_b64": aes_key_b64,
"ciphertext_size": cipher_size,
"raw_size": raw_size,
}
def download_media_from_cdn(cdn_base_url: str, encrypt_query_param: str,
aes_key: str, save_path: str) -> str:
"""
Download and decrypt a media file from Weixin CDN.
Args:
cdn_base_url: CDN base URL
encrypt_query_param: encrypted query parameter from message
aes_key: hex or base64 encoded AES key
save_path: path to save decrypted file
Returns:
save_path on success
"""
from urllib.parse import quote
url = f"{cdn_base_url}/download?encrypted_query_param={quote(encrypt_query_param)}"
resp = requests.get(url, timeout=60)
resp.raise_for_status()
# Determine key format:
# 1) 32-char hex string → 16 raw bytes
# 2) base64 string → decode → if 32 bytes, treat as hex-encoded → 16 raw bytes
# 3) base64 string → decode → 16 raw bytes directly
try:
key_bytes = bytes.fromhex(aes_key)
if len(key_bytes) != 16:
raise ValueError()
except (ValueError, TypeError):
decoded = base64.b64decode(aes_key)
if len(decoded) == 32:
try:
key_bytes = bytes.fromhex(decoded.decode("ascii"))
except (ValueError, UnicodeDecodeError):
raise ValueError(f"Invalid AES key: 32 bytes but not valid hex")
elif len(decoded) == 16:
key_bytes = decoded
else:
raise ValueError(f"Invalid AES key length after base64 decode: {len(decoded)}")
decrypted = _aes_ecb_decrypt(resp.content, key_bytes)
os.makedirs(os.path.dirname(save_path), exist_ok=True)
with open(save_path, "wb") as f:
f.write(decrypted)
return save_path

View File

@@ -0,0 +1,637 @@
"""
Weixin channel implementation.
Uses HTTP long-poll (getUpdates) to receive messages and sendMessage to reply.
Login via QR code scan through the ilink bot API.
"""
import json
import os
import threading
import time
import uuid
import requests
from bridge.context import Context, ContextType
from bridge.reply import Reply, ReplyType
from channel.chat_channel import ChatChannel, check_prefix
from channel.weixin.weixin_api import (
WeixinApi, upload_media_to_cdn,
DEFAULT_BASE_URL, CDN_BASE_URL,
)
from channel.weixin.weixin_message import WeixinMessage
from common.expired_dict import ExpiredDict
from common.log import logger
from common.singleton import singleton
from config import conf
MAX_CONSECUTIVE_FAILURES = 3
BACKOFF_DELAY = 30
RETRY_DELAY = 2
SESSION_EXPIRED_ERRCODE = -14
TEXT_CHUNK_LIMIT = 4000
QR_LOGIN_TIMEOUT_S = 480
QR_MAX_REFRESHES = 10
def _load_credentials(cred_path: str) -> dict:
"""Load saved credentials from JSON file."""
try:
if os.path.exists(cred_path):
with open(cred_path, "r") as f:
return json.load(f)
except Exception as e:
logger.warning(f"[Weixin] Failed to load credentials: {e}")
return {}
def _save_credentials(cred_path: str, data: dict):
"""Save credentials to JSON file."""
os.makedirs(os.path.dirname(cred_path), exist_ok=True)
with open(cred_path, "w") as f:
json.dump(data, f, indent=2)
try:
os.chmod(cred_path, 0o600)
except Exception:
pass
@singleton
class WeixinChannel(ChatChannel):
LOGIN_STATUS_IDLE = "idle"
LOGIN_STATUS_WAITING = "waiting_scan"
LOGIN_STATUS_SCANNED = "scanned"
LOGIN_STATUS_OK = "logged_in"
def __init__(self):
super().__init__()
self.api = None
self._stop_event = threading.Event()
self._poll_thread = None
self._context_tokens = {} # user_id -> context_token
self._received_msgs = ExpiredDict(60 * 60 * 7.1)
self._get_updates_buf = ""
self._credentials_path = ""
self.login_status = self.LOGIN_STATUS_IDLE
self._current_qr_url = ""
conf()["single_chat_prefix"] = [""]
# ── Lifecycle ──────────────────────────────────────────────────────
def startup(self):
self._stop_event.clear()
base_url = conf().get("weixin_base_url", DEFAULT_BASE_URL)
cdn_base_url = conf().get("weixin_cdn_base_url", CDN_BASE_URL)
token = conf().get("weixin_token", "")
self._credentials_path = os.path.expanduser(
conf().get("weixin_credentials_path", "~/.weixin_cow_credentials.json")
)
if not token:
creds = _load_credentials(self._credentials_path)
token = creds.get("token", "")
if creds.get("base_url"):
base_url = creds["base_url"]
if not token:
token, base_url = self._login_with_retry(base_url)
if not token:
return
self.api = WeixinApi(base_url=base_url, token=token, cdn_base_url=cdn_base_url)
self.login_status = self.LOGIN_STATUS_OK
logger.info(f"[Weixin] 微信通道已启动,凭证保存在 {self._credentials_path}"
f"如需重新扫码登录请删除该文件后重启")
self.report_startup_success()
self._poll_loop()
def _login_with_retry(self, base_url: str) -> tuple:
"""Attempt QR login, then wait for stop if failed.
Returns (token, base_url) on success, or ("", "") if stopped."""
logger.info("[Weixin] No token found, starting QR login...")
self.login_status = self.LOGIN_STATUS_WAITING
login_result = self._qr_login(base_url)
if login_result:
return login_result["token"], login_result.get("base_url", base_url)
self.login_status = self.LOGIN_STATUS_IDLE
if not self._stop_event.is_set():
logger.info("[Weixin] QR login timed out, waiting for stop or reconnect...")
print(" 二维码登录超时,请通过控制台重新接入\n")
self._stop_event.wait()
logger.info("[Weixin] Login cancelled by stop event")
return "", ""
def stop(self):
logger.info("[Weixin] stop() called")
self._stop_event.set()
def _relogin(self) -> bool:
"""Re-login after session expiry. Returns True on success."""
base_url = self.api.base_url if self.api else DEFAULT_BASE_URL
if os.path.exists(self._credentials_path):
try:
os.remove(self._credentials_path)
except Exception:
pass
self.login_status = self.LOGIN_STATUS_WAITING
result = self._qr_login(base_url)
if not result:
self.login_status = self.LOGIN_STATUS_IDLE
return False
self.api = WeixinApi(
base_url=result.get("base_url", base_url),
token=result["token"],
cdn_base_url=self.api.cdn_base_url if self.api else CDN_BASE_URL,
)
self.login_status = self.LOGIN_STATUS_OK
self._context_tokens.clear()
return True
# ── QR Login ───────────────────────────────────────────────────────
@staticmethod
def _print_qr(qrcode_url: str):
"""Print QR code to terminal for scanning."""
print("\n" + "=" * 60)
print(" 请使用微信扫描二维码登录 (二维码约2分钟后过期)")
print("=" * 60)
try:
import qrcode as qr_lib
import io
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
qr.add_data(qrcode_url)
qr.make(fit=True)
buf = io.StringIO()
qr.print_ascii(out=buf, invert=True)
try:
print(buf.getvalue())
except UnicodeEncodeError:
# Windows GBK terminals cannot render Unicode block characters
print(f"\n (终端不支持显示二维码,请使用链接扫码)")
print(f" 二维码链接: {qrcode_url}\n")
except ImportError:
print(f"\n 二维码链接: {qrcode_url}")
print(" (安装 'qrcode' 包可在终端显示二维码)\n")
def _notify_cloud_qrcode(self, qrcode_url: str):
"""Send QR code URL to cloud console when running in cloud mode."""
if not self.cloud_mode:
return
try:
from common import cloud_client
client = getattr(cloud_client, "chat_client", None)
if client and getattr(client, "client_id", None):
client.send_channel_qrcode("weixin", qrcode_url)
except Exception as e:
logger.warning(f"[Weixin] Failed to notify cloud QR code: {e}")
def _notify_cloud_connected(self):
"""Send connected status to cloud console when login succeeds."""
if not self.cloud_mode:
return
try:
from common import cloud_client
client = getattr(cloud_client, "chat_client", None)
if client and getattr(client, "client_id", None):
client.send_channel_status("weixin", "connected")
except Exception as e:
logger.warning(f"[Weixin] Failed to notify cloud connected: {e}")
def _qr_login(self, base_url: str) -> dict:
"""Perform interactive QR code login. Returns dict with token/base_url or empty dict."""
api = WeixinApi(base_url=base_url)
try:
qr_resp = api.fetch_qr_code()
except Exception as e:
logger.error(f"[Weixin] Failed to fetch QR code: {e}")
return {}
qrcode = qr_resp.get("qrcode", "")
qrcode_url = qr_resp.get("qrcode_img_content", "")
if not qrcode:
logger.error("[Weixin] No QR code returned from server")
return {}
self._current_qr_url = qrcode_url
logger.info(f"[Weixin] 微信二维码链接: {qrcode_url}")
self._print_qr(qrcode_url)
self._notify_cloud_qrcode(qrcode_url)
print(" 等待扫码...\n")
scanned_printed = False
refresh_count = 0
deadline = time.time() + QR_LOGIN_TIMEOUT_S
while not self._stop_event.is_set():
if time.time() >= deadline:
logger.warning(f"[Weixin] QR login timed out after {QR_LOGIN_TIMEOUT_S}s")
print(f"\n 二维码登录超时({QR_LOGIN_TIMEOUT_S}s请重启后重试")
break
try:
status_resp = api.poll_qr_status(qrcode)
except Exception as e:
logger.error(f"[Weixin] QR status poll error: {e}")
return {}
status = status_resp.get("status", "wait")
if status == "wait":
pass
elif status == "scaned":
self.login_status = self.LOGIN_STATUS_SCANNED
if not scanned_printed:
print(" 已扫码,请在手机上确认...")
scanned_printed = True
elif status == "expired":
refresh_count += 1
if refresh_count >= QR_MAX_REFRESHES:
logger.warning(f"[Weixin] QR code refreshed {QR_MAX_REFRESHES} times, giving up")
print(f"\n 二维码已刷新 {QR_MAX_REFRESHES} 次仍未扫码,请重启后重试")
break
print(f" 二维码已过期,正在刷新({refresh_count}/{QR_MAX_REFRESHES}...")
try:
qr_resp = api.fetch_qr_code()
qrcode = qr_resp.get("qrcode", "")
qrcode_url = qr_resp.get("qrcode_img_content", "")
scanned_printed = False
self._current_qr_url = qrcode_url
logger.info(f"[Weixin] 微信二维码链接 ({refresh_count}/{QR_MAX_REFRESHES}): {qrcode_url}")
self._print_qr(qrcode_url)
self._notify_cloud_qrcode(qrcode_url)
except Exception as e:
logger.error(f"[Weixin] QR refresh failed: {e}")
return {}
elif status == "confirmed":
bot_token = status_resp.get("bot_token", "")
bot_id = status_resp.get("ilink_bot_id", "")
result_base_url = status_resp.get("baseurl", base_url)
user_id = status_resp.get("ilink_user_id", "")
if not bot_token or not bot_id:
logger.error("[Weixin] Login confirmed but missing token/bot_id")
return {}
self._current_qr_url = ""
print(f"\n ✅ 微信登录成功bot_id={bot_id}")
logger.info(f"[Weixin] Login confirmed: bot_id={bot_id}")
self._notify_cloud_connected()
creds = {
"token": bot_token,
"base_url": result_base_url,
"bot_id": bot_id,
"user_id": user_id,
}
_save_credentials(self._credentials_path, creds)
logger.info(f"[Weixin] Credentials saved to {self._credentials_path}")
return {"token": bot_token, "base_url": result_base_url}
self._stop_event.wait(1)
self._current_qr_url = ""
if self._stop_event.is_set():
logger.info("[Weixin] QR login cancelled by stop event")
return {}
# ── Long-poll loop ─────────────────────────────────────────────────
def _poll_loop(self):
"""Main long-poll loop: getUpdates -> parse -> produce."""
logger.info("[Weixin] Starting long-poll loop")
consecutive_failures = 0
while not self._stop_event.is_set():
try:
resp = self.api.get_updates(self._get_updates_buf)
ret = resp.get("ret", 0)
errcode = resp.get("errcode", 0)
is_error = (ret != 0) or (errcode != 0)
if is_error:
if errcode == SESSION_EXPIRED_ERRCODE or ret == SESSION_EXPIRED_ERRCODE:
logger.error("[Weixin] Session expired (errcode -14), starting re-login...")
if self._relogin():
logger.info("[Weixin] Re-login successful, resuming long-poll")
self._get_updates_buf = ""
consecutive_failures = 0
continue
else:
logger.error("[Weixin] Re-login failed, will retry in 5 minutes")
self._stop_event.wait(300)
continue
consecutive_failures += 1
errmsg = resp.get("errmsg", "")
logger.error(f"[Weixin] getUpdates error: ret={ret} errcode={errcode} "
f"errmsg={errmsg} ({consecutive_failures}/{MAX_CONSECUTIVE_FAILURES})")
if consecutive_failures >= MAX_CONSECUTIVE_FAILURES:
consecutive_failures = 0
self._stop_event.wait(BACKOFF_DELAY)
else:
self._stop_event.wait(RETRY_DELAY)
continue
consecutive_failures = 0
# Update sync cursor
new_buf = resp.get("get_updates_buf", "")
if new_buf:
self._get_updates_buf = new_buf
# Process messages
msgs = resp.get("msgs", [])
for raw_msg in msgs:
try:
self._process_message(raw_msg)
except Exception as e:
logger.error(f"[Weixin] Failed to process message: {e}", exc_info=True)
except Exception as e:
if self._stop_event.is_set():
break
consecutive_failures += 1
logger.error(f"[Weixin] getUpdates exception: {e} "
f"({consecutive_failures}/{MAX_CONSECUTIVE_FAILURES})")
if consecutive_failures >= MAX_CONSECUTIVE_FAILURES:
consecutive_failures = 0
self._stop_event.wait(BACKOFF_DELAY)
else:
self._stop_event.wait(RETRY_DELAY)
logger.info("[Weixin] Long-poll loop ended")
def _process_message(self, raw_msg: dict):
"""Parse a single inbound message and produce to the handling queue."""
msg_type = raw_msg.get("message_type", 0)
if msg_type != 1: # Only process USER messages (type=1)
return
msg_id = str(raw_msg.get("message_id", raw_msg.get("seq", "")))
if self._received_msgs.get(msg_id):
return
self._received_msgs[msg_id] = True
from_user = raw_msg.get("from_user_id", "")
context_token = raw_msg.get("context_token", "")
if context_token and from_user:
self._context_tokens[from_user] = context_token
cdn_base_url = self.api.cdn_base_url if self.api else CDN_BASE_URL
try:
wx_msg = WeixinMessage(raw_msg, cdn_base_url=cdn_base_url)
except Exception as e:
logger.error(f"[Weixin] Failed to parse WeixinMessage: {e}", exc_info=True)
return
logger.info(f"[Weixin] Received: from={from_user} ctype={wx_msg.ctype} "
f"content={str(wx_msg.content)[:50]}")
# File cache logic
from channel.file_cache import get_file_cache
file_cache = get_file_cache()
session_id = from_user
if wx_msg.ctype == ContextType.IMAGE:
if hasattr(wx_msg, "image_path") and wx_msg.image_path:
file_cache.add(session_id, wx_msg.image_path, file_type="image")
logger.info(f"[Weixin] Image cached for session {session_id}")
return
if wx_msg.ctype == ContextType.FILE:
wx_msg.prepare()
file_cache.add(session_id, wx_msg.content, file_type="file")
logger.info(f"[Weixin] File cached for session {session_id}: {wx_msg.content}")
return
if wx_msg.ctype == ContextType.TEXT:
cached_files = file_cache.get(session_id)
if cached_files:
refs = []
for fi in cached_files:
ftype, fpath = fi["type"], fi["path"]
if ftype == "image":
refs.append(f"[图片: {fpath}]")
elif ftype == "video":
refs.append(f"[视频: {fpath}]")
else:
refs.append(f"[文件: {fpath}]")
wx_msg.content = wx_msg.content + "\n" + "\n".join(refs)
file_cache.clear(session_id)
context = self._compose_context(
wx_msg.ctype,
wx_msg.content,
isgroup=False,
msg=wx_msg,
no_need_at=True,
)
if context:
self.produce(context)
# ── _compose_context ───────────────────────────────────────────────
def _compose_context(self, ctype: ContextType, content, **kwargs):
context = Context(ctype, content)
context.kwargs = kwargs
if "channel_type" not in context:
context["channel_type"] = self.channel_type
if "origin_ctype" not in context:
context["origin_ctype"] = ctype
cmsg = context["msg"]
context["session_id"] = cmsg.from_user_id
context["receiver"] = cmsg.other_user_id
if ctype == ContextType.TEXT:
img_match_prefix = check_prefix(content, conf().get("image_create_prefix"))
if img_match_prefix:
content = content.replace(img_match_prefix, "", 1)
context.type = ContextType.IMAGE_CREATE
else:
context.type = ContextType.TEXT
context.content = content.strip()
return context
# ── Send reply ─────────────────────────────────────────────────────
def send(self, reply: Reply, context: Context):
receiver = context.get("receiver", "")
msg = context.get("msg")
context_token = self._get_context_token(receiver, msg)
if not context_token:
logger.error(f"[Weixin] No context_token for receiver={receiver}, cannot send")
return
if reply.type == ReplyType.TEXT:
self._send_text(reply.content, receiver, context_token)
elif reply.type in (ReplyType.IMAGE_URL, ReplyType.IMAGE):
self._send_image(reply.content, receiver, context_token)
elif reply.type == ReplyType.FILE:
self._send_file(reply.content, receiver, context_token)
elif reply.type in (ReplyType.VIDEO, ReplyType.VIDEO_URL):
self._send_video(reply.content, receiver, context_token)
else:
logger.warning(f"[Weixin] Unsupported reply type: {reply.type}, fallback to text")
self._send_text(str(reply.content), receiver, context_token)
def _get_context_token(self, receiver: str, msg=None) -> str:
"""Get the context_token for a receiver, required for all sends."""
if msg and hasattr(msg, "context_token") and msg.context_token:
return msg.context_token
return self._context_tokens.get(receiver, "")
def _send_text(self, text: str, receiver: str, context_token: str):
if len(text) <= TEXT_CHUNK_LIMIT:
try:
self.api.send_text(receiver, text, context_token)
logger.debug(f"[Weixin] Text sent to {receiver}, len={len(text)}")
except Exception as e:
logger.error(f"[Weixin] Failed to send text: {e}")
return
chunks = self._split_text(text, TEXT_CHUNK_LIMIT)
for i, chunk in enumerate(chunks):
try:
self.api.send_text(receiver, chunk, context_token)
logger.debug(f"[Weixin] Text chunk {i+1}/{len(chunks)} sent to {receiver}, len={len(chunk)}")
except Exception as e:
logger.error(f"[Weixin] Failed to send text chunk {i+1}/{len(chunks)}: {e}")
break
if i < len(chunks) - 1:
time.sleep(0.5)
@staticmethod
def _split_text(text: str, limit: int) -> list:
"""Split text into chunks, preferring to break at paragraph or line boundaries."""
if len(text) <= limit:
return [text]
chunks = []
while text:
if len(text) <= limit:
chunks.append(text)
break
cut = text.rfind("\n\n", 0, limit)
if cut <= 0:
cut = text.rfind("\n", 0, limit)
if cut <= 0:
cut = limit
chunks.append(text[:cut])
text = text[cut:].lstrip("\n")
return chunks
def _send_image(self, img_path_or_url: str, receiver: str, context_token: str):
local_path = self._resolve_media_path(img_path_or_url)
if not local_path:
self._send_text("[Image send failed: file not found]", receiver, context_token)
return
try:
result = upload_media_to_cdn(self.api, local_path, receiver, media_type=1)
self.api.send_image_item(
to=receiver,
context_token=context_token,
encrypt_query_param=result["encrypt_query_param"],
aes_key_b64=result["aes_key_b64"],
ciphertext_size=result["ciphertext_size"],
)
logger.info(f"[Weixin] Image sent to {receiver}")
except Exception as e:
logger.error(f"[Weixin] Image send failed: {e}")
self._send_text("[Image send failed]", receiver, context_token)
def _send_file(self, file_path_or_url: str, receiver: str, context_token: str):
local_path = self._resolve_media_path(file_path_or_url)
if not local_path:
self._send_text("[File send failed: file not found]", receiver, context_token)
return
try:
result = upload_media_to_cdn(self.api, local_path, receiver, media_type=3)
self.api.send_file_item(
to=receiver,
context_token=context_token,
encrypt_query_param=result["encrypt_query_param"],
aes_key_b64=result["aes_key_b64"],
file_name=os.path.basename(local_path),
file_size=result["raw_size"],
)
logger.info(f"[Weixin] File sent to {receiver}")
except Exception as e:
logger.error(f"[Weixin] File send failed: {e}")
self._send_text("[File send failed]", receiver, context_token)
def _send_video(self, video_path_or_url: str, receiver: str, context_token: str):
local_path = self._resolve_media_path(video_path_or_url)
if not local_path:
self._send_text("[Video send failed: file not found]", receiver, context_token)
return
try:
result = upload_media_to_cdn(self.api, local_path, receiver, media_type=2)
self.api.send_video_item(
to=receiver,
context_token=context_token,
encrypt_query_param=result["encrypt_query_param"],
aes_key_b64=result["aes_key_b64"],
ciphertext_size=result["ciphertext_size"],
)
logger.info(f"[Weixin] Video sent to {receiver}")
except Exception as e:
logger.error(f"[Weixin] Video send failed: {e}")
self._send_text("[Video send failed]", receiver, context_token)
@staticmethod
def _resolve_media_path(path_or_url: str) -> str:
"""Resolve a file path or URL to a local file path. Downloads if needed."""
if not path_or_url:
return ""
local_path = path_or_url
if local_path.startswith("file://"):
local_path = local_path[7:]
if local_path.startswith(("http://", "https://")):
try:
resp = requests.get(local_path, timeout=60)
resp.raise_for_status()
ct = resp.headers.get("Content-Type", "")
ext = ".bin"
if "jpeg" in ct or "jpg" in ct:
ext = ".jpg"
elif "png" in ct:
ext = ".png"
elif "gif" in ct:
ext = ".gif"
elif "webp" in ct:
ext = ".webp"
elif "mp4" in ct:
ext = ".mp4"
elif "pdf" in ct:
ext = ".pdf"
tmp_path = f"/tmp/wx_media_{uuid.uuid4().hex[:8]}{ext}"
with open(tmp_path, "wb") as f:
f.write(resp.content)
return tmp_path
except Exception as e:
logger.error(f"[Weixin] Failed to download media: {e}")
return ""
if os.path.exists(local_path):
return local_path
logger.warning(f"[Weixin] Media file not found: {local_path}")
return ""

View File

@@ -0,0 +1,204 @@
"""
Weixin ChatMessage implementation.
Parses WeixinMessage from the getUpdates API into the unified ChatMessage format.
"""
import os
import uuid
from bridge.context import ContextType
from channel.chat_message import ChatMessage
from channel.weixin.weixin_api import download_media_from_cdn, CDN_BASE_URL
from common.log import logger
from common.utils import expand_path
from config import conf
# MessageItemType constants from the Weixin protocol
ITEM_TEXT = 1
ITEM_IMAGE = 2
ITEM_VOICE = 3
ITEM_FILE = 4
ITEM_VIDEO = 5
def _get_tmp_dir() -> str:
ws_root = expand_path(conf().get("agent_workspace", "~/cow"))
tmp_dir = os.path.join(ws_root, "tmp")
os.makedirs(tmp_dir, exist_ok=True)
return tmp_dir
class WeixinMessage(ChatMessage):
"""Message wrapper for Weixin channel."""
def __init__(self, msg: dict, cdn_base_url: str = CDN_BASE_URL):
super().__init__(msg)
self.msg_id = str(msg.get("message_id", msg.get("seq", uuid.uuid4().hex[:8])))
self.create_time = msg.get("create_time_ms", 0)
self.context_token = msg.get("context_token", "")
self.is_group = False # Weixin plugin only supports direct chat
self.is_at = False
from_user_id = msg.get("from_user_id", "")
to_user_id = msg.get("to_user_id", "")
self.from_user_id = from_user_id
self.from_user_nickname = from_user_id
self.to_user_id = to_user_id
self.to_user_nickname = to_user_id
self.other_user_id = from_user_id
self.other_user_nickname = from_user_id
self.actual_user_id = from_user_id
self.actual_user_nickname = from_user_id
item_list = msg.get("item_list", [])
# Parse items: find text and media
text_body = ""
media_item = None
media_type = None
ref_text = ""
for item in item_list:
itype = item.get("type", 0)
if itype == ITEM_TEXT:
text_item = item.get("text_item", {})
text_body = text_item.get("text", "")
ref = item.get("ref_msg")
if ref:
ref_title = ref.get("title", "")
ref_mi = ref.get("message_item", {})
ref_body = ""
if ref_mi.get("type") == ITEM_TEXT:
ref_body = ref_mi.get("text_item", {}).get("text", "")
if ref_title or ref_body:
parts = [p for p in [ref_title, ref_body] if p]
ref_text = f"[引用: {' | '.join(parts)}]\n"
# If ref is a media item, treat it as the media to download
if ref_mi.get("type") in (ITEM_IMAGE, ITEM_VIDEO, ITEM_FILE):
media_item = ref_mi
media_type = ref_mi.get("type")
elif itype == ITEM_VOICE:
voice_item = item.get("voice_item", {})
voice_text = voice_item.get("text", "")
if voice_text:
text_body = voice_text
else:
# Voice without transcription - download the audio
media_item = item
media_type = ITEM_VOICE
elif itype in (ITEM_IMAGE, ITEM_VIDEO, ITEM_FILE):
if not media_item:
media_item = item
media_type = itype
# Determine ctype and content
if media_item and not text_body:
self._setup_media(media_item, media_type, cdn_base_url)
elif media_item and text_body:
# Text + media: download media, attach as file ref in text
self.ctype = ContextType.TEXT
media_path = self._download_media(media_item, media_type, cdn_base_url)
if media_path:
if media_type == ITEM_IMAGE:
text_body += f"\n[图片: {media_path}]"
elif media_type == ITEM_VIDEO:
text_body += f"\n[视频: {media_path}]"
else:
text_body += f"\n[文件: {media_path}]"
self.content = ref_text + text_body
else:
self.ctype = ContextType.TEXT
self.content = ref_text + text_body
def _setup_media(self, item: dict, media_type: int, cdn_base_url: str):
"""Set up message as a media type, with lazy download via _prepare_fn."""
if media_type == ITEM_IMAGE:
self.ctype = ContextType.IMAGE
image_path = self._download_media(item, ITEM_IMAGE, cdn_base_url)
if image_path:
self.content = image_path
self.image_path = image_path
else:
self.ctype = ContextType.TEXT
self.content = "[Image download failed]"
elif media_type == ITEM_VIDEO:
self.ctype = ContextType.FILE
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}.mp4")
self.content = save_path
def _download():
path = self._download_media(item, ITEM_VIDEO, cdn_base_url)
if path:
self.content = path
self._prepare_fn = _download
elif media_type == ITEM_FILE:
self.ctype = ContextType.FILE
file_name = item.get("file_item", {}).get("file_name", f"wx_{self.msg_id}")
save_path = os.path.join(_get_tmp_dir(), file_name)
self.content = save_path
def _download():
path = self._download_media(item, ITEM_FILE, cdn_base_url)
if path:
self.content = path
self._prepare_fn = _download
elif media_type == ITEM_VOICE:
self.ctype = ContextType.VOICE
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}.silk")
self.content = save_path
def _download():
path = self._download_media(item, ITEM_VOICE, cdn_base_url)
if path:
self.content = path
self._prepare_fn = _download
def _download_media(self, item: dict, media_type: int, cdn_base_url: str) -> str:
"""Download media from CDN, returns local file path or empty string."""
type_key_map = {
ITEM_IMAGE: "image_item",
ITEM_VIDEO: "video_item",
ITEM_FILE: "file_item",
ITEM_VOICE: "voice_item",
}
key = type_key_map.get(media_type, "")
info = item.get(key, {})
media = info.get("media", {})
encrypt_param = media.get("encrypt_query_param", "")
# aes_key can be in image_item.aeskey (hex) or media.aes_key (b64)
aes_key = info.get("aeskey", "") or media.get("aes_key", "")
if not encrypt_param or not aes_key:
logger.warning(f"[Weixin] Missing CDN params for media download (type={media_type})")
return ""
if media_type == ITEM_FILE:
original_name = info.get("file_name", "")
if original_name:
save_path = os.path.join(_get_tmp_dir(), original_name)
else:
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}.bin")
else:
ext_map = {ITEM_IMAGE: ".jpg", ITEM_VIDEO: ".mp4", ITEM_VOICE: ".silk"}
ext = ext_map.get(media_type, "")
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}{ext}")
try:
download_media_from_cdn(cdn_base_url, encrypt_param, aes_key, save_path)
logger.info(f"[Weixin] Media downloaded: {save_path}")
return save_path
except Exception as e:
logger.error(f"[Weixin] Media download failed: {e}")
return ""

1
cli/VERSION Normal file
View File

@@ -0,0 +1 @@
2.0.8

13
cli/__init__.py Normal file
View File

@@ -0,0 +1,13 @@
"""CowAgent CLI - Manage your CowAgent from the command line."""
import os as _os
def _read_version():
version_file = _os.path.join(_os.path.dirname(_os.path.abspath(__file__)), "VERSION")
try:
with open(version_file, "r") as f:
return f.read().strip()
except FileNotFoundError:
return "0.0.0"
__version__ = _read_version()

4
cli/__main__.py Normal file
View File

@@ -0,0 +1,4 @@
"""Allow running as: python -m cli"""
from cli.cli import main
main()

79
cli/cli.py Normal file
View File

@@ -0,0 +1,79 @@
"""CowAgent CLI entry point."""
import click
from cli import __version__
from cli.commands.skill import skill
from cli.commands.process import start, stop, restart, update, status, logs
from cli.commands.context import context
from cli.commands.install import install_browser
from cli.commands.knowledge import knowledge
HELP_TEXT = """Usage: cow COMMAND [ARGS]...
CowAgent CLI - Manage your CowAgent instance.
Commands:
help Show this message.
version Show the version.
start Start CowAgent.
stop Stop CowAgent.
restart Restart CowAgent.
update Update CowAgent and restart.
status Show CowAgent running status.
logs View CowAgent logs.
skill Manage CowAgent skills.
knowledge Manage knowledge base.
install-browser Install browser tool (Playwright + Chromium).
Tip: You can also send /help, /skill list, etc. in agent chat."""
class CowCLI(click.Group):
def format_help(self, ctx, formatter):
formatter.write(HELP_TEXT.strip())
formatter.write("\n")
def parse_args(self, ctx, args):
if args and args[0] == 'help':
click.echo(HELP_TEXT.strip())
ctx.exit(0)
return super().parse_args(ctx, args)
@click.group(cls=CowCLI, invoke_without_command=True, context_settings=dict(help_option_names=[]))
@click.pass_context
def main(ctx):
"""CowAgent CLI - Manage your CowAgent instance."""
if ctx.invoked_subcommand is None:
click.echo(HELP_TEXT.strip())
@main.command()
def version():
"""Show the version."""
click.echo(f"cow {__version__}")
@main.command(name='help')
@click.pass_context
def help_cmd(ctx):
"""Show this message."""
click.echo(HELP_TEXT.strip())
main.add_command(skill)
main.add_command(start)
main.add_command(stop)
main.add_command(restart)
main.add_command(update)
main.add_command(status)
main.add_command(logs)
main.add_command(context)
main.add_command(knowledge)
main.add_command(install_browser)
if __name__ == '__main__':
main()

0
cli/commands/__init__.py Normal file
View File

29
cli/commands/context.py Normal file
View File

@@ -0,0 +1,29 @@
"""cow context - Context management commands."""
import click
CHAT_HINT = (
"Context commands operate on the running agent's memory.\n"
"Please send the command in a chat conversation instead:\n\n"
" /context - View current context info\n"
" /context clear - Clear conversation context"
)
@click.group(invoke_without_command=True)
@click.pass_context
def context(ctx):
"""View or manage conversation context.
Context commands need access to the running agent's memory.
Use them in chat conversations: /context or /context clear
"""
if ctx.invoked_subcommand is None:
click.echo(f"\n {CHAT_HINT}\n")
@context.command()
def clear():
"""Clear conversation context (messages history)."""
click.echo(f"\n {CHAT_HINT}\n")

259
cli/commands/install.py Normal file
View File

@@ -0,0 +1,259 @@
"""cow install-browser - Install Playwright + Chromium for the browser tool."""
import os
import sys
import subprocess
from typing import Callable, Optional
import click
PLAYWRIGHT_VERSION = "1.52.0"
PLAYWRIGHT_LEGACY_VERSION = "1.28.0"
GLIBC_THRESHOLD = (2, 28)
CHINA_MIRROR = "https://registry.npmmirror.com/-/binary/playwright"
# stream(msg, fg=None) — fg is "yellow" | "green" | "red" | None
StreamFn = Callable[[str, Optional[str]], None]
# on_phase(msg) — coarse-grained progress for chat channels (Chinese)
PhaseFn = Callable[[str], None]
def _phase(cb: Optional[PhaseFn], msg: str) -> None:
if cb:
cb(msg)
def _has_display() -> bool:
"""Check if a graphical display is available (Linux only)."""
return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
def _is_headless_linux() -> bool:
return sys.platform == "linux" and not _has_display()
def _get_installed_version() -> str:
try:
out = subprocess.check_output(
[sys.executable, "-c", "import playwright; print(playwright.__version__)"],
stderr=subprocess.DEVNULL,
)
return out.decode().strip()
except Exception:
return ""
def _version_tuple(v: str):
try:
return tuple(int(x) for x in v.split(".")[:3])
except (ValueError, AttributeError):
return (0, 0, 0)
def _get_glibc_version():
if sys.platform != "linux":
return None
try:
import ctypes
libc = ctypes.CDLL("libc.so.6")
gnu_get_libc_version = libc.gnu_get_libc_version
gnu_get_libc_version.restype = ctypes.c_char_p
ver = gnu_get_libc_version().decode()
parts = ver.split(".")
return (int(parts[0]), int(parts[1]))
except Exception:
return None
def _is_china_network() -> bool:
try:
out = subprocess.check_output(
[sys.executable, "-m", "pip", "config", "get", "global.index-url"],
stderr=subprocess.DEVNULL,
)
url = out.decode().strip().lower()
return any(kw in url for kw in ("tsinghua", "aliyun", "npmmirror", "douban", "ustc", "huawei", "tencentyun"))
except Exception:
return False
def _pip_install(package_spec: str, stream: StreamFn) -> int:
"""Install a package, retrying with --user on permission failure."""
python = sys.executable
ret = subprocess.call([python, "-m", "pip", "install", package_spec])
if ret != 0:
stream(" Retrying with --user flag...", "yellow")
ret = subprocess.call([python, "-m", "pip", "install", "--user", package_spec])
return ret
def _default_stream(msg: str, fg: Optional[str] = None) -> None:
"""CLI: colored click output."""
if fg == "yellow":
click.echo(click.style(msg, fg="yellow"))
elif fg == "green":
click.echo(click.style(msg, fg="green"))
elif fg == "red":
click.echo(click.style(msg, fg="red"))
else:
click.echo(msg)
def run_install_browser(
stream: Optional[StreamFn] = None,
on_phase: Optional[PhaseFn] = None,
) -> int:
"""
Install Playwright Python package, optional Linux deps, and Chromium.
Reused by ``cow install-browser`` CLI and chat ``/install-browser``.
Args:
stream: Optional callback ``(message, fg)`` for each line. ``fg`` is
``yellow`` / ``green`` / ``red`` or None. Defaults to colored click output.
on_phase: Optional callback for coarse progress (e.g. push to chat);
messages are short Chinese status lines.
Returns:
0 on success, 1 on fatal failure (pip or chromium install failed).
"""
stream = stream or _default_stream
python = sys.executable
legacy_mode = False
_phase(on_phase, "🔧 开始安装浏览器工具依赖(约几分钟,请耐心等待)…")
glibc = _get_glibc_version()
if glibc and glibc < GLIBC_THRESHOLD:
legacy_mode = True
glibc_str = f"{glibc[0]}.{glibc[1]}"
stream(
f"glibc {glibc_str} detected (< 2.28). "
f"Will install playwright {PLAYWRIGHT_LEGACY_VERSION} for compatibility.",
"yellow",
)
stream(" Note: upgrade your OS for full browser tool support.", "yellow")
stream("")
_phase(
on_phase,
f" 检测到 glibc {glibc_str}(较旧),将安装兼容版 Playwright {PLAYWRIGHT_LEGACY_VERSION}",
)
target_version = PLAYWRIGHT_LEGACY_VERSION if legacy_mode else PLAYWRIGHT_VERSION
_phase(on_phase, "📦 [1/3] 正在安装 Playwright Python 包…")
stream("[1/3] Installing playwright Python package...", "yellow")
ret = _pip_install(f"playwright=={target_version}", stream)
if ret != 0:
stream("Failed to install playwright package.", "red")
_phase(on_phase, "❌ [1/3] Playwright Python 包安装失败。")
return 1
installed = _get_installed_version()
if installed:
stream(f" playwright {installed} installed.", "green")
stream("")
_phase(on_phase, f"✅ [1/3] Playwright 包已安装({installed or target_version})。")
if sys.platform == "linux":
_phase(on_phase, "🔧 [2/3] 正在安装 Linux 系统依赖与轻量中文字体(文泉驿正黑,部分步骤可能需要 sudo")
stream("[2/3] Installing system dependencies (Linux)...", "yellow")
ret = subprocess.call([python, "-m", "playwright", "install-deps", "chromium"])
if ret != 0:
stream(
" Could not auto-install system deps (may need sudo).\n"
f" Run manually: sudo {python} -m playwright install-deps chromium",
"yellow",
)
# Prefer fonts-wqy-zenhei only (~few MB). fonts-noto-cjk is much larger (~150MB+).
stream(" Installing CJK font (fonts-wqy-zenhei, lightweight)...")
font_ret = subprocess.call(
["sudo", "apt-get", "install", "-y", "--no-install-recommends", "fonts-wqy-zenhei"],
stderr=subprocess.DEVNULL,
)
if font_ret != 0:
stream(
" Could not auto-install CJK font.\n"
" Run manually: sudo apt-get install -y fonts-wqy-zenhei\n"
" (Optional, larger full coverage: sudo apt-get install -y fonts-noto-cjk)",
"yellow",
)
else:
subprocess.call(["fc-cache", "-fv"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
stream(" CJK font (wqy-zenhei) installed.", "green")
_phase(
on_phase,
"✅ [2/3] Linux 依赖与字体步骤已执行(若有权限问题请查看服务器日志或手动执行提示命令)。",
)
else:
stream(f"[2/3] Skipping system deps (not needed on {sys.platform}).", "yellow")
_phase(on_phase, f" [2/3] 当前系统({sys.platform})跳过 Linux 专用依赖。")
stream("")
_phase(on_phase, "🌐 [3/3] 正在下载并安装 Chromium体积较大请耐心等待")
stream("[3/3] Installing Chromium browser...", "yellow")
cmd = [python, "-m", "playwright", "install", "chromium"]
if _is_headless_linux() and not legacy_mode:
ver = _version_tuple(installed or "")
if ver >= (1, 57, 0):
cmd.append("--only-shell")
stream(" (headless shell for Linux server)", None)
else:
stream(" (full Chromium)", None)
elif sys.platform == "linux" and _has_display():
stream(" (full browser for Linux desktop)", None)
env = os.environ.copy()
use_mirror = _is_china_network()
if use_mirror:
env["PLAYWRIGHT_DOWNLOAD_HOST"] = CHINA_MIRROR
stream(f" (using China mirror: {CHINA_MIRROR})", None)
_phase(on_phase, "📡 检测到国内 pip 源配置Chromium 将优先走国内镜像下载。")
ret = subprocess.call(cmd, env=env)
if ret != 0 and use_mirror:
stream(" Mirror download failed, retrying with official CDN...", "yellow")
_phase(on_phase, "⚠️ 镜像下载失败,正在改用官方源重试…")
env_no_mirror = os.environ.copy()
env_no_mirror.pop("PLAYWRIGHT_DOWNLOAD_HOST", None)
ret = subprocess.call(cmd, env=env_no_mirror)
if ret != 0:
stream("Failed to install Chromium.", "red")
_phase(on_phase, "❌ [3/3] Chromium 安装失败。")
return 1
stream("")
_phase(on_phase, "✅ [3/3] Chromium 已安装。")
stream("Verifying browser installation...", None)
_phase(on_phase, "🔍 正在验证 Playwright 能否正常加载…")
ret = subprocess.call(
[python, "-c", "from playwright.sync_api import sync_playwright; print('OK')"],
stderr=subprocess.DEVNULL,
)
if ret != 0:
stream(
" Warning: playwright import failed. Browser tool may not work on this system.\n"
" Consider upgrading your OS or using Docker.",
"yellow",
)
_phase(on_phase, "⚠️ 验证未完全通过:本机可能仍无法使用浏览器工具,请查看日志或升级系统。")
else:
stream(" Verification passed.", "green")
_phase(on_phase, "✅ 验证通过。")
stream("")
stream("Browser tool ready! Restart CowAgent to enable it.", "green")
_phase(on_phase, "🎉 全部步骤结束。请重启 CowAgent 后使用 browser 工具。")
return 0
@click.command("install-browser")
def install_browser():
"""Install browser tool dependencies (Playwright + Chromium)."""
code = run_install_browser()
if code != 0:
raise SystemExit(code)

121
cli/commands/knowledge.py Normal file
View File

@@ -0,0 +1,121 @@
"""cow knowledge - Knowledge base management commands."""
import os
import click
from cli.utils import get_project_root
def _get_knowledge_dir():
"""Resolve the knowledge directory path from config or default."""
try:
import sys
sys.path.insert(0, get_project_root())
from config import conf
from common.utils import expand_path
workspace = expand_path(conf().get("agent_workspace", "~/cow"))
except Exception:
workspace = os.path.expanduser("~/cow")
return os.path.join(workspace, "knowledge")
def _get_knowledge_enabled():
try:
import sys
sys.path.insert(0, get_project_root())
from config import conf
return conf().get("knowledge", True)
except Exception:
return True
@click.group(invoke_without_command=True)
@click.pass_context
def knowledge(ctx):
"""Manage CowAgent knowledge base."""
if ctx.invoked_subcommand is None:
click.echo(_stats())
@knowledge.command("list")
def knowledge_list():
"""Display knowledge base file tree."""
click.echo(_tree())
def _stats() -> str:
knowledge_dir = _get_knowledge_dir()
if not os.path.isdir(knowledge_dir):
return "Knowledge base directory not found."
enabled = _get_knowledge_enabled()
total_files = 0
total_bytes = 0
cat_count = {}
for root, dirs, files in os.walk(knowledge_dir):
dirs[:] = [d for d in dirs if not d.startswith(".")]
rel_root = os.path.relpath(root, knowledge_dir)
category = rel_root.split(os.sep)[0] if rel_root != "." else "root"
for f in files:
if f.endswith(".md") and f not in ("index.md", "log.md"):
total_files += 1
total_bytes += os.path.getsize(os.path.join(root, f))
cat_count[category] = cat_count.get(category, 0) + 1
status_icon = click.style("enabled", fg="green") if enabled else click.style("disabled", fg="red")
lines = [
f"\n Knowledge Base [{status_icon}]",
"",
f" Pages: {total_files}",
f" Size: {total_bytes / 1024:.1f} KB",
"",
]
if cat_count:
lines.append(" Categories:")
for cat in sorted(cat_count.keys()):
lines.append(f" {cat}/ ({cat_count[cat]} pages)")
lines.append("")
lines.append(f" Path: {knowledge_dir}")
lines.append("")
return "\n".join(lines)
def _tree() -> str:
knowledge_dir = _get_knowledge_dir()
if not os.path.isdir(knowledge_dir):
return "Knowledge base directory not found."
tree_lines = [" knowledge/"]
subdirs = sorted([
d for d in os.listdir(knowledge_dir)
if os.path.isdir(os.path.join(knowledge_dir, d)) and not d.startswith(".")
])
for i, subdir in enumerate(subdirs):
is_last_dir = (i == len(subdirs) - 1)
branch = "└── " if is_last_dir else "├── "
subdir_path = os.path.join(knowledge_dir, subdir)
md_files = sorted([
f for f in os.listdir(subdir_path)
if f.endswith(".md") and not f.startswith(".")
])
tree_lines.append(f" {branch}{subdir}/ ({len(md_files)})")
child_prefix = " " if is_last_dir else ""
max_show = 15
for j, fname in enumerate(md_files[:max_show]):
is_last_file = (j == len(md_files[:max_show]) - 1) and len(md_files) <= max_show
fb = "└── " if is_last_file else "├── "
name = fname.replace(".md", "")
tree_lines.append(f"{child_prefix}{fb}{name}")
if len(md_files) > max_show:
tree_lines.append(f"{child_prefix}└── ... +{len(md_files) - max_show} more")
if not subdirs:
tree_lines.append(" (empty)")
return "\n" + "\n".join(tree_lines) + "\n"

317
cli/commands/process.py Normal file
View File

@@ -0,0 +1,317 @@
"""cow start/stop/restart/status/logs - Process management commands."""
import os
import sys
import subprocess
import time
from typing import Optional
import click
from cli.utils import get_project_root
_IS_WIN = sys.platform == "win32"
def _get_pid_file():
return os.path.join(get_project_root(), ".cow.pid")
def _get_log_file():
return os.path.join(get_project_root(), "nohup.out")
def _is_pid_alive(pid: int) -> bool:
"""Check whether a process is still running (cross-platform)."""
if _IS_WIN:
try:
out = subprocess.check_output(
["tasklist", "/FI", f"PID eq {pid}", "/NH"],
stderr=subprocess.DEVNULL,
)
return str(pid) in out.decode(errors="ignore")
except Exception:
return False
else:
try:
os.kill(pid, 0)
return True
except (ProcessLookupError, PermissionError):
return False
def _kill_pid(pid: int, force: bool = False):
"""Terminate a process by PID (cross-platform)."""
if _IS_WIN:
flag = "/F" if force else ""
cmd = ["taskkill"]
if force:
cmd.append("/F")
cmd.extend(["/PID", str(pid)])
subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
else:
import signal
sig = signal.SIGKILL if force else signal.SIGTERM
os.kill(pid, sig)
def _read_pid() -> Optional[int]:
pid_file = _get_pid_file()
if not os.path.exists(pid_file):
return None
try:
with open(pid_file, "r") as f:
pid = int(f.read().strip())
if _is_pid_alive(pid):
return pid
os.remove(pid_file)
return None
except (ValueError, OSError):
try:
os.remove(pid_file)
except OSError:
pass
return None
def _write_pid(pid: int):
with open(_get_pid_file(), "w") as f:
f.write(str(pid))
def _remove_pid():
pid_file = _get_pid_file()
if os.path.exists(pid_file):
os.remove(pid_file)
@click.command()
@click.option("--foreground", "-f", is_flag=True, help="Run in foreground (don't daemonize)")
@click.option("--no-logs", is_flag=True, help="Don't tail logs after starting")
def start(foreground, no_logs):
"""Start CowAgent."""
pid = _read_pid()
if pid:
click.echo(f"CowAgent is already running (PID: {pid}).")
return
root = get_project_root()
app_py = os.path.join(root, "app.py")
if not os.path.exists(app_py):
click.echo("Error: app.py not found in project root.", err=True)
sys.exit(1)
python = sys.executable
if foreground:
click.echo("Starting CowAgent in foreground...")
if _IS_WIN:
sys.exit(subprocess.call([python, app_py], cwd=root))
else:
os.execv(python, [python, app_py])
else:
log_file = _get_log_file()
click.echo("Starting CowAgent...")
popen_kwargs = dict(cwd=root)
if _IS_WIN:
CREATE_NO_WINDOW = 0x08000000
popen_kwargs["creationflags"] = (
subprocess.CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW
)
else:
popen_kwargs["start_new_session"] = True
with open(log_file, "a") as log:
proc = subprocess.Popen(
[python, app_py],
stdout=log,
stderr=log,
**popen_kwargs,
)
_write_pid(proc.pid)
click.echo(click.style(f"✓ CowAgent started (PID: {proc.pid})", fg="green"))
click.echo(f" Logs: {log_file}")
if not no_logs:
click.echo(" Press Ctrl+C to stop tailing logs.\n")
_tail_log(log_file)
@click.command()
def stop():
"""Stop CowAgent."""
pid = _read_pid()
if not pid:
click.echo("CowAgent is not running.")
return
click.echo(f"Stopping CowAgent (PID: {pid})...")
try:
_kill_pid(pid)
for _ in range(30):
time.sleep(0.1)
if not _is_pid_alive(pid):
break
else:
_kill_pid(pid, force=True)
except (ProcessLookupError, OSError):
pass
_remove_pid()
click.echo(click.style("✓ CowAgent stopped.", fg="green"))
@click.command()
@click.option("--no-logs", is_flag=True, help="Don't tail logs after restarting")
@click.pass_context
def restart(ctx, no_logs):
"""Restart CowAgent."""
ctx.invoke(stop)
time.sleep(1)
ctx.invoke(start, no_logs=no_logs)
@click.command()
@click.pass_context
def update(ctx):
"""Update CowAgent and restart."""
root = get_project_root()
# 1. Stop service first so git pull won't conflict with running code
ctx.invoke(stop)
# 2. Git pull
if os.path.isdir(os.path.join(root, ".git")):
click.echo("Pulling latest code...")
ret = subprocess.call(["git", "pull"], cwd=root)
if ret != 0:
click.echo("Error: git pull failed.", err=True)
sys.exit(1)
else:
click.echo("Not a git repository, skipping code update.")
python = sys.executable
req_file = os.path.join(root, "requirements.txt")
if _IS_WIN:
# On Windows, `cow.exe` (this process) locks the exe file, so
# `pip install -e .` fails with WinError 5. Write a small .bat
# helper that waits for cow.exe to exit, then installs & starts.
bat = os.path.join(root, "_cow_update.bat")
lines = [
"@echo off",
"chcp 65001 >nul",
"echo Waiting for cow.exe to exit...",
"timeout /t 3 /nobreak >nul",
]
if os.path.exists(req_file):
lines.append(f'echo Installing dependencies...')
lines.append(f'"{python}" -m pip install -r requirements.txt -q')
lines += [
"echo Reinstalling cow CLI...",
f'"{python}" -m pip install -e . -q',
"echo Starting CowAgent...",
f'"{python}" -m cli.cli start --no-logs',
"echo.",
"echo Update complete. You can close this window.",
"pause >nul",
"del \"%~f0\"",
]
with open(bat, "w", encoding="utf-8") as f:
f.write("\n".join(lines) + "\n")
subprocess.Popen(
["cmd.exe", "/c", "start", "CowAgent Update", "/wait", bat],
cwd=root,
)
click.echo(click.style(
"✓ Update script launched. Please follow the new window for progress.",
fg="green"))
else:
# 3. Install dependencies
if os.path.exists(req_file):
click.echo("Installing dependencies...")
subprocess.call(
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
cwd=root,
)
click.echo("Reinstalling cow CLI...")
subprocess.call(
[python, "-m", "pip", "install", "-e", ".", "-q"],
cwd=root,
)
# 4. Start service
click.echo("")
time.sleep(1)
ctx.invoke(start, no_logs=False)
@click.command()
def status():
"""Show CowAgent running status."""
from cli import __version__
from cli.utils import load_config_json
pid = _read_pid()
if pid:
click.echo(click.style(f"● CowAgent is running (PID: {pid})", fg="green"))
else:
click.echo(click.style("● CowAgent is not running", fg="red"))
click.echo(f" 版本: v{__version__}")
cfg = load_config_json()
if cfg:
channel = cfg.get("channel_type", "unknown")
if isinstance(channel, list):
channel = ", ".join(channel)
click.echo(f" 通道: {channel}")
click.echo(f" 模型: {cfg.get('model', 'unknown')}")
mode = "Agent" if cfg.get("agent") else "Chat"
click.echo(f" 模式: {mode}")
@click.command()
@click.option("--follow", "-f", is_flag=True, help="Follow log output")
@click.option("--lines", "-n", default=50, help="Number of lines to show")
def logs(follow, lines):
"""View CowAgent logs."""
log_file = _get_log_file()
if not os.path.exists(log_file):
click.echo("No log file found.")
return
if follow:
_tail_log(log_file, lines)
else:
_print_last_lines(log_file, lines)
def _print_last_lines(file_path: str, n: int = 50):
"""Print the last N lines of a file (cross-platform)."""
try:
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
all_lines = f.readlines()
for line in all_lines[-n:]:
click.echo(line, nl=False)
except Exception as e:
click.echo(f"Error reading log file: {e}", err=True)
def _tail_log(log_file: str, lines: int = 50):
"""Follow log file output. Blocks until Ctrl+C (cross-platform)."""
_print_last_lines(log_file, lines)
try:
with open(log_file, "r", encoding="utf-8", errors="replace") as f:
f.seek(0, 2)
while True:
line = f.readline()
if line:
click.echo(line, nl=False)
else:
time.sleep(0.3)
except KeyboardInterrupt:
pass

1483
cli/commands/skill.py Normal file

File diff suppressed because it is too large Load Diff

62
cli/utils.py Normal file
View File

@@ -0,0 +1,62 @@
"""Shared utilities for cow CLI."""
import os
import sys
import json
def get_project_root() -> str:
"""Get the CowAgent project root directory."""
# cli/ is directly under the project root
return os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
def get_workspace_dir() -> str:
"""Get the agent workspace directory from config, defaulting to ~/cow."""
config = load_config_json()
workspace = config.get("agent_workspace", "~/cow")
return os.path.expanduser(workspace)
def get_skills_dir() -> str:
"""Get the custom skills directory."""
return os.path.join(get_workspace_dir(), "skills")
def get_builtin_skills_dir() -> str:
"""Get the builtin skills directory."""
return os.path.join(get_project_root(), "skills")
def load_config_json() -> dict:
"""Load config.json from project root."""
config_path = os.path.join(get_project_root(), "config.json")
if not os.path.exists(config_path):
return {}
try:
with open(config_path, "r", encoding="utf-8") as f:
return json.load(f)
except Exception:
return {}
def load_skills_config() -> dict:
"""Load skills_config.json from the custom skills directory."""
path = os.path.join(get_skills_dir(), "skills_config.json")
if not os.path.exists(path):
return {}
try:
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
except Exception:
return {}
def ensure_sys_path():
"""Add project root to sys.path so we can import agent modules."""
root = get_project_root()
if root not in sys.path:
sys.path.insert(0, root)
SKILL_HUB_API = "https://skills.cowagent.ai/api"

View File

@@ -3,6 +3,18 @@ Cloud management client for connecting to the LinkAI control console.
Handles remote configuration sync, message push, and skill management
via the LinkAI socket protocol.
NOTE: By default, no cloud-related config is enabled. The application runs
entirely locally without connecting to any remote service. The cloud client
is only activated when BOTH of the following conditions are met:
1. ``use_linkai`` is set to True in config (checked in app.py before
importing this module).
2. ``cloud_deployment_id`` (or env CLOUD_DEPLOYMENT_ID) is non-empty
(checked in app.py and again in the ``start()`` function below).
If either condition is missing, this module is never loaded and the
program continues as a purely local application.
"""
from bridge.context import Context, ContextType
@@ -35,14 +47,16 @@ CREDENTIAL_MAP = {
class CloudClient(LinkAIClient):
def __init__(self, api_key: str, channel, host: str = ""):
super().__init__(api_key, host)
def __init__(self, api_key: str, channel, host: str = "", port=None):
super().__init__(api_key, host, port=port)
self.channel = channel
self.client_type = channel.channel_type
self.channel_mgr = None
self._skill_service = None
self._memory_service = None
self._knowledge_service = None
self._chat_service = None
self._session_service = None
@property
def skill_service(self):
@@ -76,6 +90,21 @@ class CloudClient(LinkAIClient):
logger.error(f"[CloudClient] Failed to init MemoryService: {e}")
return self._memory_service
@property
def knowledge_service(self):
"""Lazy-init KnowledgeService."""
if self._knowledge_service is None:
try:
from agent.knowledge.service import KnowledgeService
from config import conf
from common.utils import expand_path
workspace_root = expand_path(conf().get("agent_workspace", "~/cow"))
self._knowledge_service = KnowledgeService(workspace_root)
logger.debug("[CloudClient] KnowledgeService initialised")
except Exception as e:
logger.error(f"[CloudClient] Failed to init KnowledgeService: {e}")
return self._knowledge_service
@property
def chat_service(self):
"""Lazy-init ChatService (requires AgentBridge via Bridge singleton)."""
@@ -90,6 +119,18 @@ class CloudClient(LinkAIClient):
logger.error(f"[CloudClient] Failed to init ChatService: {e}")
return self._chat_service
@property
def session_service(self):
"""Lazy-init SessionService."""
if self._session_service is None:
try:
from agent.chat.session_service import SessionService
self._session_service = SessionService()
logger.debug("[CloudClient] SessionService initialised")
except Exception as e:
logger.error(f"[CloudClient] Failed to init SessionService: {e}")
return self._session_service
# ------------------------------------------------------------------
# message push callback
# ------------------------------------------------------------------
@@ -201,27 +242,43 @@ class CloudClient(LinkAIClient):
def _handle_channel_create(self, channel_type: str, data: dict):
local_config = conf()
self._set_channel_credentials(local_config, channel_type,
data.get("appId"), data.get("appSecret"))
cred_changed = self._set_channel_credentials(
local_config, channel_type, data.get("appId"), data.get("appSecret"))
self._add_channel_type(local_config, channel_type)
self._save_config_to_file(local_config)
if self.channel_mgr:
if not self.channel_mgr:
return
existing_ch = self.channel_mgr.get_channel(channel_type)
skip_restart = existing_ch and not cred_changed
if skip_restart and channel_type in ("weixin", "wx"):
login_status = getattr(existing_ch, "login_status", "")
if login_status != "logged_in":
skip_restart = False
logger.info(f"[CloudClient] Channel '{channel_type}' not logged in "
f"(status={login_status}), forcing restart")
if skip_restart:
logger.info(f"[CloudClient] Channel '{channel_type}' already running with same config, "
"skip restart, reporting status only")
threading.Thread(
target=self._do_add_channel, args=(channel_type,), daemon=True
target=self._report_channel_startup, args=(channel_type,), daemon=True
).start()
return
threading.Thread(
target=self._do_add_channel, args=(channel_type,), daemon=True
).start()
def _handle_channel_update(self, channel_type: str, data: dict):
local_config = conf()
enabled = data.get("enabled", "Y")
self._set_channel_credentials(local_config, channel_type,
data.get("appId"), data.get("appSecret"))
cred_changed = self._set_channel_credentials(
local_config, channel_type, data.get("appId"), data.get("appSecret"))
if enabled == "N":
self._remove_channel_type(local_config, channel_type)
else:
# Ensure channel_type is persisted even if this channel was not
# previously listed (e.g. update used as implicit create).
self._add_channel_type(local_config, channel_type)
self._save_config_to_file(local_config)
@@ -233,9 +290,24 @@ class CloudClient(LinkAIClient):
target=self._do_remove_channel, args=(channel_type,), daemon=True
).start()
else:
threading.Thread(
target=self._do_restart_channel, args=(self.channel_mgr, channel_type), daemon=True
).start()
existing_ch = self.channel_mgr.get_channel(channel_type)
needs_restart = cred_changed or not existing_ch
if not needs_restart and channel_type in ("weixin", "wx"):
login_status = getattr(existing_ch, "login_status", "")
if login_status != "logged_in":
needs_restart = True
logger.info(f"[CloudClient] Channel '{channel_type}' not logged in "
f"(status={login_status}), forcing restart")
if existing_ch and not needs_restart:
logger.info(f"[CloudClient] Channel '{channel_type}' already running with same config, "
"skip restart, reporting status only")
threading.Thread(
target=self._report_channel_startup, args=(channel_type,), daemon=True
).start()
else:
threading.Thread(
target=self._do_restart_channel, args=(self.channel_mgr, channel_type), daemon=True
).start()
def _handle_channel_delete(self, channel_type: str, data: dict):
local_config = conf()
@@ -243,11 +315,27 @@ class CloudClient(LinkAIClient):
self._remove_channel_type(local_config, channel_type)
self._save_config_to_file(local_config)
if channel_type in ("weixin", "wx"):
self._remove_weixin_credentials()
if self.channel_mgr:
threading.Thread(
target=self._do_remove_channel, args=(channel_type,), daemon=True
).start()
@staticmethod
def _remove_weixin_credentials():
"""Remove the weixin token credentials file so next connect triggers QR login."""
cred_path = os.path.expanduser(
conf().get("weixin_credentials_path", "~/.weixin_cow_credentials.json")
)
try:
if os.path.exists(cred_path):
os.remove(cred_path)
logger.info(f"[CloudClient] Removed weixin credentials: {cred_path}")
except Exception as e:
logger.warning(f"[CloudClient] Failed to remove weixin credentials: {e}")
# ------------------------------------------------------------------
# channel credentials helpers
# ------------------------------------------------------------------
@@ -322,7 +410,7 @@ class CloudClient(LinkAIClient):
self.channel_mgr.add_channel(channel_type)
logger.info(f"[CloudClient] Channel '{channel_type}' added successfully")
except Exception as e:
logger.error(f"[CloudClient] Failed to add channel '{channel_type}': {e}")
logger.error(f"[CloudClient] Failed to add channel '{channel_type}': {e}", exc_info=True)
self.send_channel_status(channel_type, "error", str(e))
return
self._report_channel_startup(channel_type)
@@ -334,12 +422,31 @@ class CloudClient(LinkAIClient):
except Exception as e:
logger.error(f"[CloudClient] Failed to remove channel '{channel_type}': {e}")
def send_channel_qrcode(self, channel_type: str, qrcode_url: str):
"""Report QR code URL for a channel that requires scan-to-login."""
if self.client_id:
from linkai.api.client.client import ClientMsgType
msg = self._build_package(ClientMsgType.CHANNEL_STATUS)
msg["data"]["channelType"] = channel_type
msg["data"]["status"] = "qrcode"
msg["data"]["qrcodeUrl"] = qrcode_url
self._send_package(msg)
logger.info(f"[CloudClient] Sent QR code status for '{channel_type}'")
def _report_channel_startup(self, channel_type: str):
"""Wait for channel startup result and report to cloud."""
ch = self.channel_mgr.get_channel(channel_type)
if not ch:
self.send_channel_status(channel_type, "error", "channel instance not found")
return
if channel_type in ("weixin", "wx") and hasattr(ch, "login_status"):
login_status = getattr(ch, "login_status", "")
if login_status in ("waiting_scan", "scanned", "idle"):
logger.info(f"[CloudClient] Channel '{channel_type}' is waiting for QR login, "
"skip reporting connected")
return
success, error = ch.wait_startup(timeout=3)
if success:
logger.info(f"[CloudClient] Channel '{channel_type}' connected, reporting status")
@@ -390,6 +497,27 @@ class CloudClient(LinkAIClient):
return svc.dispatch(action, payload)
# ------------------------------------------------------------------
# knowledge callback
# ------------------------------------------------------------------
def on_knowledge(self, data: dict) -> dict:
"""
Handle KNOWLEDGE messages from the cloud console.
Delegates to KnowledgeService.dispatch for the actual operations.
:param data: message data with 'action', 'clientId', 'payload'
:return: response dict
"""
action = data.get("action", "")
payload = data.get("payload")
logger.info(f"[CloudClient] on_knowledge: action={action}")
svc = self.knowledge_service
if svc is None:
return {"action": action, "code": 500, "message": "KnowledgeService not available", "payload": None}
return svc.dispatch(action, payload)
# ------------------------------------------------------------------
# chat callback
# ------------------------------------------------------------------
@@ -409,6 +537,19 @@ class CloudClient(LinkAIClient):
session_id = f"session_{session_id}"
logger.info(f"[CloudClient] on_chat: session={session_id}, channel={channel_type}, query={query[:80]}")
# Intercept cow/slash commands before the agent runs
try:
from plugins import PluginManager
mgr = PluginManager()
instance = mgr.instances.get("COW_CLI")
if instance and hasattr(instance, "execute"):
result = instance.execute(query, session_id=session_id)
if result is not None:
send_chunk_fn({"chunk_type": "content", "delta": result, "segment_id": 0})
return
except Exception as e:
logger.warning(f"[CloudClient] cow_cli intercept failed: {e}")
svc = self.chat_service
if svc is None:
raise RuntimeError("ChatService not available")
@@ -418,12 +559,23 @@ class CloudClient(LinkAIClient):
# ------------------------------------------------------------------
# history callback
# ------------------------------------------------------------------
# Session-related actions handled via the HISTORY channel
_SESSION_ACTIONS = {
"list_sessions", "delete_session", "rename_session",
"clear_context", "generate_title",
}
def on_history(self, data: dict) -> dict:
"""
Handle HISTORY messages from the cloud console.
Returns paginated conversation history for a session.
:param data: message data with 'action' and 'payload' (session_id, page, page_size)
Supports both history query and session management actions
through a unified HISTORY message channel:
- query: paginated conversation history
- list_sessions / delete_session / rename_session /
clear_context / generate_title: session lifecycle
:param data: message data with 'action' and 'payload'
:return: response dict
"""
action = data.get("action", "query")
@@ -433,8 +585,19 @@ class CloudClient(LinkAIClient):
if action == "query":
return self._query_history(payload)
if action in self._SESSION_ACTIONS:
return self._dispatch_session(action, payload)
return {"action": action, "code": 404, "message": f"unknown action: {action}", "payload": None}
def _dispatch_session(self, action: str, payload: dict) -> dict:
"""Delegate session actions to SessionService."""
svc = self.session_service
if svc is None:
return {"action": action, "code": 500,
"message": "SessionService not available", "payload": None}
return svc.dispatch(action, payload)
def _query_history(self, payload: dict) -> dict:
"""Query paginated conversation history using ConversationStore."""
session_id = payload.get("session_id", "")
@@ -551,9 +714,9 @@ def get_deployment_id() -> str:
def get_website_base_url() -> str:
"""Return the public URL prefix that maps to the workspace websites/ dir.
"""Return the URL prefix that maps to the workspace websites/ dir.
Returns empty string when cloud deployment is not configured.
Do nothing when in local env.
"""
deployment_id = get_deployment_id()
if not deployment_id:
@@ -570,6 +733,42 @@ def get_website_base_url() -> str:
return f"https://app.{domain}/{deployment_id}"
# Subdir under websites/ used by the send tool
COW_SEND_WEB_SUBDIR = "cow-send"
def copy_send_file(src_path: str, workspace_root: str) -> str:
"""Copy *src_path* into ``websites/cow-send/`` and return its URL.
Returns empty string in local env.
"""
import shutil
import uuid
from common.utils import expand_path
base = get_website_base_url()
if not base or not src_path or not os.path.isfile(src_path):
return ""
ws = os.path.abspath(expand_path(workspace_root))
send_dir = os.path.join(ws, "websites", COW_SEND_WEB_SUBDIR)
try:
os.makedirs(send_dir, exist_ok=True)
except OSError:
return ""
ext = os.path.splitext(src_path)[1].lower()
if len(ext) > 12 or not ext.replace(".", "").isalnum():
ext = ""
dest_name = f"{uuid.uuid4().hex}{ext}"
dest_path = os.path.join(send_dir, dest_name)
try:
shutil.copy2(src_path, dest_path)
except OSError as e:
logger.warning(f"[cloud] copy_send_file: copy failed: {e}")
return ""
return f"{base}/{COW_SEND_WEB_SUBDIR}/{dest_name}"
def build_website_prompt(workspace_dir: str) -> list:
"""Build system prompt lines for cloud website/file sharing rules.
@@ -590,8 +789,8 @@ def build_website_prompt(workspace_dir: str) -> list:
f" - 例如: `websites/my-app/index.html` → `{base_url}/my-app/index.html`",
"",
"2. **生成文件分享** (PPT、PDF、图片、音视频等): 当你为用户生成了需要下载或查看的文件时,**可以**将文件保存到 `websites/` 目录中",
f" - 例如: 生成的PPT保存到 `websites/files/report.pptx` → 下载链接为 `{base_url}/files/report.pptx`",
" - 你仍然可以同时使用 `send` 工具发送文件(在飞书、钉钉等IM渠道中有效),但**必须同时在回复文本中提供下载链接**作为兜底,因为部分渠道(如网页端)无法通过 send 接收本地文件",
f" - 例如: 生成的PPT保存到 `websites/files/report.pptx` → 下载链接为 `{base_url}/files/report.pptx`",
" - 你仍然可以同时使用 `send` 工具发送文件(在微信、飞书、钉钉、web等渠道中有效),但**必须同时在回复文本中提供下载链接**作为兜底,因为部分渠道无法通过 send 接收本地文件",
"",
"3. **必须发送链接**: 无论是网页还是文件,生成后**必须将完整的访问/下载链接直接写在回复文本中发送给用户**",
"",
@@ -602,8 +801,11 @@ def build_website_prompt(workspace_dir: str) -> list:
]
def start(channel, channel_mgr=None):
if not get_deployment_id():
return
global chat_client
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), channel=channel)
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), port=conf().get("cloud_port"), channel=channel)
chat_client.channel_mgr = channel_mgr
chat_client.config = _build_config()
chat_client.start()

View File

@@ -1,18 +1,21 @@
# 厂商类型
OPEN_AI = "openAI"
CHATGPT = "chatGPT"
OPENAI = "openai"
CHATGPT = "chatGPT" # legacy alias for OPENAI, kept for backward compatibility
BAIDU = "baidu"
QIANFAN = "qianfan"
XUNFEI = "xunfei"
CHATGPTONAZURE = "chatGPTOnAzure"
LINKAI = "linkai"
CLAUDEAPI= "claudeAPI"
QWEN = "qwen" # 旧版千问接入
QWEN_DASHSCOPE = "dashscope" # 新版千问接入(百炼)
QWEN = "qwen" # 千问 (兼容旧配置,实际走 DashscopeBot)
QWEN_DASHSCOPE = "dashscope" # 千问 DashScope 接入
GEMINI = "gemini"
ZHIPU_AI = "zhipu"
MOONSHOT = "moonshot"
MiniMax = "minimax"
DEEPSEEK = "deepseek"
CUSTOM = "custom" # custom OpenAI-compatible API, bot_type won't auto-switch on model change
MODELSCOPE = "modelscope"
# 模型列表
@@ -26,6 +29,7 @@ CLAUDE_35_SONNET = "claude-3-5-sonnet-latest" # 带 latest 标签的模型名
CLAUDE_35_SONNET_1022 = "claude-3-5-sonnet-20241022" # 带具体日期的模型名称,会固定为该日期发布的模型
CLAUDE_35_SONNET_0620 = "claude-3-5-sonnet-20240620"
CLAUDE_4_OPUS = "claude-opus-4-0"
CLAUDE_4_7_OPUS = "claude-opus-4-7" # Claude Opus 4.7
CLAUDE_4_6_OPUS = "claude-opus-4-6" # Claude Opus 4.6 - Agent推荐模型
CLAUDE_4_SONNET = "claude-sonnet-4-0" # Claude Sonnet 4.0
CLAUDE_4_5_SONNET = "claude-sonnet-4-5" # Claude Sonnet 4.5 - Agent推荐模型
@@ -68,6 +72,8 @@ GPT_5 = "gpt-5"
GPT_5_MINI = "gpt-5-mini"
GPT_5_NANO = "gpt-5-nano"
GPT_54 = "gpt-5.4" # GPT-5.4 - Agent recommended model
GPT_54_MINI = "gpt-5.4-mini"
GPT_54_NANO = "gpt-5.4-nano"
O1 = "o1-preview"
O1_MINI = "o1-mini"
WHISPER_1 = "whisper-1"
@@ -77,26 +83,41 @@ TTS_1_HD = "tts-1-hd"
# DeepSeek
DEEPSEEK_CHAT = "deepseek-chat" # DeepSeek-V3对话模型
DEEPSEEK_REASONER = "deepseek-reasoner" # DeepSeek-R1模型
DEEPSEEK_V4_FLASH = "deepseek-v4-flash" # DeepSeek V4 Flash - 默认推荐 (思考模式 + 工具调用)
DEEPSEEK_V4_PRO = "deepseek-v4-pro" # DeepSeek V4 Pro - 复杂任务更强 (思考模式 + 工具调用)
# Qwen (通义千问 - 阿里云)
QWEN = "qwen"
# Baidu Qianfan / ERNIE
ERNIE_5 = "ernie-5.0" # ERNIE 5.0 - default recommendation
ERNIE_X1_1 = "ernie-x1.1" # ERNIE X1.1 - reasoning-focused, multimodal
ERNIE_45_TURBO_128K = "ernie-4.5-turbo-128k"
ERNIE_45_TURBO_32K = "ernie-4.5-turbo-32k"
ERNIE_4_TURBO_8K = "ERNIE-4.0-Turbo-8K"
ERNIE_45_TURBO_VL = "ernie-4.5-turbo-vl"
ERNIE_45_TURBO_VL_32K = "ernie-4.5-turbo-vl-32k"
# Qwen (通义千问 - 阿里云 DashScope)
QWEN_TURBO = "qwen-turbo"
QWEN_PLUS = "qwen-plus"
QWEN_MAX = "qwen-max"
QWEN_LONG = "qwen-long"
QWEN3_MAX = "qwen3-max" # Qwen3 Max - Agent推荐模型
QWEN35_PLUS = "qwen3.5-plus" # Qwen3.5 Plus - Omni model (MultiModalConversation)
QWEN36_PLUS = "qwen3.6-plus" # Qwen3.6 Plus - Omni model (MultiModalConversation)
QWQ_PLUS = "qwq-plus"
# MiniMax
MINIMAX_M2_5 = "MiniMax-M2.5" # MiniMax M2.5 - Latest
MINIMAX_M2_1 = "MiniMax-M2.1" # MiniMax M2.1 - Agent推荐模型
MINIMAX_M2_7 = "MiniMax-M2.7" # MiniMax M2.7 - Latest
MINIMAX_M2_7_HIGHSPEED = "MiniMax-M2.7-highspeed" # MiniMax M2.7 highspeed
MINIMAX_M2_5 = "MiniMax-M2.5" # MiniMax M2.5
MINIMAX_M2_1 = "MiniMax-M2.1" # MiniMax M2.1
MINIMAX_M2_1_LIGHTNING = "MiniMax-M2.1-lightning" # MiniMax M2.1 极速版
MINIMAX_M2 = "MiniMax-M2" # MiniMax M2
MINIMAX_ABAB6_5 = "abab6.5-chat" # MiniMax abab6.5
# GLM (智谱AI)
GLM_5 = "glm-5" # 智谱 GLM-5 - Latest
GLM_5_1 = "glm-5.1" # 智谱 GLM-5.1 - Agent recommended model (default)
GLM_5_TURBO = "glm-5-turbo" # 智谱 GLM-5-Turbo
GLM_5 = "glm-5" # 智谱 GLM-5
GLM_4 = "glm-4"
GLM_4_PLUS = "glm-4-plus"
GLM_4_flash = "glm-4-flash"
@@ -111,6 +132,7 @@ GLM_4_7 = "glm-4.7" # 智谱 GLM-4.7 - Agent推荐模型
MOONSHOT = "moonshot"
KIMI_K2 = "kimi-k2"
KIMI_K2_5 = "kimi-k2.5"
KIMI_K2_6 = "kimi-k2.6" # Kimi K2.6 - Agent recommended model (default)
# Doubao (Volcengine Ark)
DOUBAO = "doubao"
@@ -119,6 +141,10 @@ DOUBAO_SEED_2_PRO = "doubao-seed-2-0-pro-260215"
DOUBAO_SEED_2_LITE = "doubao-seed-2-0-lite-260215"
DOUBAO_SEED_2_MINI = "doubao-seed-2-0-mini-260215"
# ModelScope(魔搭社区)
QWEN3_235B_A22B_INSTRUCT_2507 = "Qwen/Qwen3-235B-A22B-Instruct-2507"
QWEN3_5_27B = "Qwen/Qwen3.5-27B"
# 其他模型
WEN_XIN = "wenxin"
WEN_XIN_4 = "wenxin-4"
@@ -130,22 +156,35 @@ MODELSCOPE = "modelscope"
GITEE_AI_MODEL_LIST = ["Yi-34B-Chat", "InternVL2-8B", "deepseek-coder-33B-instruct", "InternVL2.5-26B", "Qwen2-VL-72B", "Qwen2.5-32B-Instruct", "glm-4-9b-chat", "codegeex4-all-9b", "Qwen2.5-Coder-32B-Instruct", "Qwen2.5-72B-Instruct", "Qwen2.5-7B-Instruct", "Qwen2-72B-Instruct", "Qwen2-7B-Instruct", "code-raccoon-v1", "Qwen2.5-14B-Instruct"]
MODELSCOPE_MODEL_LIST = ["LLM-Research/c4ai-command-r-plus-08-2024","mistralai/Mistral-Small-Instruct-2409","mistralai/Ministral-8B-Instruct-2410","mistralai/Mistral-Large-Instruct-2407",
"Qwen/Qwen2.5-Coder-32B-Instruct","Qwen/Qwen2.5-Coder-14B-Instruct","Qwen/Qwen2.5-Coder-7B-Instruct","Qwen/Qwen2.5-72B-Instruct","Qwen/Qwen2.5-32B-Instruct","Qwen/Qwen2.5-14B-Instruct","Qwen/Qwen2.5-7B-Instruct","Qwen/QwQ-32B-Preview",
"LLM-Research/Llama-3.3-70B-Instruct","opencompass/CompassJudger-1-32B-Instruct","Qwen/QVQ-72B-Preview","LLM-Research/Meta-Llama-3.1-405B-Instruct","LLM-Research/Meta-Llama-3.1-8B-Instruct","Qwen/Qwen2-VL-7B-Instruct","LLM-Research/Meta-Llama-3.1-70B-Instruct",
"Qwen/Qwen2.5-14B-Instruct-1M","Qwen/Qwen2.5-7B-Instruct-1M","Qwen/Qwen2.5-VL-3B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","Qwen/Qwen2.5-VL-72B-Instruct","deepseek-ai/DeepSeek-R1-Distill-Llama-70B","deepseek-ai/DeepSeek-R1-Distill-Llama-8B","deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B","deepseek-ai/DeepSeek-R1-Distill-Qwen-7B","deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","deepseek-ai/DeepSeek-R1","deepseek-ai/DeepSeek-V3","Qwen/QwQ-32B"]
MODELSCOPE_MODEL_LIST = ["deepseek-ai/DeepSeek-R1-0528", "deepseek-ai/DeepSeek-R1-Distill-Llama-70B", "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", "deepseek-ai/DeepSeek-V3.2", "LLM-Research/c4ai-command-r-plus-08-2024", "LLM-Research/Llama-4-Maverick-17B-128E-Instruct", "meituan-longcat/LongCat-Flash-Lite", "MiniMax/MiniMax-M1-80k", "MiniMax/MiniMax-M2.5", "mistralai/Ministral-8B-Instruct-2410",
"mistralai/Mistral-Large-Instruct-2407", "mistralai/Mistral-Small-Instruct-2409", "moonshotai/Kimi-K2.5", "MusePublic/Qwen-Image-Edit", "opencompass/CompassJudger-1-32B-Instruct", "OpenGVLab/InternVL3_5-241B-A28B",
"Qwen/QVQ-72B-Preview", "Qwen/Qwen-Image-Edit", "Qwen/Qwen3-0.6B", "Qwen/Qwen3-1.7B", "Qwen/Qwen3-14B", "Qwen/Qwen3-235B-A22B", "Qwen/Qwen3-235B-A22B-Instruct-2507", "Qwen/Qwen3-235B-A22B-Thinking-2507", "Qwen/Qwen3-30B-A3B", "Qwen/Qwen3-30B-A3B-Thinking-2507",
"Qwen/Qwen3-32B", "Qwen/Qwen3-4B", "Qwen/Qwen3-8B", "Qwen/Qwen3-Coder-30B-A3B-Instruct", "Qwen/Qwen3-Coder-480B-A35B-Instruct", "Qwen/Qwen3-Next-80B-A3B-Instruct", "Qwen/Qwen3-Next-80B-A3B-Thinking", "Qwen/Qwen3-VL-235B-A22B-Instruct", "Qwen/Qwen3-VL-8B-Instruct",
"Qwen/Qwen3-VL-8B-Thinking", "Qwen/Qwen3.5-122B-A10B", "Qwen/Qwen3.5-27B", "Qwen/Qwen3.5-35B-A3B", "Qwen/Qwen3.5-397B-A17B", "Qwen/QwQ-32B", "Qwen/QwQ-32B-Preview", "Shanghai_AI_Laboratory/Intern-S1", "Shanghai_AI_Laboratory/Intern-S1-mini",
"stepfun-ai/Step-3.5-Flash", "XiaomiMiMo/MiMo-V2-Flash", "ZhipuAI/GLM-4.7-Flash", "ZhipuAI/GLM-5"]
MODEL_LIST = [
# DeepSeek
DEEPSEEK_V4_FLASH, DEEPSEEK_V4_PRO, DEEPSEEK_CHAT, DEEPSEEK_REASONER,
# Baidu Qianfan / ERNIE
QIANFAN, ERNIE_5, ERNIE_X1_1, ERNIE_45_TURBO_128K, ERNIE_45_TURBO_32K, ERNIE_4_TURBO_8K,
ERNIE_45_TURBO_VL, ERNIE_45_TURBO_VL_32K,
# MiniMax
MiniMax, MINIMAX_M2_7, MINIMAX_M2_7_HIGHSPEED, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
# Claude
CLAUDE3, CLAUDE_4_6_SONNET, CLAUDE_4_6_OPUS, CLAUDE_4_OPUS, CLAUDE_4_5_SONNET, CLAUDE_4_SONNET, CLAUDE_3_OPUS, CLAUDE_3_OPUS_0229,
CLAUDE_35_SONNET, CLAUDE_35_SONNET_1022, CLAUDE_35_SONNET_0620, CLAUDE_3_SONNET, CLAUDE_3_HAIKU,
CLAUDE3, CLAUDE_4_6_SONNET, CLAUDE_4_7_OPUS, CLAUDE_4_6_OPUS, CLAUDE_4_OPUS, CLAUDE_4_5_SONNET, CLAUDE_4_SONNET, CLAUDE_3_OPUS, CLAUDE_3_OPUS_0229,
CLAUDE_35_SONNET, CLAUDE_35_SONNET_1022, CLAUDE_35_SONNET_0620, CLAUDE_3_SONNET, CLAUDE_3_HAIKU,
"claude", "claude-3-haiku", "claude-3-sonnet", "claude-3-opus", "claude-3.5-sonnet",
# Gemini
GEMINI_31_FLASH_LITE_PRE, GEMINI_31_PRO_PRE, GEMINI_3_PRO_PRE, GEMINI_3_FLASH_PRE, GEMINI_25_PRO_PRE, GEMINI_25_FLASH_PRE,
GEMINI_20_FLASH, GEMINI_20_flash_exp, GEMINI_15_PRO, GEMINI_15_flash, GEMINI_PRO, GEMINI,
# OpenAI
GPT35, GPT35_0125, GPT35_1106, "gpt-3.5-turbo-16k",
GPT4, GPT4_06_13, GPT4_32k, GPT4_32k_06_13,
@@ -153,33 +192,31 @@ MODEL_LIST = [
GPT_4o, GPT_4O_0806, GPT_4o_MINI,
GPT_41, GPT_41_MINI, GPT_41_NANO,
GPT_5, GPT_5_MINI, GPT_5_NANO,
GPT_54,
GPT_54, GPT_54_MINI, GPT_54_NANO,
O1, O1_MINI,
# DeepSeek
DEEPSEEK_CHAT, DEEPSEEK_REASONER,
# Qwen
QWEN, QWEN_TURBO, QWEN_PLUS, QWEN_MAX, QWEN_LONG, QWEN3_MAX, QWEN35_PLUS,
# MiniMax
MiniMax, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
# GLM
ZHIPU_AI, GLM_5, GLM_4, GLM_4_PLUS, GLM_4_flash, GLM_4_LONG, GLM_4_ALLTOOLS,
# GLM (智谱AI)
ZHIPU_AI, GLM_5_1, GLM_5_TURBO, GLM_5, GLM_4, GLM_4_PLUS, GLM_4_flash, GLM_4_LONG, GLM_4_ALLTOOLS,
GLM_4_0520, GLM_4_AIR, GLM_4_AIRX, GLM_4_7,
# Kimi
MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k",
KIMI_K2, KIMI_K2_5,
# Qwen (通义千问)
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
# Doubao
# Doubao (豆包)
DOUBAO, DOUBAO_SEED_2_CODE, DOUBAO_SEED_2_PRO, DOUBAO_SEED_2_LITE, DOUBAO_SEED_2_MINI,
# Kimi (Moonshot)
MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k",
KIMI_K2_6, KIMI_K2_5, KIMI_K2,
# ModelScope
MODELSCOPE,
# LinkAI
LINKAI_35, LINKAI_4_TURBO, LINKAI_4o,
# 其他模型
WEN_XIN, WEN_XIN_4, XUNFEI,
LINKAI_35, LINKAI_4_TURBO, LINKAI_4o,
MODELSCOPE
]
MODEL_LIST = MODEL_LIST + GITEE_AI_MODEL_LIST + MODELSCOPE_MODEL_LIST
@@ -188,3 +225,4 @@ FEISHU = "feishu"
DINGTALK = "dingtalk"
WECOM_BOT = "wecom_bot"
QQ = "qq"
WEIXIN = "weixin"

View File

@@ -1,5 +1,6 @@
import logging
import sys
import io
def _reset_logger(log):
@@ -9,7 +10,10 @@ def _reset_logger(log):
del handler
log.handlers.clear()
log.propagate = False
console_handle = logging.StreamHandler(sys.stdout)
stdout = sys.stdout
if hasattr(stdout, "buffer"):
stdout = io.TextIOWrapper(stdout.buffer, encoding="utf-8", errors="replace", line_buffering=True)
console_handle = logging.StreamHandler(stdout)
console_handle.setFormatter(
logging.Formatter(
"[%(levelname)s][%(asctime)s][%(filename)s:%(lineno)d] - %(message)s",

View File

@@ -115,3 +115,22 @@ def expand_path(path: str) -> str:
expanded = os.path.join(home, path[2:])
return expanded
def get_cloud_headers(api_key: str) -> dict:
"""
Build standard headers for LinkAI API requests,
including client_id when available.
"""
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}",
}
try:
from linkai import LinkAIClient
client_id = LinkAIClient.fetch_client_id()
if client_id:
headers["X-Client-Id"] = client_id
except Exception:
pass
return headers

View File

@@ -0,0 +1,17 @@
import inspect
from typing import Any
def websocket_app_run_forever(ws: Any, **kwargs: Any) -> None:
"""
Call WebSocketApp.run_forever; strip reconnect= if the installed
websocket-client is too old (reconnect was added in a later 1.x release).
"""
if "reconnect" in kwargs:
try:
params = inspect.signature(ws.run_forever).parameters
except (TypeError, ValueError):
params = {}
if "reconnect" not in params:
kwargs = {k: v for k, v in kwargs.items() if k != "reconnect"}
ws.run_forever(**kwargs)

View File

@@ -1,6 +1,10 @@
{
"channel_type": "web",
"model": "MiniMax-M2.5",
"channel_type": "weixin",
"model": "deepseek-v4-flash",
"deepseek_api_key": "",
"deepseek_api_base": "https://api.deepseek.com/v1",
"qianfan_api_key": "",
"qianfan_api_base": "https://qianfan.baidubce.com/v2",
"minimax_api_key": "",
"zhipu_ai_api_key": "",
"ark_api_key": "",
@@ -22,12 +26,16 @@
"linkai_app_code": "",
"feishu_app_id": "",
"feishu_app_secret": "",
"feishu_stream_reply": true,
"dingtalk_client_id": "",
"dingtalk_client_secret":"",
"dingtalk_client_secret": "",
"wecom_bot_id": "",
"wecom_bot_secret": "",
"web_password": "",
"agent": true,
"agent_max_context_tokens": 40000,
"agent_max_context_tokens": 50000,
"agent_max_context_turns": 20,
"agent_max_steps": 15
"agent_max_steps": 20,
"enable_thinking": false,
"knowledge": true
}

View File

@@ -17,10 +17,12 @@ available_setting = {
"open_ai_api_base": "https://api.openai.com/v1",
"claude_api_base": "https://api.anthropic.com/v1", # claude api base
"gemini_api_base": "https://generativelanguage.googleapis.com", # gemini api base
"custom_api_key": "", # custom OpenAI-compatible provider api key (used when bot_type is "custom")
"custom_api_base": "", # custom OpenAI-compatible provider api base (used when bot_type is "custom")
"proxy": "", # openai使用的代理
# chatgpt模型 当use_azure_chatgpt为true时其名称为Azure上model deployment名称
"model": "gpt-3.5-turbo", # 可选择: gpt-4o, pt-4o-mini, gpt-4-turbo, claude-3-sonnet, wenxin, moonshot, qwen-turbo, xunfei, glm-4, minimax, gemini等模型全部可选模型详见common/const.py文件
"bot_type": "", # 可选配置使用兼容openai格式的三方服务时候需填"chatGPT"。bot具体名称详见common/const.py文件列出的bot_type如不填根据model名称判断
"bot_type": "", # 可选配置使用兼容openai格式的三方服务时候需填"openai"或"custom"custom模式下切换模型不会自动切换bot_type。bot具体名称详见common/const.py文件如不填根据model名称判断
"use_azure_chatgpt": False, # 是否使用azure的chatgpt
"azure_deployment_id": "", # azure 模型部署名称
"azure_api_version": "", # azure api版本
@@ -74,6 +76,9 @@ available_setting = {
"baidu_wenxin_api_key": "", # Baidu api key
"baidu_wenxin_secret_key": "", # Baidu secret key
"baidu_wenxin_prompt_enabled": False, # Enable prompt if you are using ernie character model
# Baidu Qianfan / ERNIE OpenAI-compatible API
"qianfan_api_key": "", # Baidu Qianfan API key in bce-v3 format
"qianfan_api_base": "https://qianfan.baidubce.com/v2", # Qianfan OpenAI-compatible API base
# 讯飞星火API
"xunfei_app_id": "", # 讯飞应用ID
"xunfei_api_key": "", # 讯飞 API key
@@ -121,10 +126,13 @@ available_setting = {
"chat_start_time": "00:00", # 服务开始时间
"chat_stop_time": "24:00", # 服务结束时间
# 翻译api
"translate": "baidu", # 翻译api支持baidu
"translate": "baidu", # 翻译api支持baidu, youdao
# baidu翻译api的配置
"baidu_translate_app_id": "", # 百度翻译api的appid
"baidu_translate_app_key": "", # 百度翻译api的秘钥
# youdao翻译api的配置
"youdao_translate_app_key": "", # 有道翻译api的应用ID
"youdao_translate_app_secret": "", # 有道翻译api的应用密钥
# wechatmp的配置
"wechatmp_token": "", # 微信公众平台的Token
"wechatmp_port": 8080, # 微信公众平台的端口,需要端口转发到80或443
@@ -140,12 +148,13 @@ available_setting = {
"wechatcomapp_agent_id": "", # 企业微信app的agent_id
"wechatcomapp_aes_key": "", # 企业微信app的aes_key
# 飞书配置
"feishu_port": 80, # 飞书bot监听端口
"feishu_port": 80, # 飞书bot监听端口仅webhook模式需要
"feishu_app_id": "", # 飞书机器人应用APP Id
"feishu_app_secret": "", # 飞书机器人APP secret
"feishu_token": "", # 飞书 verification token
"feishu_bot_name": "", # 飞书机器人的名字
"feishu_token": "", # 飞书 verification token仅webhook模式需要
"feishu_event_mode": "websocket", # 飞书事件接收模式: webhook(HTTP服务器) 或 websocket(长连接)
# 飞书流式回复(基于官方 cardkit 流式卡片 API需要机器人开通 cardkit:card:write 权限,且飞书客户端 7.20+
"feishu_stream_reply": True, # 是否开启流式回复(打字机效果)。失败/老客户端自动降级为非流式或升级提示
# 钉钉配置
"dingtalk_client_id": "", # 钉钉机器人Client ID
"dingtalk_client_secret": "", # 钉钉机器人Client Secret
@@ -153,10 +162,15 @@ available_setting = {
# 企微智能机器人配置(长连接模式)
"wecom_bot_id": "", # 企微智能机器人BotID
"wecom_bot_secret": "", # 企微智能机器人长连接Secret
# 微信配置
"weixin_token": "", # 微信登录后获取的bot_token留空则启动时自动扫码登录
"weixin_base_url": "https://ilinkai.weixin.qq.com", # Weixin ilink API base URL
"weixin_cdn_base_url": "https://novac2c.cdn.weixin.qq.com/c2c", # CDN base URL
"weixin_credentials_path": "~/.weixin_cow_credentials.json", # credentials file path
# chatgpt指令自定义触发词
"clear_memory_commands": ["#清除记忆"], # 重置会话指令,必须以#开头
# channel配置
"channel_type": "", # 通道类型,支持多渠道同时运行。单个: "feishu",多个: "feishu, dingtalk" 或 ["feishu", "dingtalk"]。可选值: web,feishu,dingtalk,wecom_bot,wechatmp,wechatmp_service,wechatcom_app
"channel_type": "", # 通道类型,支持多渠道同时运行。单个: "feishu",多个: "feishu, dingtalk" 或 ["feishu", "dingtalk"]。可选值: web,feishu,dingtalk,wecom_bot,weixin,wechatmp,wechatmp_service,wechatcom_app
"web_console": True, # 是否自动启动Web控制台默认启动。设为False可禁用
"subscribe_msg": "", # 订阅消息, 支持: wechatmp, wechatmp_service, wechatcom_app
"debug": False, # 是否开启debug模式开启后会打印更多日志
@@ -175,25 +189,36 @@ available_setting = {
# 豆包(火山方舟) 平台配置
"ark_api_key": "",
"ark_base_url": "https://ark.cn-beijing.volces.com/api/v3",
#魔搭社区 平台配置
# 魔搭社区 平台配置
"modelscope_api_key": "",
"modelscope_base_url": "https://api-inference.modelscope.cn/v1/chat/completions",
# LinkAI平台配置
"use_linkai": False,
"linkai_api_key": "",
"linkai_app_code": "",
"linkai_api_base": "https://api.link-ai.tech", # linkAI服务地址
"linkai_api_base": "https://api.link-ai.tech",
"cloud_host": "client.link-ai.tech",
"cloud_port": None,
"cloud_deployment_id": "",
"minimax_api_key": "",
"Minimax_group_id": "",
"Minimax_base_url": "",
"deepseek_api_key": "",
"deepseek_api_base": "https://api.deepseek.com/v1",
"web_port": 9899,
"web_password": "", # Web console password; empty means no authentication required
"web_session_expire_days": 30, # Auth session expiry in days
"agent": True, # 是否开启Agent模式
"agent_workspace": "~/cow", # agent工作空间路径用于存储skills、memory等
"agent_max_context_tokens": 50000, # Agent模式下最大上下文tokens
"agent_max_context_turns": 30, # Agent模式下最大上下文记忆轮次
"agent_max_steps": 15, # Agent模式下单次运行最大决策步数
"agent_max_context_turns": 20, # Agent模式下最大上下文记忆轮次
"agent_max_steps": 20, # Agent模式下单次运行最大决策步数
"enable_thinking": False, # Enable deep-thinking mode for thinking-capable models
"knowledge": True, # 是否开启知识库功能
# Per-skill runtime config. Nested keys are flattened to env vars at startup
# using the rule: skill[<name>][<key>] -> SKILL_<NAME>_<KEY>
# (e.g. skill["image-generation"].model -> SKILL_IMAGE_GENERATION_MODEL).
"skill": {},
}
@@ -210,13 +235,13 @@ class Config(dict):
def __getitem__(self, key):
# 跳过以下划线开头的注释字段
if not key.startswith("_") and key not in available_setting:
logger.warning("[Config] key '{}' not in available_setting, may not take effect".format(key))
logger.debug("[Config] key '{}' not in available_setting, may not take effect".format(key))
return super().__getitem__(key)
def __setitem__(self, key, value):
# 跳过以下划线开头的注释字段
if not key.startswith("_") and key not in available_setting:
logger.warning("[Config] key '{}' not in available_setting, may not take effect".format(key))
logger.debug("[Config] key '{}' not in available_setting, may not take effect".format(key))
return super().__setitem__(key, value)
def get(self, key, default=None):
@@ -366,12 +391,18 @@ def load_config():
"gemini_api_base": "GEMINI_API_BASE",
"minimax_api_key": "MINIMAX_API_KEY",
"minimax_api_base": "MINIMAX_API_BASE",
"deepseek_api_key": "DEEPSEEK_API_KEY",
"deepseek_api_base": "DEEPSEEK_API_BASE",
"qianfan_api_key": "QIANFAN_API_KEY",
"qianfan_api_base": "QIANFAN_API_BASE",
"zhipu_ai_api_key": "ZHIPU_AI_API_KEY",
"zhipu_ai_api_base": "ZHIPU_AI_API_BASE",
"moonshot_api_key": "MOONSHOT_API_KEY",
"moonshot_api_base": "MOONSHOT_API_BASE",
"ark_api_key": "ARK_API_KEY",
"ark_api_base": "ARK_API_BASE",
"dashscope_api_key": "DASHSCOPE_API_KEY",
"dashscope_api_base": "DASHSCOPE_API_BASE",
# Channel credentials (used by skills that check env vars)
"feishu_app_id": "FEISHU_APP_ID",
"feishu_app_secret": "FEISHU_APP_SECRET",
@@ -382,7 +413,8 @@ def load_config():
"wechatcomapp_agent_id": "WECHATCOMAPP_AGENT_ID",
"wechatcomapp_secret": "WECHATCOMAPP_SECRET",
"qq_app_id": "QQ_APP_ID",
"qq_app_secret": "QQ_APP_SECRET"
"qq_app_secret": "QQ_APP_SECRET",
"weixin_token": "WEIXIN_TOKEN",
}
injected = 0
for conf_key, env_key in _CONFIG_TO_ENV.items():
@@ -391,18 +423,51 @@ def load_config():
if val:
os.environ[env_key] = str(val)
injected += 1
injected += _sync_skill_config_to_env(config.get("skill", {}))
if injected:
logger.info("[INIT] Synced {} config values to environment variables".format(injected))
config.load_user_datas()
def _sync_skill_config_to_env(skill_section) -> int:
"""Flatten skill-namespaced config into environment variables.
Mapping rule: ``config["skill"][<name>][<key>]`` -> ``SKILL_<NAME>_<KEY>``
(e.g. ``skill["image-generation"].model`` -> ``SKILL_IMAGE_GENERATION_MODEL``).
This lets subprocess-based skill scripts read their own settings without
importing project code. Existing env vars are NOT overwritten so the
real environment always wins.
Returns the number of variables actually injected.
"""
if not isinstance(skill_section, dict):
return 0
injected = 0
for skill_name, skill_conf in skill_section.items():
if not isinstance(skill_conf, dict):
continue
name_part = str(skill_name).replace("-", "_").upper()
for key, val in skill_conf.items():
if val is None or val == "":
continue
env_key = "SKILL_{}_{}".format(name_part, str(key).upper())
if env_key in os.environ:
continue
os.environ[env_key] = str(val)
injected += 1
return injected
def get_root():
return os.path.dirname(os.path.abspath(__file__))
def read_file(path):
with open(path, mode="r", encoding="utf-8") as f:
with open(path, mode="r", encoding="utf-8-sig") as f:
return f.read()

View File

@@ -4,32 +4,54 @@ LABEL maintainer="foo@bar.com"
ARG TZ='Asia/Shanghai'
ARG CHATGPT_ON_WECHAT_VER
# Set to "false" to skip Playwright/Chromium and produce a smaller image
ARG INSTALL_BROWSER=true
# Set to "true" to use China mirrors for apt / pip / playwright (faster in CN)
ARG USE_CN_MIRROR=false
RUN echo /etc/apt/sources.list
# RUN sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list
ENV PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright
ENV BUILD_PREFIX=/app
# Optionally switch apt and pip to China mirrors
RUN if [ "$USE_CN_MIRROR" = "true" ]; then \
sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list; \
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/; \
fi
ADD . ${BUILD_PREFIX}
# All heavy installs + user creation in ONE layer to avoid chown duplication
RUN apt-get update \
&&apt-get install -y --no-install-recommends bash ffmpeg espeak libavcodec-extra\
&& apt-get install -y --no-install-recommends bash ffmpeg espeak libavcodec-extra \
&& cd ${BUILD_PREFIX} \
&& cp config-template.json config.json \
&& /usr/local/bin/python -m pip install --no-cache --upgrade pip \
&& pip install --no-cache -r requirements.txt \
&& pip install --no-cache -r requirements-optional.txt \
&& pip install azure-cognitiveservices-speech
&& pip install --no-cache -e . \
&& if [ "$INSTALL_BROWSER" = "true" ]; then \
apt-get install -y --no-install-recommends fonts-wqy-zenhei \
&& pip install --no-cache "playwright==1.52.0" \
&& python -m playwright install-deps chromium \
&& mkdir -p /app/ms-playwright \
&& if [ "$USE_CN_MIRROR" = "true" ]; then \
PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright \
python -m playwright install chromium; \
else \
python -m playwright install chromium; \
fi; \
fi \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /home/agent/cow \
&& groupadd -r agent \
&& useradd -r -g agent -s /bin/bash -d /home/agent agent \
&& chown -R agent:agent /home/agent ${BUILD_PREFIX} /usr/local/lib
WORKDIR ${BUILD_PREFIX}
ADD docker/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh \
&& mkdir -p /home/noroot \
&& groupadd -r noroot \
&& useradd -r -g noroot -s /bin/bash -d /home/noroot noroot \
&& chown -R noroot:noroot /home/noroot ${BUILD_PREFIX} /usr/local/lib
USER noroot
&& chown agent:agent /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@@ -5,22 +5,42 @@ services:
container_name: chatgpt-on-wechat
security_opt:
- seccomp:unconfined
ports:
- "9899:9899"
environment:
CHANNEL_TYPE: 'web'
OPEN_AI_API_KEY: 'YOUR API KEY'
MODEL: ''
PROXY: ''
SINGLE_CHAT_PREFIX: '["bot", "@bot"]'
SINGLE_CHAT_REPLY_PREFIX: '"[bot] "'
GROUP_CHAT_PREFIX: '["@bot"]'
GROUP_NAME_WHITE_LIST: '["ChatGPT测试群", "ChatGPT测试群2"]'
IMAGE_CREATE_PREFIX: '["画", "看", "找"]'
CONVERSATION_MAX_TOKENS: 1000
SPEECH_RECOGNITION: 'False'
CHARACTER_DESC: '你是基于大语言模型的AI智能助手旨在回答并解决人们的任何问题并且可以使用多种语言与人交流。'
EXPIRES_IN_SECONDS: 3600
USE_GLOBAL_PLUGIN_CONFIG: 'True'
CHANNEL_TYPE: 'weixin'
MODEL: 'deepseek-v4-flash'
DEEPSEEK_API_KEY: ''
DEEPSEEK_API_BASE: 'https://api.deepseek.com/v1'
MINIMAX_API_KEY: ''
ZHIPU_AI_API_KEY: ''
ARK_API_KEY: ''
MOONSHOT_API_KEY: ''
DASHSCOPE_API_KEY: ''
CLAUDE_API_KEY: ''
CLAUDE_API_BASE: 'https://api.anthropic.com/v1'
OPEN_AI_API_KEY: ''
OPEN_AI_API_BASE: 'https://api.openai.com/v1'
GEMINI_API_KEY: ''
GEMINI_API_BASE: 'https://generativelanguage.googleapis.com'
VOICE_TO_TEXT: 'openai'
TEXT_TO_VOICE: 'openai'
VOICE_REPLY_VOICE: 'False'
SPEECH_RECOGNITION: 'True'
GROUP_SPEECH_RECOGNITION: 'False'
USE_LINKAI: 'False'
AGENT: 'True'
LINKAI_API_KEY: ''
LINKAI_APP_CODE: ''
FEISHU_APP_ID: ''
FEISHU_APP_SECRET: ''
DINGTALK_CLIENT_ID: ''
DINGTALK_CLIENT_SECRET: ''
WECOM_BOT_ID: ''
WECOM_BOT_SECRET: ''
WEB_PASSWORD: ''
AGENT: 'True'
AGENT_MAX_CONTEXT_TOKENS: 50000
AGENT_MAX_CONTEXT_TURNS: 20
AGENT_MAX_STEPS: 20
volumes:
- ./cow:/home/agent/cow

View File

@@ -43,9 +43,15 @@ fi
# fi
# go to prefix dir
# fix ownership of mounted volumes then drop to non-root user
if [ "$(id -u)" = "0" ]; then
mkdir -p /home/agent/cow
chown agent:agent /home/agent/cow
exec su agent -s /bin/bash -c "cd $CHATGPT_ON_WECHAT_PREFIX && $CHATGPT_ON_WECHAT_EXEC"
fi
# fallback: already running as agent
cd $CHATGPT_ON_WECHAT_PREFIX
# excute
$CHATGPT_ON_WECHAT_EXEC

View File

@@ -1,185 +0,0 @@
# CowAgent介绍
## 概述
Cow项目从简单的聊天机器人全面升级为超级智能助理 **CowAgent**能够主动规思考和规划任务、拥有长期记忆、操作计算机和外部资源、创造和执行Skill真正理解你并和你一起成长。CowAgent能够长期运行在个人电脑或服务器中通过飞书、钉钉、企业微信、网页等多种方式进行交互。核心能力如下
- **复杂任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标,支持多轮推理和上下文理解
- **工具系统**内置实现10+种工具包括文件读写、bash终端、浏览器、定时任务、记忆管理等通过Agent管理你的计算机或服务器
- **长期记忆**:自动将对话记忆持久化至本地文件和数据库中,包括全局记忆和天级记忆,支持关键词及向量检索
- **Skills系统**新增Skill运行引擎内置多种技能并支持通过自然语言对话完成自定义Skills开发
- **多渠道和多模型支持**支持在Web、飞书、钉钉、企微等多渠道与Agent交互支持Claude、Gemini、OpenAI、GLM、MiniMax、Qwen、Kimi、Doubao 等多种国内外主流模型
- **安全和成本**通过秘钥管理工具、提示词控制、系统权限等手段控制Agent的访问安全通过最大记忆轮次、最大上下文token、工具执行步数对token成本进行限制
## 核心功能
### 1. 长期记忆
> 记忆系统让 Agent 能够长期记住重要信息。Agent 会在用户分享偏好、决策、事实等重要信息时主动存储,也会在对话达到一定长度时自动提取摘要。记忆分为核心记忆、天级记忆,支持语义搜索和向量检索的混合检索模式。
第一次启动Agent会主动向用户获取询问关键信息并记录至工作空间 (默认为 ~/cow) 中的智能体设定、用户身份、记忆文件中。
在后续的长期对话中Agent会在需要的时候智能记录或检索记忆并对自身设定、用户偏好、记忆文件等进行不断更新总结和记录经验和教训真正实现自主思考和不断成长。
<img width="800" src="https://cdn.link-ai.tech/doc/20260203000455.png" />
### 2. 任务规划和工具调用
工具是Agent访问操作系统资源的核心Agent会根据任务需求智能选择和调用工具完成文件读写、命令执行、定时任务等各类操作。内置工具的视线在项目的 `tools` 目录下。
**主要工具:** 文件读写编辑、Bash终端、浏览器、文件发送、定时调度、记忆搜索、环境配置等。
#### 1.1 终端和文件访问能力
针对操作系统的终端和文件的访问能力是最基础和核心的工具其他很多工具或技能都是基于基础工具进行扩展。用户可通过手机端与Agent交互操作个人电脑或服务器上的资源
<img width="800" src="https://cdn.link-ai.tech/doc/20260202181130.png" />
#### 1.2 编程能力
基于编程能力和系统访问能力Agent可以实现从信息搜索、图片等素材生成、编码、测试、部署、Nginx配置修改、发布的 Vibecoding 全流程通过手机端简单的一句命令完成应用的快速demo
<img width="800" src="https://cdn.link-ai.tech/doc/20260203121008.png" />
#### 1.3 定时任务
基于 scheduler 工具实现动态定时任务,支持 **一次性任务、固定时间间隔、Cron表达式** 三种形式,任务触发可选择**固定消息发送** 或 **Agent动态任务** 执行两种模式,有很高灵活性:
<img width="800" src="https://cdn.link-ai.tech/doc/20260202195402.png" />
同时你也可以通过自然语言快速查看和管理已有的定时任务。
#### 1.4 环境变量管理
技能所需要的秘钥存储在环境变量文件中,由 `env_config` 工具进行管理,你可以通过对话的方式更新秘钥,工具内置了安全保护和脱敏策略,会严格保护秘钥安全:
<img width="800" src="https://cdn.link-ai.tech/doc/20260202234939.png" />
### 3. 技能系统
> 技能系统为Agent提供无限的扩展性每个Skill由说明文件、运行脚本 (可选)、资源 (可选) 组成描述如何完成特定类型的任务。通过Skill可以让Agent遵循说明完成复杂流程调用各类工具或对接第三方系统等。
- **内置技能:** 在项目的`skills`目录下包含技能创造器、网络搜索、图像识别openai-image-vision、LinkAI智能体、网页抓取等。内置Skill根据依赖条件 (API Key、系统命令等) 自动判断是否启用。通过技能创造器可以快速创建自定义技能。
- **自定义技能:** 由用户通过对话创建,存放在工作空间中 (`~/cow/skills/`),基于自定义技能可以实现任何复杂的业务流程和第三方系统对接。
#### 3.1 创建技能
通过 `skill-creator` 技能可以通过对话的方式快速创建技能。你可以在与Agent的写作中让他对将某个工作流程固化为技能或者把任意接口文档和示例发送给Agent让他直接完成对接
<img width="800" src="https://cdn.link-ai.tech/doc/20260202202247.png" />
#### 3.2 搜索和图像识别
- **搜索技能:** 系统内置实现了 `bocha-search`(博查搜索)的Skill依赖环境变量 `BOCHA_SEARCH_API_KEY`,可在[控制台](https://open.bochaai.com/)进行创建并发送给Agent完成配置
- **图像识别技能:** 实现了 `openai-image-vision` 插件,可使用 gpt-4.1-mini、gpt-4.1 等图像识别模型。依赖秘钥 `OPENAI_API_KEY`可通过config.json或env_config工具进行维护。
<img width="800" src="https://cdn.link-ai.tech/doc/20260202213219.png" />
#### 3.3 三方知识库和插件
`linkai-agent` 技能可以将 [LinkAI](https://link-ai.tech/) 上的所有智能体作为skill交给Agent使用并实现多智能体决策的效果。
使用方式:需通过对话的方式配置 `LINKAI_API_KEY`或在config.json中添加 `linkai_api_key`。 并在 `skills/linkai-agent/config.json`中添加智能体说明,示例如下:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI客服助手",
"app_description": "当用户需要了解LinkAI平台相关问题时才选择该助手基于LinkAI知识库进行回答"
},
{
"app_code": "SFY5x7JR",
"app_name": "内容创作助手",
"app_description": "当用户需要创作图片或视频时才使用该助手支持Nano Banana、Seedream、即梦、Veo、可灵等多种模型"
}
]
}
```
Agent可根据智能体的名称和描述进行决策并通过 app_code 调用接口访问对应的应用/工作流通过该技能可以灵活访问LinkAI平台上的智能体、知识库、插件等能力实现效果如下
<img width="750" src="https://cdn.link-ai.tech/doc/20260202234350.png" />
注:需通过 `env_config` 配置 `LINKAI_API_KEY`或在config.json中添加 `linkai_api_key` 配置。
## 使用方式
> 详细使用方式参考项目README.md文档进行
### 1.项目运行
在命令行中执行:
```bash
bash <(curl -sS https://cdn.link-ai.tech/code/cow/run.sh)
```
详细说明及后续程序管理参考:[项目启动脚本](https://github.com/zhayujie/chatgpt-on-wechat/wiki/CowAgentQuickStart)
### 2.模型选择
Agent模式推荐使用以下模型可根据效果及成本综合选择
- **MiniMax**: `MiniMax-M2.5`
- **GLM**: `glm-5`
- **Kimi**: `kimi-k2.5`
- **Doubao**: `doubao-seed-2-0-code-preview-260215`
- **Qwen**: `qwen3.5-plus`
- **Claude**: `claude-sonnet-4-6`
- **Gemini**: `gemini-3.1-flash-lite-preview`
- **OpenAI**: `gpt-5.4`
详细模型配置方式参考 [README.md 模型说明](../README.md#模型说明)
### 3.Agent核心配置
Agent模式的核心配置项如下`config.json` 中配置:
```bash
{
"agent": true, # 是否启用Agent模式
"agent_workspace": "~/cow", # Agent工作空间路径
"agent_max_context_tokens": 40000, # 最大上下文tokens
"agent_max_context_turns": 30, # 最大上下文记忆轮次
"agent_max_steps": 15 # 单次任务最大决策步数
}
```
**配置说明:**
- `agent`: 设为 `true` 启用Agent模式获得多轮工具决策、长期记忆、Skills等能力
- `agent_workspace`: 工作空间路径,用于存储 memory、skills、其他系统设定提示词
- `agent_max_context_tokens`: 上下文token上限超出将自动丢弃最早的对话
- `agent_max_context_turns`: 上下文记忆轮次,每轮包括一次提问和回复
- `agent_max_steps`: 单次任务最大工具调用步数,防止无限循环
### 4.渠道接入
Agent支持在多种渠道中使用只需修改 `config.json` 中的 `channel_type` 配置即可切换。
- **Web网页**:默认使用该渠道,运行后监听本地端口,通过浏览器访问
- **飞书接入**[飞书接入文档](https://docs.link-ai.tech/cow/multi-platform/feishu)
- **钉钉接入**[钉钉接入文档](https://docs.link-ai.tech/cow/multi-platform/dingtalk)
- **企业微信应用接入**[企微应用文档](https://docs.link-ai.tech/cow/multi-platform/wechat-com)
- **企微智能机器人**[企微智能机器人文档](https://docs.link-ai.tech/cow/multi-platform/wecom-bot)
- **QQ机器人**[QQ机器人文档](https://docs.link-ai.tech/cow/multi-platform/qq)
更多渠道配置参考:[通道说明](../README.md#通道说明)

View File

@@ -3,67 +3,109 @@ title: 飞书
description: 将 CowAgent 接入飞书应用
---
通过自建应用 CowAgent 接入飞书,需要是飞书企业用户且具有企业管理权限
> 通过飞书自建应用接入 CowAgent,支持单聊与群聊(@机器人),使用 WebSocket 长连接模式,无需公网 IP支持流式打字机回复、语音消息收发
## 一、创建企业自建应用
<Note>
接入需要是飞书企业用户且具有企业管理权限。
</Note>
### 1. 创建应用
## 一、接入方式
进入 [飞书开发平台](https://open.feishu.cn/app/),点击 **创建企业自建应用**,填写必要信息后点击 **创建**
### 方式一:扫码一键接入(推荐)
启动 Cow 项目后在终端中即可完成扫码创建。或打开 Web 控制台本地链接http://127.0.0.1:9899 ),选择 **通道** 菜单,点击 **接入通道**,选择 **飞书**,点击 **一键创建飞书应用**,使用 **飞书 App** 扫描二维码即可自动完成应用创建并接入:
<img src="https://cdn.link-ai.tech/doc/20260505181126.png" width="800"/>
<Note>
1. `lark-oapi` 依赖版本需要 >=1.5.5
2. 扫码创建出的应用会自动预置全部所需权限(消息收发、卡片读写、群聊事件等)和事件订阅,无需到开发者后台手动配置。
</Note>
### 方式二:手动创建接入
需要先在飞书开放平台创建自建应用并配置权限,再通过 Web 控制台或配置文件接入。
**步骤一:创建应用**
1. 进入 [飞书开发平台](https://open.feishu.cn/app/),点击 **创建企业自建应用**
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-create-app.jpg" width="500"/>
### 2. 添加机器人能力
在 **添加应用能力** 菜单中,为应用添加 **机器人** 能力:
2. 在 **添加应用能力** 中,为应用添加 **机器人** 能力
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-add-bot.jpg" width="800"/>
### 3. 配置应用权限
点击 **权限管理**,复制以下权限配置,粘贴到 **权限配置** 下方的输入框内,全选筛选出来的权限,点击 **批量开通** 并确认:
3. 在 **权限管理** 中,将以下权限粘贴到输入框,全选并 **批量开通**
```
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource,cardkit:card:write
```
<img src="https://cdn.link-ai.tech/doc/feishu-hosting-add-auth2.png" width="800"/>
## 二、项目配置
1. 在 **凭证与基础信息** 中获取 `App ID` 和 `App Secret`
4. 在 **凭证与基础信息** 中获取 `App ID` 和 `App Secret`
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-appid-secret.jpg" width="800"/>
2. 将以下配置加入项目根目录的 `config.json` 文件:
**步骤二:接入 CowAgent**
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_bot_name": "YOUR_BOT_NAME"
}
```
<Tabs>
<Tab title="Web 控制台">
打开 Web 控制台,选择 **通道** 菜单,点击 **接入通道**,选择 **飞书**切换到「手动填写」Tab输入 App ID 和 App Secret点击接入即可。
</Tab>
<Tab title="配置文件">
在 `config.json` 中添加以下配置后启动程序:
| 参数 | 说明 |
| --- | --- |
| `feishu_app_id` | 飞书机器人应用 App ID |
| `feishu_app_secret` | 飞书机器人 App Secret |
| `feishu_bot_name` | 飞书机器人名称(创建应用时设置),群聊中使用依赖此配置 |
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_stream_reply": true
}
```
配置完成后启动项目。
| 参数 | 说明 | 默认值 |
| --- | --- | --- |
| `feishu_app_id` | 飞书应用 App ID | - |
| `feishu_app_secret` | 飞书应用 App Secret | - |
| `feishu_stream_reply` | 是否开启流式打字机回复 | `true` |
</Tab>
</Tabs>
## 三、配置事件订阅
**步骤三:发布应用**
1. 成功运行项目后,在飞书开放平台点击 **事件与回调**,选择 **长连接** 方式,点击保存:
1. 启动 Cow 项目后,在飞书开放平台点击 **事件与回调**,选择 **长连接** 模式并保存:
<img src="https://cdn.link-ai.tech/doc/202601311731183.png" width="600"/>
2. 点击下方的 **添加事件**,搜索 "接收消息",选择 "**接收消息v2.0**",确认添加
2. 点击 **添加事件**,搜索 "接收消息",选择 **接收消息 v2.0** 并确认
3. 点击 **版本管理与发布**,创建版本并申请 **线上发布**,在飞书客户端查看审批消息并审核通过:
3. 点击 **版本管理与发布**,创建版本并申请 **线上发布**,在飞书客户端审核通过:
<img src="https://cdn.link-ai.tech/doc/202601311807356.png" width="600"/>
完成后在飞书中搜索机器人名称,即可开始对话。
## 二、功能说明
| 功能 | 支持情况 |
| --- | --- |
| 单聊 | ✅ |
| 群聊(@机器人) | ✅ |
| 文本消息 | ✅ 收发 |
| 图片消息 | ✅ 收发 |
| 语音消息 | ✅ 收发 |
| 流式回复 | ✅(通过 `feishu_stream_reply` 配置控制,默认开启) |
<Note>
流式回复需要机器人具备 `cardkit:card:write` 权限(一键创建已默认开通),且接收方飞书客户端版本 ≥ 7.20。低版本客户端会显示升级提示,权限或版本不满足时自动降级为普通文本回复。
</Note>
## 三、使用
完成接入后,在飞书中搜索机器人名称即可开始单聊对话。
如需在群聊中使用,将机器人添加到群中,@机器人发送消息即可。

View File

@@ -10,7 +10,9 @@ Web 控制台是 CowAgent 的默认通道,启动后会自动运行,通过浏
```json
{
"channel_type": "web",
"web_port": 9899
"web_port": 9899,
"web_password": "",
"enable_thinking": false
}
```
@@ -18,6 +20,11 @@ Web 控制台是 CowAgent 的默认通道,启动后会自动运行,通过浏
| --- | --- | --- |
| `channel_type` | 设为 `web` | `web` |
| `web_port` | Web 服务监听端口 | `9899` |
| `web_password` | 访问密码,留空表示不启用密码保护 | `""` |
| `web_session_expire_days` | 登录会话有效天数 | `30` |
| `enable_thinking` | 是否启用深度思考模式 | `false` |
配置密码后,访问控制台时需先输入密码完成登录。登录状态默认保持 30 天,期间重启服务也无需重新登录。密码也支持在控制台的「配置」页面中在线修改。
## 访问地址
@@ -34,10 +41,20 @@ Web 控制台是 CowAgent 的默认通道,启动后会自动运行,通过浏
### 对话界面
支持流式输出,可实时展示 Agent 的思考过程Reasoning和工具调用过程Tool Calls更直观地观察 Agent 的决策过程
支持流式输出,可实时展示 Agent 的思考过程Reasoning和工具调用过程Tool Calls更直观地观察 Agent 的决策过程。深度思考功能可通过配置或控制台的「Agent 配置」开关控制。
<img width="850" src="https://cdn.link-ai.tech/doc/20260227180120.png" />
#### 多会话管理
对话界面支持多会话Session管理所有会话记录持久化存储在数据库中
- **会话列表**:点击左侧历史会话图标可展开/收起会话列表面板,支持滚动加载全部历史会话
- **AI 生成标题**:新会话在首轮对话完成后,自动调用模型生成简短的会话摘要标题
- **新建会话**:点击会话列表顶部的「新对话」按钮或输入区的 `+` 按钮创建新会话
- **删除会话**:点击会话项的删除按钮,确认后永久删除该会话及其所有消息
- **清除上下文**:点击输入区的清除按钮,在当前会话中插入一条分隔线,分隔线以上的消息仍然展示但不再作为模型的上下文输入
### 模型管理
支持在线管理模型配置,无需手动编辑配置文件:

View File

@@ -9,7 +9,23 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
智能机器人与企业微信自建应用是两种不同的接入方式。智能机器人使用 WebSocket 长连接,无需服务器公网 IP 和域名,配置更简单。
</Note>
## 一、创建智能机器人
## 一、接入方式
### 方式一:扫码一键接入(推荐)
无需提前创建机器人,启动 Cow 项目后打开 Web 控制台本地链接http://127.0.0.1:9899/),选择 **通道** 菜单,点击**接入通道**,选择**企微智能机器人**,切换到「扫码接入」模式,使用**企业微信**扫码即可自动完成机器人创建和接入。
<img src="https://cdn.link-ai.tech/doc/20260401121213.png" width="800"/>
<Note>
扫码成功后,可在企业微信工作台 - **智能机器人**页面对机器人进行进一步配置,包括修改名称、头像、可见范围等。
</Note>
### 方式二:手动创建接入
需要先在企业微信中创建智能机器人并获取 Bot ID 和 Secret再通过 Web 控制台或配置文件接入。
**步骤一:创建智能机器人**
1. 打开企业微信客户端,进入工作台,点击**智能机器人**
@@ -25,34 +41,35 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
4. 设置机器人名称、头像、可见范围,并选择**长连接模式**,记录下 **Bot ID** 和 **Secret** 信息后点击保存。
## 二、配置和运行
**步骤二:接入 CowAgent**
### 方式一Web 控制台接入
<Tabs>
<Tab title="Web 控制台">
打开 Web 控制台,选择**通道**菜单,点击**接入通道**,选择**企微智能机器人**,切换到「手动填写」模式,输入 Bot ID 和 Secret点击接入即可。
启动Cow项目后打开 Web 控制台 (本地链接为: http://127.0.0.1:9899/ ),选择 **通道** 菜单,点击 **接入通道**,选择 **企微智能机器人**,填写上一步保存的 Bot ID 和 Secret点击接入即可。
<img src="https://cdn.link-ai.tech/doc/20260316181711.png" width="800"/>
</Tab>
<Tab title="配置文件">
在 `config.json` 中添加以下配置后启动程序:
<img src="https://cdn.link-ai.tech/doc/20260316181711.png" width="800"/>
```json
{
"channel_type": "wecom_bot",
"wecom_bot_id": "YOUR_BOT_ID",
"wecom_bot_secret": "YOUR_SECRET"
}
```
### 方式二:配置文件接入
| 参数 | 说明 |
| --- | --- |
| `wecom_bot_id` | 智能机器人的 BotID |
| `wecom_bot_secret` | 智能机器人的 Secret |
</Tab>
</Tabs>
在 `config.json` 中添加以下配置:
日志显示 `[WecomBot] Subscribe success` 即表示连接成功。
```json
{
"channel_type": "wecom_bot",
"wecom_bot_id": "YOUR_BOT_ID",
"wecom_bot_secret": "YOUR_SECRET"
}
```
| 参数 | 说明 |
| --- | --- |
| `wecom_bot_id` | 智能机器人的 BotID |
| `wecom_bot_secret` | 智能机器人的 Secret |
配置完成后启动程序,日志显示 `[WecomBot] Subscribe success` 即表示连接成功。
## 三、功能说明
## 二、功能说明
| 功能 | 支持情况 |
| --- | --- |
@@ -64,7 +81,7 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
| 流式回复 | ✅ |
| 定时任务主动推送 | ✅ |
## 、使用
## 、使用
在企业微信中搜索创建的机器人名称,即可开始单聊对话。

74
docs/channels/weixin.mdx Normal file
View File

@@ -0,0 +1,74 @@
---
title: 微信
description: 将 CowAgent 接入个人微信(基于官方接口)
---
> 接入个人微信扫码登录即可使用支持文本、图片、语音、文件、视频等消息的私聊收发。通过微信官方API进行接入无安全风险接入后会在会话中新增一个机器人助手不影响当前账号的使用。
## 一、配置和运行
### 方式一Web 控制台接入
启动 Cow 项目后打开 Web 控制台 (本地链接为: http://127.0.0.1:9899/ ),选择 **通道** 菜单,点击 **接入通道**,选择 **微信**,点击接入后按照提示扫码登录。
<img src="https://cdn.link-ai.tech/doc/20260322195114.png" width="800" />
### 方式二:配置文件接入
在 `config.json` 中设置 `channel_type` 为 `weixin`
```json
{
"channel_type": "weixin"
}
```
启动程序后,终端会显示二维码,使用微信扫码授权即可完成登录。
<img src="https://cdn.link-ai.tech/doc/20260322195509.png" width="800" />
<Note>
1. 兼容历史配置:`channel_type` 设为 `wx` 同样可以启用微信通道。
2. 注意微信客户端需要更新至 8.0.69 版本或以上
</Note>
## 二、使用说明
微信扫码并进行授权确认后,即可完成接入并开始对话。接入微信后会在对话中创建出一个机器人助理,不会对已有账号的正常使用有任何影响。
> 你可以通过搜索"微信ClawBot"随时找到这个机器人,还可以修改这个机器人的头像、备注等信息,将机器人置顶在消息列表等。
<img src="https://cdn.link-ai.tech/doc/83ae8251d896219fde4803f4205205be.jpg" width="250" />
## 三、登录说明
### 扫码登录
首次启动时,终端会显示一个二维码(有效期约 2 分钟)。使用微信扫描二维码并在手机上确认后即可完成登录。
- 二维码过期后会自动刷新并重新显示
- `requirements.txt` 中已默认包含 `qrcode` 依赖,安装后可在终端直接渲染二维码图案
### 凭证保存
登录成功后,凭证会自动保存至 `~/.weixin_cow_credentials.json`,下次启动时无需重新扫码。
如需重新登录,删除该凭证文件后重启程序即可。
### Session 过期
当微信 session 过期时errcode -14程序会自动清除旧凭证并重新发起扫码登录无需手动干预。
## 四、功能说明
| 功能 | 支持情况 |
| --- | --- |
| 单聊 | ✅ |
| 文本消息 | ✅ 收发 |
| 图片消息 | ✅ 收发 |
| 文件消息 | ✅ 收发 |
| 视频消息 | ✅ 收发 |
| 语音消息 | ✅ 接收 (自带语音识别) |

116
docs/cli/general.mdx Normal file
View File

@@ -0,0 +1,116 @@
---
title: 常用命令
description: 查看状态、管理配置和上下文等常用命令
---
以下命令支持在对话中使用 `/` 前缀,也支持在终端中使用 `cow` 前缀(部分命令仅对话可用)。
<Tip>
在 Web 控制台中输入 `/` 会自动弹出命令提示,支持键盘上下选择和 Tab 补全。
</Tip>
## help
显示所有可用命令的帮助信息。
```text
/help
```
## status
查看当前会话和服务的运行状态,包括进程信息、模型配置、会话消息数量和已加载技能数量。
```text
/status
```
输出示例:
```
🐮 CowAgent Status
Process: PID 12345 | Running 2h 15m
Version: 2.0.4
Channel: web
Model: MiniMax-M2.5
Mode: agent
Session: 12 messages | 8 skills loaded
```
## config
查看或修改运行时配置。修改后立即生效,无需重启服务。
**查看所有可配置项:**
```text
/config
```
**查看单个配置项:**
```text
/config model
```
**修改配置项:**
```text
/config model deepseek-v4-flash
```
**支持修改的配置项:**
| 配置项 | 说明 | 示例值 |
| --- | --- | --- |
| `model` | AI 模型名称 | `deepseek-v4-flash` |
| `agent_max_context_tokens` | 最大上下文 tokens | `40000` |
| `agent_max_context_turns` | 最大上下文记忆轮次 | `30` |
| `agent_max_steps` | 单次任务最大决策步数 | `15` |
| `enable_thinking` | 是否启用深度思考模式 | `true` / `false` |
<Note>
修改 `model` 时,系统会自动匹配对应的模型调用方式。配置会写入 `config.json` 并持久保存。
</Note>
## context
查看当前会话的上下文信息,包括消息数量、内容长度等统计。
```text
/context
```
**清空当前会话上下文:**
```text
/context clear
```
<Tip>
清空上下文后Agent 会"忘记"之前的对话内容,适用于切换话题或释放上下文空间。
</Tip>
## logs
查看最近的服务日志,默认显示最近 20 行,最多 50 行。
```text
/logs
```
**指定行数:**
```text
/logs 50
```
## version
显示当前 CowAgent 版本号。
```text
/version
```

96
docs/cli/index.mdx Normal file
View File

@@ -0,0 +1,96 @@
---
title: 命令总览
description: CowAgent 命令系统 — 终端 CLI 和对话命令
---
CowAgent 提供两种命令交互方式:
- **终端CLI** — 在系统终端中执行 `cow <命令>`,用于服务管理、技能管理等运维操作
- **对话命令** — 在对话中输入 `/<命令>` 或 `cow <命令>`,用于查看状态、管理技能、调整配置等
## 终端命令
通过一键安装脚本部署后,`cow` 命令会自动可用。手动安装的用户需要在项目根目录下额外执行:
```bash
pip install -e .
```
安装后即可在任意位置使用 `cow` 命令:
```bash
cow help
```
输出示例:
```
CowAgent CLI
Usage: cow <command>
Service:
start Start the CowAgent service
stop Stop the CowAgent service
restart Restart the CowAgent service
update Update code and restart service
status Show service status
logs View service logs
Skills:
skill Manage skills (list / search / install / uninstall ...)
Memory & Knowledge:
memory Memory distillation (dream)
knowledge View knowledge base stats and structure
Others:
help Show this help message
version Show version
```
## 对话命令
在 Web 控制台或任意接入渠道的对话中,支持输入以 `/` 开头的命令:
| 命令 | 说明 |
| --- | --- |
| `/help` | 显示命令帮助 |
| `/status` | 查看服务状态和配置 |
| `/config` | 查看或修改运行时配置 |
| `/skill` | 管理技能(安装、卸载、启用、禁用等) |
| `/memory dream [N]` | 手动触发记忆蒸馏(默认 3 天,最大 30 |
| `/knowledge` | 查看知识库统计信息 |
| `/knowledge list` | 查看知识库目录结构 |
| `/knowledge on\|off` | 开启或关闭知识库 |
| `/context` | 查看当前会话上下文信息 |
| `/context clear` | 清空当前会话上下文 |
| `/logs` | 查看最近日志 |
| `/version` | 显示版本号 |
<Tip>
对话命令中 `/start`、`/stop`、`/restart` 等服务管理命令会提示到终端中执行,因为它们涉及进程操作。
</Tip>
## 命令对照表
以下是各命令在终端和对话中的可用性:
| 命令 | 终端 (`cow`) | 对话 (`/`) |
| --- | :---: | :---: |
| help | ✓ | ✓ |
| version | ✓ | ✓ |
| status | ✓ | ✓ |
| logs | ✓ | ✓ |
| config | ✗ | ✓ |
| context | — | ✓ |
| memory (子命令) | ✗ | ✓ |
| knowledge (子命令) | ✓ | ✓ |
| skill (子命令) | ✓ | ✓ |
| start / stop / restart | ✓ | ✗ |
| update | ✓ | ✗ |
| install-browser | ✓ | ✗ |
<Note>
`context` 在终端中仅提示到对话中使用。`config` 仅支持在对话中修改。
</Note>

View File

@@ -0,0 +1,77 @@
---
title: 记忆与知识库
description: 记忆蒸馏和知识库管理命令
---
## memory
管理 Agent 的长期记忆系统。
### memory dream
手动触发记忆蒸馏Deep Dream整理近期的天级记忆蒸馏合并到 MEMORY.md并生成梦境日记。
```text
/memory dream [N]
```
- `N`:整理近 N 天的记忆,默认 3 天,最大 30 天
- 蒸馏在后台异步执行,完成后会在对话中通知结果
- 无需等待 Agent 初始化,首次对话前即可使用
**示例:**
```text
/memory dream # 整理近 3 天
/memory dream 7 # 整理近 7 天
/memory dream 30 # 整理近 30 天(全量)
```
蒸馏完成后Web 端会收到带有跳转链接的通知,可直接查看更新后的 MEMORY.md 和梦境日记。
<Tip>
系统每天 23:55 会自动执行一次蒸馏lookback 1 天)。手动触发适用于首次部署后的历史整理,或需要立即更新记忆时使用。
</Tip>
## knowledge
查看和管理个人知识库。默认显示知识库统计信息。
```text
/knowledge
```
输出示例:
```
📚 知识库
- 状态:已开启
- 页面数12
- 总大小45.2 KB
- 分类明细:
- concepts/: 5 篇
- entities/: 4 篇
- sources/: 3 篇
```
### knowledge list
查看知识库目录树结构。
```text
/knowledge list
```
### knowledge on / off
开启或关闭知识库。关闭后不再注入知识提示词和索引知识文件。
```text
/knowledge on
/knowledge off
```
<Note>
终端 CLI 中 `cow knowledge` 和 `cow knowledge list` 可用,但 `on|off` 仅支持在对话中使用(需实时生效)。
</Note>

134
docs/cli/process.mdx Normal file
View File

@@ -0,0 +1,134 @@
---
title: 进程管理
description: 使用 cow 命令管理 CowAgent 进程的启动、停止、重启、更新等操作
---
进程管理命令用于控制 CowAgent 后台进程的生命周期。这些命令仅在终端中可用。
## start
启动 CowAgent 服务。默认以后台进程方式运行,并自动跟踪日志输出。
```bash
cow start
```
**选项:**
| 选项 | 说明 |
| --- | --- |
| `-f`, `--foreground` | 前台运行,不以后台守护进程方式启动 |
| `--no-logs` | 启动后不自动跟踪日志 |
## stop
停止正在运行的 CowAgent 服务。
```bash
cow stop
```
## restart
重启 CowAgent 服务(先停止再启动)。
```bash
cow restart
```
**选项:**
| 选项 | 说明 |
| --- | --- |
| `--no-logs` | 重启后不自动跟踪日志 |
## update
更新代码并重启服务。自动执行以下流程:
1. 拉取最新代码(`git pull`
2. 停止当前服务
3. 更新 Python 依赖
4. 重新安装 CLI
5. 启动服务
```bash
cow update
```
<Warning>
如果 `git pull` 失败(如存在本地未提交的修改),更新会中止,服务不受影响。
</Warning>
## status
查看 CowAgent 服务运行状态,包括进程信息、版本号、当前配置的模型和通道。
```bash
cow status
```
输出示例:
```
🐮 CowAgent Status
Status: ● Running (PID: 12345)
Version: 2.0.4
Channel: web
Model: MiniMax-M2.5
Mode: agent
```
## logs
查看服务日志。
```bash
cow logs
```
**选项:**
| 选项 | 说明 | 默认值 |
| --- | --- | --- |
| `-f`, `--follow` | 持续跟踪日志输出 | 否 |
| `-n`, `--lines` | 显示最近 N 行 | 50 |
示例:
```bash
# 查看最近 100 行日志
cow logs -n 100
# 持续跟踪日志
cow logs -f
```
## install-browser
安装 Playwright 和 Chromium 浏览器,用于启用 [浏览器工具](/tools/browser)。
```bash
cow install-browser
```
<Tip>
仅在需要使用浏览器工具(如网页浏览、截图等)时才需要安装。
</Tip>
## run.sh 兼容
如果未安装 Cow CLI也可以使用 `run.sh` 脚本管理服务:
| cow 命令 | run.sh 等效命令 |
| --- | --- |
| `cow start` | `./run.sh start` |
| `cow stop` | `./run.sh stop` |
| `cow restart` | `./run.sh restart` |
| `cow update` | `./run.sh update` |
| `cow status` | `./run.sh status` |
| `cow logs` | `./run.sh logs` |
<Note>
推荐使用 `cow` 命令,它提供更简洁的语法和更丰富的功能。通过一键安装脚本部署时 `cow` 命令会自动安装。
</Note>

218
docs/cli/skill.mdx Normal file
View File

@@ -0,0 +1,218 @@
---
title: 技能管理
description: 通过命令安装、卸载、启用、禁用和管理技能
---
技能管理命令用于安装、查询和管理 CowAgent 的技能。在对话中使用 `/skill <子命令>`,在终端中使用 `cow skill <子命令>`。
## list
列出已安装的技能及其状态。
<CodeGroup>
```text 对话
/skill list
```
```bash 终端
cow skill list
```
</CodeGroup>
输出示例:
```
📦 已安装的技能 (3/4)
✅ pptx
Use this skill any time a .pptx file is involved…
来源: cowhub
✅ skill-creator
Create, install, or update skills…
来源: builtin
⏸️ image-vision (已禁用)
图片理解和视觉分析
来源: builtin
```
**浏览技能广场**(查看 Hub 上所有可安装的技能):
<CodeGroup>
```text 对话
/skill list --remote
```
```bash 终端
cow skill list --remote
```
</CodeGroup>
**选项:**
| 选项 | 说明 | 默认值 |
| --- | --- | --- |
| `--remote`, `-r` | 浏览 Skill Hub 远程技能列表 | 否 |
| `--page` | 远程列表分页页码 | 1 |
## search
在技能广场中搜索技能。
<CodeGroup>
```text 对话
/skill search pptx
```
```bash 终端
cow skill search pptx
```
</CodeGroup>
## install
安装技能。通过统一的 `install` 命令,可一键安装来自 **Cow 技能广场、GitHub、ClawHub** 以及任意 URLzip 压缩包、SKILL.md 链接)上的技能,无需手动下载和配置。
**从 Cow 技能广场安装(推荐):**
<CodeGroup>
```text 对话
/skill install pptx
```
```bash 终端
cow skill install pptx
```
</CodeGroup>
**从 GitHub 安装:**
<CodeGroup>
```text 对话
# 安装仓库中的所有技能(自动扫描包含 SKILL.md 的子目录)
/skill install larksuite/cli
# 指定子目录,只安装单个技能
/skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# 使用 # 指定子目录
/skill install larksuite/cli#skills/lark-minutes
```
```bash 终端
# 安装仓库中的所有技能(自动扫描包含 SKILL.md 的子目录)
cow skill install larksuite/cli
# 指定子目录,只安装单个技能
cow skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# 使用 # 指定子目录
cow skill install larksuite/cli#skills/lark-minutes
```
</CodeGroup>
支持完整的 GitHub URL 和 `owner/repo` 简写。对于 mono-repo一个仓库中包含多个技能不指定子目录时会自动发现并批量安装所有技能指定子目录时只安装该目录下的技能。
**从 ClawHub 安装:**
<CodeGroup>
```text 对话
/skill install clawhub:baidu-search
```
```bash 终端
cow skill install clawhub:baidu-search
```
</CodeGroup>
**从 URL 安装:**
<CodeGroup>
```text 对话
# 从 zip 压缩包安装(支持单个或批量)
/skill install https://cdn.link-ai.tech/skills/pptx.zip
# 从 SKILL.md 链接安装
/skill install https://example.com/path/to/SKILL.md
```
```bash 终端
# 从 zip 压缩包安装(支持单个或批量)
cow skill install https://cdn.link-ai.tech/skills/pptx.zip
# 从 SKILL.md 链接安装
cow skill install https://example.com/path/to/SKILL.md
```
</CodeGroup>
支持从 zip / tar.gz 压缩包 URL 安装,解压后自动扫描包含 `SKILL.md` 的目录,支持单个或批量安装。也支持直接从 `SKILL.md` 文件链接安装,会自动解析技能名称和描述。
安装成功后会显示技能名称、描述和来源,例如:
```
✅ baidu-search
百度搜索:使用百度搜索引擎检索信息…
来源: clawhub
```
## uninstall
卸载已安装的技能。
<CodeGroup>
```text 对话
/skill uninstall pptx
```
```bash 终端
cow skill uninstall pptx
```
</CodeGroup>
<Warning>
卸载操作会删除技能目录下的所有文件,此操作不可恢复。
</Warning>
## enable / disable
启用或禁用技能,禁用后技能不会被 Agent 调用。
<CodeGroup>
```text 对话
/skill enable pptx
/skill disable pptx
```
```bash 终端
cow skill enable pptx
cow skill disable pptx
```
</CodeGroup>
## info
查看已安装技能的详细信息,包括 `SKILL.md` 内容预览。
<CodeGroup>
```text 对话
/skill info pptx
```
```bash 终端
cow skill info pptx
```
</CodeGroup>
## 技能来源
安装的技能会记录来源信息,可通过 `/skill list` 查看:
| 来源标识 | 说明 |
| --- | --- |
| `builtin` | 项目内置技能 |
| `cowhub` | 从 CowAgent Skill Hub 安装 |
| `github` | 从 GitHub URL 直接安装 |
| `clawhub` | 从 ClawHub 安装 |
| `url` | 从 SKILL.md URL 安装 |
| `local` | 本地创建的技能 |

View File

@@ -24,13 +24,13 @@
},
{
"label": "GitHub",
"href": "https://github.com/zhayujie/chatgpt-on-wechat"
"href": "https://github.com/zhayujie/CowAgent"
}
]
},
"footer": {
"socials": {
"github": "https://github.com/zhayujie/chatgpt-on-wechat"
"github": "https://github.com/zhayujie/CowAgent"
}
},
"navigation": {
@@ -59,7 +59,8 @@
"group": "安装部署",
"pages": [
"guide/quick-start",
"guide/manual-install"
"guide/manual-install",
"guide/upgrade"
]
}
]
@@ -71,16 +72,19 @@
"group": "模型配置",
"pages": [
"models/index",
"models/deepseek",
"models/minimax",
"models/glm",
"models/qwen",
"models/kimi",
"models/doubao",
"models/claude",
"models/gemini",
"models/openai",
"models/deepseek",
"models/linkai"
"models/glm",
"models/qwen",
"models/doubao",
"models/kimi",
"models/qianfan",
"models/linkai",
"models/coding-plan",
"models/custom"
]
}
]
@@ -104,14 +108,17 @@
"tools/bash",
"tools/send",
"tools/memory",
"tools/env-config"
"tools/env-config",
"tools/web-fetch",
"tools/scheduler"
]
},
{
"group": "可选工具",
"pages": [
"tools/web-search",
"tools/scheduler"
"tools/vision",
"tools/browser"
]
}
]
@@ -123,15 +130,17 @@
"group": "技能系统",
"pages": [
"skills/index",
"skills/skill-creator"
"skills/install",
"skills/create",
"skills/hub"
]
},
{
"group": "内置技能",
"pages": [
"skills/image-vision",
"skills/linkai-agent",
"skills/web-fetch"
"skills/skill-creator",
"skills/knowledge-wiki",
"skills/image-generation"
]
}
]
@@ -142,7 +151,20 @@
{
"group": "记忆系统",
"pages": [
"memory"
"memory/index",
"memory/context",
"memory/deep-dream"
]
}
]
},
{
"tab": "知识",
"groups": [
{
"group": "知识库",
"pages": [
"knowledge/index"
]
}
]
@@ -153,6 +175,7 @@
{
"group": "接入渠道",
"pages": [
"channels/weixin",
"channels/web",
"channels/feishu",
"channels/dingtalk",
@@ -164,6 +187,21 @@
}
]
},
{
"tab": "命令",
"groups": [
{
"group": "命令系统",
"pages": [
"cli/index",
"cli/process",
"cli/skill",
"cli/memory-knowledge",
"cli/general"
]
}
]
},
{
"tab": "版本",
"groups": [
@@ -171,6 +209,12 @@
"group": "发布记录",
"pages": [
"releases/overview",
"releases/v2.0.8",
"releases/v2.0.7",
"releases/v2.0.6",
"releases/v2.0.5",
"releases/v2.0.4",
"releases/v2.0.3",
"releases/v2.0.2",
"releases/v2.0.1",
"releases/v2.0.0"
@@ -215,16 +259,19 @@
"group": "Model Configuration",
"pages": [
"en/models/index",
"en/models/deepseek",
"en/models/minimax",
"en/models/glm",
"en/models/qwen",
"en/models/kimi",
"en/models/doubao",
"en/models/claude",
"en/models/gemini",
"en/models/openai",
"en/models/deepseek",
"en/models/linkai"
"en/models/glm",
"en/models/qwen",
"en/models/doubao",
"en/models/kimi",
"en/models/qianfan",
"en/models/linkai",
"en/models/coding-plan",
"en/models/custom"
]
}
]
@@ -248,14 +295,17 @@
"en/tools/bash",
"en/tools/send",
"en/tools/memory",
"en/tools/env-config"
"en/tools/env-config",
"en/tools/web-fetch",
"en/tools/scheduler"
]
},
{
"group": "Optional Tools",
"pages": [
"en/tools/web-search",
"en/tools/scheduler"
"en/tools/vision",
"en/tools/browser"
]
}
]
@@ -267,15 +317,16 @@
"group": "Skills System",
"pages": [
"en/skills/index",
"en/skills/skill-creator"
"en/skills/install",
"en/skills/hub"
]
},
{
"group": "Built-in Skills",
"pages": [
"en/skills/image-vision",
"en/skills/linkai-agent",
"en/skills/web-fetch"
"en/skills/skill-creator",
"en/skills/knowledge-wiki",
"en/skills/image-generation"
]
}
]
@@ -286,7 +337,20 @@
{
"group": "Memory System",
"pages": [
"en/memory"
"en/memory/index",
"en/memory/context",
"en/memory/deep-dream"
]
}
]
},
{
"tab": "Knowledge",
"groups": [
{
"group": "Knowledge Base",
"pages": [
"en/knowledge/index"
]
}
]
@@ -297,6 +361,7 @@
{
"group": "Platforms",
"pages": [
"en/channels/weixin",
"en/channels/web",
"en/channels/feishu",
"en/channels/dingtalk",
@@ -308,6 +373,21 @@
}
]
},
{
"tab": "CLI",
"groups": [
{
"group": "Command System",
"pages": [
"en/cli/index",
"en/cli/process",
"en/cli/skill",
"en/cli/memory-knowledge",
"en/cli/chat"
]
}
]
},
{
"tab": "Releases",
"groups": [
@@ -315,6 +395,12 @@
"group": "Release Notes",
"pages": [
"en/releases/overview",
"en/releases/v2.0.8",
"en/releases/v2.0.7",
"en/releases/v2.0.6",
"en/releases/v2.0.5",
"en/releases/v2.0.4",
"en/releases/v2.0.3",
"en/releases/v2.0.2",
"en/releases/v2.0.1",
"en/releases/v2.0.0"
@@ -323,6 +409,194 @@
]
}
]
},
{
"language": "ja",
"tabs": [
{
"tab": "紹介",
"groups": [
{
"group": "概要",
"pages": [
"ja/intro/index",
"ja/intro/architecture",
"ja/intro/features"
]
}
]
},
{
"tab": "クイックスタート",
"groups": [
{
"group": "インストール",
"pages": [
"ja/guide/quick-start",
"ja/guide/manual-install",
"ja/guide/upgrade"
]
}
]
},
{
"tab": "モデル",
"groups": [
{
"group": "モデル設定",
"pages": [
"ja/models/index",
"ja/models/deepseek",
"ja/models/minimax",
"ja/models/claude",
"ja/models/gemini",
"ja/models/openai",
"ja/models/glm",
"ja/models/qwen",
"ja/models/doubao",
"ja/models/kimi",
"ja/models/qianfan",
"ja/models/linkai",
"ja/models/coding-plan",
"ja/models/custom"
]
}
]
},
{
"tab": "ツール",
"groups": [
{
"group": "ツールシステム",
"pages": [
"ja/tools/index"
]
},
{
"group": "内蔵ツール",
"pages": [
"ja/tools/read",
"ja/tools/write",
"ja/tools/edit",
"ja/tools/ls",
"ja/tools/bash",
"ja/tools/send",
"ja/tools/memory",
"ja/tools/env-config",
"ja/tools/web-fetch",
"ja/tools/scheduler"
]
},
{
"group": "オプションツール",
"pages": [
"ja/tools/web-search",
"ja/tools/vision",
"ja/tools/browser"
]
}
]
},
{
"tab": "スキル",
"groups": [
{
"group": "スキルシステム",
"pages": [
"ja/skills/index",
"ja/skills/install",
"ja/skills/create",
"ja/skills/hub"
]
},
{
"group": "内蔵スキル",
"pages": [
"ja/skills/skill-creator",
"ja/skills/knowledge-wiki",
"ja/skills/image-generation"
]
}
]
},
{
"tab": "メモリ",
"groups": [
{
"group": "メモリシステム",
"pages": [
"ja/memory/index",
"ja/memory/context",
"ja/memory/deep-dream"
]
}
]
},
{
"tab": "ナレッジ",
"groups": [
{
"group": "ナレッジベース",
"pages": [
"ja/knowledge/index"
]
}
]
},
{
"tab": "チャネル",
"groups": [
{
"group": "プラットフォーム",
"pages": [
"ja/channels/weixin",
"ja/channels/web",
"ja/channels/feishu",
"ja/channels/dingtalk",
"ja/channels/wecom-bot",
"ja/channels/qq",
"ja/channels/wecom",
"ja/channels/wechatmp"
]
}
]
},
{
"tab": "CLI",
"groups": [
{
"group": "コマンドシステム",
"pages": [
"ja/cli/index",
"ja/cli/process",
"ja/cli/skill",
"ja/cli/memory-knowledge",
"ja/cli/general"
]
}
]
},
{
"tab": "リリース",
"groups": [
{
"group": "リリースノート",
"pages": [
"ja/releases/overview",
"ja/releases/v2.0.8",
"ja/releases/v2.0.7",
"ja/releases/v2.0.6",
"ja/releases/v2.0.5",
"ja/releases/v2.0.4",
"ja/releases/v2.0.3",
"ja/releases/v2.0.2",
"ja/releases/v2.0.1",
"ja/releases/v2.0.0"
]
}
]
}
]
}
]
}

View File

@@ -1,31 +1,35 @@
<p align="center"><img src="https://github.com/user-attachments/assets/eca9a9ec-8534-4615-9e0f-96c5ac1d10a3" alt="CowAgent" width="550" /></p>
<p align="center">
<a href="https://github.com/zhayujie/chatgpt-on-wechat/releases/latest"><img src="https://img.shields.io/github/v/release/zhayujie/chatgpt-on-wechat" alt="Latest release"></a>
<a href="https://github.com/zhayujie/chatgpt-on-wechat/blob/master/LICENSE"><img src="https://img.shields.io/github/license/zhayujie/chatgpt-on-wechat" alt="License: MIT"></a>
<a href="https://github.com/zhayujie/chatgpt-on-wechat"><img src="https://img.shields.io/github/stars/zhayujie/chatgpt-on-wechat?style=flat-square" alt="Stars"></a> <br/>
[<a href="https://github.com/zhayujie/chatgpt-on-wechat/blob/master/README.md">中文</a>] | [English]
<a href="https://github.com/zhayujie/CowAgent/releases/latest"><img src="https://img.shields.io/github/v/release/zhayujie/CowAgent" alt="Latest release"></a>
<a href="https://github.com/zhayujie/CowAgent/blob/master/LICENSE"><img src="https://img.shields.io/github/license/zhayujie/CowAgent" alt="License: MIT"></a>
<a href="https://github.com/zhayujie/CowAgent"><img src="https://img.shields.io/github/stars/zhayujie/CowAgent?style=flat-square" alt="Stars"></a> <br/>
[<a href="https://github.com/zhayujie/CowAgent/blob/master/README.md">中文</a>] | [English] | [<a href="https://github.com/zhayujie/CowAgent/blob/master/docs/ja/README.md">日本語</a>]
</p>
**CowAgent** is an AI super assistant powered by LLMs, capable of autonomous task planning, operating computers and external resources, creating and executing Skills, and continuously growing with long-term memory. It supports flexible model switching, handles text, voice, images, and files, and can be integrated into Web, Feishu, DingTalk, WeCom Bot, WeCom App, and WeChat Official Account — running 7×24 hours on your personal computer or server.
**CowAgent** is an AI super assistant powered by LLMs, capable of autonomous task planning, operating computers and external resources, creating and executing Skills, and continuously growing with long-term memory and a personal knowledge base. It supports flexible model switching, handles text, voice, images, and files, and can be integrated into WeChat, Web, Feishu, DingTalk, WeCom Bot, WeCom App, and WeChat Official Account — running 7×24 hours on your personal computer or server.
<p align="center">
<a href="https://cowagent.ai/">🌐 Website</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/intro/index">📖 Docs</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/guide/quick-start">🚀 Quick Start</a>
<a href="https://docs.cowagent.ai/en/guide/quick-start">🚀 Quick Start</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 Skill Hub</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ Try Online</a>
</p>
## Introduction
> CowAgent is both an out-of-the-box AI super assistant and a highly extensible Agent framework. You can extend it with new model interfaces, channels, built-in tools, and the Skills system to flexibly implement various customization needs.
-**Autonomous Task Planning**: Understands complex tasks and autonomously plans execution, continuously thinking and invoking tools until goals are achieved. Supports accessing files, terminal, browser, schedulers, and other system resources via tools.
-**Long-term Memory**: Automatically persists conversation memory to local files and databases, including core memory and daily memory, with keyword and vector retrieval support.
-**Skills System**: Implements a Skills creation and execution engine with multiple built-in skills, and supports custom Skills development through natural language conversation.
-**Autonomous Task Planning**: Understands complex tasks and autonomously plans execution, continuously thinking and invoking tools until goals are achieved.
-**Long-term Memory**: Automatically persists conversation memory to local files and databases, including core memory, daily memory, and Deep Dream distillation, with keyword and vector retrieval support.
-**Personal Knowledge Base**: Automatically organizes structured knowledge with cross-references to build a knowledge graph, with web-based visualization and conversational management.
-**Skills System**: Implements a Skills creation and execution engine, supports installing skills from [Skill Hub](https://skills.cowagent.ai), GitHub, etc., or creating custom Skills through conversation.
-**Tool System**: Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more — autonomously invoked by the Agent.
-**CLI System**: Provides terminal commands and in-chat commands for process management, skill installation, configuration, and more.
-**Multimodal Messages**: Supports parsing, processing, generating, and sending text, images, voice, files, and other message types.
-**Multiple Model Support**: Supports OpenAI, Claude, Gemini, DeepSeek, MiniMax, GLM, Qwen, Kimi, Doubao, and other mainstream model providers.
-**Multi-platform Deployment**: Runs on local computers or servers, integrable into Web, Feishu, DingTalk, WeChat Official Account, and WeCom applications.
-**Knowledge Base**: Integrates enterprise knowledge base capabilities via the [LinkAI](https://link-ai.tech) platform.
-**Multiple Model Support**: Supports DeepSeek, MiniMax, Claude, Gemini, OpenAI, GLM, Qwen, Doubao, Kimi, and other mainstream model providers.
-**Multi-platform Deployment**: Runs on local computers or servers, integrable into WeChat, Web, Feishu, DingTalk, WeChat Official Account, and WeCom applications.
## Disclaimer
@@ -33,19 +37,27 @@
2. Agent mode consumes more tokens than normal chat mode. Choose models based on effectiveness and cost. Agent has access to the host OS — please deploy in trusted environments.
3. CowAgent focuses on open-source development and does not participate in, authorize, or issue any cryptocurrency.
## Demo
Try online (no deployment needed): [CowAgent](https://link-ai.tech/cowagent/create)
## Changelog
> **2026.02.27:** [v2.0.2](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.2) — Web console overhaul (streaming chat, model/skill/memory/channel/scheduler/log management), multi-channel concurrent running, session persistence, new models including Gemini 3.1 Pro / Claude 4.6 Sonnet / Qwen3.5 Plus.
> **2026.04.14:** [v2.0.6](https://github.com/zhayujie/CowAgent/releases/tag/2.0.6) — Knowledge Base, Deep Dream Memory Distillation, Smart Context Compression, Web Console upgrades.
> **2026.02.13:** [v2.0.1](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.1) — Built-in Web Search tool, smart context trimming, runtime info dynamic update, Windows compatibility, fixes for scheduler memory loss, Feishu connection issues, and more.
> **2026.04.01:** [v2.0.5](https://github.com/zhayujie/CowAgent/releases/tag/2.0.5) — Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more.
> **2026.02.03:** [v2.0.0](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.0) — Full upgrade to AI super assistant with multi-step task planning, long-term memory, built-in tools, Skills framework, new models, and optimized channels.
> **2026.02.27:** [v2.0.2](https://github.com/zhayujie/CowAgent/releases/tag/2.0.2) — Web console overhaul (streaming chat, model/skill/memory/channel/scheduler/log management), multi-channel concurrent running, session persistence, new models including Gemini 3.1 Pro / Claude 4.6 Sonnet / Qwen3.5 Plus.
> **2025.05.23:** [v1.7.6](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.7.6) — Web channel optimization, AgentMesh multi-agent plugin, Baidu TTS, claude-4-sonnet/opus support.
> **2026.02.13:** [v2.0.1](https://github.com/zhayujie/CowAgent/releases/tag/2.0.1) — Built-in Web Search tool, smart context trimming, runtime info dynamic update, Windows compatibility, fixes for scheduler memory loss, Feishu connection issues, and more.
> **2025.04.11:** [v1.7.5](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.7.5) — wechatferry protocol, DeepSeek model, Tencent Cloud voice, ModelScope and Gitee-AI support.
> **2026.02.03:** [v2.0.0](https://github.com/zhayujie/CowAgent/releases/tag/2.0.0) — Full upgrade to AI super assistant with multi-step task planning, long-term memory, built-in tools, Skills framework, new models, and optimized channels.
> **2024.12.13:** [v1.7.4](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/1.7.4) — Gemini 2.0 model, Web channel, memory leak fix.
> **2025.05.23:** [v1.7.6](https://github.com/zhayujie/CowAgent/releases/tag/1.7.6) — Web channel optimization, AgentMesh multi-agent plugin, Baidu TTS, claude-4-sonnet/opus support.
> **2025.04.11:** [v1.7.5](https://github.com/zhayujie/CowAgent/releases/tag/1.7.5) — wechatferry protocol, DeepSeek model, Tencent Cloud voice, ModelScope and Gitee-AI support.
> **2024.12.13:** [v1.7.4](https://github.com/zhayujie/CowAgent/releases/tag/1.7.4) — Gemini 2.0 model, Web channel, memory leak fix.
Full changelog: [Release Notes](https://docs.cowagent.ai/en/releases/overview)
@@ -55,21 +67,27 @@ Full changelog: [Release Notes](https://docs.cowagent.ai/en/releases/overview)
The project provides a one-click script for installation, configuration, startup, and management:
**Linux / macOS:**
```bash
bash <(curl -sS https://cdn.link-ai.tech/code/cow/run.sh)
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
**Windows (PowerShell):**
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
After running, the Web service starts by default. Access `http://localhost:9899/chat` to chat.
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start)
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/cli/index) to manage the service.
### Manual Installation
**1. Clone the project**
```bash
git clone https://github.com/zhayujie/chatgpt-on-wechat
cd chatgpt-on-wechat/
git clone https://github.com/zhayujie/CowAgent
cd CowAgent/
```
**2. Install dependencies**
@@ -79,7 +97,25 @@ pip3 install -r requirements.txt
pip3 install -r requirements-optional.txt # optional but recommended
```
**3. Configure**
**3. Install Cow CLI (recommended)**
```bash
pip3 install -e .
```
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/cli/index).
**4. Install browser (optional)**
If you need the Agent to operate a browser (visit web pages, fill forms, etc.):
```bash
cow install-browser
```
This auto-installs `playwright` and Chromium. See [Browser Tool Docs](https://docs.cowagent.ai/en/tools/browser).
**5. Configure**
```bash
cp config-template.json config.json
@@ -87,13 +123,25 @@ cp config-template.json config.json
Fill in your model API key and channel type in `config.json`. See the [configuration docs](https://docs.cowagent.ai/en/guide/manual-install) for details.
**4. Run**
**6. Run**
```bash
python3 app.py
cow start # recommended, requires Cow CLI
python3 app.py # or run directly
```
For server background run:
For server deployment, use `cow` commands to manage the service:
```bash
cow start # start in background
cow stop # stop service
cow restart # restart service
cow status # check running status
cow logs # view logs
cow update # pull latest code and restart
```
Or use the traditional way:
```bash
nohup python3 app.py & tail -f nohup.out
@@ -102,7 +150,7 @@ nohup python3 app.py & tail -f nohup.out
### Docker Deployment
```bash
wget https://cdn.link-ai.tech/code/cow/docker-compose.yml
curl -O https://cdn.link-ai.tech/code/cow/docker-compose.yml
# Edit docker-compose.yml with your config
sudo docker compose up -d
sudo docker logs -f chatgpt-on-wechat
@@ -116,18 +164,40 @@ Supports mainstream model providers. Recommended models for Agent mode:
| Provider | Recommended Model |
| --- | --- |
| MiniMax | `MiniMax-M2.5` |
| GLM | `glm-5` |
| Kimi | `kimi-k2.5` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Qwen | `qwen3.5-plus` |
| DeepSeek | `deepseek-v4-flash` |
| MiniMax | `MiniMax-M2.7` |
| Claude | `claude-sonnet-4-6` |
| Gemini | `gemini-3.1-pro-preview` |
| OpenAI | `gpt-5.4` |
| DeepSeek | `deepseek-chat` |
| GLM | `glm-5.1` |
| Qwen | `qwen3.6-plus` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Kimi | `kimi-k2.6` |
For detailed configuration of each model, see the [Models documentation](https://docs.cowagent.ai/en/models/index).
### Coding Plan
Coding Plan is a monthly subscription package offered by various providers, ideal for high-frequency Agent usage. All providers can be accessed via OpenAI-compatible mode:
```json
{
"bot_type": "openai",
"model": "MODEL_NAME",
"open_ai_api_base": "PROVIDER_CODING_PLAN_API_BASE",
"open_ai_api_key": "YOUR_API_KEY"
}
```
- `bot_type`: Must be `openai`
- `model`: Model name supported by the provider
- `open_ai_api_base`: Provider's Coding Plan API Base (different from standard pay-as-you-go)
- `open_ai_api_key`: Provider's Coding Plan API Key
> Note: Coding Plan API Base and API Key are usually separate from standard pay-as-you-go ones. Please obtain them from each provider's platform.
Supported providers include Alibaba Cloud, MiniMax, Zhipu GLM, Kimi, Volcengine, and more. For detailed configuration of each provider, see the [Coding Plan documentation](https://docs.cowagent.ai/en/models/coding-plan).
<br/>
## Channels
@@ -136,6 +206,7 @@ Supports multiple platforms. Set `channel_type` in `config.json` to switch:
| Channel | `channel_type` | Docs |
| --- | --- | --- |
| WeChat | `weixin` | [WeChat Setup](https://docs.cowagent.ai/en/channels/weixin) |
| Web (default) | `web` | [Web Channel](https://docs.cowagent.ai/en/channels/web) |
| Feishu | `feishu` | [Feishu Setup](https://docs.cowagent.ai/en/channels/feishu) |
| DingTalk | `dingtalk` | [DingTalk Setup](https://docs.cowagent.ai/en/channels/dingtalk) |
@@ -158,21 +229,22 @@ Multiple channels can be enabled simultaneously, separated by commas: `"channel_
## 🔗 Related Projects
- [Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub): Open skill marketplace for AI Agents — browse, search, install, and publish skills for CowAgent, OpenClaw, Claude Code, and more.
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything): Lightweight and highly extensible LLM application framework supporting Slack, Telegram, Discord, Gmail, and more.
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh): Open-source Multi-Agent framework for complex problem solving through agent team collaboration.
## 🔎 FAQ
FAQs: <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
FAQs: <https://github.com/zhayujie/CowAgent/wiki/FAQs>
## 🛠️ Contributing
Welcome to add new channels, referring to the [Feishu channel](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) as an example. Also welcome to contribute new Skills, referring to the [Skill Creator docs](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/skills/skill-creator/SKILL.md).
Welcome to add new channels, referring to the [Feishu channel](https://github.com/zhayujie/CowAgent/blob/master/channel/feishu/feishu_channel.py) as an example. Also welcome to contribute new Skills, see the [Skill Creation docs](https://docs.cowagent.ai/en/skills/create), or submit to [Skill Hub](https://skills.cowagent.ai/submit).
## ✉ Contact
Welcome to submit PRs and Issues, and support the project with a 🌟 Star. For questions, check the [FAQ list](https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs) or search [Issues](https://github.com/zhayujie/chatgpt-on-wechat/issues).
Welcome to submit PRs and Issues, and support the project with a 🌟 Star. For questions, check the [FAQ list](https://github.com/zhayujie/CowAgent/wiki/FAQs) or search [Issues](https://github.com/zhayujie/CowAgent/issues).
## 🌟 Contributors
![cow contributors](https://contrib.rocks/image?repo=zhayujie/chatgpt-on-wechat&max=1000)
![cow contributors](https://contrib.rocks/image?repo=zhayujie/CowAgent&max=1000)

View File

@@ -1,69 +1,107 @@
---
title: Feishu (Lark)
description: Integrate CowAgent into Feishu application
description: Integrate CowAgent into Feishu via a custom enterprise app
---
Integrate CowAgent into Feishu by creating a custom enterprise app. You need to be a Feishu enterprise user with admin privileges.
> Integrate CowAgent into Feishu via a custom enterprise app. Supports p2p chat and group chat (@bot), uses WebSocket long connection (no public IP needed), supports streaming typewriter replies and voice messages.
## 1. Create Enterprise Custom App
<Note>
You need to be a Feishu enterprise user with admin privileges.
</Note>
### 1.1 Create App
## 1. Setup
Go to [Feishu Developer Platform](https://open.feishu.cn/app/), click **Create Enterprise Custom App**, fill in the required information and click **Create**:
### Option 1: One-click Scan to Create (Recommended)
No need to manually create an app on the Feishu Developer Platform. Start the Cow project, open the web console (default `http://127.0.0.1:9899/`), go to **Channels**, click **Add Channel**, choose **Feishu**, then under the **Scan QR** tab click **One-click Create Feishu App** and scan with the **Feishu App** to complete app creation and connection automatically.
<Note>
The created app comes with all required permissions (messaging, card read/write, group events, etc.) and event subscriptions pre-configured. Currently only the Feishu mainland version is supported (Lark international not yet supported).
</Note>
When starting from CLI without `feishu_app_id` configured, the QR code is also printed to the terminal.
### Option 2: Manual Setup
Manually create a custom app on the Feishu Developer Platform, then connect via Web Console or config file.
**Step 1: Create the App**
1. Go to [Feishu Developer Platform](https://open.feishu.cn/app/), click **Create Enterprise Custom App**:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-create-app.jpg" width="500"/>
### 1.2 Add Bot Capability
In **Add App Capabilities**, add **Bot** capability to the app:
2. In **Add App Capabilities**, add the **Bot** capability:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-add-bot.jpg" width="800"/>
### 1.3 Configure App Permissions
Click **Permission Management**, paste the following permission string into the input box below **Permission Configuration**, select all filtered permissions, click **Batch Enable** and confirm:
3. In **Permission Management**, paste the following permissions and **Batch Enable** all:
```
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource,cardkit:card:write
```
<img src="https://cdn.link-ai.tech/doc/feishu-hosting-add-auth2.png" width="800"/>
## 2. Project Configuration
1. Get `App ID` and `App Secret` from **Credentials & Basic Info**:
4. Get `App ID` and `App Secret` from **Credentials & Basic Info**:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-appid-secret.jpg" width="800"/>
2. Add the following configuration to `config.json` in the project root:
**Step 2: Connect to CowAgent**
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_bot_name": "YOUR_BOT_NAME"
}
```
<Tabs>
<Tab title="Web Console">
Open the web console, go to **Channels**, click **Add Channel**, choose **Feishu**, switch to the **Manual** tab, enter App ID and App Secret, then click connect.
</Tab>
<Tab title="Config File">
Add the following to `config.json` and start the program:
| Parameter | Description |
| --- | --- |
| `feishu_app_id` | Feishu bot App ID |
| `feishu_app_secret` | Feishu bot App Secret |
| `feishu_bot_name` | Bot name (set when creating the app), required for group chat usage |
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_stream_reply": true
}
```
Start the project after configuration is complete.
| Parameter | Description | Default |
| --- | --- | --- |
| `feishu_app_id` | Feishu app App ID | - |
| `feishu_app_secret` | Feishu app App Secret | - |
| `feishu_stream_reply` | Enable streaming typewriter reply | `true` |
</Tab>
</Tabs>
## 3. Configure Event Subscription
**Step 3: Publish the App**
1. After the project is running successfully, go to the Feishu Developer Platform, click **Events & Callbacks**, select **Long Connection** mode, and click save:
1. After Cow is running, go to **Events & Callbacks** in the Feishu Developer Platform, choose **Long Connection** mode and save:
<img src="https://cdn.link-ai.tech/doc/202601311731183.png" width="600"/>
2. Click **Add Event** below, search for "Receive Message", select "**Receive Message v2.0**", and confirm.
2. Click **Add Event**, search for "Receive Message" and choose **Receive Message v2.0**.
3. Click **Version Management & Release**, create a new version and apply for **Production Release**. Check the approval message in the Feishu client and approve:
3. Click **Version Management & Release**, create a version and apply for **Production Release**. Approve the request in the Feishu client:
<img src="https://cdn.link-ai.tech/doc/202601311807356.png" width="600"/>
Once completed, search for the bot name in Feishu to start chatting.
## 2. Features
| Feature | Status |
| --- | --- |
| P2P chat | ✅ |
| Group chat (@bot) | ✅ |
| Text messages | ✅ send/receive |
| Image messages | ✅ send/receive |
| Voice messages | ✅ send/receive |
| Streaming reply | ✅ (powered by Feishu cardkit streaming card) |
<Note>
Streaming reply requires the `cardkit:card:write` permission (already enabled by one-click creation) and Feishu client version ≥ 7.20. Older clients see an upgrade prompt; if the permission or version is not satisfied, replies fall back to plain text automatically.
</Note>
## 3. Usage
After connection, search for the bot name in Feishu to start a chat.
To use in groups, add the bot to a group and @-mention it.

View File

@@ -38,6 +38,16 @@ Supports streaming output with real-time display of the Agent's reasoning proces
<img width="850" src="https://cdn.link-ai.tech/doc/20260227180120.png" />
#### Multi-Session Management
The chat interface supports multi-session management. All session records are persistently stored in a SQLite database:
- **Session List**: Click the history icon on the left to expand/collapse the session list panel, with scroll-to-load support for all historical sessions
- **AI-Generated Titles**: After the first exchange in a new session, the model is automatically called to generate a short summary title
- **New Session**: Click the "New Chat" button at the top of the session list or the `+` button in the input area to create a new session
- **Delete Session**: Click the delete button on a session item and confirm to permanently delete the session and all its messages
- **Clear Context**: Click the clear button in the input area to insert a divider in the current session. Messages above the divider are still displayed but no longer included as context for the model
### Model Management
Manage model configurations online without manually editing config files:

View File

@@ -0,0 +1,72 @@
---
title: WeChat
description: Connect CowAgent to personal WeChat
---
> Connect CowAgent to your personal WeChat. Simply scan a QR code to log in — no public IP required. Supports text, image, voice, file, and video messages.
## 1. Configuration
### Option A: Web Console
Start the program and open the Web console (local access: http://127.0.0.1:9899). Go to the **Channels** tab, click **Connect Channel**, select **WeChat**, and follow the prompts to scan the QR code.
### Option B: Config File
Set `channel_type` to `weixin` in your `config.json`:
```json
{
"channel_type": "weixin"
}
```
After starting the program, a QR code will be displayed in the terminal. Scan it with WeChat and confirm on your phone to complete login.
<Note>
For backward compatibility, setting `channel_type` to `wx` also activates the WeChat channel.
</Note>
## 2. Parameters
| Parameter | Description | Default |
| --- | --- | --- |
| `channel_type` | Set to `weixin` or `wx` | — |
Login credentials are automatically saved to `~/.weixin_cow_credentials.json`. To force a re-login, delete this file and restart.
## 3. Login
### QR Code Login
On first startup, a QR code is displayed in the terminal (valid for approximately 2 minutes). Scan it with WeChat and confirm on your phone.
- The QR code automatically refreshes when it expires
- The `qrcode` dependency is already included in `requirements.txt`, enabling QR code rendering directly in the terminal
### Credential Persistence
After successful login, credentials are saved to `~/.weixin_cow_credentials.json`. Subsequent startups will reuse the saved credentials without requiring a new scan.
To force a re-login, delete the credentials file and restart the program.
### Session Expiry
When the WeChat session expires (errcode -14), the program automatically clears old credentials and initiates a new QR login — no manual intervention required.
## 4. Supported Features
| Feature | Status |
| --- | --- |
| Direct Messages | ✅ |
| Text Messages | ✅ Send & Receive |
| Image Messages | ✅ Send & Receive |
| File Messages | ✅ Send & Receive |
| Video Messages | ✅ Send & Receive |
| Voice Messages | ✅ Receive |
## 5. Notes
1. Ensure network access to `ilinkai.weixin.qq.com`.
2. Media files (images, files, videos) are transferred via CDN with AES-128-ECB encryption, handled automatically by the program.
3. A stable network connection is recommended to avoid frequent disconnections that would require re-scanning.

102
docs/en/cli/general.mdx Normal file
View File

@@ -0,0 +1,102 @@
---
title: General Commands
description: View status, manage config, and control context with commonly used commands
---
The following commands can be used in chat with the `/` prefix or in the terminal with the `cow` prefix (some are chat-only).
<Tip>
In the Web console, typing `/` brings up an autocomplete menu with keyboard navigation and Tab completion.
</Tip>
## help
Show help information for all available commands.
```text
/help
```
## status
View current session and service status, including process info, model configuration, message count, and loaded skills.
```text
/status
```
## config
View or modify runtime configuration. Changes take effect immediately without restarting.
**View all configurable items:**
```text
/config
```
**View a single item:**
```text
/config model
```
**Modify a config item:**
```text
/config model deepseek-v4-flash
```
**Configurable items:**
| Item | Description | Example |
| --- | --- | --- |
| `model` | AI model name | `deepseek-v4-flash` |
| `agent_max_context_tokens` | Max context tokens | `40000` |
| `agent_max_context_turns` | Max context memory turns | `30` |
| `agent_max_steps` | Max decision steps per task | `15` |
| `enable_thinking` | Enable deep thinking mode | `true` / `false` |
<Note>
When changing `model`, the system automatically matches the corresponding model API. Configuration is persisted to `config.json`.
</Note>
## context
View current session context statistics, including message count and content length.
```text
/context
```
**Clear current session context:**
```text
/context clear
```
<Tip>
Clearing context makes the Agent "forget" previous conversation, useful for switching topics or freeing context space.
</Tip>
## logs
View recent service logs. Shows the last 20 lines by default, up to 50.
```text
/logs
```
**Specify line count:**
```text
/logs 50
```
## version
Show the current CowAgent version.
```text
/version
```

94
docs/en/cli/index.mdx Normal file
View File

@@ -0,0 +1,94 @@
---
title: Commands Overview
description: CowAgent command system — Terminal CLI and chat commands
---
CowAgent provides two ways to interact via commands:
- **Terminal CLI** — Run `cow <command>` in your system terminal for service management, skill management, and other operations
- **Chat Commands** — Type `/<command>` or `cow <command>` in any conversation to check status, manage skills, adjust configuration, etc.
## Cow CLI
After deploying with the one-click install script, the `cow` command is automatically available. For manual installations, run:
```bash
pip install -e .
```
Then use the `cow` command from anywhere:
```bash
cow help
```
Example output:
```
🐮 CowAgent CLI
Usage: cow <command>
Service:
start Start the CowAgent service
stop Stop the CowAgent service
restart Restart the CowAgent service
update Update code and restart service
status Show service status
logs View service logs
Skills:
skill Manage skills (list / search / install / uninstall ...)
Memory & Knowledge:
memory Memory distillation (dream)
knowledge View knowledge base stats and structure
Others:
help Show this help message
version Show version
```
## Chat Commands
In the Web console or any connected channel, type `/` to see command suggestions. Supported commands:
| Command | Description |
| --- | --- |
| `/help` | Show command help |
| `/status` | View service status and configuration |
| `/config` | View or modify runtime configuration |
| `/skill` | Manage skills (install, uninstall, enable, disable, etc.) |
| `/memory dream [N]` | Manually trigger memory distillation (default 3 days, max 30) |
| `/knowledge` | View knowledge base statistics |
| `/knowledge list` | View knowledge base directory structure |
| `/knowledge on\|off` | Enable or disable knowledge base |
| `/context` | View current session context info |
| `/context clear` | Clear current session context |
| `/logs` | View recent logs |
| `/version` | Show version number |
<Tip>
Service management commands like `/start`, `/stop`, `/restart` will prompt you to use them in the terminal instead, as they involve process operations.
</Tip>
## Command Availability
| Command | Terminal (`cow`) | Chat (`/`) |
| --- | :---: | :---: |
| help | ✓ | ✓ |
| version | ✓ | ✓ |
| status | ✓ | ✓ |
| logs | ✓ | ✓ |
| config | ✗ | ✓ |
| context | — | ✓ |
| memory (subcommands) | ✗ | ✓ |
| knowledge (subcommands) | ✓ | ✓ |
| skill (subcommands) | ✓ | ✓ |
| start / stop / restart | ✓ | ✗ |
| update | ✓ | ✗ |
| install-browser | ✓ | ✗ |
<Note>
`context` only shows a hint in the terminal to use it in chat. `config` is only available in chat.
</Note>

View File

@@ -0,0 +1,63 @@
---
title: Memory & Knowledge
description: Memory distillation and knowledge base management commands
---
## memory
Manage the Agent's long-term memory system.
### memory dream
Manually trigger memory distillation (Deep Dream) — consolidate recent daily memories into MEMORY.md and generate a dream diary.
```text
/memory dream [N]
```
- `N`: Consolidate the last N days of memory (default 3, max 30)
- Runs asynchronously in the background; you'll be notified in chat when complete
- Works without Agent initialization — can be used before the first conversation
**Examples:**
```text
/memory dream # Consolidate last 3 days
/memory dream 7 # Consolidate last 7 days
/memory dream 30 # Consolidate last 30 days (full)
```
On the Web console, the completion notification includes clickable links to view the updated MEMORY.md and dream diary.
<Tip>
The system automatically runs distillation daily at 23:55 (lookback 1 day). Manual trigger is useful for consolidating historical memories after first deployment, or when you need an immediate memory update.
</Tip>
## knowledge
View and manage the personal knowledge base. Shows statistics by default.
```text
/knowledge
```
### knowledge list
View the knowledge base directory tree.
```text
/knowledge list
```
### knowledge on / off
Enable or disable the knowledge base. When disabled, knowledge prompts and file indexing are not injected.
```text
/knowledge on
/knowledge off
```
<Note>
In the terminal CLI, `cow knowledge` and `cow knowledge list` are available, but `on|off` is only supported in chat (requires runtime effect).
</Note>

123
docs/en/cli/process.mdx Normal file
View File

@@ -0,0 +1,123 @@
---
title: Process Management
description: Manage CowAgent process lifecycle with cow commands
---
Process management commands control the CowAgent background process. These commands are only available in the terminal.
## start
Start the CowAgent service. Runs as a background daemon by default and automatically tails logs.
```bash
cow start
```
**Options:**
| Option | Description |
| --- | --- |
| `-f`, `--foreground` | Run in foreground, not as a background daemon |
| `--no-logs` | Don't tail logs after starting |
## stop
Stop the running CowAgent service.
```bash
cow stop
```
## restart
Restart the CowAgent service (stop then start).
```bash
cow restart
```
**Options:**
| Option | Description |
| --- | --- |
| `--no-logs` | Don't tail logs after restart |
## update
Update code and restart the service. Automatically performs:
1. Pull latest code (`git pull`)
2. Stop current service
3. Update Python dependencies
4. Reinstall CLI
5. Start service
```bash
cow update
```
<Warning>
If `git pull` fails (e.g., uncommitted local changes), the update aborts and the service remains unaffected.
</Warning>
## status
Check CowAgent service status, including process info, version, and current model/channel configuration.
```bash
cow status
```
## logs
View service logs.
```bash
cow logs
```
**Options:**
| Option | Description | Default |
| --- | --- | --- |
| `-f`, `--follow` | Continuously tail log output | No |
| `-n`, `--lines` | Show last N lines | 50 |
Examples:
```bash
# View last 100 lines
cow logs -n 100
# Continuously tail logs
cow logs -f
```
## install-browser
Install Playwright and Chromium browser for the [browser tool](/en/tools/browser).
```bash
cow install-browser
```
<Tip>
Only needed when using browser tools (web browsing, screenshots, etc.).
</Tip>
## run.sh Compatibility
If Cow CLI is not installed, you can use `run.sh` to manage the service:
| cow command | run.sh equivalent |
| --- | --- |
| `cow start` | `./run.sh start` |
| `cow stop` | `./run.sh stop` |
| `cow restart` | `./run.sh restart` |
| `cow update` | `./run.sh update` |
| `cow status` | `./run.sh status` |
| `cow logs` | `./run.sh logs` |
<Note>
The `cow` command is recommended — it provides cleaner syntax and richer features. It is automatically installed via the one-click install script.
</Note>

192
docs/en/cli/skill.mdx Normal file
View File

@@ -0,0 +1,192 @@
---
title: Skill Management
description: Install, uninstall, enable, disable, and manage skills via commands
---
Skill management commands are used to install, query, and manage CowAgent skills. Use `/skill <subcommand>` in chat or `cow skill <subcommand>` in the terminal.
## list
List installed skills and their status.
<CodeGroup>
```text Chat
/skill list
```
```bash Terminal
cow skill list
```
</CodeGroup>
**Browse the Skill Hub** (view all available skills):
<CodeGroup>
```text Chat
/skill list --remote
```
```bash Terminal
cow skill list --remote
```
</CodeGroup>
**Options:**
| Option | Description | Default |
| --- | --- | --- |
| `--remote`, `-r` | Browse Skill Hub remote skill list | No |
| `--page` | Page number for remote listing | 1 |
## search
Search for skills on the Skill Hub.
<CodeGroup>
```text Chat
/skill search pptx
```
```bash Terminal
cow skill search pptx
```
</CodeGroup>
## install
Install skills with a single `install` command from Cow Skill Hub, GitHub, ClawHub, or any URL (zip archives, SKILL.md links) — no manual download or configuration required.
**From Skill Hub (recommended):**
<CodeGroup>
```text Chat
/skill install pptx
```
```bash Terminal
cow skill install pptx
```
</CodeGroup>
**From GitHub:**
<CodeGroup>
```text Chat
# Install all skills in a repo (auto-discovers subdirectories with SKILL.md)
/skill install larksuite/cli
# Specify a subdirectory to install a single skill
/skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# Use # to specify a subdirectory
/skill install larksuite/cli#skills/lark-minutes
```
```bash Terminal
# Install all skills in a repo (auto-discovers subdirectories with SKILL.md)
cow skill install larksuite/cli
# Specify a subdirectory to install a single skill
cow skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# Use # to specify a subdirectory
cow skill install larksuite/cli#skills/lark-minutes
```
</CodeGroup>
Supports full GitHub URLs and `owner/repo` shorthand. For mono-repos (multiple skills in one repository), omitting the subdirectory auto-discovers and batch-installs all skills; specifying a subdirectory installs only that skill.
**From ClawHub:**
<CodeGroup>
```text Chat
/skill install clawhub:baidu-search
```
```bash Terminal
cow skill install clawhub:baidu-search
```
</CodeGroup>
**From URL:**
<CodeGroup>
```text Chat
# Install from a zip archive (single or batch)
/skill install https://cdn.link-ai.tech/skills/pptx.zip
# Install from a SKILL.md link
/skill install https://example.com/path/to/SKILL.md
```
```bash Terminal
# Install from a zip archive (single or batch)
cow skill install https://cdn.link-ai.tech/skills/pptx.zip
# Install from a SKILL.md link
cow skill install https://example.com/path/to/SKILL.md
```
</CodeGroup>
Supports installing from zip / tar.gz archive URLs — automatically extracts and discovers directories containing `SKILL.md`, with support for single or batch install. Also supports installing directly from a `SKILL.md` file URL, automatically parsing the skill name and description.
## uninstall
Uninstall an installed skill.
<CodeGroup>
```text Chat
/skill uninstall pptx
```
```bash Terminal
cow skill uninstall pptx
```
</CodeGroup>
<Warning>
Uninstalling deletes all files in the skill directory. This action cannot be undone.
</Warning>
## enable / disable
Enable or disable a skill. Disabled skills will not be invoked by the Agent.
<CodeGroup>
```text Chat
/skill enable pptx
/skill disable pptx
```
```bash Terminal
cow skill enable pptx
cow skill disable pptx
```
</CodeGroup>
## info
View details of an installed skill, including a preview of its `SKILL.md`.
<CodeGroup>
```text Chat
/skill info pptx
```
```bash Terminal
cow skill info pptx
```
</CodeGroup>
## Skill Sources
Installed skills track their origin, viewable via `/skill list`:
| Source | Description |
| --- | --- |
| `builtin` | Built-in project skills |
| `cowhub` | Installed from CowAgent Skill Hub |
| `github` | Installed directly from a GitHub URL |
| `clawhub` | Installed from ClawHub |
| `url` | Installed from a SKILL.md URL |
| `local` | Locally created skills |

View File

@@ -8,12 +8,12 @@ description: Deploy CowAgent manually (source code / Docker)
### 1. Clone the project
```bash
git clone https://github.com/zhayujie/chatgpt-on-wechat
cd chatgpt-on-wechat/
git clone https://github.com/zhayujie/CowAgent
cd CowAgent/
```
<Tip>
For network issues, use the mirror: https://gitee.com/zhayujie/chatgpt-on-wechat
For network issues, use the mirror: https://gitee.com/zhayujie/CowAgent
</Tip>
### 2. Install dependencies
@@ -30,7 +30,25 @@ Optional dependencies (recommended):
pip3 install -r requirements-optional.txt
```
### 3. Configure
### 3. Install Cow CLI
Install the command-line tool for managing services and skills:
```bash
pip3 install -e .
```
Then use the `cow` command:
```bash
cow help
```
<Note>
This step is recommended. After installation you can use `cow start`, `cow stop`, `cow update` to manage the service, and `cow skill` to manage skills. Without the CLI, you can use `./run.sh` or `python3 app.py` to run.
</Note>
### 4. Configure
Copy the config template and edit:
@@ -40,22 +58,32 @@ cp config-template.json config.json
Fill in model API keys, channel type, and other settings in `config.json`. See the [model docs](/en/models/index) for details.
### 4. Run
### 5. Run
**Local run:**
**Using Cow CLI (recommended):**
```bash
cow start
```
**Or run locally in foreground:**
```bash
python3 app.py
```
By default, the Web service starts. Access `http://localhost:9899/chat` to chat.
By default, the Web console starts. Access `http://localhost:9899` to chat.
**Background run on server:**
**Background run on server (without CLI):**
```bash
nohup python3 app.py & tail -f nohup.out
```
<Tip>
If deploying on a server, open port `9899` in your firewall or security group to access the Web console. It's recommended to restrict access to specific IPs for security.
</Tip>
## Docker Deployment
Docker deployment does not require cloning source code or installing dependencies. For Agent mode, source deployment is recommended for broader system access.
@@ -67,7 +95,7 @@ Docker deployment does not require cloning source code or installing dependencie
**1. Download config**
```bash
wget https://cdn.link-ai.tech/code/cow/docker-compose.yml
curl -O https://cdn.link-ai.tech/code/cow/docker-compose.yml
```
Edit `docker-compose.yml` with your configuration.
@@ -84,12 +112,17 @@ sudo docker compose up -d
sudo docker logs -f chatgpt-on-wechat
```
<Tip>
If deploying on a server, open port `9899` in your firewall or security group to access the Web console. It's recommended to restrict access to specific IPs for security.
</Tip>
## Core Configuration
```json
{
"channel_type": "web",
"model": "MiniMax-M2.5",
"model": "deepseek-v4-flash",
"deepseek_api_key": "",
"agent": true,
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
@@ -101,7 +134,7 @@ sudo docker logs -f chatgpt-on-wechat
| Parameter | Description | Default |
| --- | --- | --- |
| `channel_type` | Channel type | `web` |
| `model` | Model name | `MiniMax-M2.5` |
| `model` | Model name | `deepseek-v4-flash` |
| `agent` | Enable Agent mode | `true` |
| `agent_workspace` | Agent workspace path | `~/cow` |
| `agent_max_context_tokens` | Max context tokens | `40000` |
@@ -109,5 +142,5 @@ sudo docker logs -f chatgpt-on-wechat
| `agent_max_steps` | Max decision steps per task | `15` |
<Tip>
Full configuration options are in the project [`config.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/config.py).
Full configuration options are in the project [`config.py`](https://github.com/zhayujie/CowAgent/blob/master/config.py).
</Tip>

Some files were not shown because too many files have changed in this diff Show More