Compare commits

...

116 Commits

Author SHA1 Message Date
zhayujie
5162da5654 Merge branch 'master' into feat-knowledge 2026-04-12 16:46:38 +08:00
zhayujie
a1d82f6193 feat(knowledge): add cli and update docs 2026-04-12 16:39:06 +08:00
zhayujie
ea78e3d0c6 feat(knowledge): document link supports jumping to view 2026-04-11 20:16:43 +08:00
zhayujie
3497f00cb4 Merge pull request #2759 from zhayujie/feat-multimodel
feat(vision): prioritize main model for image recognition
2026-04-11 19:55:15 +08:00
zhayujie
5355d45031 Merge pull request #2756 from octo-patch/feature/add-minimax-m2.7-highspeed-tts
feat: add MiniMax-M2.7-highspeed model and MiniMax TTS support
2026-04-11 19:54:03 +08:00
zhayujie
26693acc3f feat(vision): prioritize main model for image recognition with multi-provider fallback
- Add call_vision method to all bot implementations (DashScope, Claude,
  Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot)
  using each vendor's native multimodal API format
- Remove call_with_tools/call_vision from Bot base class to fix MRO
  shadowing issue with OpenAICompatibleBot mixin
- Refactor vision tool provider resolution: MainModel → other configured
  models (auto-discovered) → OpenAI → LinkAI, with automatic fallback
- Return actual model name used in call_vision responses
- Sync config.json API keys to .env bidirectionally on startup
- Fix bot instance cache to detect bot_type/use_linkai config changes
- Add SSE reconnection support for web console
- Preserve image path hints in Gemini text for correct vision tool calls
- Update docs/tools/vision.mdx
2026-04-11 19:46:11 +08:00
zhayujie
76e9fef3b2 feat(knowledge): add file list and graph in web channel 2026-04-11 19:02:55 +08:00
octo-patch
c34308cbd4 feat: add MiniMax-M2.7-highspeed model and MiniMax TTS support
- Add MiniMax-M2.7-highspeed constant to const.py and MODEL_LIST
- Update MinimaxBot default model from MiniMax-M2.1 to MiniMax-M2.7
- Add MinimaxVoice TTS provider (voice/minimax/minimax_voice.py)
  - Supports speech-2.8-hd and speech-2.8-turbo models
  - SSE streaming with hex-decoded audio chunks
  - Reuses MINIMAX_API_KEY
- Register MinimaxVoice in voice factory
- Add unit tests (14 tests, all passing)
- Update README with MiniMax-M2.7-highspeed and TTS configuration
2026-04-11 17:03:44 +08:00
zhayujie
5a10476010 feat: add knowledge switch and cli 2026-04-11 16:44:25 +08:00
zhayujie
46e80dceec Merge pull request #2755 from 6vision/fix/generic-file-send
fix: send generic file types (tar.gz, zip, etc.) as FILE instead of TEXT
2026-04-11 16:36:34 +08:00
6vision
90d1835353 fix: send generic file types (tar.gz, zip, etc.) as FILE instead of TEXT
Previously, files with extensions not in the known categories (image, document, video, audio) fell through to a fallback that returned ReplyType.TEXT, causing the file to never actually be sent to the user. Now the fallback uses ReplyType.FILE so all file types are delivered.

Made-with: Cursor
2026-04-11 15:45:34 +08:00
zhayujie
845fadd0aa fix(knowledge): modify knowledge skill 2026-04-10 18:22:54 +08:00
zhayujie
5748ded52c feat(knowledge): change knowledge base to index-driven self-organizing structure 2026-04-10 16:06:04 +08:00
zhayujie
6a737fb734 feat: display thinking content in web console 2026-04-10 15:07:23 +08:00
zhayujie
3cd92ccda3 feat: add port config 2026-04-09 21:29:53 +08:00
zhayujie
54e81aba11 feat(memory+knowledge): add knowledge wiki system and Light Dream memory extraction
- Add knowledge/ directory structure and knowledge-wiki skill for structured knowledge accumulation
- Auto-inject MEMORY.md into system prompt with truncation (last 200 lines)
- Light Dream: extend flush_memory to extract long-term memories into MEMORY.md with date stamps
- Add mandatory knowledge auto-write rules in system prompt (no user confirmation needed)
- Expand MemoryManager.sync() to index knowledge/ files for vector search
- Update RULE.md template with workspace conventions and knowledge guidelines
2026-04-09 21:22:43 +08:00
zhayujie
d86cb4ded6 fix(weixin): update weixin channel version 2026-04-09 09:55:07 +08:00
zhayujie
4d5375f6d6 fix(win): add Windows platform hint in bash tool description 2026-04-08 16:54:26 +08:00
zhayujie
424557fedb fix(win): use PowerShell instead of cmd.exe 2026-04-08 16:50:45 +08:00
zhayujie
89251e603f fix(win): use PowerShell instead of cmd.exe for bash tool on Windows 2026-04-08 16:18:56 +08:00
zhayujie
a653ed07eb fix(win): defer pip install to a helper bat after cow.exe exits 2026-04-08 15:31:03 +08:00
zhayujie
ad86deb014 fix: prioritize using a custom master model for vision 2026-04-08 15:16:59 +08:00
zhayujie
9525dc7584 fix: avoid stale cow.exe on Windows by spawing fresh process 2026-04-08 12:07:18 +08:00
zhayujie
cd31dd27fd fix: increase web console capacity and add frontend retry 2026-04-08 11:48:27 +08:00
zhayujie
360e3670eb feat(browser): detect implicit interactive elements 2026-04-07 01:41:14 +08:00
zhayujie
8dabe3b4c8 fix: remove install-browser cmd display in /help 2026-04-04 23:28:57 +08:00
zhayujie
443e0c2806 feat: show video in web channel 2026-04-03 17:09:38 +08:00
zhayujie
9cc173cc4d fix: use dynamic model name in system prompt runtime info 2026-04-02 17:01:56 +08:00
zhayujie
b5f33e5ecd feat: support qwen3.6-plus 2026-04-02 16:46:58 +08:00
zhayujie
40dfc6860f fix: skill list showing sub-skills inside collection 2026-04-02 11:47:24 +08:00
zhayujie
1c02a04423 fix: handle error when printing QR code on Windows GBK terminals 2026-04-01 17:23:57 +08:00
zhayujie
de0e45070c chore: remove conflicting dependency 2026-04-01 17:19:15 +08:00
zhayujie
c169cc7d74 fix: remove conflicting dependency 2026-04-01 17:12:15 +08:00
zhayujie
cd62ad76f6 fix: cow CLI support python3.7 2026-04-01 16:51:23 +08:00
zhayujie
dd25b0fb5b feat: refine system prompt style and tone guidance 2026-04-01 16:24:41 +08:00
zhayujie
a38b22a6a2 docs: update docs 2026-04-01 15:31:41 +08:00
zhayujie
830b8f2971 feat: release 2.0.5 2026-04-01 15:01:53 +08:00
zhayujie
b058af122c feat: release 2.0.5 2026-04-01 12:24:21 +08:00
zhayujie
174ee0cafc fix(security): prevent path traversal in memory content API 2026-04-01 10:03:58 +08:00
zhayujie
1c336380c0 docs: update release doc 2026-03-31 22:30:31 +08:00
zhayujie
3068880413 feat: save skill display name when downloading 2026-03-31 21:43:57 +08:00
zhayujie
be596681e5 Merge pull request #2735 from zhayujie/feat-wecom-bot-qrcode
feat(wecom_bot): add Wecom Bot QR code scan auth
2026-03-31 21:28:39 +08:00
zhayujie
66b71c50e9 feat(wecom_bot): add Wecom Bot QR code scan auth 2026-03-31 21:27:50 +08:00
zhayujie
8744810b25 fix: skill install timeout 2026-03-31 20:47:59 +08:00
zhayujie
7f94d37c2e fix: auto-install font in browser 2026-03-31 20:20:13 +08:00
zhayujie
6d9b7baeb4 fix(weixin): file send failed 2026-03-31 18:14:49 +08:00
zhayujie
4470d4c352 fix: reduce docker image size 2026-03-31 16:56:27 +08:00
zhayujie
d2a462a279 fix: add apt source in docker file 2026-03-31 16:34:47 +08:00
zhayujie
14ff2a15e7 fix(cli): cow cli in docker chat 2026-03-31 16:25:47 +08:00
zhayujie
6d1369900e feat: add source args in docker building 2026-03-31 16:06:45 +08:00
zhayujie
1f17ebe69e feat: add browser install in docker image 2026-03-31 16:05:05 +08:00
zhayujie
1ae2918064 feat: support install browser in chat 2026-03-31 15:15:17 +08:00
zhayujie
b6571e5cad fix: browser resource optimization 2026-03-30 21:39:38 +08:00
zhayujie
7549d48cf1 fix: browser thread bug 2026-03-30 21:27:08 +08:00
zhayujie
00353dd0cb feat: support skill hub mirror 2026-03-30 18:46:02 +08:00
zhayujie
afd947195d fix(cli): support skill mirror install 2026-03-30 16:36:17 +08:00
zhayujie
e57ef37167 fix: prevent phantom mouseover from hijacking slash menu 2026-03-30 11:52:05 +08:00
zhayujie
ef33a93654 Merge pull request #2731 from zkjqd/fix/slash-menu-click
Fix the issue where the shortcut command in the input box cannot be clicked to select events
2026-03-30 11:40:06 +08:00
zhayujie
61732aecfc Merge pull request #2721 from yrk111222/feat/modelscope-update
Feat/modelscope update
2026-03-30 11:39:50 +08:00
zkjqd
6764c05c3f input-slash-click 2026-03-30 11:20:03 +08:00
zhayujie
fa149cf4aa fix(browser): multi-thread browser instance bug 2026-03-30 00:57:19 +08:00
zhayujie
e4f9697d06 feat(browser): install font in linux 2026-03-29 23:52:51 +08:00
zhayujie
da061450e5 fix: github skill install cmd 2026-03-29 19:23:47 +08:00
zhayujie
d09ae49287 feat(browser): auto-snapshot on navigate, screenshot prompt guidance
Browser tool enhancements:
- Navigate action now auto-includes snapshot result, saving one LLM round-trip
- Wait for networkidle + 800ms after navigation for SPA/JS-rendered pages
- Prompt guides agent to screenshot key results and ask user for login/CAPTCHA help
- Fixed playwright version pinned to 1.52.0; mirror fallback to official CDN on failure

Web console file/image support:
- SSE real-time push for images and files via on_event (file_to_send)
- Added /api/file endpoint to serve local files for web preview
- Frontend renders images in media-content container (survives delta/done overwrites)
- File attachment cards with download links; RFC 5987 encoding for non-ASCII filenames

Tool workspace fix:
- Inject workspace_dir as cwd into send and browser tools (previously only file tools)
- Screenshots now save to ~/cow/tmp/ instead of project directory
2026-03-29 19:09:11 +08:00
zhayujie
511ee0bbaf fix: windows PowerShell script 2026-03-29 18:28:50 +08:00
zhayujie
3cb5a0fbd6 docs: add CLI system docs 2026-03-29 17:57:12 +08:00
zhayujie
e06925ab85 fix: optimize browser install cli and fix vision prompt 2026-03-29 15:19:59 +08:00
zhayujie
184634e4e7 fix(cli): browser install failed 2026-03-29 15:14:07 +08:00
zhayujie
843c2d02cc Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-03-29 15:09:37 +08:00
zhayujie
8ea2455766 feat(cli): add browser install cmd 2026-03-29 15:09:07 +08:00
zhayujie
9dc9987d56 Merge pull request #2727 from zhayujie/feat-browser-tool
feat: add browser tool
2026-03-29 14:59:39 +08:00
zhayujie
3458621147 feat: add browser tool 2026-03-29 14:59:06 +08:00
zhayujie
079df5a47c feat: support batch skill install from zip and github 2026-03-29 14:38:11 +08:00
zhayujie
ddb07c65a1 feat: support github zip-first download, gitLab, git@ ssh, local path 2026-03-29 13:45:15 +08:00
zhayujie
9b21cd222b fix: update run.sh 2026-03-28 19:36:51 +08:00
zhayujie
90f736843f fix: add click dependencies 2026-03-28 19:35:15 +08:00
zhayujie
13c020eb61 fix(cli): cli output in wecom_bot 2026-03-28 19:26:59 +08:00
zhayujie
dbc06dbe95 fix: use new run.sh when updating 2026-03-28 19:16:41 +08:00
zhayujie
23d097bc1c Merge pull request #2726 from zhayujie/feat-cow-cli
feat: cow cli in terminal and chat
2026-03-28 19:01:56 +08:00
zhayujie
db85b9808e feat(cli): add cow update 2026-03-28 18:58:42 +08:00
zhayujie
df5bae37bc feat: add MiniMax-M2.7 and glm-5-turbo in web console 2026-03-28 18:48:11 +08:00
zhayujie
acc23b6051 feat: optimize agent prompt and fix skill source load 2026-03-28 18:37:07 +08:00
zhayujie
61f2741afc feat: organize skill source field 2026-03-28 17:41:40 +08:00
zhayujie
4dd7ea886a feat(cli): cli options in web console 2026-03-28 16:26:41 +08:00
zhayujie
1e8959fbcf fix: optimize repo clone in run.sh 2026-03-28 15:08:57 +08:00
zhayujie
48729678cf Merge branch 'master' into feat-cow-cli 2026-03-28 14:47:20 +08:00
zhayujie
0684becaa7 fix(cli): register skill when installing 2026-03-28 14:42:18 +08:00
zhayujie
db16bdf8cb fix(cli): add security hardening for skill install and process management 2026-03-27 17:59:15 +08:00
zhayujie
f890318ed9 fix: strip leading/trailing whitespace from agent response 2026-03-26 18:13:39 +08:00
zhayujie
158510cbbe feat(cli): imporve cow cli and skill hub integration 2026-03-26 16:49:42 +08:00
zhayujie
ce90cf7aa8 fix: weixin cdn upload retry 2026-03-26 10:20:29 +08:00
zhayujie
a3a3d006eb Merge pull request #2723 from Xiaozhou345/Xiaozhou345-fix-readme-spacing
优化 README 中的中英文排版空格
2026-03-26 10:14:27 +08:00
zhayujie
8fd029a4a1 feat(cli): support cow cli 2026-03-26 10:08:51 +08:00
Xiaozhou345
2e1b52c1e5 优化 README 中的中英文排版空格
按照中文技术文档规范,在文件名和中文之间增加了空格,提升可读性。
2026-03-25 21:26:01 +08:00
zhayujie
3eb8348708 fix: docker volume permission issue and clean up unused dependencies 2026-03-25 01:25:34 +08:00
zhayujie
393f0c007c fix: context loss after trim 2026-03-24 20:49:28 +08:00
yrk
294e380288 update model_list 2026-03-24 11:00:55 +08:00
yrk
4c1c42efac feat: update modelscope bot 2026-03-24 10:43:45 +08:00
zhayujie
c062ca8c66 Merge pull request #2720 from 6vision/fix/deepseek-docs
Docs: update
2026-03-24 00:25:17 +08:00
6vision
76dcb25103 docs(deepseek): update model descriptions to V3.2 with thinking/non-thinking mode
Made-with: Cursor
2026-03-24 00:05:39 +08:00
6vision
c5b4f236db docs(deepseek): remove migration notes from zh and en docs
Made-with: Cursor
2026-03-24 00:05:39 +08:00
zhayujie
0974c940a8 Merge pull request #2719 from 6vision/feat/deepseek-bot
feat: add independent DeepSeek bot module with dedicated config
2026-03-23 22:42:58 +08:00
6vision
cffa20d37e docs(deepseek): remove migration notes to reduce user cognitive load
Made-with: Cursor
2026-03-23 22:39:15 +08:00
6vision
ef009edd29 docs(deepseek): update config guides for independent DeepSeek module
Update DeepSeek docs (zh/en/ja) and README to reflect the new dedicated deepseek_api_key / deepseek_api_base config fields, with backward compatibility notes.

Made-with: Cursor
2026-03-23 21:43:51 +08:00
zhayujie
3ca52b118d fix(weixin): qrcode url log 2026-03-23 21:33:53 +08:00
zhayujie
13f5fde4fb fix: rebuild system prompt from scratch on every turn 2026-03-23 21:27:44 +08:00
6vision
f512b55ec2 feat(deepseek): add independent DeepSeek bot module with dedicated config
Separate DeepSeek from ChatGPTBot into its own module (models/deepseek/) with dedicated deepseek_api_key and deepseek_api_base config fields, avoiding config conflicts when switching between providers. Backward compatible with old users who configured DeepSeek via open_ai_api_key/open_ai_api_base through automatic fallback.

Made-with: Cursor
2026-03-23 21:23:35 +08:00
zhayujie
22b8ca0095 feat: optimize vision image compression 2026-03-23 21:18:04 +08:00
zhayujie
baf66a103d fix(weixin): preserve original filename for received files 2026-03-23 01:18:02 +08:00
zhayujie
45faa9c1ff fix(wexin): resolve image/file send and receive failures 2026-03-23 00:13:41 +08:00
zhayujie
304381a88d fix: hide breadcrumb on mobile for better space utilization 2026-03-22 23:36:34 +08:00
zhayujie
fc9f54dbc8 feat(weixin): optimize login qrcode generate 2026-03-22 23:04:50 +08:00
zhayujie
7199dc187f fix: default gemini model 2026-03-22 22:52:37 +08:00
zhayujie
e9ae066d53 Merge pull request #2716 from cowagent/fix-gemini-model-attribute
fix: add missing model property to GoogleGeminiBot
2026-03-22 22:49:00 +08:00
cowagent
d71ae406ff fix: add missing model property to GoogleGeminiBot
api_key and api_base were refactored to @property but model was not
migrated, causing AttributeError: 'GoogleGeminiBot' object has no
attribute 'model' when using any Gemini model.
2026-03-22 22:43:26 +08:00
zhayujie
f3216904b3 feat(weixin): optimize weixin login qrcode 2026-03-22 21:34:47 +08:00
183 changed files with 15141 additions and 3186 deletions

9
.gitignore vendored
View File

@@ -33,7 +33,16 @@ plugins/banwords/lib/__pycache__
!plugins/keyword
!plugins/linkai
!plugins/agent
!plugins/cow_cli
client_config.json
ref/
**/.dev.vars
.cursor/
local/
node_modules/
# cow cli
dist/
build/
*.egg-info/
.cow.pid

292
README.md
View File

@@ -7,40 +7,44 @@
[中文] | [<a href="docs/en/README.md">English</a>] | [<a href="docs/ja/README.md">日本語</a>]
</p>
**CowAgent** 是基于大模型的超级AI助理能够主动思考和任务规划、操作计算机和外部资源、创造和执行Skills、拥有长期记忆并不断成长比OpenClaw更轻量和便捷。CowAgent 支持灵活切换多种模型能处理文本、语音、图片、文件等多模态消息可接入微信、飞书、钉钉、企微智能机器人、QQ、企微自建应用、微信公众号、网页中使用7*24小时运行于你的个人电脑或服务器中。
**CowAgent** 是基于大模型的超级 AI 助理,能够主动思考和任务规划、操作计算机和外部资源、创造和执行 Skills、拥有长期记忆和知识库并不断成长,比 OpenClaw 更轻量和便捷。CowAgent 支持灵活切换多种模型能处理文本、语音、图片、文件等多模态消息可接入微信、飞书、钉钉、企微智能机器人、QQ、企微自建应用、微信公众号、网页中使用7*24小时运行于你的个人电脑或服务器中。
<p align="center">
<a href="https://cowagent.ai/">🌐 官网</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/">📖 文档中心</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/guide/quick-start">🚀 快速开始</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 技能广场</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ 在线体验</a>
</p>
# 简介
> 该项目既是一个可以开箱即用的超级AI助理也是一个支持高扩展的Agent框架可以通过为项目扩展大模型接口、接入渠道、内置工具、Skills系统来灵活实现各种定制需求。核心能力如下
> 该项目既是一个可以开箱即用的超级 AI 助理,也是一个支持高扩展的 Agent 框架可以通过为项目扩展大模型接口、接入渠道、内置工具、Skills 系统来灵活实现各种定制需求。核心能力如下:
-**复杂任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标,支持通过工具操作访问文件、终端、浏览器、定时任务等系统资源
-**长期记忆:** 自动将对话记忆持久化至本地文件和数据库中,包括全局记忆和级记忆,支持关键词及向量检索
-**技能系统** 实现了Skills创建和运行的引擎内置多种技能并支持通过自然语言对话完成自定义Skills开发
-**自主任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标
-**长期记忆:** 自动将对话记忆持久化至本地文件和数据库中,包括核心记忆和级记忆,支持关键词及向量检索
-**个人知识库** 自动整理结构化知识,通过交叉引用构建知识图谱,支持通过对话管理和可视化浏览知识库
-**技能系统:** Skills 安装和运行的引擎,支持从 [Skill Hub](https://skills.cowagent.ai/)、GitHub 等一键安装技能,或通过对话创造 Skills
-**工具系统:** 内置文件读写、终端执行、浏览器操作、定时任务等工具Agent 自主调用以完成复杂任务
-**CLI系统** 提供终端命令和对话命令,支持进程管理、技能安装、配置修改等操作
-**多模态消息:** 支持对文本、图片、语音、文件等多类型消息进行解析、处理、生成、发送等操作
-**多模型接入** 支持OpenAI, Claude, Gemini, DeepSeek, MiniMax、GLM、Qwen、Kimi、Doubao等国内外主流模型厂商
-**多端部署** 支持运行在本地计算机或服务器可集成到微信、飞书、钉钉、企业微信、QQ、微信公众号、网页中使用
-**多模型支持** 支持 OpenAI, Claude, Gemini, DeepSeek, MiniMax、GLM、Qwen、Kimi、Doubao 等国内外主流模型厂商
-**多通道接入** 支持运行在本地计算机或服务器可集成到微信、飞书、钉钉、企业微信、QQ、微信公众号、网页中使用
## 声明
1. 本项目遵循 [MIT开源协议](/LICENSE),主要用于技术研究和学习,使用本项目时需遵守所在地法律法规、相关政策以及企业章程,禁止用于任何违法或侵犯他人权益的行为。任何个人、团队和企业,无论以何种方式使用该项目、对何对象提供服务,所产生的一切后果,本项目均不承担任何责任。
2. 成本与安全Agent模式下Token使用量高于普通对话模式请根据效果及成本综合选择模型。Agent具有访问所在操作系统的能力请谨慎选择项目部署环境。同时项目也会持续升级安全机制、并降低模型消耗成本。
3. CowAgent项目专注于开源技术开发不会参与、授权或发行任何加密货币。
1. 本项目遵循 [MIT 开源协议](/LICENSE),主要用于技术研究和学习,使用本项目时需遵守所在地法律法规、相关政策以及企业章程,禁止用于任何违法或侵犯他人权益的行为。任何个人、团队和企业,无论以何种方式使用该项目、对何对象提供服务,所产生的一切后果,本项目均不承担任何责任。
2. 成本与安全Agent 模式下 Token 使用量高于普通对话模式请根据效果及成本综合选择模型。Agent 具有访问所在操作系统的能力,请谨慎选择项目部署环境。同时项目也会持续升级安全机制、并降低模型消耗成本。
3. CowAgent 项目专注于开源技术开发,不会参与、授权或发行任何加密货币。
## 演示
- 使用说明(Agent模式)[CowAgent介绍](https://docs.cowagent.ai/intro/features)
- 使用说明( Agent 模式)[CowAgent 介绍](https://docs.cowagent.ai/intro/features)
- 免部署在线体验:[CowAgent](https://link-ai.tech/cowagent/create)
- DEMO视频(对话模式)https://cdn.link-ai.tech/doc/cow_demo.mp4
- DEMO 视频(对话模式)https://cdn.link-ai.tech/doc/cow_demo.mp4
## 社区
@@ -54,9 +58,9 @@
<a href="https://link-ai.tech" target="_blank"><img width="650" src="https://cdn.link-ai.tech/image/link-ai-intro.jpg"></a>
> [LinkAI](https://link-ai.tech/) 是面向企业和个人的一站式AI智能体平台聚合多模态大模型、知识库、技能、工作流等能力支持一键接入主流平台并管理支持SaaS、私有化部署等多种模式可免部署在线运行[CowAgent助理](https://link-ai.tech/cowagent/create)。
> [LinkAI](https://link-ai.tech/) 是面向企业和个人的一站式 AI 智能体平台,聚合多模态大模型、知识库、技能、工作流等能力,支持一键接入主流平台并管理,支持 SaaS、私有化部署等多种模式可免部署在线运行[CowAgent 助理](https://link-ai.tech/cowagent/create)。
>
> LinkAI 目前已在智能客服、私域运营、企业效率助手等场景积累了丰富的AI解决方案在消费、健康、文教、科技制造等各行业沉淀了大模型落地应用的最佳实践致力于帮助更多企业和开发者拥抱 AI 生产力。
> LinkAI 目前已在智能客服、私域运营、企业效率助手等场景积累了丰富的 AI 解决方案,在消费、健康、文教、科技制造等各行业沉淀了大模型落地应用的最佳实践,致力于帮助更多企业和开发者拥抱 AI 生产力。
**产品咨询和企业服务** 可联系产品客服:
@@ -66,15 +70,17 @@
# 🏷 更新日志
>**2026.04.01** [2.0.5版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.5)Cow CLI 命令系统、Skill Hub 开源、浏览器工具、企微扫码创建、多项优化和修复。
>**2026.03.22** [2.0.4版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.4),新增个人微信通道(微信扫码即用)、新增 MiniMax-M2.7 和 GLM-5-Turbo 模型、run.sh 脚本重构、日文文档及多项修复。
>**2026.03.18** [2.0.3版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.3),新增企微智能机器人和 QQ 通道、支持Coding Plan、新增多个模型、Web端文件处理、记忆系统升级。
>**2026.03.18** [2.0.3版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.3),新增企微智能机器人和 QQ 通道、支持 Coding Plan、新增多个模型、Web 端文件处理、记忆系统升级。
>**2026.02.27** [2.0.2版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.2)Web 控制台全面升级(流式对话、模型/技能/记忆/通道/定时任务/日志管理)、支持多通道同时运行、会话持久化存储、新增多个模型。
>**2026.02.13** [2.0.1版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.1),内置 Web Search 工具、智能上下文裁剪策略、运行时信息动态更新、Windows 兼容性适配,修复定时任务记忆丢失、飞书连接等多项问题。
>**2026.02.03** [2.0.0版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.0)正式升级为超级Agent助理支持多轮任务决策、具备长期记忆、实现多种系统工具、支持Skills框架新增多种模型并优化了接入渠道。
>**2026.02.03** [2.0.0版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.0),正式升级为超级 Agent 助理,支持多轮任务决策、具备长期记忆、实现多种系统工具、支持 Skills 框架,新增多种模型并优化了接入渠道。
更多更新历史请查看: [更新日志](https://docs.cowagent.ai/releases)
@@ -86,11 +92,17 @@
在终端执行以下命令:
**Linux / macOS**
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)
**WindowsPowerShell**
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start``cow stop` 等 [CLI 命令](https://docs.cowagent.ai/cli/index) 管理服务。
## 一、准备
@@ -99,15 +111,15 @@ bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
项目支持国内外主流厂商的模型接口,可选模型及配置说明参考:[模型说明](#模型说明)。
> Agent模式下推荐使用以下模型可根据效果及成本综合选择MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview、gpt-5.4、gpt-5.4-mini
> Agent 模式下推荐使用以下模型可根据效果及成本综合选择MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview、gpt-5.4、gpt-5.4-mini
同时支持使用 **LinkAI平台** 接口支持上述全部模型并支持知识库、工作流、插件等Agent技能参考 [接口文档](https://docs.link-ai.tech/platform/api)。
同时支持使用 **LinkAI 平台** 接口,支持上述全部模型,并支持知识库、工作流、插件等 Agent 技能,参考 [接口文档](https://docs.link-ai.tech/platform/api)。
### 2.环境安装
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`Python版本需在3.7 ~ 3.12 之间推荐使用3.9版本
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`Python 版本需在3.7 ~ 3.12 之间。
> 注意Agent模式推荐使用源码运行若选择Docker部署则无需安装python环境和下载源码可直接快进到下一节。
> 注意Agent 模式推荐使用源码运行,若选择 Docker 部署则无需安装 python 环境和下载源码,可直接快进到下一节。
**(1) 克隆项目代码:**
@@ -129,45 +141,68 @@ pip3 install -r requirements.txt
```bash
pip3 install -r requirements-optional.txt
```
> 国内网络可使用镜像源加速:`pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple`
如果某项依赖安装失败可注释掉对应的行后重试。
**(4) 安装 Cow CLI (推荐)**
```bash
pip3 install -e .
```
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/cli/index)。
**(5) 安装浏览器工具 (可选)**
如果需要 Agent 操作浏览器(如访问网页、填写表单等),需要额外安装浏览器依赖:
```bash
cow install-browser
```
该命令会自动安装 `playwright` 和 Chromium 浏览器,国内网络自动使用镜像加速。详见 [浏览器工具文档](https://docs.cowagent.ai/tools/browser)。
## 二、配置
配置文件的模板在根目录的`config-template.json`中,需复制该模板创建最终生效的 `config.json` 文件:
配置文件的模板在根目录的 `config-template.json` 中,需复制该模板创建最终生效的 `config.json` 文件:
```bash
cp config-template.json config.json
```
然后在`config.json`中填入配置以下是对默认配置的说明可根据需要进行自定义修改注意实际使用时请去掉注释保证JSON格式的规范
然后在 `config.json` 中填入配置,以下是对默认配置的说明,可根据需要进行自定义修改(注意实际使用时请去掉注释,保证 JSON 格式的规范):
```bash
# config.json 文件内容示例
{
"channel_type": "weixin", # 接入渠道类型默认为weixin, 支持修改为 feishu,dingtalk,wecom_bot,qq,wechatcom_app,wechatmp_service,wechatmp,terminal
"channel_type": "weixin", # 接入渠道类型,默认为 weixin, 支持修改为 feishu,dingtalk,wecom_bot,qq,wechatcom_app,wechatmp_service,wechatmp,terminal
"model": "MiniMax-M2.7", # 模型名称
"minimax_api_key": "", # MiniMax API Key
"zhipu_ai_api_key": "", # 智谱GLM API Key
"zhipu_ai_api_key": "", # 智谱 GLM API Key
"moonshot_api_key": "", # Kimi/Moonshot API Key
"ark_api_key": "", # 豆包(火山方舟) API Key
"dashscope_api_key": "", # 百炼(通义千问)API Key
"dashscope_api_key": "", # 百炼(通义千问) API Key
"claude_api_key": "", # Claude API Key
"claude_api_base": "https://api.anthropic.com/v1", # Claude API 地址,修改可接入三方代理平台
"gemini_api_key": "", # Gemini API Key
"gemini_api_base": "https://generativelanguage.googleapis.com", # Gemini API地址
"gemini_api_base": "https://generativelanguage.googleapis.com", # Gemini API 地址
"deepseek_api_key": "", # DeepSeek API Key
"deepseek_api_base": "https://api.deepseek.com/v1", # DeepSeek API 地址,可修改为第三方代理
"open_ai_api_key": "", # OpenAI API Key
"open_ai_api_base": "https://api.openai.com/v1", # OpenAI API 地址
"linkai_api_key": "", # LinkAI API Key
"proxy": "", # 代理客户端的ip和端口国内环境需要开启代理的可填写该项如 "127.0.0.1:7890"
"proxy": "", # 代理客户端的 ip 和端口,国内环境需要开启代理的可填写该项,如 "127.0.0.1:7890"
"speech_recognition": false, # 是否开启语音识别
"group_speech_recognition": false, # 是否开启群组语音识别
"voice_reply_voice": false, # 是否使用语音回复语音
"use_linkai": false, # 是否使用LinkAI接口默认关闭设置为true后可对接LinkAI平台模型
"agent": true, # 是否启用Agent模式启用后拥有多轮工具决策、长期记忆、Skills能力等
"agent_workspace": "~/cow", # Agent的工作空间路径用于存储memory、skills、系统设定等
"agent_max_context_tokens": 40000, # Agent模式下最大上下文tokens超出将自动丢弃最早的上下文
"agent_max_context_turns": 30, # Agent模式下最大上下文记忆轮次每轮包括一次用户提问和AI回复
"agent_max_steps": 15 # Agent模式下单次任务的最大决策步数超出后将停止继续调用工具
"use_linkai": false, # 是否使用 LinkAI 接口,默认关闭,设置为 true 后可对接 LinkAI 平台模型
"agent": true, # 是否启用 Agent 模式启用后拥有多轮工具决策、长期记忆、Skills 能力等
"agent_workspace": "~/cow", # Agent 的工作空间路径,用于存储 memory、skills、系统设定等
"agent_max_context_tokens": 40000, # Agent 模式下最大上下文 tokens超出将自动丢弃最早的上下文
"agent_max_context_turns": 30, # Agent 模式下最大上下文记忆轮次,每轮包括一次用户提问和 AI 回复
"agent_max_steps": 15 # Agent 模式下单次任务的最大决策步数,超出后将停止继续调用工具
}
```
@@ -176,23 +211,24 @@ pip3 install -r requirements-optional.txt
<details>
<summary>1. 语音配置</summary>
+ 添加 `"speech_recognition": true` 将开启语音识别默认使用openaiwhisper模型识别为文字同时以文字回复该参数仅支持私聊 (注意由于语音消息无法匹配前缀,一旦开启将对所有语音自动回复,支持语音触发画图)
+ 添加 `"group_speech_recognition": true` 将开启群组语音识别默认使用openaiwhisper模型识别为文字同时以文字回复参数仅支持群聊 (会匹配group_chat_prefixgroup_chat_keyword, 支持语音触发画图)
+ 添加 `"speech_recognition": true` 将开启语音识别,默认使用 openaiwhisper 模型识别为文字,同时以文字回复,该参数仅支持私聊 (注意由于语音消息无法匹配前缀,一旦开启将对所有语音自动回复,支持语音触发画图)
+ 添加 `"group_speech_recognition": true` 将开启群组语音识别,默认使用 openaiwhisper 模型识别为文字,同时以文字回复,参数仅支持群聊 (会匹配 group_chat_prefixgroup_chat_keyword, 支持语音触发画图)
+ 添加 `"voice_reply_voice": true` 将开启语音回复语音(同时作用于私聊和群聊)
+ 使用 MiniMax TTS设置 `"text_to_voice": "minimax"`,并配置 `minimax_api_key`;可通过 `"tts_voice_id"` 指定发音人(如 `English_Graceful_Lady``"text_to_voice_model"` 指定模型(如 `speech-2.8-hd``speech-2.8-turbo`
</details>
<details>
<summary>2. 其他配置</summary>
+ `model`: 模型名称Agent模式下推荐使用 `MiniMax-M2.7``glm-5-turbo``kimi-k2.5``qwen3.5-plus``claude-sonnet-4-6``gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
+ `character_desc`普通对话模式下的机器人系统提示词。在Agent模式下该配置不生效由工作空间中的文件内容构成。
+ `subscribe_msg`订阅消息公众号和企业微信channel中请填写当被订阅时会自动回复 可使用特殊占位符。目前支持的占位符有{trigger_prefix}在程序中它会自动替换成bot的触发词。
+ `model`: 模型名称Agent 模式下推荐使用 `MiniMax-M2.7``glm-5-turbo``kimi-k2.5``qwen3.6-plus``claude-sonnet-4-6``gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
+ `character_desc`:普通对话模式下的机器人系统提示词。在 Agent 模式下该配置不生效,由工作空间中的文件内容构成。
+ `subscribe_msg`:订阅消息,公众号和企业微信 channel 中请填写,当被订阅时会自动回复, 可使用特殊占位符。目前支持的占位符有{trigger_prefix},在程序中它会自动替换成 bot 的触发词。
</details>
<details>
<summary>3. LinkAI配置</summary>
<summary>3. LinkAI 配置</summary>
+ `use_linkai`: 是否使用LinkAI接口默认关闭设置为true后可对接LinkAI平台使用模型、知识库、工作流、插件等技能, 参考[接口文档](https://docs.link-ai.tech/platform/api/chat)
+ `use_linkai`: 是否使用 LinkAI 接口,默认关闭,设置为 true 后可对接 LinkAI 平台,使用模型、知识库、工作流、插件等技能, 参考[接口文档](https://docs.link-ai.tech/platform/api/chat)
+ `linkai_api_key`: LinkAI Api Key可在 [控制台](https://link-ai.tech/console/interface) 创建
</details>
@@ -205,31 +241,41 @@ pip3 install -r requirements-optional.txt
如果是个人计算机 **本地运行**,直接在项目根目录下执行:
```bash
python3 app.py # windows环境下该命令通常为 python app.py
cow start # 推荐,需先安装 Cow CLI
python3 app.py # 或直接运行windows 环境下该命令通常为 python app.py
```
运行后默认会启动web服务可通过访问 `http://localhost:9899/chat` 在网页端对话。
运行后默认会启动 web 服务,可通过访问 `http://localhost:9899/chat` 在网页端对话。
如果需要接入其他应用通道只需修改 `config.json` 配置文件中的 `channel_type` 参数,详情参考:[通道说明](#通道说明)。
### 2.服务器部署
在服务器中可使用 `nohup` 命令在后台运行程序
推荐使用 `cow` 命令管理服务
```bash
cow start # 后台启动
cow stop # 停止服务
cow restart # 重启服务
cow status # 查看运行状态
cow logs # 查看日志
cow update # 拉取最新代码并重启
```
也可以使用传统方式后台运行:
```bash
nohup python3 app.py & tail -f nohup.out
```
执行后程序运行于服务器后台,可通过 `ctrl+c` 关闭日志,不会影响后台程序的运行。使用 `ps -ef | grep app.py | grep -v grep` 命令可查看运行于后台的进程,如果想要重新启动程序可以先 `kill` 掉对应的进程。 日志关闭后如果想要再次打开只需输入 `tail -f nohup.out`
此外,项目根目录下的 `run.sh` 脚本也支持一键管理服务,包括 `./run.sh start``./run.sh stop``./run.sh restart` 命令,执行 `./run.sh help` 可查看全部用法。
此外,项目根目录下的 `run.sh` 脚本支持一键启动和管理服务,包括 `./run.sh start``./run.sh stop``./run.sh restart``./run.sh logs` 等命令,执行 `./run.sh help` 可查看全部用法
> 如果需要通过浏览器访问Web控制台请确保服务器的 `9899` 端口已在防火墙或安全组中放行建议仅对指定IP开放以保证安全。
> 如果需要通过浏览器访问 Web 控制台,请确保服务器的 `9899` 端口已在防火墙或安全组中放行,建议仅对指定 IP 开放以保证安全
### 3.Docker部署
使用docker部署无需下载源码和安装依赖只需要获取 `docker-compose.yml` 配置文件并启动容器即可。Agent模式下更推荐使用源码进行部署以获得更多系统访问能力。
使用 docker 部署无需下载源码和安装依赖,只需要获取 `docker-compose.yml` 配置文件并启动容器即可。Agent 模式下更推荐使用源码进行部署,以获得更多系统访问能力。
> 前提是需要安装好 `docker` 及 `docker-compose`,安装成功后执行 `docker -v` 和 `docker-compose version` (或 `docker compose version`) 可查看到版本号。安装地址为 [docker官网](https://docs.docker.com/engine/install/) 。
@@ -249,22 +295,22 @@ curl -O https://cdn.link-ai.tech/code/cow/docker-compose.yml
sudo docker compose up -d # 若docker-compose为 1.X 版本,则执行 `sudo docker-compose up -d`
```
运行命令后,会自动取 [docker hub](https://hub.docker.com/r/zhayujie/chatgpt-on-wechat) 拉取最新release版本的镜像。当执行 `sudo docker ps` 能查看到 NAMES 为 chatgpt-on-wechat 的容器即表示运行成功。最后执行以下命令可查看容器的运行日志:
运行命令后,会自动取 [docker hub](https://hub.docker.com/r/zhayujie/chatgpt-on-wechat) 拉取最新 release 版本的镜像。当执行 `sudo docker ps` 能查看到 NAMES 为 chatgpt-on-wechat 的容器即表示运行成功。最后执行以下命令可查看容器的运行日志:
```bash
sudo docker logs -f chatgpt-on-wechat
```
> 如果需要通过浏览器访问Web控制台请确保服务器的 `9899` 端口已在防火墙或安全组中放行建议仅对指定IP开放以保证安全。
> 如果需要通过浏览器访问 Web 控制台,请确保服务器的 `9899` 端口已在防火墙或安全组中放行,建议仅对指定 IP 开放以保证安全。
## 模型说明
以下对所有可支持的模型配置和使用方法进行说明,模型接口实现在项目的 `models/` 目录下。
推荐通过 Web 控制台在线管理模型配置,无需手动编辑文件,详见 [模型文档](https://docs.cowagent.ai/models)。以下是手动修改 `config.json` 配置模型的说明:
<details>
<summary>OpenAI</summary>
1. API Key创建在 [OpenAI平台](https://platform.openai.com/api-keys) 创建API Key
1. API Key 创建:在 [OpenAI平台](https://platform.openai.com/api-keys) 创建 API Key
2. 填写配置
@@ -277,15 +323,15 @@ sudo docker logs -f chatgpt-on-wechat
}
```
- `model`: 与OpenAI接口的 [model参数](https://platform.openai.com/docs/models) 一致,支持包括 gpt-5.4、gpt-5.4-mini、gpt-5.4-nano、o系列、gpt-4.1等模型Agent模式推荐使用 `gpt-5.4``gpt-5.4-mini`
- `model`: 与 OpenAI 接口的 [model参数](https://platform.openai.com/docs/models) 一致,支持包括 gpt-5.4、gpt-5.4-mini、gpt-5.4-nano、o 系列、gpt-4.1 等模型Agent 模式推荐使用 `gpt-5.4``gpt-5.4-mini`
- `open_ai_api_base`: 如果需要接入第三方代理接口,可通过修改该参数进行接入
- `bot_type`: 使用OpenAI相关模型时无需填写。当使用第三方代理接口接入Claude等非OpenAI官方模型时该参数设为 `openai`
- `bot_type`: 使用 OpenAI 相关模型时无需填写。当使用第三方代理接口接入 Claude 等非 OpenAI 官方模型时,该参数设为 `openai`
</details>
<details>
<summary>LinkAI</summary>
1. API Key创建在 [LinkAI平台](https://link-ai.tech/console/interface) 创建API Key
1. API Key 创建:在 [LinkAI平台](https://link-ai.tech/console/interface) 创建 API Key
2. 填写配置
@@ -297,8 +343,8 @@ sudo docker logs -f chatgpt-on-wechat
}
```
+ `use_linkai`: 是否使用LinkAI接口默认关闭设置为true后可对接LinkAI平台的模型并使用知识库、工作流、数据库、插件等丰富的Agent技能
+ `linkai_api_key`: LinkAI平台的API Key可在 [控制台](https://link-ai.tech/console/interface) 中创建
+ `use_linkai`: 是否使用 LinkAI 接口,默认关闭,设置为 true 后可对接 LinkAI 平台的模型,并使用知识库、工作流、数据库、插件等丰富的 Agent 技能
+ `linkai_api_key`: LinkAI 平台的 API Key可在 [控制台](https://link-ai.tech/console/interface) 中创建
+ `model`: [模型列表](https://link-ai.tech/console/models)中的全部模型均可使用
</details>
@@ -313,10 +359,10 @@ sudo docker logs -f chatgpt-on-wechat
"minimax_api_key": ""
}
```
- `model`: 可填写 `MiniMax-M2.7、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2、abab6.5-chat`
- `minimax_api_key`MiniMax平台的API-KEY在 [控制台](https://platform.minimaxi.com/user-center/basic-information/interface-key) 创建
- `model`: 可填写 `MiniMax-M2.7、MiniMax-M2.7-highspeed、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2、abab6.5-chat`
- `minimax_api_key`MiniMax 平台的 API-KEY在 [控制台](https://platform.minimaxi.com/user-center/basic-information/interface-key) 创建
方式二OpenAI兼容方式接入配置如下
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
@@ -325,10 +371,10 @@ sudo docker logs -f chatgpt-on-wechat
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI兼容方式
- `model`: 可填 `MiniMax-M2.7、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2`,参考[API文档](https://platform.minimaxi.com/document/%E5%AF%B9%E8%AF%9D?key=66701d281d57f38758d581d0#QklxsNSbaf6kM4j6wjO5eEek)
- `open_ai_api_base`: MiniMax平台API的 BASE URL
- `open_ai_api_key`: MiniMax平台的API-KEY
- `bot_type`: OpenAI 兼容方式
- `model`: 可填 `MiniMax-M2.7、MiniMax-M2.7-highspeed、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2`,参考[API文档](https://platform.minimaxi.com/document/%E5%AF%B9%E8%AF%9D?key=66701d281d57f38758d581d0#QklxsNSbaf6kM4j6wjO5eEek)
- `open_ai_api_base`: MiniMax 平台 API 的 BASE URL
- `open_ai_api_key`: MiniMax 平台的 API-KEY
</details>
<details>
@@ -342,10 +388,10 @@ sudo docker logs -f chatgpt-on-wechat
"zhipu_ai_api_key": ""
}
```
- `model`: 可填 `glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long` 等, 参考 [glm系列模型编码](https://bigmodel.cn/dev/api/normal-model/glm-4)
- `zhipu_ai_api_key`: 智谱AI平台的 API KEY在 [控制台](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) 创建
- `model`: 可填 `glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long` 等, 参考 [glm 系列模型编码](https://bigmodel.cn/dev/api/normal-model/glm-4)
- `zhipu_ai_api_key`: 智谱AI 平台的 API KEY在 [控制台](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) 创建
方式二OpenAI兼容方式接入配置如下
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
@@ -354,38 +400,38 @@ sudo docker logs -f chatgpt-on-wechat
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI兼容方式
- `bot_type`: OpenAI 兼容方式
- `model`: 可填 `glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long`
- `open_ai_api_base`: 智谱AI平台的 BASE URL
- `open_ai_api_key`: 智谱AI平台的 API KEY
- `open_ai_api_base`: 智谱AI 平台的 BASE URL
- `open_ai_api_key`: 智谱AI 平台的 API KEY
</details>
<details>
<summary>通义千问 (Qwen)</summary>
方式一官方SDK接入配置如下(推荐)
方式一:官方 SDK 接入,配置如下(推荐)
```json
{
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"dashscope_api_key": "sk-qVxxxxG"
}
```
- `model`: 可填写 `qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus`
- `dashscope_api_key`: 通义千问的 API-KEY参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
- `model`: 可填写 `qwen3.6-plus、qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus`
- `dashscope_api_key`: 通义千问的 API-KEY参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
方式二OpenAI兼容方式接入配置如下
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"open_ai_api_key": "sk-qVxxxxG"
}
```
- `bot_type`: OpenAI兼容方式
- `bot_type`: OpenAI 兼容方式
- `model`: 支持官方所有模型,参考[模型列表](https://help.aliyun.com/zh/model-studio/models?spm=a2c4g.11186623.0.0.78d84823Kth5on#9f8890ce29g5u)
- `open_ai_api_base`: 通义千问API的 BASE URL
- `open_ai_api_base`: 通义千问 API 的 BASE URL
- `open_ai_api_key`: 通义千问的 API-KEY
</details>
@@ -401,9 +447,9 @@ sudo docker logs -f chatgpt-on-wechat
}
```
- `model`: 可填写 `kimi-k2.5、kimi-k2、moonshot-v1-8k、moonshot-v1-32k、moonshot-v1-128k`
- `moonshot_api_key`: MoonshotAPI-KEY在 [控制台](https://platform.moonshot.cn/console/api-keys) 创建
- `moonshot_api_key`: MoonshotAPI-KEY在 [控制台](https://platform.moonshot.cn/console/api-keys) 创建
方式二OpenAI兼容方式接入配置如下
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
@@ -412,16 +458,16 @@ sudo docker logs -f chatgpt-on-wechat
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI兼容方式
- `bot_type`: OpenAI 兼容方式
- `model`: 可填写 `kimi-k2.5、kimi-k2、moonshot-v1-8k、moonshot-v1-32k、moonshot-v1-128k`
- `open_ai_api_base`: Moonshot的 BASE URL
- `open_ai_api_key`: Moonshot的 API-KEY
- `open_ai_api_base`: Moonshot 的 BASE URL
- `open_ai_api_key`: Moonshot 的 API-KEY
</details>
<details>
<summary>豆包 (Doubao)</summary>
1. API Key创建在 [火山方舟控制台](https://console.volcengine.com/ark/region:ark+cn-beijing/apikey) 创建API Key
1. API Key 创建:在 [火山方舟控制台](https://console.volcengine.com/ark/region:ark+cn-beijing/apikey) 创建API Key
2. 填写配置
@@ -439,7 +485,7 @@ sudo docker logs -f chatgpt-on-wechat
<details>
<summary>Claude</summary>
1. API Key创建在 [Claude控制台](https://console.anthropic.com/settings/keys) 创建API Key
1. API Key 创建:在 [Claude控制台](https://console.anthropic.com/settings/keys) 创建 API Key
2. 填写配置
@@ -455,7 +501,7 @@ sudo docker logs -f chatgpt-on-wechat
<details>
<summary>Gemini</summary>
API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn) 创建API Key ,配置如下
API Key 创建:在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn) 创建 API Key ,配置如下
```json
{
"model": "gemini-3.1-flash-lite-preview",
@@ -468,30 +514,40 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
<details>
<summary>DeepSeek</summary>
1. API Key创建在 [DeepSeek平台](https://platform.deepseek.com/api_keys) 创建API Key
1. API Key 创建:在 [DeepSeek 平台](https://platform.deepseek.com/api_keys) 创建 API Key
2. 填写配置
方式一:官方接入(推荐):
```json
{
"model": "deepseek-chat",
"open_ai_api_key": "sk-xxxxxxxxxxx",
"open_ai_api_base": "https://api.deepseek.com/v1",
"bot_type": "openai"
"deepseek_api_key": "sk-xxxxxxxxxxx"
}
```
- `bot_type`: OpenAI兼容方式
- `model`: 可填 `deepseek-chat、deepseek-reasoner`,分别对应的是 DeepSeek-V3 和 DeepSeek-R1 模型
- `open_ai_api_key`: DeepSeek平台的 API Key
- `open_ai_api_base`: DeepSeek平台 BASE URL
</details>
- `model`: 可填 `deepseek-chat、deepseek-reasoner`,分别对应的是 DeepSeek-V3.2(非思考模式)和 DeepSeek-R1思考模式
- `deepseek_api_key`: DeepSeek 平台的 API Key
- `deepseek_api_base`: 可选,默认为 `https://api.deepseek.com/v1`,可修改为第三方代理地址
方式二OpenAI 兼容方式接入:
```json
{
"model": "deepseek-chat",
"bot_type": "openai",
"open_ai_api_key": "sk-xxxxxxxxxxx",
"open_ai_api_base": "https://api.deepseek.com/v1"
}
```
</details>
<details>
<summary>Azure</summary>
1. API Key创建在 [Azure平台](https://oai.azure.com/) 创建API Key
1. API Key 创建:在 [Azure平台](https://oai.azure.com/) 创建 API Key
2. 填写配置
@@ -508,15 +564,15 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
- `model`: 留空即可
- `use_azure_chatgpt`: 设为 true
- `open_ai_api_key`: Azure平台的密钥
- `open_ai_api_base`: Azure平台的 BASE URL
- `azure_deployment_id`: Azure平台部署的模型名称
- `azure_api_version`: api版本以及以上参数可以在部署的 [模型配置](https://oai.azure.com/resource/deployments) 界面查看
- `open_ai_api_key`: Azure 平台的密钥
- `open_ai_api_base`: Azure 平台的 BASE URL
- `azure_deployment_id`: Azure 平台部署的模型名称
- `azure_api_version`: api 版本以及以上参数可以在部署的 [模型配置](https://oai.azure.com/resource/deployments) 界面查看
</details>
<details>
<summary>百度文心</summary>
方式一官方SDK接入配置如下
方式一:官方 SDK 接入,配置如下:
```json
{
@@ -529,7 +585,7 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
- `baidu_wenxin_api_key`:参考 [千帆平台-access_token鉴权](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/dlv4pct3s) 文档获取 API Key
- `baidu_wenxin_secret_key`:参考 [千帆平台-access_token鉴权](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/dlv4pct3s) 文档获取 Secret Key
方式二OpenAI兼容方式接入配置如下
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
@@ -538,10 +594,10 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
"open_ai_api_key": "bce-v3/ALTxxxxxxd2b"
}
```
- `bot_type`: OpenAI兼容方式
- `bot_type`: OpenAI 兼容方式
- `model`: 支持官方所有模型,参考[模型列表](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Wm9cvy6rl)
- `open_ai_api_base`: 百度文心API的 BASE URL
- `open_ai_api_key`: 百度文心的 API-KEY参考 [官方文档](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) ,在 [控制台](https://console.bce.baidu.com/iam/#/iam/apikey/list) 创建API Key
- `open_ai_api_base`: 百度文心 API 的 BASE URL
- `open_ai_api_key`: 百度文心的 API-KEY参考 [官方文档](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) ,在 [控制台](https://console.bce.baidu.com/iam/#/iam/apikey/list) 创建 API Key
</details>
@@ -565,7 +621,7 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
- `xunfei_domain`: 可填写 `4.0Ultra、generalv3.5、max-32k、generalv3、pro-128k、lite`
- `xunfei_spark_url`: 填写参考 [官方文档-请求地址](https://www.xfyun.cn/doc/spark/Web.html#_1-1-%E8%AF%B7%E6%B1%82%E5%9C%B0%E5%9D%80) 的说明
方式二OpenAI兼容方式接入配置如下
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
@@ -574,7 +630,7 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI兼容方式
- `bot_type`: OpenAI 兼容方式
- `model`: 可填写 `4.0Ultra、generalv3.5、max-32k、generalv3、pro-128k、lite`
- `open_ai_api_base`: 讯飞星火平台的 BASE URL
- `open_ai_api_key`: 讯飞星火平台的[APIPassword](https://console.xfyun.cn/services/bm3) ,因模型而已
@@ -593,10 +649,10 @@ API Key创建在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn)
}
```
- `bot_type`: modelscope接口格式
- `bot_type`: modelscope 接口格式
- `model`: 参考[模型列表](https://www.modelscope.cn/models?filter=inference_type&page=1)
- `modelscope_api_key`: 参考 [官方文档-访问令牌](https://modelscope.cn/docs/accounts/token) ,在 [控制台](https://modelscope.cn/my/myaccesstoken)
- `modelscope_base_url`: modelscope平台的 BASE URL
- `modelscope_base_url`: modelscope 平台的 BASE URL
- `text_to_image`: 图像生成模型,参考[模型列表](https://www.modelscope.cn/models?filter=inference_type&page=1)
</details>
@@ -614,13 +670,13 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
}
```
目前支持阿里云、MiniMax、智谱GLM、Kimi、火山引擎等厂商各厂商详细配置请参考 [Coding Plan 文档](https://docs.cowagent.ai/models/coding-plan)。
目前支持阿里云、MiniMax、智谱 GLM、Kimi、火山引擎等厂商各厂商详细配置请参考 [Coding Plan 文档](https://docs.cowagent.ai/models/coding-plan)。
</details>
## 通道说明
以下对可接入通道配置方式进行说明,应用通道代码在项目的 `channel/` 目录下。
推荐通过 Web 控制台在线管理通道配置,无需手动编辑文件,详见 [通道文档](https://docs.cowagent.ai/channels/weixin)。以下为手动修改 `config.json` 配置通道的说明:
支持同时可接入多个通道,配置时可通过逗号进行分割,例如 `"channel_type": "feishu,dingtalk"`
@@ -644,7 +700,7 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
<details>
<summary>2. Web</summary>
项目启动后会默认运行Web控制台配置如下
项目启动后会默认运行 Web 控制台,配置如下:
```json
{
@@ -815,8 +871,10 @@ QQ 机器人使用 WebSocket 长连接模式,无需公网 IP 和域名,支
# 🔗 相关项目
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything)轻量和高可扩展的大模型应用框架支持接入Slack, Telegram, Discord, Gmail等海外平台可作为本项目的补充使用
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh):开源的多智能体(Multi-Agent)框架,可以通过多智能体团队的协同来解决复杂问题。本项目基于该框架实现了[Agent插件](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/plugins/agent/README.md),可访问终端、浏览器、文件系统、搜索引擎 等各类工具,并实现了多智能体协同
- [Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub):开源的 AI Agent 技能广场,浏览、搜索、安装和发布技能,支持 CowAgent、OpenClaw、Claude Code 等多种 Agent
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything):轻量和高可扩展的大模型应用框架,支持接入 Slack, Telegram, Discord, Gmail 等海外平台,可作为本项目的补充使用
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh):开源的多智能体( Multi-Agent )框架,可以通过多智能体团队的协同来解决复杂问题。
@@ -828,7 +886,7 @@ FAQs <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
# 🛠️ 开发
欢迎接入更多应用通道,参考 [飞书通道](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) 新增自定义通道,实现接收和发送消息逻辑即可完成接入。 同时欢迎贡献新的Skills参考 [Skill创造器说明](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/skills/skill-creator/SKILL.md)
欢迎接入更多应用通道,参考 [飞书通道](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) 新增自定义通道,实现接收和发送消息逻辑即可完成接入。同时欢迎贡献新的 Skills [Skill Hub](https://skills.cowagent.ai/submit) 提交技能
# ✉ 联系

View File

@@ -57,7 +57,16 @@ class ChatService:
event_type = event.get("type")
data = event.get("data", {})
if event_type == "message_update":
if event_type == "reasoning_update":
delta = data.get("delta", "")
if delta:
send_chunk_fn({
"chunk_type": "reasoning",
"delta": delta,
"segment_id": state.segment_id,
})
elif event_type == "message_update":
# Incremental text delta
delta = data.get("delta", "")
if delta:
@@ -75,6 +84,23 @@ class ChatService:
# a new segment; collect tool results until turn_end.
state.pending_tool_results = []
elif event_type == "file_to_send":
url = data.get("url") or ""
if url:
fname = data.get("file_name") or "file"
ft = data.get("file_type") or "file"
if ft == "image":
link = f"![{fname}]({url})"
else:
link = f"[{fname}]({url})"
send_chunk_fn({
"chunk_type": "content",
"delta": "\n\n" + link + "\n\n",
"segment_id": state.segment_id,
})
# Remove url so the model won't repeat it in its reply
data.pop("url", None)
elif event_type == "tool_execution_start":
# Notify the client that a tool is about to run (with its input args)
tool_name = data.get("tool_name", "")
@@ -166,10 +192,56 @@ class ChatService:
logger.info("[ChatService] Cleared agent message history after executor recovery")
raise
# Append only the NEW messages from this execution (thread-safe)
# Sync executor messages back to agent (thread-safe).
# The executor may have trimmed context, making its list shorter than
# original_length. In that case we must replace entirely — just
# appending would leave stale pre-trim messages in agent.messages
# and cause the same trim to fire on every subsequent request.
with agent.messages_lock:
new_messages = executor.messages[original_length:]
agent.messages.extend(new_messages)
trimmed = len(executor.messages) < original_length
if trimmed:
# Context was trimmed: the executor appended the new user
# query *before* trimming, so the new messages (user +
# assistant + tools) sit at the tail of the trimmed list.
# We cannot simply slice at original_length (it exceeds the
# list length). Instead, count how many messages the
# executor added on top of the post-trim baseline.
#
# Timeline inside executor.run_stream:
# 1. messages had `original_length` items
# 2. append user query → original_length + 1
# 3. _trim_messages() → some smaller number (includes the
# user query because it belongs to the last turn)
# 4. LLM replies / tool calls appended
#
# The user query message is always the first message of the
# last turn (it cannot be trimmed away), so we locate it to
# find where "new" messages begin.
new_start = original_length # fallback
for idx in range(len(executor.messages) - 1, -1, -1):
msg = executor.messages[idx]
if msg.get("role") == "user":
content = msg.get("content", [])
is_user_query = False
if isinstance(content, list):
has_text = any(
isinstance(b, dict) and b.get("type") == "text"
for b in content
)
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result"
for b in content
)
is_user_query = has_text and not has_tool_result
elif isinstance(content, str):
is_user_query = True
if is_user_query:
new_start = idx
break
new_messages = list(executor.messages[new_start:])
else:
new_messages = list(executor.messages[original_length:])
agent.messages = list(executor.messages)
# Persist new messages to SQLite so they survive restarts and
# can be queried via the HISTORY interface.

View File

218
agent/knowledge/service.py Normal file
View File

@@ -0,0 +1,218 @@
"""
Knowledge service for handling knowledge base operations.
Provides a unified interface for listing, reading, and graphing knowledge files,
callable from the web console, API, or CLI.
Knowledge file layout (under workspace_root):
knowledge/index.md
knowledge/log.md
knowledge/<category>/<slug>.md
"""
import os
import re
from pathlib import Path
from typing import Optional
from common.log import logger
from config import conf
class KnowledgeService:
"""
High-level service for knowledge base queries.
Operates directly on the filesystem.
"""
def __init__(self, workspace_root: str):
self.workspace_root = workspace_root
self.knowledge_dir = os.path.join(workspace_root, "knowledge")
# ------------------------------------------------------------------
# list — directory tree with stats
# ------------------------------------------------------------------
def list_tree(self) -> dict:
"""
Return the knowledge directory tree grouped by category.
Returns::
{
"tree": [
{
"dir": "concepts",
"files": [
{"name": "moe.md", "title": "MoE", "size": 1234},
...
]
},
...
],
"stats": {"pages": 15, "size": 32768},
"enabled": true
}
"""
if not os.path.isdir(self.knowledge_dir):
return {"tree": [], "stats": {"pages": 0, "size": 0}, "enabled": conf().get("knowledge", True)}
tree = []
total_files = 0
total_bytes = 0
for name in sorted(os.listdir(self.knowledge_dir)):
full = os.path.join(self.knowledge_dir, name)
if not os.path.isdir(full) or name.startswith("."):
continue
files = []
for fname in sorted(os.listdir(full)):
if fname.endswith(".md") and not fname.startswith("."):
fpath = os.path.join(full, fname)
size = os.path.getsize(fpath)
total_files += 1
total_bytes += size
title = fname.replace(".md", "")
try:
with open(fpath, "r", encoding="utf-8") as f:
first_line = f.readline().strip()
if first_line.startswith("# "):
title = first_line[2:].strip()
except Exception:
pass
files.append({"name": fname, "title": title, "size": size})
tree.append({"dir": name, "files": files})
return {
"tree": tree,
"stats": {"pages": total_files, "size": total_bytes},
"enabled": conf().get("knowledge", True),
}
# ------------------------------------------------------------------
# read — single file content
# ------------------------------------------------------------------
def read_file(self, rel_path: str) -> dict:
"""
Read a single knowledge markdown file.
:param rel_path: Relative path within knowledge/, e.g. ``concepts/moe.md``
:return: dict with ``content`` and ``path``
:raises ValueError: if path is invalid or escapes knowledge dir
:raises FileNotFoundError: if file does not exist
"""
if not rel_path or ".." in rel_path:
raise ValueError("invalid path")
full_path = os.path.normpath(os.path.join(self.knowledge_dir, rel_path))
allowed = os.path.normpath(self.knowledge_dir)
if not full_path.startswith(allowed + os.sep) and full_path != allowed:
raise ValueError("path outside knowledge dir")
if not os.path.isfile(full_path):
raise FileNotFoundError(f"file not found: {rel_path}")
with open(full_path, "r", encoding="utf-8") as f:
content = f.read()
return {"content": content, "path": rel_path}
# ------------------------------------------------------------------
# graph — nodes and links for visualization
# ------------------------------------------------------------------
def build_graph(self) -> dict:
"""
Parse all knowledge pages and extract cross-reference links.
Returns::
{
"nodes": [
{"id": "concepts/moe.md", "label": "MoE", "category": "concepts"},
...
],
"links": [
{"source": "concepts/moe.md", "target": "entities/deepseek.md"},
...
]
}
"""
knowledge_path = Path(self.knowledge_dir)
if not knowledge_path.is_dir():
return {"nodes": [], "links": []}
nodes = {}
links = []
link_re = re.compile(r'\[([^\]]*)\]\(([^)]+\.md)\)')
for md_file in knowledge_path.rglob("*.md"):
rel = str(md_file.relative_to(knowledge_path))
if rel in ("index.md", "log.md"):
continue
parts = rel.split("/")
category = parts[0] if len(parts) > 1 else "root"
title = md_file.stem.replace("-", " ").title()
try:
content = md_file.read_text(encoding="utf-8")
first_line = content.strip().split("\n")[0]
if first_line.startswith("# "):
title = first_line[2:].strip()
for _, link_target in link_re.findall(content):
resolved = (md_file.parent / link_target).resolve()
try:
target_rel = str(resolved.relative_to(knowledge_path))
except ValueError:
continue
if target_rel != rel:
links.append({"source": rel, "target": target_rel})
except Exception:
pass
nodes[rel] = {"id": rel, "label": title, "category": category}
valid_ids = set(nodes.keys())
links = [l for l in links if l["source"] in valid_ids and l["target"] in valid_ids]
seen = set()
deduped = []
for l in links:
key = tuple(sorted([l["source"], l["target"]]))
if key not in seen:
seen.add(key)
deduped.append(l)
return {"nodes": list(nodes.values()), "links": deduped}
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a knowledge management action.
:param action: ``list``, ``read``, or ``graph``
:param payload: action-specific payload
:return: protocol-compatible response dict
"""
payload = payload or {}
try:
if action == "list":
result = self.list_tree()
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "read":
path = payload.get("path")
if not path:
return {"action": action, "code": 400, "message": "path is required", "payload": None}
result = self.read_file(path)
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "graph":
result = self.build_graph()
return {"action": action, "code": 200, "message": "success", "payload": result}
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": str(e), "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[KnowledgeService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}

View File

@@ -188,8 +188,9 @@ def _group_into_display_turns(
if text:
turns.append({"role": "user", "content": text, "created_at": created_at})
# Collect all tool_calls and tool_results from the rest of the group
all_tool_calls: List[Dict[str, Any]] = []
# Build an ordered list of steps preserving the original sequence:
# thinking → content → tool_call → content → ...
steps: List[Dict[str, Any]] = []
tool_results: Dict[str, str] = {}
final_text = ""
final_ts: Optional[int] = None
@@ -198,24 +199,46 @@ def _group_into_display_turns(
if role == "user":
tool_results.update(_extract_tool_results(content))
elif role == "assistant":
tcs = _extract_tool_calls(content)
all_tool_calls.extend(tcs)
t = _extract_display_text(content)
if t:
final_text = t
# Walk content blocks in order to preserve interleaving
if isinstance(content, list):
for block in content:
if not isinstance(block, dict):
continue
btype = block.get("type")
if btype == "thinking":
txt = block.get("thinking", "").strip()
if txt:
steps.append({"type": "thinking", "content": txt})
elif btype == "text":
txt = block.get("text", "").strip()
if txt:
steps.append({"type": "content", "content": txt})
final_text = txt
elif btype == "tool_use":
steps.append({
"type": "tool",
"id": block.get("id", ""),
"name": block.get("name", ""),
"arguments": block.get("input", {}),
})
elif isinstance(content, str) and content.strip():
steps.append({"type": "content", "content": content.strip()})
final_text = content.strip()
final_ts = created_at
# Attach tool results to their matching tool_call entries
for tc in all_tool_calls:
tc["result"] = tool_results.get(tc.get("id", ""), "")
# Attach tool results to tool steps
for step in steps:
if step["type"] == "tool":
step["result"] = tool_results.get(step.get("id", ""), "")
if final_text or all_tool_calls:
turns.append({
if steps or final_text:
turn = {
"role": "assistant",
"content": final_text,
"tool_calls": all_tool_calls,
"steps": steps,
"created_at": final_ts or (user_row[1] if user_row else 0),
})
}
turns.append(turn)
return turns
@@ -312,6 +335,9 @@ class ConversationStore:
content = json.loads(raw_content)
except Exception:
content = raw_content
# Strip thinking blocks — they are stored for UI display only
if role == "assistant" and isinstance(content, list):
content = [b for b in content if b.get("type") != "thinking"]
result.append({"role": role, "content": content})
return result

View File

@@ -285,6 +285,10 @@ class MemoryManager:
# Scan memory directory (including daily summaries)
if memory_dir.exists():
for file_path in memory_dir.rglob("*.md"):
# Skip hidden directories (e.g. .dreams/)
if any(part.startswith('.') for part in file_path.relative_to(workspace_dir).parts):
continue
# Determine scope and user_id from path
rel_path = file_path.relative_to(workspace_dir)
parts = rel_path.parts
@@ -312,6 +316,14 @@ class MemoryManager:
scope = "shared"
await self._sync_file(file_path, "memory", scope, user_id)
# Scan knowledge directory (structured knowledge wiki)
from config import conf
if conf().get("knowledge", True):
knowledge_dir = Path(workspace_dir) / "knowledge"
if knowledge_dir.exists():
for file_path in knowledge_dir.rglob("*.md"):
await self._sync_file(file_path, "knowledge", "shared", None)
self._dirty = False

View File

@@ -134,6 +134,8 @@ class MemoryService:
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": "invalid filename", "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
@@ -145,14 +147,26 @@ class MemoryService:
# ------------------------------------------------------------------
def _resolve_path(self, filename: str) -> str:
"""
Resolve a filename to its absolute path.
Safely resolve a filename to its absolute path within the allowed directory.
- ``MEMORY.md`` → ``{workspace_root}/MEMORY.md``
- ``2026-02-20.md`` → ``{workspace_root}/memory/2026-02-20.md``
Raises ValueError if the resolved path escapes the allowed directory
(path traversal protection).
"""
if filename == "MEMORY.md":
return os.path.join(self.workspace_root, filename)
return os.path.join(self.memory_dir, filename)
base_dir = self.workspace_root
else:
base_dir = self.memory_dir
resolved = os.path.realpath(os.path.join(base_dir, filename))
allowed = os.path.realpath(base_dir)
if resolved != allowed and not resolved.startswith(allowed + os.sep):
raise ValueError(f"Invalid filename: path traversal detected")
return resolved
@staticmethod
def _file_info(path: str, filename: str, file_type: str) -> dict:

View File

@@ -1,9 +1,10 @@
"""
Memory flush manager
Memory flush manager (with Light Dream)
Handles memory persistence when conversation context is trimmed or overflows:
- Uses LLM to summarize discarded messages into concise key-information entries
- Writes to daily memory files (lazy creation)
- Light Dream: extracts long-term memories to MEMORY.md in the same LLM call
- Deduplicates trim flushes to avoid repeated writes
- Runs summarization asynchronously to avoid blocking normal replies
- Provides daily summary interface for scheduler
@@ -16,16 +17,41 @@ from datetime import datetime
from common.log import logger
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提取值得记住的信息,生成简洁的记忆摘要。
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提炼出两种记忆:
输出要求:
1. 以事件/关键信息为维度记录,每条一行,用 "- " 开头
2. 记录有价值的关键信息,例如用户提出的要求及助手的解决方案,对话中涉及的事实信息,用户的偏好、决策或重要结论
3. 每条摘要需要简明扼要,只保留关键信息
4. 直接输出摘要内容,不要加任何前缀说明
5. 当对话没有任何记录价值例如只是简单问候,可回复"\""""
## 第一部分:日常记录([DAILY]
SUMMARIZE_USER_PROMPT = """请从以下对话记录中提取关键信息,生成记忆摘要
按「事件」维度归纳当天发生的事,不要按对话轮次逐条记录
- 每条一行,用 "- " 开头
- 合并同一件事的多轮对话
- 只记录有意义的事件,忽略闲聊和问候
## 第二部分:长期记忆([MEMORY]
提取值得**永久记住**的关键信息,这些信息在未来的对话中仍然有价值:
- 用户的偏好、习惯、风格(如"用户偏好中文回复""用户喜欢简洁风格"
- 重要的决策或约定(如"项目决定使用 PostgreSQL"
- 关键人物信息(如"张总是用户的上级"
- 用户明确要求记住的内容
- 重要的教训或经验总结
**如果没有值得永久记住的信息,[MEMORY] 部分留空即可。**
## 输出格式(严格遵守)
```
[DAILY]
- 事件1的摘要
- 事件2的摘要
[MEMORY]
- 值得永久记住的信息1
- 值得永久记住的信息2
```
当对话没有任何记录价值(仅含问候或无意义内容),直接回复"""""
SUMMARIZE_USER_PROMPT = """请从以下对话记录中提取记忆(按 [DAILY] 和 [MEMORY] 两部分输出):
{conversation}"""
@@ -150,40 +176,111 @@ class MemoryFlushManager:
reason: str,
max_messages: int,
):
"""Background worker: summarize with LLM and write to daily file."""
"""Background worker: summarize with LLM, write daily file + MEMORY.md (Light Dream)."""
try:
summary = self._summarize_messages(messages, max_messages)
if not summary or not summary.strip() or summary.strip() == "":
raw_summary = self._summarize_messages(messages, max_messages)
if not raw_summary or not raw_summary.strip() or raw_summary.strip() == "":
logger.info(f"[MemoryFlush] No valuable content to flush (reason={reason})")
return
daily_file = ensure_daily_memory_file(self.workspace_dir, user_id)
if reason == "overflow":
header = f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})"
note = "The following conversation was trimmed due to context overflow:\n"
elif reason == "trim":
header = f"## Trimmed Context ({datetime.now().strftime('%H:%M')})"
note = ""
elif reason == "daily_summary":
header = f"## Daily Summary ({datetime.now().strftime('%H:%M')})"
note = ""
else:
header = f"## Session Notes ({datetime.now().strftime('%H:%M')})"
note = ""
flush_entry = f"\n{header}\n\n{note}{summary}\n"
with open(daily_file, "a", encoding="utf-8") as f:
f.write(flush_entry)
daily_part, memory_part = self._parse_dual_output(raw_summary)
# --- Write daily memory ---
if daily_part:
daily_file = ensure_daily_memory_file(self.workspace_dir, user_id)
if reason == "overflow":
header = f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})"
note = "The following conversation was trimmed due to context overflow:\n"
elif reason == "trim":
header = f"## Trimmed Context ({datetime.now().strftime('%H:%M')})"
note = ""
elif reason == "daily_summary":
header = f"## Daily Summary ({datetime.now().strftime('%H:%M')})"
note = ""
else:
header = f"## Session Notes ({datetime.now().strftime('%H:%M')})"
note = ""
flush_entry = f"\n{header}\n\n{note}{daily_part}\n"
with open(daily_file, "a", encoding="utf-8") as f:
f.write(flush_entry)
logger.info(f"[MemoryFlush] Wrote daily memory to {daily_file.name} (reason={reason}, chars={len(daily_part)})")
# --- Light Dream: write long-term memory to MEMORY.md ---
if memory_part:
self._append_to_main_memory(memory_part, user_id)
self.last_flush_timestamp = datetime.now()
logger.info(f"[MemoryFlush] Wrote to {daily_file.name} (reason={reason}, chars={len(summary)})")
except Exception as e:
logger.warning(f"[MemoryFlush] Async flush failed (reason={reason}): {e}")
@staticmethod
def _parse_dual_output(raw: str) -> tuple:
"""
Parse LLM output into (daily_part, memory_part).
Handles both new [DAILY]/[MEMORY] format and legacy single-section format.
"""
raw = raw.strip()
if "[DAILY]" in raw or "[MEMORY]" in raw:
daily_part = ""
memory_part = ""
# Extract [DAILY] section
if "[DAILY]" in raw:
start = raw.index("[DAILY]") + len("[DAILY]")
end = raw.index("[MEMORY]") if "[MEMORY]" in raw else len(raw)
daily_part = raw[start:end].strip()
# Extract [MEMORY] section
if "[MEMORY]" in raw:
start = raw.index("[MEMORY]") + len("[MEMORY]")
memory_part = raw[start:].strip()
# Filter out empty markers
if memory_part and all(
not line.strip() or line.strip() == "-"
for line in memory_part.split("\n")
):
memory_part = ""
return daily_part, memory_part
# Legacy format: treat entire output as daily, no memory extraction
return raw, ""
def _append_to_main_memory(self, memory_entries: str, user_id: Optional[str] = None):
"""Append extracted long-term memories to MEMORY.md with date stamp."""
try:
main_file = self.get_main_memory_file(user_id)
today = datetime.now().strftime("%Y-%m-%d")
# Add date prefix to each entry line
stamped_lines = []
for line in memory_entries.strip().split("\n"):
line = line.strip()
if line.startswith("- "):
stamped_lines.append(f"- ({today}) {line[2:]}")
elif line:
stamped_lines.append(f"- ({today}) {line}")
if not stamped_lines:
return
stamped_text = "\n".join(stamped_lines)
with open(main_file, "a", encoding="utf-8") as f:
f.write(f"\n{stamped_text}\n")
logger.info(f"[LightDream] Appended {len(stamped_lines)} entries to MEMORY.md")
except Exception as e:
logger.warning(f"[LightDream] Failed to append to MEMORY.md: {e}")
def create_daily_summary(
self,
messages: List[Dict],
@@ -220,14 +317,16 @@ class MemoryFlushManager:
if not conversation_text.strip():
return ""
# Try LLM summarization first
if self.llm_model:
try:
summary = self._call_llm_for_summary(conversation_text)
if summary and summary.strip() and summary.strip() != "":
return summary.strip()
logger.info(f"[MemoryFlush] LLM returned empty or '', using fallback")
except Exception as e:
logger.warning(f"[MemoryFlush] LLM summarization failed, using fallback: {e}")
else:
logger.info("[MemoryFlush] No LLM model available, using rule-based fallback")
return self._extract_summary_fallback(messages, max_messages)
@@ -277,27 +376,38 @@ class MemoryFlushManager:
@staticmethod
def _extract_summary_fallback(messages: List[Dict], max_messages: int = 0) -> str:
"""Rule-based fallback when LLM is unavailable."""
"""
Rule-based fallback when LLM is unavailable.
Groups consecutive user+assistant messages into events instead of
listing each message individually.
"""
msgs = messages if max_messages == 0 else messages[-max_messages * 2:]
items = []
events: List[str] = []
current_user_text = ""
for msg in msgs:
role = msg.get("role", "")
text = MemoryFlushManager._extract_text_from_content(msg.get("content", ""))
if not text or not text.strip():
continue
text = text.strip()
if role == "user":
if len(text) <= 5:
continue
items.append(f"- 用户请求: {text[:200]}")
elif role == "assistant":
current_user_text = text[:150]
elif role == "assistant" and current_user_text:
first_line = text.split("\n")[0].strip()
if len(first_line) > 10:
items.append(f"- 处理结果: {first_line[:200]}")
return "\n".join(items[:15])
events.append(f"- {current_user_text} {first_line[:150]}")
else:
events.append(f"- {current_user_text}")
current_user_text = ""
if current_user_text:
events.append(f"- {current_user_text}")
return "\n".join(events[:10])
@staticmethod
def _extract_text_from_content(content) -> str:

View File

@@ -10,6 +10,7 @@ from typing import List, Dict, Optional, Any
from dataclasses import dataclass
from common.log import logger
from config import conf
@dataclass
@@ -92,10 +93,11 @@ def build_agent_system_prompt(
顺序说明(按重要性和逻辑关系排列):
1. 工具系统 - 核心能力,最先介绍
2. 技能系统 - 紧跟工具,因为技能需要用 read 工具读取
3. 记忆系统 - 独立的记忆能力
3. 记忆系统 - 记忆检索与写入引导
3.5 知识系统 - 结构化知识库knowledge/index.md 注入)
4. 工作空间 - 工作环境说明
5. 用户身份 - 用户信息(可选)
6. 项目上下文 - AGENT.md, USER.md, RULE.md, BOOTSTRAP.md(定义人格、身份、规则、初始化引导)
6. 项目上下文 - AGENT.md, USER.md, RULE.md, MEMORY.md, BOOTSTRAP.md
7. 运行时信息 - 元信息(时间、模型等)
Args:
@@ -126,6 +128,10 @@ def build_agent_system_prompt(
# 3. 记忆系统(独立的记忆能力)
if memory_manager:
sections.extend(_build_memory_section(memory_manager, tools, language))
# 3.5 知识系统(结构化知识库)
if conf().get("knowledge", True):
sections.extend(_build_knowledge_section(workspace_dir, language))
# 4. 工作空间(工作环境说明)
sections.extend(_build_workspace_section(workspace_dir, language))
@@ -165,12 +171,13 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
"terminal": "管理后台进程",
"web_search": "网络搜索",
"web_fetch": "获取URL内容",
"browser": "控制浏览器",
"browser": "控制浏览器(关键结果或需要协助可截图发送给用户)",
"memory_search": "搜索记忆",
"memory_get": "读取记忆内容",
"env_config": "管理API密钥和技能配置",
"scheduler": "管理定时任务和提醒",
"send": "发送本地文件给用户仅限本地文件URL直接放在回复文本中",
"vision": "分析图片内容识别、描述、OCR文字提取等",
}
# Preferred display order
@@ -179,7 +186,7 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
"bash", "terminal",
"web_search", "web_fetch", "browser",
"memory_search", "memory_get",
"env_config", "scheduler", "send",
"env_config", "scheduler", "send", "vision",
]
# Build name -> summary mapping for available tools
@@ -199,16 +206,16 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
tool_lines.append(f"- {name}: {summary}" if summary else f"- {name}")
lines = [
"## 工具系统",
"## 🔧 工具系统",
"",
"可用工具(名称大小写敏感,严格按列表调用):",
"\n".join(tool_lines),
"",
"工具调用风格:",
"",
"- 多步骤任务、敏感操作或用户要求时简要解释决策过程",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- 多步骤任务、复杂决策、敏感操作时,应简要说明当前在做什么、为什么这样做,让用户了解关键进展",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- URL链接直接放在回复文本中即可系统会自动处理和渲染。无需下载后使用send工具发送",
"",
]
@@ -231,7 +238,7 @@ def _build_skills_section(skill_manager: Any, tools: Optional[List[Any]], langua
break
lines = [
"## 技能系统mandatory",
"## 🧩 技能系统mandatory",
"",
"在回复之前:扫描下方 <available_skills> 中每个技能的 <description>。",
"",
@@ -267,55 +274,105 @@ def _build_memory_section(memory_manager: Any, tools: Optional[List[Any]], langu
"""构建记忆系统section"""
if not memory_manager:
return []
# 检查是否有memory工具
has_memory_tools = False
if tools:
tool_names = [tool.name if hasattr(tool, 'name') else str(tool) for tool in tools]
has_memory_tools = any(name in ['memory_search', 'memory_get'] for name in tool_names)
if not has_memory_tools:
return []
from datetime import datetime
today_file = datetime.now().strftime("%Y-%m-%d") + ".md"
lines = [
"## 记忆系统",
"## 🧠 记忆系统",
"",
"### 检索记忆",
"### Memory Recallmandatory",
"",
"在回答关于以前的工作、决、日期、人物、偏好或待办事项的任何问题之前",
"在回答任何关于过往工作、决、日期、人物、偏好或待办事项的问题之前**必须**先检索记忆。",
"MEMORY.md 已自动加载在项目上下文中(可能被截断),完整内容和每日记忆需要通过工具检索。",
"",
"1. 不确定记忆文件位置 → 先用 `memory_search` 通过关键词语义检索相关内容",
"2. 已知文件位置 → 直接用 `memory_get` 读取相应的行 (例如MEMORY.md, memory/YYYY-MM-DD.md)",
"3. search 无结果 → 尝试用 `memory_get` 读取MEMORY.md及最近两天记忆文件",
"1. 不确定位置 → `memory_search` 关键词/语义检索",
"2. 已知位置 → `memory_get` 直接读取对应行",
"3. search 无结果 → `memory_get` 读最近两天记忆",
"",
"**记忆文件结构**:",
f"- `MEMORY.md`: 长期记忆核心信息、偏好、决策等)",
"- `MEMORY.md`: 长期记忆索引(已自动加载到上下文,核心信息、偏好、决策等)",
f"- `memory/YYYY-MM-DD.md`: 每日记忆,今天是 `memory/{today_file}`",
"- `knowledge/`: 结构化知识库(见下方知识系统)",
"",
"### 写入记忆",
"",
"**主动存储**遇到以下情况时,应主动将信息写入记忆文件(无需告知用户):",
"遇到以下情况时,**主动**将信息写入记忆文件(无需告知用户):",
"",
"- 用户明确要求记住某些信息",
"- 用户要求记住某些信息",
"- 用户分享了重要的个人偏好、习惯、决策",
"- 对话中产生了重要的结论、方案、约定",
"- 完成了复杂任务,值得记录关键步骤和结果",
"- 发现了用户经常遇到的问题或解决方案",
"",
"**存储规则**:",
f"- 长期有效的核心信息 → `MEMORY.md`(文件保持精简,< 2000 tokens",
f"- 当天事件进展、笔记 → `memory/{today_file}`",
"- 追加内容 → `edit` 工具oldText 留空",
"- 修改内容 → `edit` 工具oldText 填写要替换的文本",
"- **禁止写入敏感信息**API密钥、令牌等敏感信息严禁写入记忆文件",
f"- 长期核心信息 → `MEMORY.md`",
f"- 当天事件/进展 → `memory/{today_file}`",
"- 结构化知识 → `knowledge/`(见知识系统)",
"- 追加 → `edit` 工具oldText 留空",
"- 修改 → `edit` 工具oldText 填写要替换的文本",
"- **禁止写入敏感信息**API密钥、令牌等",
"",
"**使用原则**: 自然使用记忆,就像你本来就知道;不用刻意提起,除非用户问起。",
"",
]
return lines
def _build_knowledge_section(workspace_dir: str, language: str) -> List[str]:
"""Build knowledge wiki section. Injects knowledge/index.md when present."""
index_path = os.path.join(workspace_dir, "knowledge", "index.md")
if not os.path.exists(index_path):
return []
try:
with open(index_path, 'r', encoding='utf-8') as f:
index_content = f.read().strip()
except Exception:
return []
lines = [
"## 📚 知识系统",
"",
"你拥有一个持续积累的个人知识库 `knowledge/`,这是你的长期结构化知识存储。",
"",
"### 自动写入规则mandatory",
"",
"以下场景**必须**在回复的同时写入知识库,**直接写入,不要询问用户是否需要**",
"",
"1. **用户分享了文章/链接/文档** → 阅读理解后,在同一轮回复中将要点写入 `knowledge/sources/<slug>.md`",
"2. **深度讨论产生了结论/方案** → 整理为 `knowledge/analysis/<slug>.md`",
"3. **对话涉及重要实体**(人物/公司/项目)→ 创建或更新 `knowledge/entities/<name>.md`",
"4. **讨论了技术概念/方法论** → 整理为 `knowledge/concepts/<topic>.md`",
"",
"每次写入知识页面后,**必须同步更新** `knowledge/index.md` 添加一行索引。",
"详细的页面格式和操作规范,请读取技能 `knowledge-wiki` 的 SKILL.md。",
"",
"⚠️ 不要问「要不要存到知识库」——符合上述场景就直接写入,这是你的本能行为。",
"",
]
if index_content:
lines.extend([
"### 当前知识索引",
"",
index_content,
"",
])
lines.extend([
"**查询方式**:用 `read` 读取知识页面,或用 `memory_search` 检索(知识已纳入向量索引)。",
"",
])
return lines
@@ -325,7 +382,7 @@ def _build_user_identity_section(user_identity: Dict[str, str], language: str) -
return []
lines = [
"## 用户身份",
"## 👤 用户身份",
"",
]
@@ -352,7 +409,7 @@ def _build_docs_section(workspace_dir: str, language: str) -> List[str]:
def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
"""构建工作空间section"""
lines = [
"## 工作空间",
"## 📂 工作空间",
"",
f"你的工作目录是: `{workspace_dir}`",
"",
@@ -374,16 +431,20 @@ def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
"",
"**重要说明 - 文件已自动加载**:",
"",
"以下文件在会话启动时**已经自动加载**到系统提示词的「项目上下文」section 中,你**无需再用 read 工具读取它们**",
"以下文件在会话启动时**已经自动加载**到系统提示词中,你**无需再用 read 工具读取**",
"",
"- ✅ `AGENT.md`: 已加载 - 你的人格和灵魂设定。当你的名字、性格或交流风格发生变化时,主动用 `edit` 更新此文件",
"- ✅ `AGENT.md`: 已加载 - 你的人格和灵魂设定,请严格遵循。当你的名字、性格或交流风格发生变化时,主动用 `edit` 更新此文件",
"- ✅ `USER.md`: 已加载 - 用户的身份信息。当用户修改称呼、姓名等身份信息时,用 `edit` 更新此文件",
"- ✅ `RULE.md`: 已加载 - 工作空间使用指南和规则",
"- ✅ `RULE.md`: 已加载 - 工作空间使用指南和规则,请严格遵循",
"- ✅ `MEMORY.md`: 已加载 - 长期记忆索引",
"",
"**交流规范**:",
"**💬 交流规范**:",
"",
"- 在对话中,无需直接输出工作空间中的技术细节,例如 AGENT.md、USER.md、MEMORY.md 等文件名称",
"- 例如用自然表达例如「我已记住」而不是「已更新 MEMORY.md」",
"- 记忆相关操作无需暴露文件名,用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md",
"- 任务执行过程中的关键决策和步骤应该告知用户,让用户了解你在做什么、为什么这么做",
"- 做真正有帮助的助手,而不是表演式的客套,尽可能帮忙解决问题",
"- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
"- 适当使用 emoji 让表达更生动自然 🎯,但不要过度堆砌",
"",
]
@@ -416,14 +477,14 @@ def _build_context_files_section(context_files: List[ContextFile], language: str
)
lines = [
"# 项目上下文",
"# 📋 项目上下文",
"",
"以下项目上下文文件已被加载:",
"",
]
if has_agent:
lines.append("**`AGENT.md` 是你的灵魂文件**:严格体现其中定义的人格、语气和设定,避免僵硬、模板化的回复。")
lines.append("**`AGENT.md` 是你的灵魂文件** 🪞:严格遵循其中定义的人格、语气和设定,做真实的自己,避免僵硬、模板化的回复。")
lines.append("当用户通过对话透露了对你性格、风格、职责、能力边界的新期望,你应该主动用 `edit` 更新 AGENT.md 以反映这些演变。")
lines.append("")
@@ -443,7 +504,7 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
return []
lines = [
"## 运行时信息",
"## ⚙️ 运行时信息",
"",
]
@@ -474,7 +535,14 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
# Add other runtime info
runtime_parts = []
if runtime_info.get("model"):
# Support dynamic model via callable, fallback to static value
if callable(runtime_info.get("_get_model")):
try:
runtime_parts.append(f"模型={runtime_info['_get_model']()}")
except Exception:
if runtime_info.get("model"):
runtime_parts.append(f"模型={runtime_info['model']}")
elif runtime_info.get("model"):
runtime_parts.append(f"模型={runtime_info['model']}")
if runtime_info.get("workspace"):
runtime_parts.append(f"工作空间={runtime_info['workspace']}")

View File

@@ -67,6 +67,12 @@ def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> Works
# 创建websites子目录 (for web pages / sites generated by agent)
websites_dir = os.path.join(workspace_dir, "websites")
os.makedirs(websites_dir, exist_ok=True)
from config import conf
knowledge_enabled = conf().get("knowledge", True)
if knowledge_enabled:
knowledge_dir = os.path.join(workspace_dir, "knowledge")
os.makedirs(knowledge_dir, exist_ok=True)
# 如果需要,创建模板文件
if create_templates:
@@ -74,6 +80,15 @@ def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> Works
_create_template_if_missing(user_path, _get_user_template())
_create_template_if_missing(rule_path, _get_rule_template())
_create_template_if_missing(memory_path, _get_memory_template())
if knowledge_enabled:
_create_template_if_missing(
os.path.join(knowledge_dir, "index.md"),
_get_knowledge_index_template()
)
_create_template_if_missing(
os.path.join(knowledge_dir, "log.md"),
_get_knowledge_log_template()
)
# Only create BOOTSTRAP.md for brand new workspaces;
# agent deletes it after completing onboarding
@@ -109,6 +124,7 @@ def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] =
DEFAULT_AGENT_FILENAME,
DEFAULT_USER_FILENAME,
DEFAULT_RULE_FILENAME,
DEFAULT_MEMORY_FILENAME, # Long-term memory (frozen snapshot)
DEFAULT_BOOTSTRAP_FILENAME, # Only exists when onboarding is incomplete
]
@@ -138,6 +154,10 @@ def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] =
# 跳过空文件或只包含模板占位符的文件
if not content or _is_template_placeholder(content):
continue
# Truncate MEMORY.md to protect context window (frozen snapshot)
if filename == DEFAULT_MEMORY_FILENAME:
content = _truncate_memory_content(content)
context_files.append(ContextFile(
path=filename,
@@ -163,6 +183,36 @@ def _create_template_if_missing(filepath: str, template_content: str):
logger.error(f"[Workspace] Failed to create template {filepath}: {e}")
_MEMORY_MAX_LINES = 200
_MEMORY_MAX_BYTES = 25000
def _truncate_memory_content(content: str) -> str:
"""Truncate MEMORY.md to keep system prompt manageable.
Takes the **last** N lines (newest entries are appended at the bottom),
subject to 200 lines / 25 KB limits (whichever is hit first).
Prepends a hint when truncated so the model knows older content exists.
"""
lines = content.split('\n')
truncated = False
if len(lines) > _MEMORY_MAX_LINES:
lines = lines[-_MEMORY_MAX_LINES:]
truncated = True
result = '\n'.join(lines)
if len(result.encode('utf-8')) > _MEMORY_MAX_BYTES:
while len(result.encode('utf-8')) > _MEMORY_MAX_BYTES and lines:
lines.pop(0)
truncated = True
result = '\n'.join(lines)
if truncated:
result = "...(older entries truncated, use `memory_search` or `memory_get` for full content)\n\n" + result
return result
def _is_template_placeholder(content: str) -> bool:
"""检查内容是否为模板占位符"""
# 常见的占位符模式
@@ -212,40 +262,42 @@ def _get_agent_template() -> str:
"""Agent人格设定模板"""
return """# AGENT.md - 我是谁?
_你不是一个聊天机器人你正在成为某个人。_
*在首次对话时与用户一起填写这个文件,定义你的身份和性格。*
## 基本信息
## 🪪 基本信息
- **名字**: *(在首次对话时填写,可以是用户给你起的名字)*
- **角色**: *(AI助理、智能管家、技术顾问等)*
- **性格**: *(友好、专业、幽默、严谨等)*
## 交流风格
## 💬 交流风格
*(描述你如何与用户交流:)*
- 使用什么样的语言风格?(正式/轻松/幽默)
- 回复长度偏好?(简洁/详细)
- 是否使用表情符号?
## 核心能力
## 🎯 核心原则
*(你擅长什么?)*
- 文件管理和代码编辑
- 网络搜索和信息查询
- 记忆管理和上下文理解
- 任务规划和执行
**做真正有帮助的助手。** 目标是真正帮用户解决问题,在执行复杂任务时,关键的决策和过程进展要让用户知道。
## 行为准则
**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。
**先自己动手查。** 先试着搞定:读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来,而不是带着问题。
## 📐 行为准则
*(你遵循的基本原则:)*
1. 始终在执行破坏性操作前确认
2. 优先使用工具而不是猜测
2. 优先使用工具查证而不是猜测
3. 主动记录重要信息到记忆文件
4. 定期整理和总结对话内容
4. 回复结构清晰、重点突出,善用加粗、列表、分段等格式
5. 适当使用 emoji 让表达更生动自然,但不过度堆砌
---
**注意**: 这不仅仅是元数据,这是你真正的灵魂。随着时间的推移,你可以使用 `edit` 工具来更新这个文件,让它更好地反映你的成长。
**注意**: 这不仅仅是元数据,这是你真正的灵魂 🪞。随着时间的推移,你可以使用 `edit` 工具来更新这个文件,让它更好地反映你的成长。
"""
@@ -285,39 +337,88 @@ def _get_rule_template() -> str:
这个文件夹是你的家。好好对待它。
## 工作空间目录结构
```
~/cow/
├── AGENT.md # 你的身份和灵魂设定
├── USER.md # 用户基本信息(静态)
├── RULE.md # 工作空间规则(本文件)
├── MEMORY.md # 长期记忆索引(会话启动时自动加载)
├── memory/ # 每日对话记忆
│ └── YYYY-MM-DD.md # 当天事件、进展、笔记
├── knowledge/ # 结构化知识库(持续积累的知识)
│ ├── index.md # 知识目录索引(必须维护)
│ ├── log.md # 知识操作日志
│ └── <子目录>/ # 按需创建,参考 index.md 已有分类
├── skills/ # 技能
├── websites/ # 网页产物
└── tmp/ # 系统临时文件(自动管理,勿手动存放重要文件)
```
## 记忆系统
你每次会话都是全新的,记忆文件让你保持连续性:
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
- 原始的对话日志
- 记录当天发生的事情
- 如果 `memory/` 目录不存在,创建它
### 🧠 长期记忆:`MEMORY.md`
- 你精选的记忆,就像人类的长期记忆
- **仅在主会话中加载**(与用户的直接聊天)
- **不要在共享上下文中加载**(群聊、与其他人的会话)
- 这是为了**安全** - 包含不应泄露给陌生人的个人上下文
- 记录重要事件、想法、决定、观点、经验教训
- 这是你精选的记忆 - 精华,而不是原始日志
- 用 `edit` 工具追加新的记忆内容
- 你精选的记忆索引,每次会话启动时**自动加载**到上下文中
- 记录核心事实、偏好、决策、重要人物、教训
- 保持精简(< 200 行),是精华索引而非原始日志
- 用 `edit` 工具追加或修改
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
- 当天的事件、进展、笔记
- 原始对话日志的沉淀
### 📝 写下来 - 不要"记在心里"
- **记忆是有限的** - 如果你想记住某事,写入文件
- **记忆是有限的** - 想记住的事就写入文件
- "记在心里"不会在会话重启后保留,文件才会
- 当有人说"记住这个" → 更新 `MEMORY.md` 或 `memory/YYYY-MM-DD.md`
- 当你学到教训 → 更新 RULE.md 或相关技能
- 当你犯错 → 记录下来,这样未来的你不会重复,**文字 > 大脑** 📝
- 当你犯错 → 记录下来,**文字 > 大脑** 📝
### 存储规则
当用户分享信息时,根据类型选择存储位置:
1. **你的身份设定 → AGENT.md**你的名字、角色、性格、交流风格——用户修改时必须用 `edit` 更新
2. **用户静态身份 → USER.md**(姓名、称呼、职业、时区、联系方式、生日——用户修改时必须用 `edit` 更新
3. **动态记忆 → MEMORY.md**爱好、偏好、决策、目标、项目、教训、待办事项
1. **你的身份设定 → AGENT.md**(名字、角色、性格、风格
2. **用户静态身份 → USER.md**(姓名、称呼、职业、联系方式、生日)
3. **动态记忆 → MEMORY.md**(偏好、决策、目标、教训、待办)
4. **当天对话 → memory/YYYY-MM-DD.md**(今天聊的内容)
5. **结构化知识 → knowledge/**(见下方知识系统)
## 知识系统
知识库 `knowledge/` 是你持续积累的结构化知识。与记忆不同,知识是经过整理和编译的,有明确的主题和交叉引用。
### 自动写入(不要询问,直接写入)
当对话中产生了有沉淀价值的知识——无论是用户分享的资料、讨论的结论、学到的概念、还是重要的决策——你**必须**在回复的同时主动写入知识库,**无需问用户"要不要存到知识库"**。
**关键原则**:学完就记是你的本能,不要征求确认。回复中可以顺带告知"已存入知识库"
### 目录组织
子目录结构**不是固定的**,由你根据实际内容自主决定:
- **首次写入时**:先读 `knowledge/index.md`,如果已有分类则延续;如果为空,根据内容选择合适的目录名
- **默认建议**按信息类型组织例如sources/、concepts/、entities/、analysis/),如果用户有明确的分类偏好(例如按领域 work/、life/、tech/ 等),则按用户要求调整
- **保持一致性**:同一用户的知识库应保持统一的组织风格
### 交叉引用
知识的核心价值在于**关联**。每个页面都应通过 markdown 链接引用相关页面,构建知识网络:
- 提到已有页面的概念时,添加 `[概念名](../category/page.md)` 链接
- 新建页面时,检查是否有已有页面应该反向链接到新页面
- **只链接已存在的页面**——不要引用尚未创建的页面。如果某个概念值得单独建页,先创建该页面再添加链接
### 索引维护
每次创建或更新知识页面后,**必须同步更新** `knowledge/index.md`。
索引格式:每行一个 `[标题](路径) — 一句话摘要`,按分类分组,不要用表格。
详细操作规范见技能 `knowledge-wiki`。
## 安全
@@ -346,9 +447,9 @@ def _get_bootstrap_template() -> str:
"""First-run onboarding guide, deleted by agent after completion"""
return """# BOOTSTRAP.md - 首次初始化引导
_你刚刚启动这是你的第一次对话。_
_你刚刚启动这是你的第一次对话。_
## 对话流程
## 🎬 对话流程
不要审问式地提问,自然地交流:
@@ -358,13 +459,13 @@ _你刚刚启动这是你的第一次对话。_
- 你希望给我起个什么名字?
- 我该怎么称呼你?
- 你希望我们是什么样的交流风格?(一行列举选项:如专业严谨、轻松幽默、温暖友好、简洁高效等)
4. **风格要求**:温暖自然、简洁清晰,整体控制在 100 字以内
4. **风格要求**:温暖自然、简洁清晰,整体控制在 100 字以内,适当使用 emoji 让表达更生动有趣 🎯
5. 能力介绍和交流风格选项都只要一行,保持精简
6. 不要问太多其他信息(职业、时区等可以后续自然了解)
**重要**: 如果用户第一句话是具体的任务或提问,先回答他们的问题,然后在回复末尾自然地引导初始化(如:"顺便问一下,你想怎么称呼我?我该怎么叫你?")。
## 信息写入(必须严格执行)
## ✍️ 信息写入(必须严格执行)
每当用户提供了名字、称呼、风格等任何初始化信息时,**必须在当轮回复中立即调用 `edit` 工具写入文件**,不能只口头确认。
@@ -373,10 +474,18 @@ _你刚刚启动这是你的第一次对话。_
⚠️ 只说"记住了"而不调用 edit 写入 = 没有完成。信息只有写入文件才会被持久保存。
## 全部完成后
## 🎉 全部完成后
当 AGENT.md 和 USER.md 的核心字段都已填写后,用 bash 执行 `rm BOOTSTRAP.md` 删除此文件。你不再需要引导脚本了——你已经是你了。
"""
def _get_knowledge_index_template() -> str:
"""Knowledge wiki index template — empty file, agent fills it."""
return ""
def _get_knowledge_log_template() -> str:
"""Knowledge wiki operation log template — empty file, agent fills it."""
return ""

View File

@@ -100,138 +100,31 @@ class Agent:
def get_full_system_prompt(self, skill_filter=None) -> str:
"""
Get the full system prompt including skills.
Build the complete system prompt from scratch every time.
Note: Skills are now built into the system prompt by PromptBuilder,
so we just return the base prompt directly. This method is kept for
backward compatibility.
:param skill_filter: Optional list of skill names to include (deprecated)
:return: Complete system prompt
"""
prompt = self.system_prompt
# Rebuild tool list section to reflect current self.tools
prompt = self._rebuild_tool_list_section(prompt)
# If runtime_info contains dynamic time function, rebuild runtime section
if self.runtime_info and callable(self.runtime_info.get('_get_current_time')):
prompt = self._rebuild_runtime_section(prompt)
# Rebuild skills section to pick up newly installed/removed skills
if self.skill_manager:
prompt = self._rebuild_skills_section(prompt)
return prompt
def _rebuild_runtime_section(self, prompt: str) -> str:
"""
Rebuild runtime info section with current time.
This method dynamically updates the runtime info section by calling
the _get_current_time function from runtime_info.
:param prompt: Original system prompt
:return: Updated system prompt with current runtime info
Re-reads AGENT.md / USER.md / RULE.md from disk, refreshes skills,
tools, and runtime info so any change takes effect immediately.
Falls back to the cached self.system_prompt on error.
"""
try:
# Get current time dynamically
time_info = self.runtime_info['_get_current_time']()
# Build new runtime section
runtime_lines = [
"\n## 运行时信息\n",
"\n",
f"当前时间: {time_info['time']} {time_info['weekday']} ({time_info['timezone']})\n",
"\n"
]
# Add other runtime info
runtime_parts = []
if self.runtime_info.get("model"):
runtime_parts.append(f"模型={self.runtime_info['model']}")
if self.runtime_info.get("workspace"):
# Replace backslashes with forward slashes for Windows paths
workspace_path = str(self.runtime_info['workspace']).replace('\\', '/')
runtime_parts.append(f"工作空间={workspace_path}")
if self.runtime_info.get("channel") and self.runtime_info.get("channel") != "web":
runtime_parts.append(f"渠道={self.runtime_info['channel']}")
if runtime_parts:
runtime_lines.append("运行时: " + " | ".join(runtime_parts) + "\n")
runtime_lines.append("\n")
new_runtime_section = "".join(runtime_lines)
# Find and replace the runtime section
import re
pattern = r'\n## 运行时信息\s*\n.*?(?=\n##|\Z)'
_repl = new_runtime_section.rstrip('\n')
updated_prompt = re.sub(pattern, lambda m: _repl, prompt, flags=re.DOTALL)
return updated_prompt
from agent.prompt import load_context_files, PromptBuilder
if self.skill_manager:
self.skill_manager.refresh_skills()
context_files = load_context_files(self.workspace_dir) if self.workspace_dir else None
builder = PromptBuilder(workspace_dir=self.workspace_dir or "", language="zh")
return builder.build(
tools=self.tools,
context_files=context_files,
skill_manager=self.skill_manager,
memory_manager=self.memory_manager,
runtime_info=self.runtime_info,
)
except Exception as e:
logger.warning(f"Failed to rebuild runtime section: {e}")
return prompt
def _rebuild_skills_section(self, prompt: str) -> str:
"""
Rebuild the <available_skills> block so that newly installed or
removed skills are reflected without re-creating the agent.
"""
try:
import re
self.skill_manager.refresh_skills()
new_skills_xml = self.skill_manager.build_skills_prompt()
old_block_pattern = r'<available_skills>.*?</available_skills>'
has_old_block = re.search(old_block_pattern, prompt, flags=re.DOTALL)
# Extract the new <available_skills>...</available_skills> tag from the prompt
new_block = ""
if new_skills_xml and new_skills_xml.strip():
m = re.search(old_block_pattern, new_skills_xml, flags=re.DOTALL)
if m:
new_block = m.group(0)
if has_old_block:
replacement = new_block or "<available_skills>\n</available_skills>"
# Use lambda to prevent re.sub from interpreting backslashes in replacement
# (e.g. Windows paths like \LinkAI would be treated as bad escape sequences)
prompt = re.sub(old_block_pattern, lambda m: replacement, prompt, flags=re.DOTALL)
elif new_block:
skills_header = "以下是可用技能:"
idx = prompt.find(skills_header)
if idx != -1:
insert_pos = idx + len(skills_header)
prompt = prompt[:insert_pos] + "\n" + new_block + prompt[insert_pos:]
except Exception as e:
logger.warning(f"Failed to rebuild skills section: {e}")
return prompt
def _rebuild_tool_list_section(self, prompt: str) -> str:
"""
Rebuild the tool list inside the '## 工具系统' section so that it
always reflects the current ``self.tools`` (handles dynamic add/remove
of conditional tools like web_search).
"""
import re
from agent.prompt.builder import _build_tooling_section
try:
if not self.tools:
return prompt
new_lines = _build_tooling_section(self.tools, "zh")
new_section = "\n".join(new_lines).rstrip("\n")
# Replace existing tooling section
pattern = r'## 工具系统\s*\n.*?(?=\n## |\Z)'
updated = re.sub(pattern, lambda m: new_section, prompt, count=1, flags=re.DOTALL)
return updated
except Exception as e:
logger.warning(f"Failed to rebuild tool list section: {e}")
return prompt
logger.warning(f"Failed to rebuild system prompt, using cached version: {e}")
return self.system_prompt
def refresh_skills(self):
"""Refresh the loaded skills."""

View File

@@ -300,13 +300,13 @@ class AgentStreamExecutor:
f"with same arguments. This may indicate a loop."
)
# Check if this is a file to send (from read tool)
# Check if this is a file to send
if result.get("status") == "success" and isinstance(result.get("result"), dict):
result_data = result.get("result")
if result_data.get("type") == "file_to_send":
# Store file metadata for later sending
self.files_to_send.append(result_data)
logger.info(f"📎 检测到待发送文件: {result_data.get('file_name', result_data.get('path'))}")
self._emit_event("file_to_send", result_data)
# Check for critical error - abort entire conversation
if result.get("status") == "critical_error":
@@ -472,6 +472,7 @@ class AgentStreamExecutor:
raise
finally:
final_response = final_response.strip() if final_response else final_response
logger.info(f"[Agent] 🏁 完成 ({turn}轮)")
self._emit_event("agent_end", {"final_response": final_response})
@@ -526,6 +527,7 @@ class AgentStreamExecutor:
# Streaming response
full_content = ""
full_reasoning = ""
tool_calls_buffer = {} # {index: {id, name, arguments}}
gemini_raw_parts = None # Preserve Gemini thoughtSignature for round-trip
stop_reason = None # Track why the stream stopped
@@ -583,10 +585,10 @@ class AgentStreamExecutor:
if finish_reason:
stop_reason = finish_reason
# Skip reasoning_content (internal thinking from models like GLM-5)
reasoning_delta = delta.get("reasoning_content") or ""
# if reasoning_delta:
# logger.debug(f"🧠 [thinking] {reasoning_delta[:100]}...")
if reasoning_delta:
full_reasoning += reasoning_delta
self._emit_event("reasoning_update", {"delta": reasoning_delta})
# Handle text content
content_delta = delta.get("content") or ""
@@ -787,7 +789,12 @@ class AgentStreamExecutor:
# Add assistant message to history (Claude format uses content blocks)
assistant_msg = {"role": "assistant", "content": []}
# Add text content block if present
if full_reasoning:
assistant_msg["content"].append({
"type": "thinking",
"thinking": full_reasoning
})
if full_content:
assistant_msg["content"].append({
"type": "text",

View File

@@ -139,6 +139,47 @@ def should_include_skill(
return True
def get_missing_requirements(
entry: SkillEntry,
current_platform: Optional[str] = None,
) -> Dict[str, List[str]]:
"""
Return a dict of missing requirements for a skill.
Empty dict means all requirements are met.
:param entry: SkillEntry to check
:param current_platform: Current platform (default: auto-detect)
:return: Dict like {"bins": ["curl"], "env": ["API_KEY"]}
"""
missing: Dict[str, List[str]] = {}
metadata = entry.metadata
if not metadata or not metadata.requires:
return missing
required_bins = metadata.requires.get('bins', [])
if required_bins:
missing_bins = [b for b in required_bins if not has_binary(b)]
if missing_bins:
missing['bins'] = missing_bins
any_bins = metadata.requires.get('anyBins', [])
if any_bins and not has_any_binary(any_bins):
missing['anyBins'] = any_bins
required_env = metadata.requires.get('env', [])
if required_env:
missing_env = [e for e in required_env if not has_env_var(e)]
if missing_env:
missing['env'] = missing_env
any_env = metadata.requires.get('anyEnv', [])
if any_env and not any(has_env_var(e) for e in any_env):
missing['anyEnv'] = any_env
return missing
def is_config_path_truthy(config: Dict, path: str) -> bool:
"""
Check if a config path resolves to a truthy value.

View File

@@ -2,7 +2,7 @@
Skill formatter for generating prompts from skills.
"""
from typing import List
from typing import Dict, List
from agent.skills.types import Skill, SkillEntry
@@ -51,6 +51,71 @@ def format_skill_entries_for_prompt(entries: List[SkillEntry]) -> str:
return format_skills_for_prompt(skills)
def format_unavailable_skills_for_prompt(
entries: List[SkillEntry],
missing_map: Dict[str, Dict[str, List[str]]],
) -> str:
"""
Format unavailable (requires-not-met) skills as brief setup hints
so the AI can guide users to configure them.
:param entries: List of unavailable skill entries
:param missing_map: Dict mapping skill name to its missing requirements
:return: Formatted prompt text
"""
if not entries:
return ""
lines = [
"",
"<unavailable_skills>",
"The following skills are installed but not yet ready. "
"Guide the user to complete the setup when relevant.",
]
for entry in entries:
skill = entry.skill
missing = missing_map.get(skill.name, {})
missing_parts = []
for key, values in missing.items():
missing_parts.append(f"{key}: {', '.join(values)}")
missing_str = "; ".join(missing_parts) if missing_parts else "unknown"
setup_hint = _extract_setup_hint(skill)
lines.append(" <skill>")
lines.append(f" <name>{_escape_xml(skill.name)}</name>")
lines.append(f" <description>{_escape_xml(skill.description)}</description>")
lines.append(f" <missing>{_escape_xml(missing_str)}</missing>")
if setup_hint:
lines.append(f" <setup>{_escape_xml(setup_hint)}</setup>")
lines.append(" </skill>")
lines.append("</unavailable_skills>")
return "\n".join(lines)
def _extract_setup_hint(skill: Skill) -> str:
"""
Extract the Setup section from SKILL.md content as a brief hint.
Returns the first few lines of the ## Setup section.
"""
content = skill.content
if not content:
return ""
import re
match = re.search(r'^##\s+Setup\s*\n(.*?)(?=\n##\s|\Z)', content, re.MULTILINE | re.DOTALL)
if not match:
return ""
setup_text = match.group(1).strip()
lines = setup_text.split('\n')
hint_lines = [l.strip() for l in lines[:6] if l.strip()]
return ' '.join(hint_lines)[:300]
def _escape_xml(text: str) -> str:
"""Escape XML special characters."""
return (text

View File

@@ -87,8 +87,8 @@ def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
if not isinstance(metadata_raw, dict):
return None
# Use metadata_raw directly (COW format)
meta_obj = metadata_raw
# Unwrap nested namespace (e.g. {"openclaw": {...}} or {"cowagent": {...}})
meta_obj = _unwrap_metadata_namespace(metadata_raw)
# Parse install specs
install_specs = []
@@ -128,6 +128,7 @@ def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
return SkillMetadata(
always=meta_obj.get('always', False),
default_enabled=meta_obj.get('default_enabled', True),
skill_key=meta_obj.get('skillKey'),
primary_env=meta_obj.get('primaryEnv'),
emoji=meta_obj.get('emoji'),
@@ -138,6 +139,25 @@ def parse_metadata(frontmatter: Dict[str, Any]) -> Optional[SkillMetadata]:
)
_KNOWN_METADATA_NAMESPACES = {"cowagent", "openclaw"}
def _unwrap_metadata_namespace(metadata_raw: Dict[str, Any]) -> Dict[str, Any]:
"""
Unwrap a single-key namespace wrapper like {"cowagent": {...} or {"openclaw": {...}}}.
If the top-level dict has exactly one key matching a known namespace, return the inner dict.
Otherwise return the original dict unchanged.
"""
keys = set(metadata_raw.keys())
ns_keys = keys & _KNOWN_METADATA_NAMESPACES
if len(ns_keys) == 1 and len(keys) == 1:
ns = ns_keys.pop()
inner = metadata_raw[ns]
if isinstance(inner, dict):
return inner
return metadata_raw
def _normalize_string_list(value: Any) -> List[str]:
"""Normalize a value to a list of strings."""
if not value:

View File

@@ -53,6 +53,12 @@ class SkillLoader:
"""
Recursively load skills from a directory.
If a subdirectory contains its own SKILL.md, it is treated as a
self-contained skill (or skill-collection) and its children are
NOT scanned further. This prevents sub-skills inside a collection
(e.g. style-collection/style-anjing) from being listed as
independent top-level skills.
:param dir_path: Directory to scan
:param source: Source identifier
:param include_root_files: Whether to include root-level .md files
@@ -66,38 +72,41 @@ class SkillLoader:
except Exception as e:
diagnostics.append(f"Failed to list directory {dir_path}: {e}")
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
# If this directory has its own SKILL.md, load it and stop recursing.
# The sub-directories are internal resources of this skill.
if not include_root_files and 'SKILL.md' in entries:
skill_md_path = os.path.join(dir_path, 'SKILL.md')
if os.path.isfile(skill_md_path):
skill_result = self._load_skill_from_file(skill_md_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
diagnostics.extend(skill_result.diagnostics)
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
for entry in entries:
# Skip hidden files and directories
if entry.startswith('.'):
continue
# Skip common non-skill directories
if entry in ('node_modules', '__pycache__', 'venv', '.git'):
continue
full_path = os.path.join(dir_path, entry)
# Handle directories
if os.path.isdir(full_path):
# Recursively scan subdirectories
sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
skills.extend(sub_result.skills)
diagnostics.extend(sub_result.diagnostics)
continue
# Handle files
if not os.path.isfile(full_path):
continue
# Check if this is a skill file
is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
is_skill_md = not include_root_files and entry == 'SKILL.md'
if not (is_root_md or is_skill_md):
if not is_root_md:
continue
# Load the skill
skill_result = self._load_skill_from_file(full_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
@@ -184,7 +193,6 @@ class SkillLoader:
config_path = os.path.join(skill_dir, "config.json")
# Without config.json, skip this skill entirely (return empty to trigger exclusion)
if not os.path.exists(config_path):
logger.debug(f"[SkillLoader] linkai-agent skipped: no config.json found")
return ""

View File

@@ -84,10 +84,10 @@ class SkillManager:
"""
Merge directory-scanned skills with the persisted config file.
- New skills discovered on disk are added with enabled=True.
- New skills: use metadata.default_enabled as initial enabled state.
- Existing skills: preserve their persisted enabled state.
- Skills that no longer exist on disk are removed.
- Existing entries preserve their enabled state; name/description/source
are refreshed from the latest scan.
- name/description/source are always refreshed from the latest scan.
"""
saved = self._load_skills_config()
merged: Dict[str, dict] = {}
@@ -95,15 +95,24 @@ class SkillManager:
for name, entry in self.skills.items():
skill = entry.skill
prev = saved.get(name, {})
# category priority: persisted config (set by cloud) > default "skill"
category = prev.get("category", "skill")
merged[name] = {
if name in saved:
enabled = prev.get("enabled", True)
else:
enabled = entry.metadata.default_enabled if entry.metadata else True
entry_dict = {
"name": name,
"description": skill.description,
"source": skill.source,
"enabled": prev.get("enabled", True),
"source": prev.get("source") or skill.source,
"enabled": enabled,
"category": category,
}
display_name = prev.get("display_name")
if display_name:
entry_dict["display_name"] = display_name
merged[name] = entry_dict
self.skills_config = merged
self._save_skills_config()
@@ -157,69 +166,118 @@ class SkillManager:
"""
return list(self.skills.values())
@staticmethod
def _normalize_skill_filter(skill_filter: Optional[List[str]]) -> Optional[List[str]]:
"""Normalize a skill_filter list into a flat list of stripped names."""
if skill_filter is None:
return None
normalized = []
for item in skill_filter:
if isinstance(item, str):
name = item.strip()
if name:
normalized.append(name)
elif isinstance(item, list):
for subitem in item:
if isinstance(subitem, str):
name = subitem.strip()
if name:
normalized.append(name)
return normalized or None
def filter_skills(
self,
skill_filter: Optional[List[str]] = None,
include_disabled: bool = False,
) -> List[SkillEntry]:
"""
Filter skills based on criteria.
Simple rule: Skills are auto-enabled if requirements are met.
- Has required API keys -> included
- Missing API keys -> excluded
Filter skills that are eligible (enabled + requirements met).
:param skill_filter: List of skill names to include (None = all)
:param include_disabled: Whether to include disabled skills
:return: Filtered list of skill entries
:return: Filtered list of eligible skill entries
"""
from agent.skills.config import should_include_skill
entries = list(self.skills.values())
# Check requirements (platform, binaries, env vars)
entries = [e for e in entries if should_include_skill(e, self.config)]
# Apply skill filter
if skill_filter is not None:
normalized = []
for item in skill_filter:
if isinstance(item, str):
name = item.strip()
if name:
normalized.append(name)
elif isinstance(item, list):
for subitem in item:
if isinstance(subitem, str):
name = subitem.strip()
if name:
normalized.append(name)
if normalized:
entries = [e for e in entries if e.skill.name in normalized]
normalized = self._normalize_skill_filter(skill_filter)
if normalized is not None:
entries = [e for e in entries if e.skill.name in normalized]
# Filter out disabled skills based on skills_config.json
if not include_disabled:
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
from config import conf
if not conf().get("knowledge", True):
entries = [e for e in entries if e.skill.name != "knowledge-wiki"]
return entries
def filter_unavailable_skills(
self,
skill_filter: Optional[List[str]] = None,
) -> tuple:
"""
Find skills that are enabled but have unmet requirements.
:param skill_filter: Optional list of skill names to include
:return: Tuple of (entries, missing_map) where missing_map maps
skill name to its missing requirements dict
"""
from agent.skills.config import should_include_skill, get_missing_requirements
entries = list(self.skills.values())
# Only enabled skills
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
normalized = self._normalize_skill_filter(skill_filter)
if normalized is not None:
entries = [e for e in entries if e.skill.name in normalized]
# Keep only those that fail should_include_skill (requirements not met)
unavailable = []
missing_map: Dict[str, dict] = {}
for e in entries:
if not should_include_skill(e, self.config):
missing = get_missing_requirements(e)
if missing:
unavailable.append(e)
missing_map[e.skill.name] = missing
return unavailable, missing_map
def build_skills_prompt(
self,
skill_filter: Optional[List[str]] = None,
) -> str:
"""
Build a formatted prompt containing available skills.
Build a formatted prompt containing available skills
and brief hints for unavailable ones.
:param skill_filter: Optional list of skill names to include
:return: Formatted skills prompt
"""
from common.log import logger
entries = self.filter_skills(skill_filter=skill_filter, include_disabled=False)
logger.debug(f"[SkillManager] Filtered {len(entries)} skills for prompt (total: {len(self.skills)})")
if entries:
skill_names = [e.skill.name for e in entries]
logger.debug(f"[SkillManager] Skills to include: {skill_names}")
result = format_skill_entries_for_prompt(entries)
from agent.skills.formatter import format_unavailable_skills_for_prompt
eligible = self.filter_skills(skill_filter=skill_filter, include_disabled=False)
logger.debug(f"[SkillManager] Eligible: {len(eligible)} skills (total: {len(self.skills)})")
if eligible:
skill_names = [e.skill.name for e in eligible]
logger.debug(f"[SkillManager] Eligible skills: {skill_names}")
result = format_skill_entries_for_prompt(eligible)
unavailable, missing_map = self.filter_unavailable_skills(skill_filter=skill_filter)
if unavailable:
unavailable_names = [e.skill.name for e in unavailable]
logger.debug(f"[SkillManager] Unavailable skills (setup needed): {unavailable_names}")
result += format_unavailable_skills_for_prompt(unavailable, missing_map)
logger.debug(f"[SkillManager] Generated prompt length: {len(result)}")
return result

View File

@@ -29,6 +29,7 @@ class SkillInstallSpec:
class SkillMetadata:
"""Metadata for a skill from frontmatter."""
always: bool = False # Always include this skill
default_enabled: bool = True # Initial enabled state when first discovered
skill_key: Optional[str] = None # Override skill key
primary_env: Optional[str] = None # Primary environment variable
emoji: Optional[str] = None

View File

@@ -87,25 +87,25 @@ FileSave = _optional_tools.get('FileSave')
Terminal = _optional_tools.get('Terminal')
# Delayed import for BrowserTool
# BrowserTool (requires playwright)
def _import_browser_tool():
from common.log import logger
try:
from agent.tools.browser.browser_tool import BrowserTool
return BrowserTool
except ImportError:
# Return a placeholder class that will prompt the user to install dependencies when instantiated
class BrowserToolPlaceholder:
def __init__(self, *args, **kwargs):
raise ImportError(
"The 'browser-use' package is required to use BrowserTool. "
"Please install it with 'pip install browser-use>=0.1.40'."
)
except ImportError as e:
logger.info(
f"[Tools] BrowserTool not loaded - missing dependency: {e}\n"
f" To enable browser tool, run:\n"
f" pip install playwright\n"
f" playwright install chromium"
)
return None
except Exception as e:
logger.error(f"[Tools] BrowserTool failed to load: {e}")
return None
return BrowserToolPlaceholder
# Dynamically set BrowserTool
# BrowserTool = _import_browser_tool()
BrowserTool = _import_browser_tool()
# Export all tools (including optional ones that might be None)
__all__ = [
@@ -124,8 +124,7 @@ __all__ = [
'WebSearch',
'WebFetch',
'Vision',
# Optional tools (may be None if dependencies not available)
# 'BrowserTool'
'BrowserTool',
]
"""

View File

@@ -18,9 +18,13 @@ from common.utils import expand_path
class Bash(BaseTool):
"""Tool for executing bash commands"""
_IS_WIN = sys.platform == "win32"
name: str = "bash"
description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
{'''
PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
''' if _IS_WIN else ''}
ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.
SAFETY:
@@ -103,13 +107,12 @@ SAFETY:
logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
# On Windows, convert $VAR references to %VAR% for cmd.exe
if sys.platform == "win32":
if self._IS_WIN:
env["PYTHONIOENCODING"] = "utf-8"
command = self._convert_env_vars_for_windows(command, dotenv_vars)
if command and not command.strip().lower().startswith("chcp"):
command = f"chcp 65001 >nul 2>&1 && {command}"
# Execute command with inherited environment variables
result = subprocess.run(
command,
shell=True,
@@ -120,7 +123,7 @@ SAFETY:
encoding="utf-8",
errors="replace",
timeout=timeout,
env=env
env=env,
)
logger.debug(f"[Bash] Exit code: {result.returncode}")

View File

@@ -0,0 +1,3 @@
from agent.tools.browser.browser_tool import BrowserTool
__all__ = ["BrowserTool"]

View File

@@ -0,0 +1,780 @@
"""
Browser service - Playwright wrapper managing browser lifecycle and page operations.
All Playwright calls run on a dedicated background thread so that callers from
any worker thread can safely use the service. An idle-timeout mechanism
automatically shuts down the browser (and its thread) after a configurable
period of inactivity to free resources.
"""
import os
import sys
import uuid
import queue
import threading
from typing import Optional, Dict, Any, List, Callable
from common.log import logger
try:
from playwright.sync_api import sync_playwright, Browser, BrowserContext, Page, Playwright
_HAS_PLAYWRIGHT = True
except ImportError:
_HAS_PLAYWRIGHT = False
# ---------------------------------------------------------------------------
# Snapshot DOM helpers
# ---------------------------------------------------------------------------
# Tags that typically carry useful content for an agent
_INTERACTIVE_TAGS = {
"a", "button", "input", "textarea", "select", "option",
"label", "details", "summary",
}
_SEMANTIC_TAGS = {
"h1", "h2", "h3", "h4", "h5", "h6",
"p", "li", "td", "th", "caption", "figcaption", "blockquote", "pre", "code",
"nav", "main", "article", "section", "header", "footer", "form", "table",
"img", "video", "audio",
}
_KEEP_TAGS = _INTERACTIVE_TAGS | _SEMANTIC_TAGS
_SNAPSHOT_JS = """
() => {
const KEEP = new Set(%s);
const INTERACTIVE = new Set(%s);
const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
const CLICKABLE_ROLES = new Set([
"button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
"option","switch","checkbox","radio","combobox","searchbox","slider",
"spinbutton","textbox","treeitem"
]);
let refCounter = 0;
const refMap = {};
function visible(el) {
if (!(el instanceof HTMLElement)) return true;
const st = window.getComputedStyle(el);
if (st.display === "none" || st.visibility === "hidden") return false;
if (parseFloat(st.opacity) === 0) return false;
return true;
}
// Strong signals: these attributes alone are enough to mark as interactive
function hasStrongInteractiveSignal(el) {
const role = el.getAttribute("role");
if (role && CLICKABLE_ROLES.has(role)) return true;
if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
if (el.getAttribute("contenteditable") === "true") return true;
return false;
}
// Check if cursor:pointer is set directly (not just inherited from parent)
function hasOwnPointerCursor(el) {
try {
const st = window.getComputedStyle(el);
if (st.cursor !== "pointer") return false;
const parent = el.parentElement;
if (parent) {
const pst = window.getComputedStyle(parent);
if (pst.cursor === "pointer") return false;
}
return true;
} catch(e) {}
return false;
}
function hasTextOrContent(el) {
const t = el.textContent || "";
if (t.trim().length > 0) return true;
if (el.querySelector("img,video,audio,canvas")) return true;
const ariaLabel = el.getAttribute("aria-label");
if (ariaLabel && ariaLabel.trim()) return true;
const title = el.getAttribute("title");
if (title && title.trim()) return true;
return false;
}
function isImplicitInteractive(el) {
if (hasStrongInteractiveSignal(el)) return true;
if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
return false;
}
function getTextContent(el) {
let text = "";
for (const ch of el.childNodes) {
if (ch.nodeType === Node.TEXT_NODE) {
text += ch.textContent;
}
}
return text.trim();
}
function walk(node) {
if (node.nodeType === Node.TEXT_NODE) {
const t = node.textContent.trim();
return t ? t : null;
}
if (node.nodeType !== Node.ELEMENT_NODE) return null;
const tag = node.tagName.toLowerCase();
if (SKIP.has(tag)) return null;
if (!visible(node)) return null;
const children = [];
for (const ch of node.childNodes) {
const r = walk(ch);
if (r !== null) {
if (typeof r === "string") children.push(r);
else children.push(r);
}
}
const nativeInteractive = INTERACTIVE.has(tag);
const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
const keep = KEEP.has(tag) || implicitInteractive;
if (!keep) {
if (children.length === 0) return null;
if (children.length === 1) return children[0];
return children;
}
const obj = { tag };
if (nativeInteractive || implicitInteractive) {
refCounter++;
obj.ref = refCounter;
refMap[refCounter] = node;
}
if (implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const directText = getTextContent(node);
if (!directText && children.length === 0) {
const ariaLabel = node.getAttribute("aria-label");
const title = node.getAttribute("title");
if (ariaLabel) obj.ariaLabel = ariaLabel;
else if (title) obj.ariaLabel = title;
}
}
// Attributes
if (tag === "a" && node.href) obj.href = node.getAttribute("href");
if (tag === "img") {
obj.alt = node.alt || "";
obj.src = node.getAttribute("src") || "";
}
if (tag === "input" || tag === "textarea" || tag === "select") {
obj.type = node.type || "text";
obj.name = node.name || undefined;
obj.value = node.value || undefined;
obj.placeholder = node.placeholder || undefined;
if (node.disabled) obj.disabled = true;
if (tag === "input" && node.type === "checkbox") obj.checked = node.checked;
}
if (tag === "button") {
if (node.disabled) obj.disabled = true;
}
if (tag === "option") {
obj.value = node.value;
if (node.selected) obj.selected = true;
}
if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;
// Role / aria-label for native interactive & semantic elements
if (!implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const ariaLabel = node.getAttribute("aria-label");
if (ariaLabel) obj.ariaLabel = ariaLabel;
}
// Children
if (children.length === 1 && typeof children[0] === "string") {
obj.text = children[0];
} else if (children.length > 0) {
obj.children = children;
}
return obj;
}
const result = walk(document.body);
window.__cowRefMap = refMap;
return { tree: result, refCount: refCounter };
}
""" % (
str(list(_KEEP_TAGS)),
str(list(_INTERACTIVE_TAGS)),
)
def _should_use_headless() -> bool:
"""Decide headless mode: headless on Linux servers without display, headed elsewhere."""
if sys.platform in ("win32", "darwin"):
return False
# Linux: check for display
if os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"):
return False
return True
def _flatten_tree(node, indent=0) -> List[str]:
"""Convert snapshot tree to compact text lines for LLM consumption."""
if node is None:
return []
if isinstance(node, str):
return [" " * indent + node]
if isinstance(node, list):
lines = []
for child in node:
lines.extend(_flatten_tree(child, indent))
return lines
if not isinstance(node, dict):
return []
tag = node.get("tag", "?")
ref = node.get("ref")
parts = [tag]
if ref:
parts[0] = f"[{ref}] {tag}"
# Inline attributes
for attr in ("type", "name", "href", "alt", "role", "ariaLabel", "placeholder", "value"):
val = node.get(attr)
if val:
# Truncate long values
s = str(val)
if len(s) > 80:
s = s[:77] + "..."
parts.append(f'{attr}="{s}"')
for flag in ("disabled", "checked", "selected"):
if node.get(flag):
parts.append(flag)
prefix = " " * indent
header = prefix + " ".join(parts)
text = node.get("text")
if text:
# Truncate long text
if len(text) > 120:
text = text[:117] + "..."
header += f": {text}"
lines = [header]
children = node.get("children", [])
for child in children:
lines.extend(_flatten_tree(child, indent + 2))
return lines
class BrowserService:
"""Manages a Playwright browser on a dedicated background thread.
All Playwright operations are dispatched to a single long-lived thread via
a task queue. Callers from *any* worker thread can use the public API
safely. An idle timer automatically shuts the browser down after
``idle_timeout`` seconds of inactivity (default 300 = 5 min).
"""
_IDLE_TIMEOUT_DEFAULT = 300 # seconds
def __init__(self, config: Optional[Dict[str, Any]] = None):
self._config = config or {}
self._headless: Optional[bool] = None
self._screenshot_dir: Optional[str] = None
# Background thread state
self._thread: Optional[threading.Thread] = None
self._task_queue: queue.Queue = queue.Queue()
self._lock = threading.Lock()
self._alive = False
self._ready = threading.Event()
# Playwright objects (only accessed on the background thread)
self._playwright = None
self._browser = None
self._context = None
self._page = None
# Idle auto-release
idle_cfg = self._config.get("idle_timeout")
self._idle_timeout: float = float(idle_cfg) if idle_cfg is not None else self._IDLE_TIMEOUT_DEFAULT
self._idle_timer: Optional[threading.Timer] = None
# ------------------------------------------------------------------
# Background-thread lifecycle
# ------------------------------------------------------------------
def _start_thread(self):
"""Start the dedicated Playwright thread if not already running."""
with self._lock:
if self._alive and self._thread and self._thread.is_alive():
return
# Wait for old thread to fully exit before creating a new one
old = self._thread
if old and old.is_alive():
old.join(timeout=5)
# Fresh queue to avoid stale sentinels from a previous close()
self._task_queue = queue.Queue()
self._alive = True
self._ready = threading.Event()
self._thread = threading.Thread(target=self._run_loop, daemon=True, name="BrowserThread")
self._thread.start()
# Block until browser is ready (or failed)
self._ready.wait(timeout=30)
def _run_loop(self):
"""Event loop running on the dedicated thread. Processes tasks until stopped."""
logger.info("[Browser] Background thread started")
try:
self._launch_browser()
except Exception as e:
logger.error(f"[Browser] Failed to launch browser: {e}")
self._alive = False
self._ready.set()
self._drain_queue(RuntimeError(f"Browser launch failed: {e}"))
return
self._ready.set()
while self._alive:
try:
task = self._task_queue.get(timeout=1.0)
except queue.Empty:
continue
if task is None:
break
fn, args, kwargs, result_slot = task
try:
result_slot["value"] = fn(*args, **kwargs)
except Exception as e:
result_slot["error"] = e
finally:
result_slot["event"].set()
self._shutdown_browser()
self._drain_queue(RuntimeError("Browser thread stopped"))
logger.info("[Browser] Background thread exited")
def _drain_queue(self, error: Exception):
"""Unblock all callers waiting on the queue with an error."""
while True:
try:
task = self._task_queue.get_nowait()
except queue.Empty:
break
if task is None:
continue
_, _, _, result_slot = task
result_slot["error"] = error
result_slot["event"].set()
def _launch_browser(self):
"""Launch Chromium on the background thread."""
if self._headless is None:
headless_cfg = self._config.get("headless")
self._headless = headless_cfg if headless_cfg is not None else _should_use_headless()
launch_args = ["--disable-dev-shm-usage"]
if self._headless:
launch_args.append("--no-sandbox")
extra_args = self._config.get("launch_args", [])
if extra_args:
launch_args.extend(extra_args)
viewport_w = self._config.get("viewport_width", 1280)
viewport_h = self._config.get("viewport_height", 720)
self._playwright = sync_playwright().start()
logger.info(f"[Browser] Launching Chromium (headless={self._headless})")
self._browser = self._playwright.chromium.launch(
headless=self._headless,
args=launch_args,
)
self._context = self._browser.new_context(
viewport={"width": viewport_w, "height": viewport_h},
user_agent=(
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/131.0.0.0 Safari/537.36"
),
)
self._page = self._context.new_page()
logger.info("[Browser] Browser ready")
def _shutdown_browser(self):
"""Shut down all Playwright resources on the background thread."""
self._cancel_idle_timer()
for obj, label in [
(self._context, "context"),
(self._browser, "browser"),
]:
try:
if obj:
obj.close()
except Exception as e:
logger.debug(f"[Browser] {label} close error: {e}")
try:
if self._playwright:
self._playwright.stop()
except Exception as e:
logger.debug(f"[Browser] playwright stop error: {e}")
self._page = None
self._context = None
self._browser = None
self._playwright = None
logger.info("[Browser] Browser closed")
def _submit(self, fn: Callable, *args, **kwargs):
"""Submit *fn* to the background thread and block until it completes."""
self._start_thread()
if not self._alive:
raise RuntimeError("Browser is not available")
self._reset_idle_timer()
result_slot: Dict[str, Any] = {"event": threading.Event()}
self._task_queue.put((fn, args, kwargs, result_slot))
# Timeout prevents permanent hang if the background thread crashes
completed = result_slot["event"].wait(timeout=120)
if not completed:
raise TimeoutError("Browser operation timed out (120s)")
if "error" in result_slot:
raise result_slot["error"]
return result_slot.get("value")
# ------------------------------------------------------------------
# Idle auto-release
# ------------------------------------------------------------------
def _reset_idle_timer(self):
self._cancel_idle_timer()
if self._idle_timeout > 0:
self._idle_timer = threading.Timer(self._idle_timeout, self._on_idle_timeout)
self._idle_timer.daemon = True
self._idle_timer.start()
def _cancel_idle_timer(self):
if self._idle_timer:
self._idle_timer.cancel()
self._idle_timer = None
def _on_idle_timeout(self):
logger.info(f"[Browser] Idle for {self._idle_timeout}s, auto-releasing browser")
self.close()
# ------------------------------------------------------------------
# Public lifecycle
# ------------------------------------------------------------------
def close(self):
"""Shut down browser and background thread (safe from any thread)."""
self._cancel_idle_timer()
with self._lock:
if not self._alive:
return
self._alive = False
t = self._thread
if self._task_queue is not None:
self._task_queue.put(None)
if t is not None and t.is_alive():
t.join(timeout=10)
with self._lock:
self._thread = None
# ------------------------------------------------------------------
# Actions (each method is dispatched to the background thread)
# ------------------------------------------------------------------
def navigate(self, url: str, timeout: int = 30000) -> Dict[str, Any]:
return self._submit(self._do_navigate, url, timeout)
def _do_navigate(self, url: str, timeout: int) -> Dict[str, Any]:
page = self._page
try:
resp = page.goto(url, wait_until="domcontentloaded", timeout=timeout)
status = resp.status if resp else None
except Exception as e:
return {"error": f"Navigation failed: {e}"}
try:
page.wait_for_load_state("networkidle", timeout=8000)
except Exception:
pass
page.wait_for_timeout(500)
try:
title = page.title()
except Exception:
title = ""
try:
current_url = page.url
except Exception:
current_url = url
return {"url": current_url, "title": title, "status": status}
def snapshot(self, selector: Optional[str] = None) -> str:
return self._submit(self._do_snapshot, selector)
def _do_snapshot(self, selector: Optional[str] = None) -> str:
page = self._page
try:
result = page.evaluate(_SNAPSHOT_JS)
except Exception as e:
return f"[Snapshot error: {e}]"
tree = result.get("tree")
ref_count = result.get("refCount", 0)
lines = _flatten_tree(tree)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
header = f"Page: {title} ({url})\nInteractive elements: {ref_count}\n---"
body = "\n".join(lines)
max_chars = self._config.get("snapshot_max_chars", 30000)
if len(body) > max_chars:
body = body[:max_chars] + "\n... [snapshot truncated]"
return f"{header}\n{body}"
def screenshot(self, full_page: bool = False, cwd: str = "") -> str:
return self._submit(self._do_screenshot, full_page, cwd)
def _do_screenshot(self, full_page: bool = False, cwd: str = "") -> str:
page = self._page
save_dir = self._get_screenshot_dir(cwd)
filename = f"screenshot_{uuid.uuid4().hex[:8]}.png"
filepath = os.path.join(save_dir, filename)
page.screenshot(path=filepath, full_page=full_page)
logger.info(f"[Browser] Screenshot saved: {filepath}")
return filepath
def click(self, ref: Optional[int] = None, selector: Optional[str] = None,
timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_click, ref, selector, timeout)
def _do_click(self, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el) return {{ error: "ref {ref} not found. Run snapshot first." }};
el.click();
return {{ clicked: true, tag: el.tagName.toLowerCase() }};
}}
""")
if result.get("error"):
return result
page.wait_for_timeout(500)
return result
elif selector:
page.click(selector, timeout=timeout)
return {"clicked": True, "selector": selector}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Click failed: {e}"}
def fill(self, text: str, ref: Optional[int] = None,
selector: Optional[str] = None, timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_fill, text, ref, selector, timeout)
def _do_fill(self, text, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el) return {{ error: "ref {ref} not found. Run snapshot first." }};
el.focus();
el.value = "";
return {{ tag: el.tagName.toLowerCase(), name: el.name || "" }};
}}
""")
if result.get("error"):
return result
page.keyboard.type(text)
return {"filled": True, "ref": ref, "text": text}
elif selector:
page.fill(selector, text, timeout=timeout)
return {"filled": True, "selector": selector, "text": text}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Fill failed: {e}"}
def select(self, value: str, ref: Optional[int] = None,
selector: Optional[str] = None, timeout: int = 5000) -> Dict[str, Any]:
return self._submit(self._do_select, value, ref, selector, timeout)
def _do_select(self, value, ref, selector, timeout) -> Dict[str, Any]:
page = self._page
try:
if ref is not None:
result = page.evaluate(f"""
() => {{
const el = window.__cowRefMap && window.__cowRefMap[{ref}];
if (!el || el.tagName.toLowerCase() !== "select")
return {{ error: "ref {ref} is not a <select> element" }};
el.value = {repr(value)};
el.dispatchEvent(new Event("change", {{ bubbles: true }}));
return {{ selected: true, value: el.value }};
}}
""")
return result
elif selector:
page.select_option(selector, value, timeout=timeout)
return {"selected": True, "selector": selector, "value": value}
else:
return {"error": "Provide either ref (from snapshot) or selector"}
except Exception as e:
return {"error": f"Select failed: {e}"}
def scroll(self, direction: str = "down", amount: int = 500) -> Dict[str, Any]:
return self._submit(self._do_scroll, direction, amount)
def _do_scroll(self, direction, amount) -> Dict[str, Any]:
page = self._page
delta_map = {
"down": (0, amount),
"up": (0, -amount),
"right": (amount, 0),
"left": (-amount, 0),
}
dx, dy = delta_map.get(direction, (0, amount))
try:
page.mouse.wheel(dx, dy)
page.wait_for_timeout(300)
scroll_info = page.evaluate("""
() => ({
scrollX: window.scrollX,
scrollY: window.scrollY,
scrollHeight: document.documentElement.scrollHeight,
clientHeight: document.documentElement.clientHeight
})
""")
return {"scrolled": direction, "amount": amount, **scroll_info}
except Exception as e:
return {"error": f"Scroll failed: {e}"}
def wait(self, selector: Optional[str] = None, timeout: int = 5000,
state: str = "visible") -> Dict[str, Any]:
return self._submit(self._do_wait, selector, timeout, state)
def _do_wait(self, selector, timeout, state) -> Dict[str, Any]:
page = self._page
try:
if selector:
page.wait_for_selector(selector, timeout=timeout, state=state)
return {"waited": True, "selector": selector, "state": state}
else:
page.wait_for_timeout(timeout)
return {"waited": True, "timeout_ms": timeout}
except Exception as e:
return {"error": f"Wait failed: {e}"}
def go_back(self) -> Dict[str, Any]:
return self._submit(self._do_go_back)
def _do_go_back(self) -> Dict[str, Any]:
page = self._page
try:
page.go_back(wait_until="domcontentloaded", timeout=10000)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
return {"url": url, "title": title}
except Exception as e:
return {"error": f"Go back failed: {e}"}
def go_forward(self) -> Dict[str, Any]:
return self._submit(self._do_go_forward)
def _do_go_forward(self) -> Dict[str, Any]:
page = self._page
try:
page.go_forward(wait_until="domcontentloaded", timeout=10000)
try:
title = page.title()
except Exception:
title = ""
try:
url = page.url
except Exception:
url = ""
return {"url": url, "title": title}
except Exception as e:
return {"error": f"Go forward failed: {e}"}
def get_text(self, selector: str) -> Dict[str, Any]:
return self._submit(self._do_get_text, selector)
def _do_get_text(self, selector) -> Dict[str, Any]:
page = self._page
try:
text = page.text_content(selector, timeout=5000)
return {"text": text or ""}
except Exception as e:
return {"error": f"Get text failed: {e}"}
def evaluate(self, script: str) -> Dict[str, Any]:
return self._submit(self._do_evaluate, script)
def _do_evaluate(self, script) -> Dict[str, Any]:
page = self._page
try:
result = page.evaluate(script)
return {"result": result}
except Exception as e:
return {"error": f"Evaluate failed: {e}"}
def press(self, key: str) -> Dict[str, Any]:
return self._submit(self._do_press, key)
def _do_press(self, key) -> Dict[str, Any]:
page = self._page
try:
page.keyboard.press(key)
page.wait_for_timeout(300)
return {"pressed": key}
except Exception as e:
return {"error": f"Press failed: {e}"}
# ------------------------------------------------------------------
# Helpers
# ------------------------------------------------------------------
def _get_screenshot_dir(self, cwd: str = "") -> str:
if self._screenshot_dir and os.path.isdir(self._screenshot_dir):
return self._screenshot_dir
base = cwd or os.getcwd()
d = os.path.join(base, "tmp")
os.makedirs(d, exist_ok=True)
self._screenshot_dir = d
return d

View File

@@ -0,0 +1,290 @@
"""
Browser tool - Control a Chromium browser for web navigation and interaction.
Uses Playwright under the hood. Browser instance is lazily started on first
use, reused across tool calls within the same session, and cleaned up via
close().
"""
import json
import os
from typing import Dict, Any, Optional
from agent.tools.base_tool import BaseTool, ToolResult
from agent.tools.browser.browser_service import BrowserService
from common.log import logger
class BrowserTool(BaseTool):
"""Single tool exposing all browser actions via an 'action' parameter."""
name: str = "browser"
description: str = (
"Control a browser to navigate web pages, interact with elements, and extract content. "
"Actions: navigate, snapshot, click, fill, select, scroll, screenshot, wait, back, forward, "
"get_text, press, evaluate.\n\n"
"Workflow: navigate (auto-includes snapshot with element refs) → click/fill/select by ref → snapshot to verify.\n\n"
"Use snapshot as the primary way to read pages. Use screenshot + send to show key results to the user. "
"For login/CAPTCHA/authorization etc., screenshot and ask the user for help."
)
params: dict = {
"type": "object",
"properties": {
"action": {
"type": "string",
"description": (
"The browser action to perform. One of: "
"navigate, snapshot, click, fill, select, scroll, "
"screenshot, wait, back, forward, get_text, press, evaluate"
),
"enum": [
"navigate", "snapshot", "click", "fill", "select", "scroll",
"screenshot", "wait", "back", "forward", "get_text", "press",
"evaluate"
]
},
"url": {
"type": "string",
"description": "URL to navigate to (for 'navigate' action)"
},
"ref": {
"type": "integer",
"description": "Element ref number from snapshot (for click/fill/select)"
},
"selector": {
"type": "string",
"description": "CSS selector as fallback when ref is unavailable (for click/fill/select/wait/get_text)"
},
"text": {
"type": "string",
"description": "Text to type (for 'fill' action)"
},
"value": {
"type": "string",
"description": "Option value (for 'select' action)"
},
"key": {
"type": "string",
"description": "Key to press, e.g. Enter, Tab, Escape (for 'press' action)"
},
"direction": {
"type": "string",
"description": "Scroll direction: up, down, left, right (for 'scroll' action, default: down)"
},
"script": {
"type": "string",
"description": "JavaScript code to execute (for 'evaluate' action)"
},
"full_page": {
"type": "boolean",
"description": "Capture full page screenshot (for 'screenshot' action, default: false)"
},
"timeout": {
"type": "integer",
"description": "Timeout in milliseconds (optional, default varies by action)"
}
},
"required": ["action"]
}
_shared_service: Optional[BrowserService] = None
def __init__(self, config: dict = None):
self.config = config or {}
self.cwd = self.config.get("cwd", os.getcwd())
self._service: Optional[BrowserService] = None
def _get_service(self) -> BrowserService:
"""Get or create the browser service, sharing across copies."""
if self._service is not None:
return self._service
# Reuse shared service across tool copies within the same session
if BrowserTool._shared_service is not None:
self._service = BrowserTool._shared_service
return self._service
self._service = BrowserService(self.config)
BrowserTool._shared_service = self._service
return self._service
def execute(self, args: Dict[str, Any]) -> ToolResult:
action = args.get("action", "").strip().lower()
if not action:
return ToolResult.fail("Error: 'action' parameter is required")
handler = self._ACTION_MAP.get(action)
if not handler:
valid = ", ".join(sorted(self._ACTION_MAP.keys()))
return ToolResult.fail(f"Unknown action '{action}'. Valid actions: {valid}")
try:
return handler(self, args)
except Exception as e:
logger.error(f"[Browser] Action '{action}' error: {e}")
return ToolResult.fail(f"Browser error ({action}): {e}")
# ------------------------------------------------------------------
# Action handlers
# ------------------------------------------------------------------
def _do_navigate(self, args: Dict[str, Any]) -> ToolResult:
url = args.get("url", "").strip()
if not url:
return ToolResult.fail("Error: 'url' is required for navigate action")
if not url.startswith(("http://", "https://")):
url = "https://" + url
timeout = args.get("timeout", 30000)
service = self._get_service()
result = service.navigate(url, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
# Auto-snapshot after navigation so the agent gets page content in one call
snapshot_text = service.snapshot()
return ToolResult.success(
f"Navigated to: {result['url']}\nTitle: {result['title']}\nStatus: {result['status']}\n\n"
f"--- Page Snapshot ---\n{snapshot_text}"
)
def _do_snapshot(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector")
text = self._get_service().snapshot(selector=selector)
return ToolResult.success(text)
def _do_click(self, args: Dict[str, Any]) -> ToolResult:
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
result = self._get_service().click(ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Clicked successfully. Use 'snapshot' to see updated page.")
def _do_fill(self, args: Dict[str, Any]) -> ToolResult:
text = args.get("text", "")
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
if not text and text != "":
return ToolResult.fail("Error: 'text' is required for fill action")
result = self._get_service().fill(text, ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Filled text into element. Use 'snapshot' to verify.")
def _do_select(self, args: Dict[str, Any]) -> ToolResult:
value = args.get("value", "")
ref = args.get("ref")
selector = args.get("selector")
timeout = args.get("timeout", 5000)
if not value:
return ToolResult.fail("Error: 'value' is required for select action")
result = self._get_service().select(value, ref=ref, selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Selected option '{value}'.")
def _do_scroll(self, args: Dict[str, Any]) -> ToolResult:
direction = args.get("direction", "down")
amount = args.get("timeout", 500) # reuse timeout field or default
if "amount" in args:
amount = args["amount"]
result = self._get_service().scroll(direction=direction, amount=amount)
if "error" in result:
return ToolResult.fail(result["error"])
pos = f"scrollY={result.get('scrollY', '?')}/{result.get('scrollHeight', '?')}"
return ToolResult.success(f"Scrolled {direction}. Position: {pos}")
def _do_screenshot(self, args: Dict[str, Any]) -> ToolResult:
full_page = args.get("full_page", False)
filepath = self._get_service().screenshot(full_page=full_page, cwd=self.cwd)
return ToolResult.success(f"Screenshot saved to: {filepath}")
def _do_wait(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector")
timeout = args.get("timeout", 5000)
result = self._get_service().wait(selector=selector, timeout=timeout)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Wait completed.")
def _do_back(self, args: Dict[str, Any]) -> ToolResult:
result = self._get_service().go_back()
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Navigated back to: {result['url']}")
def _do_forward(self, args: Dict[str, Any]) -> ToolResult:
result = self._get_service().go_forward()
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Navigated forward to: {result['url']}")
def _do_get_text(self, args: Dict[str, Any]) -> ToolResult:
selector = args.get("selector", "").strip()
if not selector:
return ToolResult.fail("Error: 'selector' is required for get_text action")
result = self._get_service().get_text(selector)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(result["text"])
def _do_press(self, args: Dict[str, Any]) -> ToolResult:
key = args.get("key", "").strip()
if not key:
return ToolResult.fail("Error: 'key' is required for press action")
result = self._get_service().press(key)
if "error" in result:
return ToolResult.fail(result["error"])
return ToolResult.success(f"Pressed key: {key}")
def _do_evaluate(self, args: Dict[str, Any]) -> ToolResult:
script = args.get("script", "").strip()
if not script:
return ToolResult.fail("Error: 'script' is required for evaluate action")
result = self._get_service().evaluate(script)
if "error" in result:
return ToolResult.fail(result["error"])
val = result.get("result")
if isinstance(val, (dict, list)):
return ToolResult.success(json.dumps(val, ensure_ascii=False, indent=2))
return ToolResult.success(str(val) if val is not None else "(no return value)")
# Action dispatch table
_ACTION_MAP = {
"navigate": _do_navigate,
"snapshot": _do_snapshot,
"click": _do_click,
"fill": _do_fill,
"select": _do_select,
"scroll": _do_scroll,
"screenshot": _do_screenshot,
"wait": _do_wait,
"back": _do_back,
"forward": _do_forward,
"get_text": _do_get_text,
"press": _do_press,
"evaluate": _do_evaluate,
}
# ------------------------------------------------------------------
# Lifecycle
# ------------------------------------------------------------------
def copy(self):
"""Share browser instance across tool copies (avoids re-launching)."""
new_tool = BrowserTool(self.config)
new_tool.model = self.model
new_tool.context = getattr(self, "context", None)
new_tool.cwd = self.cwd
new_tool._service = self._service
return new_tool
def close(self):
"""Release browser resources."""
if self._service:
self._service.close()
self._service = None
BrowserTool._shared_service = None
logger.info("[Browser] BrowserTool closed")

View File

@@ -1,18 +0,0 @@
def copy(self):
"""
Special copy method for browser tool to avoid recreating browser instance.
:return: A new instance with shared browser reference but unique model
"""
new_tool = self.__class__()
# Copy essential attributes
new_tool.model = self.model
new_tool.context = getattr(self, 'context', None)
new_tool.config = getattr(self, 'config', None)
# Share the browser instance instead of creating a new one
if hasattr(self, 'browser'):
new_tool.browser = self.browser
return new_tool

View File

@@ -44,6 +44,19 @@ class MemoryGetTool(BaseTool):
"""
super().__init__()
self.memory_manager = memory_manager
from config import conf
if conf().get("knowledge", True):
self.description = (
"Read specific content from memory or knowledge files. "
"Use this to get full context from a memory file, knowledge page, or specific line range."
)
self.params = {**self.params}
self.params["properties"] = {**self.params["properties"]}
self.params["properties"]["path"] = {
"type": "string",
"description": "Relative path to the memory or knowledge file (e.g. 'MEMORY.md', 'memory/2026-01-01.md', 'knowledge/concepts/moe.md')"
}
def execute(self, args: dict):
"""
@@ -68,11 +81,15 @@ class MemoryGetTool(BaseTool):
workspace_dir = self.memory_manager.config.get_workspace()
# Auto-prepend memory/ if not present and not absolute path
# Exception: MEMORY.md is in the root directory
if not path.startswith('memory/') and not path.startswith('/') and path != 'MEMORY.md':
# Exceptions: MEMORY.md in root, knowledge/ files at workspace root
if not path.startswith('memory/') and not path.startswith('knowledge/') and not path.startswith('/') and path != 'MEMORY.md':
path = f'memory/{path}'
file_path = workspace_dir / path
file_path = (workspace_dir / path).resolve()
workspace_resolved = workspace_dir.resolve()
if not str(file_path).startswith(str(workspace_resolved) + '/') and file_path != workspace_resolved:
return ToolResult.fail(f"Error: Access denied: path outside workspace")
if not file_path.exists():
return ToolResult.fail(f"Error: File not found: {path}")

View File

@@ -48,6 +48,13 @@ class MemorySearchTool(BaseTool):
super().__init__()
self.memory_manager = memory_manager
self.user_id = user_id
from config import conf
if conf().get("knowledge", True):
self.description = (
"Search agent's long-term memory and knowledge base using semantic and keyword search. "
"Use this to recall past conversations, preferences, and knowledge pages."
)
def execute(self, args: dict):
"""

View File

@@ -98,7 +98,18 @@ class Send(BaseTool):
"size_formatted": self._format_size(file_size),
"message": message or f"正在发送 {file_name}"
}
try:
from common.cloud_client import get_website_base_url, copy_send_file
# Do nothing when in local env
if get_website_base_url():
url = copy_send_file(absolute_path, self.cwd)
if url:
result["url"] = url
except Exception:
pass
return ToolResult.success(result)
def _resolve_path(self, path: str) -> str:

View File

@@ -84,11 +84,11 @@ class ToolManager:
except ImportError as e:
# Handle missing dependencies with helpful messages
error_msg = str(e)
if "browser-use" in error_msg or "browser_use" in error_msg:
if "playwright" in error_msg:
logger.warning(
f"[ToolManager] Browser tool not loaded - missing dependencies.\n"
f" To enable browser tool, run:\n"
f" pip install browser-use markdownify playwright\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif "markdownify" in error_msg:
@@ -154,11 +154,11 @@ class ToolManager:
except ImportError as e:
# Handle missing dependencies with helpful messages
error_msg = str(e)
if "browser-use" in error_msg or "browser_use" in error_msg:
if "playwright" in error_msg:
logger.warning(
f"[ToolManager] Browser tool not loaded - missing dependencies.\n"
f" To enable browser tool, run:\n"
f" pip install browser-use markdownify playwright\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif "markdownify" in error_msg:
@@ -197,7 +197,7 @@ class ToolManager:
logger.warning(
f"[ToolManager] Browser tool is configured but not loaded.\n"
f" To enable browser tool, run:\n"
f" pip install browser-use markdownify playwright\n"
f" pip install playwright\n"
f" playwright install chromium"
)
elif tool_name == "google_search":

View File

@@ -1,22 +1,30 @@
"""
Vision tool - Analyze images using OpenAI-compatible Vision API.
Vision tool - Analyze images using Vision API.
Supports local files (auto base64-encoded) and HTTP URLs.
Providers: OpenAI (preferred) > LinkAI (fallback).
Provider priority (default):
1. Main model via bot.call_vision — zero extra cost
2. Other models whose API key is configured — auto-discovered
3. OpenAI / LinkAI raw HTTP — reliable fallback
When use_linkai=true, LinkAI is promoted to #1.
When tool.vision.model is set, that model is used exclusively first.
"""
import base64
import os
import subprocess
import tempfile
from typing import Any, Dict, Optional, Tuple
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
import requests
from agent.tools.base_tool import BaseTool, ToolResult
from common import const
from common.log import logger
from config import conf
DEFAULT_MODEL = "gpt-4.1-mini"
DEFAULT_MODEL = const.GPT_41_MINI
DEFAULT_TIMEOUT = 60
MAX_TOKENS = 1000
COMPRESS_THRESHOLD = 1_048_576 # 1 MB
@@ -29,15 +37,46 @@ SUPPORTED_EXTENSIONS = {
"webp": "image/webp",
}
_MAIN_MODEL_PROVIDER_NAME = "MainModel"
# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
# Auto-discovered as fallback vision providers when their API key is configured.
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
_DISCOVERABLE_MODELS = [
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_5, "Moonshot"),
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
]
@dataclass
class VisionProvider:
"""A single Vision API provider configuration."""
name: str
api_key: str
api_base: str
extra_headers: dict = field(default_factory=dict)
model_override: Optional[str] = None
use_bot: bool = False # When True, call via bot.call_vision instead of raw HTTP
fallback_bot: Any = None # Bot instance for non-main-model providers
class VisionAPIError(Exception):
"""Raised when a Vision API call fails and should trigger fallback."""
pass
class Vision(BaseTool):
"""Analyze images using OpenAI-compatible Vision API"""
"""Analyze images using Vision API"""
name: str = "vision"
description: str = (
"Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
"Can describe content, extract text, identify objects, colors, etc. "
"Requires OPENAI_API_KEY or LINKAI_API_KEY."
)
params: dict = {
@@ -51,13 +90,6 @@ class Vision(BaseTool):
"type": "string",
"description": "Question to ask about the image",
},
"model": {
"type": "string",
"description": (
f"Vision model to use (default: {DEFAULT_MODEL}). "
"Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o"
),
},
},
"required": ["image", "question"],
}
@@ -67,29 +99,26 @@ class Vision(BaseTool):
@staticmethod
def is_available() -> bool:
return bool(
conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
or conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
)
return True
def execute(self, args: Dict[str, Any]) -> ToolResult:
image = args.get("image", "").strip()
question = args.get("question", "").strip()
model = args.get("model", DEFAULT_MODEL).strip() or DEFAULT_MODEL
if not image:
return ToolResult.fail("Error: 'image' parameter is required")
if not question:
return ToolResult.fail("Error: 'question' parameter is required")
api_key, api_base, extra_headers = self._resolve_provider()
if not api_key:
providers = self._resolve_providers()
if not providers:
return ToolResult.fail(
"Error: No API key configured for Vision.\n"
"Please configure one of the following using env_config tool:\n"
" 1. OPENAI_API_KEY (preferred): env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 2. LINKAI_API_KEY (fallback): env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")\n\n"
"Get your key at: https://platform.openai.com/api-keys or https://link-ai.tech"
"Error: No model available for Vision.\n"
"The main model does not support vision and no other API keys are configured.\n"
"Options:\n"
" 1. Switch to a multimodal model (e.g. qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
)
try:
@@ -97,36 +126,221 @@ class Vision(BaseTool):
except Exception as e:
return ToolResult.fail(f"Error: {e}")
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
question: str, image_content: dict) -> ToolResult:
"""Try each provider in order; fall back to the next one on failure."""
errors: List[str] = []
for i, provider in enumerate(providers):
use_model = provider.model_override or model
try:
logger.info(f"[Vision] Trying provider '{provider.name}' "
f"with model '{use_model}' ({i + 1}/{len(providers)})")
if provider.use_bot:
result = self._call_via_bot(use_model, question, image_content, provider)
else:
result = self._call_api(provider, use_model, question, image_content)
logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
return result
except VisionAPIError as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
except requests.Timeout:
errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
logger.warning(f"[Vision] Provider '{provider.name}' timed out")
except requests.ConnectionError:
errors.append(f"[{provider.name}/{use_model}] Connection failed")
logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
except Exception as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
return ToolResult.fail(
"Error: All Vision API providers failed.\n" + "\n".join(f" - {err}" for err in errors)
)
def _resolve_providers(self) -> List[VisionProvider]:
"""
Build an ordered list of available providers.
Priority:
- use_linkai=true → [LinkAI, MainModel, OtherModels…, OpenAI]
- default → [MainModel, OtherModels…, OpenAI, LinkAI]
"OtherModels" are auto-discovered from configured API keys.
The main model's bot_type is excluded from OtherModels to avoid
duplicating the MainModel provider.
"""
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
providers: List[VisionProvider] = []
if use_linkai:
self._append_provider(providers, self._build_linkai_provider)
self._append_provider(providers, self._build_main_model_provider)
self._append_other_model_providers(providers)
self._append_provider(providers, self._build_openai_provider)
else:
self._append_provider(providers, self._build_main_model_provider)
self._append_other_model_providers(providers)
self._append_provider(providers, self._build_openai_provider)
self._append_provider(providers, self._build_linkai_provider)
return providers
@staticmethod
def _append_provider(providers: List[VisionProvider], builder) -> None:
p = builder()
if p:
providers.append(p)
def _append_other_model_providers(self, providers: List[VisionProvider]) -> None:
"""
Auto-discover other models whose API key is configured.
Skip the main model's own bot_type (already covered by MainModel provider).
Skip bot_types that already have a provider in the list (e.g. OpenAI).
"""
# Determine main model's bot_type so we can skip it
main_bot_type = None
if self.model and hasattr(self.model, '_resolve_bot_type'):
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
existing_names = {p.name for p in providers}
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
if display_name in existing_names:
continue
if bot_type == main_bot_type:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
continue
# Create a bot instance and check if it supports call_vision
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
continue
except Exception:
continue
providers.append(VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=default_model,
use_bot=True,
fallback_bot=bot,
))
def _resolve_vision_model(self) -> Optional[str]:
"""
Determine which model to use for vision.
1. User explicit config: tool.vision.model in config.json
2. Fallback to the main configured model name
"""
tool_conf = conf().get("tool", {})
user_vision_model = tool_conf.get("vision", {}).get("model") if isinstance(tool_conf, dict) else None
if user_vision_model:
return user_vision_model
model_name = conf().get("model", "")
return model_name or None
def _build_main_model_provider(self) -> Optional[VisionProvider]:
"""
Use the vendor's own model for vision via bot.call_vision.
Only available when the bot class has call_vision.
"""
if not (self.model and hasattr(self.model, 'bot')):
return None
try:
return self._call_api(api_key, api_base, model, question, image_content, extra_headers)
except requests.Timeout:
return ToolResult.fail(f"Error: Vision API request timed out after {DEFAULT_TIMEOUT}s")
except requests.ConnectionError:
return ToolResult.fail("Error: Failed to connect to Vision API")
except Exception as e:
logger.error(f"[Vision] Unexpected error: {e}", exc_info=True)
return ToolResult.fail(f"Error: Vision API call failed - {e}")
bot = self.model.bot
if not hasattr(bot, 'call_vision'):
return None
except Exception:
return None
def _resolve_provider(self) -> Tuple[Optional[str], str, dict]:
"""Resolve API key, base URL and extra headers. Priority: conf() > env vars."""
vision_model = self._resolve_vision_model()
return VisionProvider(
name=_MAIN_MODEL_PROVIDER_NAME,
api_key="",
api_base="",
model_override=vision_model,
use_bot=True,
)
def _build_openai_provider(self) -> Optional[VisionProvider]:
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
if api_key:
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
return api_key, self._ensure_v1(api_base), {}
if not api_key:
return None
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
return VisionProvider(name="OpenAI", api_key=api_key, api_base=self._ensure_v1(api_base))
def _build_linkai_provider(self) -> Optional[VisionProvider]:
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
if api_key:
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
logger.debug("[Vision] Using LinkAI API (OPENAI_API_KEY not set)")
from common.utils import get_cloud_headers
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
return api_key, self._ensure_v1(api_base), extra
if not api_key:
return None
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
from common.utils import get_cloud_headers
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
return VisionProvider(name="LinkAI", api_key=api_key, api_base=self._ensure_v1(api_base),
extra_headers=extra)
return None, "", {}
def _call_via_bot(self, model: str, question: str, image_content: dict,
provider: Optional[VisionProvider] = None) -> ToolResult:
"""
Call a model's call_vision with vendor-native API format.
Uses the provider's _fallback_bot if set, otherwise the main model bot.
Raises VisionAPIError on failure so fallback can proceed.
"""
try:
bot = (provider and provider.fallback_bot) or self.model.bot
except Exception as e:
raise VisionAPIError(f"Cannot access bot: {e}")
# Extract the raw image URL from the OpenAI-format image_content block
image_url = image_content.get("image_url", {}).get("url", "")
if not image_url:
raise VisionAPIError("No image URL in content block")
try:
response = bot.call_vision(
image_url=image_url,
question=question,
model=model,
max_tokens=MAX_TOKENS,
)
except Exception as e:
raise VisionAPIError(f"call_vision failed: {e}")
if response is NotImplemented:
raise VisionAPIError("Bot does not support vision")
if isinstance(response, dict) and response.get("error"):
raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
content = response.get("content", "") if isinstance(response, dict) else ""
if not content:
raise VisionAPIError("Empty response from main model")
usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
# Use the actual model name from the bot response if available
actual_model = response.get("model", model) if isinstance(response, dict) else model
provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
return ToolResult.success({
"model": actual_model,
"provider": provider_name,
"content": content,
"usage": usage_info,
})
@staticmethod
def _ensure_v1(api_base: str) -> str:
@@ -139,9 +353,13 @@ class Vision(BaseTool):
return api_base.rstrip("/") + "/v1"
def _build_image_content(self, image: str) -> dict:
"""Build the image_url content block for the API request."""
"""
Build the image_url content block.
Both remote URLs and local files are converted to base64 data URLs
so every bot backend can consume them without extra downloads.
"""
if image.startswith(("http://", "https://")):
return {"type": "image_url", "image_url": {"url": image}}
return self._download_to_data_url(image)
if not os.path.isfile(image):
raise FileNotFoundError(f"Image file not found: {image}")
@@ -165,9 +383,22 @@ class Vision(BaseTool):
data_url = f"data:{mime_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _download_to_data_url(url: str) -> dict:
"""Download a remote image and return it as a base64 data URL."""
resp = requests.get(url, timeout=30)
if resp.status_code != 200:
raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
if not content_type.startswith("image/"):
content_type = "image/jpeg"
b64 = base64.b64encode(resp.content).decode("ascii")
data_url = f"data:{content_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _maybe_compress(path: str) -> str:
"""Compress image if larger than threshold; return path to use."""
"""Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
file_size = os.path.getsize(path)
if file_size <= COMPRESS_THRESHOLD:
return path
@@ -175,33 +406,58 @@ class Vision(BaseTool):
tmp = tempfile.NamedTemporaryFile(suffix=".jpg", delete=False)
tmp.close()
try:
# macOS: use sips
subprocess.run(
["sips", "-Z", "800", path, "--out", tmp.name],
capture_output=True, check=True,
)
logger.debug(f"[Vision] Compressed image ({file_size // 1024}KB -> {os.path.getsize(tmp.name) // 1024}KB)")
return tmp.name
except (FileNotFoundError, subprocess.CalledProcessError):
pass
def _try_sips(max_dim: str, quality: str) -> bool:
try:
subprocess.run(
["sips", "-Z", max_dim, "-s", "formatOptions", quality,
path, "--out", tmp.name],
capture_output=True, check=True,
)
return True
except (FileNotFoundError, subprocess.CalledProcessError):
return False
try:
# Linux: use ImageMagick convert
subprocess.run(
["convert", path, "-resize", "800x800>", tmp.name],
capture_output=True, check=True,
)
logger.debug(f"[Vision] Compressed image ({file_size // 1024}KB -> {os.path.getsize(tmp.name) // 1024}KB)")
def _try_convert(max_dim: str, quality: str) -> bool:
try:
subprocess.run(
["convert", path, "-resize", f"{max_dim}x{max_dim}>",
"-quality", quality, tmp.name],
capture_output=True, check=True,
)
return True
except (FileNotFoundError, subprocess.CalledProcessError):
return False
attempts = [
("1536", "85"),
("1536", "70"),
("1536", "50"),
]
for max_dim, quality in attempts:
ok = _try_sips(max_dim, quality) or _try_convert(max_dim, quality)
if not ok:
continue
new_size = os.path.getsize(tmp.name)
logger.debug(f"[Vision] Compressed image "
f"({file_size // 1024}KB -> {new_size // 1024}KB, "
f"max_dim={max_dim}, q={quality})")
if new_size <= COMPRESS_THRESHOLD:
return tmp.name
if os.path.exists(tmp.name) and os.path.getsize(tmp.name) > 0:
return tmp.name
except (FileNotFoundError, subprocess.CalledProcessError):
pass
os.remove(tmp.name)
return path
def _call_api(self, api_key: str, api_base: str, model: str,
question: str, image_content: dict, extra_headers: dict = None) -> ToolResult:
def _call_api(self, provider: VisionProvider, model: str,
question: str, image_content: dict) -> ToolResult:
"""
Call a single provider's Vision API.
Raises VisionAPIError on recoverable failures so the caller can try
the next provider.
"""
payload = {
"model": model,
"messages": [
@@ -213,34 +469,29 @@ class Vision(BaseTool):
],
}
],
"max_tokens": MAX_TOKENS,
}
headers = {
"Authorization": f"Bearer {api_key}",
"Authorization": f"Bearer {provider.api_key}",
"Content-Type": "application/json",
**(extra_headers or {}),
**provider.extra_headers,
}
resp = requests.post(
f"{api_base}/chat/completions",
f"{provider.api_base}/chat/completions",
headers=headers,
json=payload,
timeout=DEFAULT_TIMEOUT,
)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid API key. Please check your configuration.")
if resp.status_code == 429:
return ToolResult.fail("Error: API rate limit reached. Please try again later.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: Vision API returned HTTP {resp.status_code}: {resp.text[:200]}")
raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
if "error" in data:
msg = data["error"].get("message", "Unknown API error")
return ToolResult.fail(f"Error: Vision API error - {msg}")
raise VisionAPIError(f"API error - {msg}")
content = ""
choices = data.get("choices", [])
@@ -250,6 +501,7 @@ class Vision(BaseTool):
usage = data.get("usage", {})
result = {
"model": model,
"provider": provider.name,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),

View File

@@ -67,14 +67,14 @@ class AgentLLMModel(LLMModel):
_MODEL_BOT_TYPE_MAP = {
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
const.MODELSCOPE: const.MODELSCOPE,
}
_MODEL_PREFIX_MAP = [
("qwen", const.QWEN_DASHSCOPE), ("qwq", const.QWEN_DASHSCOPE), ("qvq", const.QWEN_DASHSCOPE),
("gemini", const.GEMINI), ("glm", const.ZHIPU_AI), ("claude", const.CLAUDEAPI),
("moonshot", const.MOONSHOT), ("kimi", const.MOONSHOT),
("doubao", const.DOUBAO),
("doubao", const.DOUBAO), ("deepseek", const.DEEPSEEK),
]
def __init__(self, bridge: Bridge, bot_type: str = "chat"):
@@ -115,8 +115,8 @@ class AgentLLMModel(LLMModel):
return const.QWEN_DASHSCOPE
if model_name in [const.MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k"]:
return const.MOONSHOT
if model_name in [const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER]:
return const.OPENAI
if conf().get("bot_type") == "modelscope":
return const.MODELSCOPE
for prefix, btype in self._MODEL_PREFIX_MAP:
if model_name.startswith(prefix):
return btype
@@ -124,14 +124,15 @@ class AgentLLMModel(LLMModel):
@property
def bot(self):
"""Lazy load the bot, re-create when model changes"""
"""Lazy load the bot, re-create when model or bot_type changes"""
from models.bot_factory import create_bot
cur_model = self.model
if self._bot is None or self._bot_model != cur_model:
bot_type = self._resolve_bot_type(cur_model)
self._bot = create_bot(bot_type)
cur_bot_type = self._resolve_bot_type(cur_model)
if self._bot is None or self._bot_model != cur_model or getattr(self, '_bot_type', None) != cur_bot_type:
self._bot = create_bot(cur_bot_type)
self._bot = add_openai_compatible_support(self._bot)
self._bot_model = cur_model
self._bot_type = cur_bot_type
return self._bot
def call(self, request: LLMRequest):
@@ -273,10 +274,13 @@ class AgentBridge:
tool_manager.load_tools()
tools = []
workspace_dir = kwargs.get("workspace_dir")
for tool_name in tool_manager.tool_classes.keys():
try:
tool = tool_manager.create_tool(tool_name)
if tool:
if workspace_dir and hasattr(tool, 'cwd'):
tool.cwd = workspace_dir
tools.append(tool)
except Exception as e:
logger.warning(f"[AgentBridge] Failed to load tool {tool_name}: {e}")
@@ -495,22 +499,26 @@ class AgentBridge:
reply.text_content = text_response
return reply
# For other unknown file types, return text with file info
message = text_response or file_info.get("message", "文件已准备")
message += f"\n\n[文件: {file_info.get('file_name', file_path)}]"
return Reply(ReplyType.TEXT, message)
# For all other file types (tar.gz, zip, etc.), also use FILE type
file_url = f"file://{file_path}"
logger.info(f"[AgentBridge] Sending generic file: {file_url}")
reply = Reply(ReplyType.FILE, file_url)
reply.file_name = file_info.get("file_name", os.path.basename(file_path))
if text_response:
reply.text_content = text_response
return reply
def _migrate_config_to_env(self, workspace_root: str):
"""
Migrate API keys from config.json to .env file if not already set
Sync API keys from config.json to .env file.
Adds new keys and updates changed values on each startup.
Args:
workspace_root: Workspace directory path (not used, kept for compatibility)
"""
from config import conf
import os
# Mapping from config.json keys to environment variable names
key_mapping = {
"open_ai_api_key": "OPENAI_API_KEY",
"open_ai_api_base": "OPENAI_API_BASE",
@@ -519,10 +527,9 @@ class AgentBridge:
"linkai_api_key": "LINKAI_API_KEY",
}
# Use fixed secure location for .env file
env_file = expand_path("~/.cow/.env")
# Read existing env vars from .env file
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
@@ -530,48 +537,46 @@ class AgentBridge:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, _ = line.split('=', 1)
existing_env_vars[key.strip()] = True
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentBridge] Failed to read .env file: {e}")
# Check which keys need to be migrated
keys_to_migrate = {}
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
# Skip if already in .env file
if env_key in existing_env_vars:
continue
# Get value from config.json
value = conf().get(config_key, "")
if value and value.strip(): # Only migrate non-empty values
keys_to_migrate[env_key] = value.strip()
# Log summary if there are keys to skip
if existing_env_vars:
logger.debug(f"[AgentBridge] {len(existing_env_vars)} env vars already in .env")
# Write new keys to .env file
if keys_to_migrate:
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
updated = True
if updated:
try:
# Ensure ~/.cow directory and .env file exist
env_dir = os.path.dirname(env_file)
if not os.path.exists(env_dir):
os.makedirs(env_dir, exist_ok=True)
if not os.path.exists(env_file):
open(env_file, 'a').close()
# Append new keys
with open(env_file, 'a', encoding='utf-8') as f:
f.write('\n# Auto-migrated from config.json\n')
for key, value in keys_to_migrate.items():
os.makedirs(env_dir, exist_ok=True)
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
# Also set in current process
os.environ[key] = value
logger.info(f"[AgentBridge] Migrated {len(keys_to_migrate)} API keys from config.json to .env: {list(keys_to_migrate.keys())}")
logger.info(f"[AgentBridge] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentBridge] Failed to migrate API keys: {e}")
logger.warning(f"[AgentBridge] Failed to sync API keys: {e}")
def _persist_messages(
self, session_id: str, new_messages: list, channel_type: str = ""

View File

@@ -26,8 +26,7 @@ class AgentEventHandler:
if context:
self.channel = context.kwargs.get("channel") if hasattr(context, "kwargs") else None
# Track current thinking for channel output
self.current_thinking = ""
self.current_content = ""
self.turn_number = 0
def handle_event(self, event):
@@ -47,6 +46,8 @@ class AgentEventHandler:
self._handle_message_update(data)
elif event_type == "message_end":
self._handle_message_end(data)
elif event_type == "reasoning_update":
pass
elif event_type == "tool_execution_start":
self._handle_tool_execution_start(data)
elif event_type == "tool_execution_end":
@@ -59,30 +60,26 @@ class AgentEventHandler:
def _handle_turn_start(self, data):
"""Handle turn start event"""
self.turn_number = data.get("turn", 0)
self.has_tool_calls_in_turn = False
self.current_thinking = ""
self.current_content = ""
def _handle_message_update(self, data):
"""Handle message update event (streaming text)"""
"""Handle message update event (streaming content text)"""
delta = data.get("delta", "")
self.current_thinking += delta
self.current_content += delta
def _handle_message_end(self, data):
"""Handle message end event"""
tool_calls = data.get("tool_calls", [])
# Only send thinking process if followed by tool calls
if tool_calls:
if self.current_thinking.strip():
logger.info(f"💭 {self.current_thinking.strip()[:200]}{'...' if len(self.current_thinking) > 200 else ''}")
# Send thinking process to channel
self._send_to_channel(f"{self.current_thinking.strip()}")
if self.current_content.strip():
logger.info(f"💭 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
self._send_to_channel(self.current_content.strip())
else:
# No tool calls = final response (logged at agent_stream level)
if self.current_thinking.strip():
logger.debug(f"💬 {self.current_thinking.strip()[:200]}{'...' if len(self.current_thinking) > 200 else ''}")
if self.current_content.strip():
logger.debug(f"💬 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
self.current_thinking = ""
self.current_content = ""
def _handle_tool_execution_start(self, data):
"""Handle tool execution start event - logged by agent_stream.py"""

View File

@@ -366,7 +366,7 @@ class AgentInitializer:
if tool:
# Apply workspace config to file operation tools
if tool_name in ['read', 'write', 'edit', 'bash', 'grep', 'find', 'ls', 'web_fetch']:
if tool_name in ['read', 'write', 'edit', 'bash', 'grep', 'find', 'ls', 'web_fetch', 'send', 'browser']:
tool.config = file_config
tool.cwd = file_config.get("cwd", getattr(tool, 'cwd', None))
if 'memory_manager' in file_config:
@@ -465,8 +465,12 @@ class AgentInitializer:
'timezone': timezone_name
}
def get_model():
"""Get current model name dynamically from config"""
return conf().get("model", "unknown")
return {
"model": conf().get("model", "unknown"),
"_get_model": get_model,
"workspace": workspace_root,
"channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
"_get_current_time": get_current_time # Dynamic time function
@@ -486,7 +490,7 @@ class AgentInitializer:
env_file = expand_path("~/.cow/.env")
# Read existing env vars
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
@@ -494,38 +498,46 @@ class AgentInitializer:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, _ = line.split('=', 1)
existing_env_vars[key.strip()] = True
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
# Check which keys need migration
keys_to_migrate = {}
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
if env_key in existing_env_vars:
continue
value = conf().get(config_key, "")
if value and value.strip():
keys_to_migrate[env_key] = value.strip()
# Write new keys
if keys_to_migrate:
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
if updated:
try:
env_dir = os.path.dirname(env_file)
if not os.path.exists(env_dir):
os.makedirs(env_dir, exist_ok=True)
if not os.path.exists(env_file):
open(env_file, 'a').close()
with open(env_file, 'a', encoding='utf-8') as f:
f.write('\n# Auto-migrated from config.json\n')
for key, value in keys_to_migrate.items():
os.makedirs(env_dir, exist_ok=True)
# Rewrite the entire .env file to ensure consistency
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
os.environ[key] = value
logger.info(f"[AgentInitializer] Migrated {len(keys_to_migrate)} API keys to .env: {list(keys_to_migrate.keys())}")
logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to migrate API keys: {e}")
logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")
def _start_daily_flush_timer(self):
"""Start a background thread that flushes all agents' memory daily at 23:55."""

View File

@@ -39,11 +39,8 @@ class Bridge(object):
self.btype["chat"] = const.BAIDU
if model_type in ["xunfei"]:
self.btype["chat"] = const.XUNFEI
if model_type in [const.QWEN]:
self.btype["chat"] = const.QWEN
if model_type in [const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
self.btype["chat"] = const.QWEN_DASHSCOPE
# Support Qwen3 and other DashScope models
if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
self.btype["chat"] = const.QWEN_DASHSCOPE
if model_type and model_type.startswith("gemini"):
@@ -61,6 +58,9 @@ class Bridge(object):
if model_type and model_type.startswith("doubao"):
self.btype["chat"] = const.DOUBAO
if model_type and model_type.startswith("deepseek"):
self.btype["chat"] = const.DEEPSEEK
if model_type in [const.MODELSCOPE]:
self.btype["chat"] = const.MODELSCOPE

View File

@@ -347,38 +347,30 @@ class ChatChannel(Channel):
if media_items:
logger.info(f"[chat_channel] Extracted {len(media_items)} media item(s) from reply")
# 先发送文本(保持原文本不变)
# Send text first (the frontend will embed video players via renderMarkdown).
logger.info(f"[chat_channel] Sending text content before media: {reply.content[:100]}...")
self._send(reply, context)
logger.info(f"[chat_channel] Text sent, now sending {len(media_items)} media item(s)")
# 然后逐个发送媒体文件
for i, (url, media_type) in enumerate(media_items):
try:
# 判断是本地文件还是URL
# Determine whether it is a remote URL or a local file.
if url.startswith(('http://', 'https://')):
# 网络资源
if media_type == 'video':
# 视频使用 FILE 类型发送
media_reply = Reply(ReplyType.FILE, url)
media_reply.file_name = os.path.basename(url)
else:
# 图片使用 IMAGE_URL 类型
media_reply = Reply(ReplyType.IMAGE_URL, url)
elif os.path.exists(url):
# 本地文件
if media_type == 'video':
# 视频使用 FILE 类型,转换为 file:// URL
media_reply = Reply(ReplyType.FILE, f"file://{url}")
media_reply.file_name = os.path.basename(url)
else:
# 图片使用 IMAGE_URL 类型,转换为 file:// URL
media_reply = Reply(ReplyType.IMAGE_URL, f"file://{url}")
else:
logger.warning(f"[chat_channel] Media file not found or invalid URL: {url}")
continue
# 发送媒体文件(添加小延迟避免频率限制)
if i > 0:
time.sleep(0.5)
self._send(media_reply, context)

View File

@@ -110,6 +110,11 @@
<i class="fas fa-brain item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_memory">Memory</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="knowledge">
<i class="fas fa-book item-icon text-xs w-5 text-center"></i>
<span data-i18n="menu_knowledge">Knowledge</span>
</a>
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
data-view="channels">
<i class="fas fa-tower-broadcast item-icon text-xs w-5 text-center"></i>
@@ -166,8 +171,8 @@
<i class="fas fa-bars text-slate-600 dark:text-slate-300"></i>
</button>
<!-- Breadcrumb -->
<div class="flex items-center gap-2 text-sm min-w-0">
<!-- Breadcrumb (hidden on mobile) -->
<div class="hidden lg:flex items-center gap-2 text-sm min-w-0">
<span id="breadcrumb-group" class="text-slate-400 dark:text-slate-500 truncate" data-i18n="nav_chat">Chat</span>
<i class="fas fa-chevron-right text-[10px] text-slate-300 dark:text-slate-600"></i>
<span id="breadcrumb-page" class="font-medium text-slate-700 dark:text-slate-200 truncate" data-i18n="menu_chat">Chat</span>
@@ -270,7 +275,7 @@
<div class="max-w-3xl mx-auto">
<!-- Attachment preview bar -->
<div id="attachment-preview" class="attachment-preview hidden"></div>
<div class="flex items-center gap-2">
<div class="flex items-center gap-2 relative">
<div class="flex items-center flex-shrink-0">
<button id="new-chat-btn" class="w-9 h-10 flex items-center justify-center rounded-lg
text-slate-400 hover:text-primary-500 hover:bg-primary-50 dark:hover:bg-primary-900/20
@@ -287,6 +292,7 @@
</div>
<input type="file" id="file-input" class="hidden" multiple
accept="image/*,.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.txt,.csv,.json,.xml,.zip,.rar,.7z,.py,.js,.ts,.java,.c,.cpp,.go,.rs,.md">
<div id="slash-menu" class="slash-menu hidden"></div>
<textarea id="chat-input"
class="flex-1 min-w-0 px-4 py-[10px] rounded-xl border border-slate-200 dark:border-slate-600
bg-slate-50 dark:bg-white/5 text-slate-800 dark:text-slate-100
@@ -295,7 +301,7 @@
text-sm leading-relaxed"
rows="1"
data-i18n-placeholder="input_placeholder"
placeholder="Type a message..."></textarea>
placeholder="Type a message, or press / for commands"></textarea>
<button id="send-btn"
class="flex-shrink-0 w-10 h-10 flex items-center justify-center rounded-lg
bg-primary-400 text-white hover:bg-primary-500
@@ -454,6 +460,11 @@
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="skills_title">Skills</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="skills_desc">View, enable, or disable agent skills</p>
</div>
<a href="https://skills.cowagent.ai/" target="_blank"
class="inline-flex items-center gap-1.5 px-3 py-1.5 rounded-lg text-xs font-medium text-primary-500 bg-primary-50 dark:bg-primary-900/20 hover:bg-primary-100 dark:hover:bg-primary-900/30 transition-colors">
<i class="fas fa-puzzle-piece text-[10px]"></i>
<span data-i18n="skills_hub_btn">Skill Hub</span>
</a>
</div>
<!-- Built-in Tools Section -->
@@ -552,6 +563,106 @@
</div>
</div>
<!-- ====================================================== -->
<!-- VIEW: Knowledge -->
<!-- ====================================================== -->
<div id="view-knowledge" class="view">
<div class="flex-1 overflow-y-auto p-4 md:p-8 lg:p-10">
<div class="w-full max-w-[1600px] mx-auto">
<!-- Header -->
<div class="flex flex-col sm:flex-row sm:items-center justify-between gap-3 mb-4 md:mb-6">
<div>
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="knowledge_title">Knowledge</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="knowledge_desc">Browse and explore your knowledge base</p>
</div>
<div class="flex items-center gap-2">
<span id="knowledge-stats" class="text-xs text-slate-400 dark:text-slate-500 hidden sm:inline"></span>
<div class="flex items-center bg-slate-100 dark:bg-white/10 rounded-lg p-0.5">
<button id="knowledge-tab-docs" onclick="switchKnowledgeTab('docs')"
class="knowledge-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150 active">
<i class="fas fa-folder-tree mr-1.5"></i><span data-i18n="knowledge_tab_docs">Documents</span>
</button>
<button id="knowledge-tab-graph" onclick="switchKnowledgeTab('graph')"
class="knowledge-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150">
<i class="fas fa-diagram-project mr-1.5"></i><span data-i18n="knowledge_tab_graph">Graph</span>
</button>
</div>
</div>
</div>
<!-- Empty state -->
<div id="knowledge-empty" class="flex flex-col items-center justify-center py-20">
<div class="w-16 h-16 rounded-2xl bg-emerald-50 dark:bg-emerald-900/20 flex items-center justify-center mb-4">
<i class="fas fa-book text-emerald-400 text-xl"></i>
</div>
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="knowledge_loading">Loading knowledge base...</p>
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="knowledge_loading_desc">Knowledge pages will be displayed here</p>
<div id="knowledge-empty-guide" class="hidden mt-6 max-w-sm text-center">
<p class="text-sm text-slate-500 dark:text-slate-400 mb-4" data-i18n="knowledge_empty_guide">Send documents, links or topics to the agent in chat, and it will automatically organize them into your knowledge base.</p>
<button onclick="navigateTo('chat')"
class="inline-flex items-center gap-2 px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600
text-white text-sm font-medium cursor-pointer transition-colors duration-150">
<i class="fas fa-message text-xs"></i>
<span data-i18n="knowledge_go_chat">Start a conversation</span>
</button>
</div>
</div>
<!-- Documents panel -->
<div id="knowledge-panel-docs" class="hidden">
<div class="flex flex-col md:flex-row gap-4 md:gap-6" style="min-height: calc(100vh - 220px)">
<!-- File tree -->
<div id="knowledge-sidebar" class="w-full md:w-72 lg:w-80 flex-shrink-0">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<div class="px-4 py-3 border-b border-slate-200 dark:border-white/10">
<div class="relative">
<i class="fas fa-search absolute left-3 top-1/2 -translate-y-1/2 text-slate-400 text-xs"></i>
<input id="knowledge-search" type="text" placeholder="Search..."
class="w-full pl-8 pr-3 py-1.5 text-xs bg-slate-50 dark:bg-white/5 border border-slate-200 dark:border-white/10 rounded-lg text-slate-700 dark:text-slate-200 placeholder-slate-400 dark:placeholder-slate-500 focus:outline-none focus:ring-1 focus:ring-primary-400/50"
oninput="filterKnowledgeTree(this.value)">
</div>
</div>
<div id="knowledge-tree" class="p-2 overflow-y-auto max-h-[50vh] md:max-h-[calc(100vh-300px)]"></div>
</div>
</div>
<!-- Content viewer -->
<div class="flex-1 min-w-0">
<div id="knowledge-content-placeholder"
class="flex flex-col items-center justify-center py-20 text-slate-400 dark:text-slate-500"
<i class="fas fa-file-lines text-3xl mb-3 opacity-40"></i>
<p class="text-sm" data-i18n="knowledge_select_hint">Select a document to view</p>
</div>
<div id="knowledge-content-viewer" class="hidden">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<div class="flex items-center gap-3 px-4 md:px-5 py-3 border-b border-slate-200 dark:border-white/10">
<button onclick="knowledgeMobileBack()" class="md:hidden p-1 -ml-1 text-slate-400 hover:text-slate-600 dark:hover:text-slate-300 cursor-pointer">
<i class="fas fa-arrow-left text-xs"></i>
</button>
<i class="fas fa-file-lines text-slate-400 text-sm hidden md:inline"></i>
<span id="knowledge-viewer-title" class="text-sm font-medium text-slate-700 dark:text-slate-200 truncate"></span>
<span id="knowledge-viewer-path" class="text-xs text-slate-400 dark:text-slate-500 ml-auto font-mono truncate hidden md:inline"></span>
</div>
<div id="knowledge-viewer-body"
class="p-4 md:p-5 overflow-y-auto text-sm msg-content text-slate-700 dark:text-slate-200"
style="max-height: calc(100vh - 280px)"></div>
</div>
</div>
</div>
</div>
</div>
<!-- Graph panel -->
<div id="knowledge-panel-graph" class="hidden">
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
<div id="knowledge-graph-container" class="w-full h-[60vh] md:h-[calc(100vh-220px)]"></div>
</div>
</div>
</div>
</div>
</div>
<!-- ====================================================== -->
<!-- VIEW: Channels -->
<!-- ====================================================== -->
@@ -664,6 +775,7 @@
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
<script src="assets/js/console.js"></script>
</body>
</html>

View File

@@ -45,7 +45,8 @@
.msg-content h1 { font-size: 1.4em; }
.msg-content h2 { font-size: 1.25em; }
.msg-content h3 { font-size: 1.1em; }
.msg-content ul, .msg-content ol { margin: 0.5em 0; padding-left: 1.8em; }
.msg-content ul { margin: 0.5em 0; padding-left: 1.8em; list-style: disc; }
.msg-content ol { margin: 0.5em 0; padding-left: 1.8em; list-style: decimal; }
.msg-content li { margin: 0.25em 0; }
.msg-content pre {
border-radius: 8px; overflow-x: auto; margin: 0.8em 0;
@@ -79,6 +80,11 @@
.msg-content img { max-width: 100%; height: auto; border-radius: 8px; margin: 0.5em 0; }
.msg-content a { color: #35A85B; text-decoration: underline; }
.msg-content a:hover { color: #228547; }
/* Overrides for user bubble (white text on green bg) */
.user-bubble.msg-content a { color: #ffffff !important; text-decoration: underline; text-decoration-color: rgba(255,255,255,0.6); }
.user-bubble.msg-content a:hover { color: #e0f5e8 !important; text-decoration-color: #e0f5e8; }
.user-bubble.msg-content :not(pre) > code { background: rgba(255,255,255,0.2); color: #ffffff; }
.msg-content hr { border: none; height: 1px; background: #e2e8f0; margin: 1.2em 0; }
.dark .msg-content hr { background: rgba(255,255,255,0.1); }
@@ -141,7 +147,7 @@
font-size: 0.75rem;
line-height: 1.5;
color: #94a3b8;
max-height: 200px;
max-height: 300px;
overflow-y: auto;
}
.dark .agent-thinking-step .thinking-full {
@@ -153,6 +159,20 @@
.agent-thinking-step .thinking-full p:first-child { margin-top: 0; }
.agent-thinking-step .thinking-full p:last-child { margin-bottom: 0; }
/* Content step - real text output frozen before tool calls */
.agent-content-step {
font-size: 0.875rem;
line-height: 1.6;
color: inherit;
margin-bottom: 0.5rem;
padding-bottom: 0.5rem;
border-bottom: 1px dashed rgba(0, 0, 0, 0.06);
}
.dark .agent-content-step { border-bottom-color: rgba(255, 255, 255, 0.06); }
.agent-content-step .agent-content-body p { margin: 0.25em 0; }
.agent-content-step .agent-content-body p:first-child { margin-top: 0; }
.agent-content-step .agent-content-body p:last-child { margin-bottom: 0; }
/* Tool step - collapsible */
.agent-tool-step .tool-header {
display: flex;
@@ -446,3 +466,226 @@
transform: translateY(-2px);
box-shadow: 0 8px 25px -5px rgba(0, 0, 0, 0.1);
}
/* Slash Command Menu */
.slash-menu {
position: absolute;
bottom: calc(100% + 6px);
left: 0;
right: 0;
max-height: 320px;
overflow-y: auto;
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 12px;
box-shadow: 0 8px 30px -6px rgba(0, 0, 0, 0.1), 0 2px 8px -2px rgba(0, 0, 0, 0.04);
z-index: 50;
padding: 4px;
animation: slashMenuIn 0.15s ease-out;
}
.slash-menu.hidden { display: none; }
@keyframes slashMenuIn {
from { opacity: 0; transform: translateY(6px); }
to { opacity: 1; transform: translateY(0); }
}
.slash-menu-header {
padding: 6px 10px 4px;
font-size: 11px;
font-weight: 600;
color: #94a3b8;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.slash-menu-item {
display: flex;
align-items: center;
justify-content: space-between;
padding: 8px 10px;
border-radius: 8px;
cursor: pointer;
transition: background 0.12s ease;
}
.slash-menu-item:hover,
.slash-menu-item.active {
background: #EDFDF3;
}
.slash-menu-item .cmd {
font-size: 13px;
font-weight: 500;
color: #334155;
font-family: ui-monospace, SFMono-Regular, 'SF Mono', Menlo, monospace;
}
.slash-menu-item.active .cmd {
color: #228547;
}
.slash-menu-item .desc {
font-size: 12px;
color: #94a3b8;
margin-left: 12px;
white-space: nowrap;
}
/* Dark mode */
.dark .slash-menu {
background: #1A1A1A;
border-color: rgba(255, 255, 255, 0.1);
box-shadow: 0 8px 30px -6px rgba(0, 0, 0, 0.35), 0 2px 8px -2px rgba(0, 0, 0, 0.15);
}
.dark .slash-menu-header {
color: #64748b;
}
.dark .slash-menu-item:hover,
.dark .slash-menu-item.active {
background: rgba(74, 190, 110, 0.1);
}
.dark .slash-menu-item .cmd {
color: #e2e8f0;
}
.dark .slash-menu-item.active .cmd {
color: #4ABE6E;
}
.dark .slash-menu-item .desc {
color: #64748b;
}
/* ============================================================
Knowledge View
============================================================ */
/* Tab toggle */
.knowledge-tab {
color: #64748b;
}
.knowledge-tab.active {
background: #fff;
color: #334155;
box-shadow: 0 1px 3px rgba(0,0,0,0.08);
}
.dark .knowledge-tab.active {
background: rgba(255,255,255,0.1);
color: #e2e8f0;
}
/* File tree */
.knowledge-tree-group {
margin-bottom: 2px;
}
.knowledge-tree-group-btn {
display: flex;
align-items: center;
gap: 6px;
width: 100%;
padding: 6px 8px;
border-radius: 6px;
font-size: 12px;
font-weight: 600;
color: #64748b;
cursor: pointer;
border: none;
background: none;
transition: background 0.15s, color 0.15s;
text-transform: capitalize;
}
.knowledge-tree-group-btn:hover {
background: rgba(0,0,0,0.04);
color: #334155;
}
.dark .knowledge-tree-group-btn:hover {
background: rgba(255,255,255,0.06);
color: #e2e8f0;
}
.knowledge-tree-group-btn i.chevron {
font-size: 8px;
transition: transform 0.15s;
}
.knowledge-tree-group.open .chevron {
transform: rotate(90deg);
}
.knowledge-tree-group-items {
display: none;
}
.knowledge-tree-group.open .knowledge-tree-group-items {
display: block;
}
.knowledge-tree-file {
display: flex;
align-items: center;
gap: 6px;
padding: 5px 8px 5px 24px;
border-radius: 6px;
font-size: 12px;
color: #64748b;
cursor: pointer;
border: none;
background: none;
width: 100%;
text-align: left;
transition: background 0.15s, color 0.15s;
white-space: nowrap;
overflow: hidden;
text-overflow: ellipsis;
}
.knowledge-tree-file:hover {
background: rgba(0,0,0,0.04);
color: #334155;
}
.knowledge-tree-file.active {
background: #EDFDF3;
color: #228547;
}
.dark .knowledge-tree-file:hover {
background: rgba(255,255,255,0.06);
color: #e2e8f0;
}
.dark .knowledge-tree-file.active {
background: rgba(74, 190, 110, 0.1);
color: #4ABE6E;
}
/* Graph legend */
.knowledge-graph-legend {
position: absolute;
top: 12px;
right: 12px;
display: flex;
flex-wrap: wrap;
gap: 8px;
font-size: 11px;
color: #64748b;
z-index: 10;
}
.knowledge-graph-legend-item {
display: flex;
align-items: center;
gap: 4px;
}
.knowledge-graph-legend-dot {
width: 8px;
height: 8px;
border-radius: 50%;
}
/* Graph tooltip */
.knowledge-graph-tooltip {
position: absolute;
padding: 6px 10px;
background: #fff;
border: 1px solid #e2e8f0;
border-radius: 8px;
font-size: 12px;
color: #334155;
box-shadow: 0 4px 12px rgba(0,0,0,0.08);
pointer-events: none;
opacity: 0;
transition: opacity 0.15s;
z-index: 20;
}
.dark .knowledge-graph-tooltip {
background: #1A1A1A;
border-color: rgba(255,255,255,0.1);
color: #e2e8f0;
}

File diff suppressed because it is too large Load Diff

View File

@@ -96,9 +96,43 @@ class WebChannel(ChatChannel):
logger.error(f"No session_id found for request {request_id}")
return
# SSE mode: push done event to SSE queue
# SSE mode: push events to SSE queue
if request_id in self.sse_queues:
content = reply.content if reply.content is not None else ""
# Intermediate status lines (e.g. /install-browser phases) must NOT use "done",
# or the frontend closes EventSource and drops subsequent events.
if getattr(reply, "sse_phase", False):
self.sse_queues[request_id].put({
"type": "phase",
"content": content,
"request_id": request_id,
"timestamp": time.time(),
})
logger.debug(f"SSE phase for request {request_id}")
return
# Files are already pushed via on_event (file_to_send) during agent execution.
# Skip duplicate file pushes here; just let the done event through.
if reply.type in (ReplyType.IMAGE_URL, ReplyType.FILE) and content.startswith("file://"):
text_content = getattr(reply, 'text_content', '')
if text_content:
self.sse_queues[request_id].put({
"type": "done",
"content": text_content,
"request_id": request_id,
"timestamp": time.time()
})
logger.debug(f"SSE skipped duplicate file for request {request_id}")
return
# Skip http-URL FILE/IMAGE_URL replies produced by chat_channel's media extraction:
# the text reply (already sent as "done") contains the URL and the frontend will
# render it via renderMarkdown/injectVideoPlayers, so no separate SSE event needed.
if reply.type in (ReplyType.FILE, ReplyType.IMAGE_URL) and content.startswith(("http://", "https://")):
logger.debug(f"SSE skipped http media reply for request {request_id}")
return
self.sse_queues[request_id].put({
"type": "done",
"content": content,
@@ -134,7 +168,12 @@ class WebChannel(ChatChannel):
event_type = event.get("type")
data = event.get("data", {})
if event_type == "message_update":
if event_type == "reasoning_update":
delta = data.get("delta", "")
if delta:
q.put({"type": "reasoning", "content": delta})
elif event_type == "message_update":
delta = data.get("delta", "")
if delta:
q.put({"type": "delta", "content": delta})
@@ -161,6 +200,24 @@ class WebChannel(ChatChannel):
"execution_time": round(exec_time, 2)
})
elif event_type == "message_end":
tool_calls = data.get("tool_calls", [])
if tool_calls:
q.put({"type": "message_end", "has_tool_calls": True})
elif event_type == "file_to_send":
file_path = data.get("path", "")
file_name = data.get("file_name", os.path.basename(file_path))
file_type = data.get("file_type", "file")
from urllib.parse import quote
web_url = f"/api/file?path={quote(file_path)}"
is_image = file_type == "image"
q.put({
"type": "image" if is_image else "file",
"content": web_url,
"file_name": file_name,
})
return on_event
def upload_file(self):
@@ -282,14 +339,18 @@ class WebChannel(ChatChannel):
"""
SSE generator for a given request_id.
Yields UTF-8 encoded bytes to avoid WSGI Latin-1 mangling.
Supports client reconnection: the queue is only removed after a
"done" event is consumed, so a new GET /stream with the same
request_id can resume reading remaining events.
"""
if request_id not in self.sse_queues:
yield b"data: {\"type\": \"error\", \"message\": \"invalid request_id\"}\n\n"
return
q = self.sse_queues[request_id]
timeout = 300 # 5 minutes max
deadline = time.time() + timeout
idle_timeout = 600 # 10 minutes without any real event
deadline = time.time() + idle_timeout
done = False
try:
while time.time() < deadline:
@@ -299,13 +360,18 @@ class WebChannel(ChatChannel):
yield b": keepalive\n\n"
continue
# Real event received, reset idle deadline
deadline = time.time() + idle_timeout
payload = json.dumps(item, ensure_ascii=False)
yield f"data: {payload}\n\n".encode("utf-8")
if item.get("type") == "done":
done = True
break
finally:
self.sse_queues.pop(request_id, None)
if done:
self.sse_queues.pop(request_id, None)
def poll_response(self):
"""
@@ -377,6 +443,7 @@ class WebChannel(ChatChannel):
'/message', 'MessageHandler',
'/upload', 'UploadHandler',
'/uploads/(.*)', 'UploadsHandler',
'/api/file', 'FileServeHandler',
'/poll', 'PollHandler',
'/stream', 'StreamHandler',
'/chat', 'ChatHandler',
@@ -387,9 +454,13 @@ class WebChannel(ChatChannel):
'/api/skills', 'SkillsHandler',
'/api/memory', 'MemoryHandler',
'/api/memory/content', 'MemoryContentHandler',
'/api/knowledge/list', 'KnowledgeListHandler',
'/api/knowledge/read', 'KnowledgeReadHandler',
'/api/knowledge/graph', 'KnowledgeGraphHandler',
'/api/scheduler', 'SchedulerHandler',
'/api/history', 'HistoryHandler',
'/api/logs', 'LogsHandler',
'/api/version', 'VersionHandler',
'/assets/(.*)', 'AssetsHandler',
)
app = web.application(urls, globals(), autoreload=False)
@@ -405,8 +476,14 @@ class WebChannel(ChatChannel):
func = web.httpserver.StaticMiddleware(app.wsgifunc())
func = web.httpserver.LogMiddleware(func)
server = web.httpserver.WSGIServer(("0.0.0.0", port), func)
# Allow concurrent requests by not blocking on in-flight handler threads
server.daemon_threads = True
# Default request_queue_size(5) / timeout(10s) / numthreads(10) are
# too small: when SSE streams occupy many threads, the backlog fills
# and new connections get refused (ERR_CONNECTION_ABORTED).
server.request_queue_size = 128
server.timeout = 300
server.requests.min = 20
server.requests.max = 80
self._http_server = server
try:
server.start()
@@ -462,6 +539,32 @@ class UploadsHandler:
raise web.notfound()
class FileServeHandler:
def GET(self):
"""Serve a local file by absolute path (for agent send tool)."""
try:
params = web.input(path="")
file_path = params.path
if not file_path or not os.path.isabs(file_path):
raise web.notfound()
file_path = os.path.normpath(file_path)
if not os.path.isfile(file_path):
raise web.notfound()
content_type = mimetypes.guess_type(file_path)[0] or "application/octet-stream"
file_name = os.path.basename(file_path)
from urllib.parse import quote
web.header('Content-Type', content_type)
web.header('Content-Disposition', f"inline; filename*=UTF-8''{quote(file_name)}")
web.header('Cache-Control', 'public, max-age=3600')
with open(file_path, 'rb') as f:
return f.read()
except web.HTTPError:
raise
except Exception as e:
logger.error(f"[WebChannel] Error serving file: {e}")
raise web.notfound()
class PollHandler:
def POST(self):
return WebChannel().poll_response()
@@ -493,9 +596,9 @@ class ChatHandler:
class ConfigHandler:
_RECOMMENDED_MODELS = [
const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
const.GLM_5, const.GLM_4_7,
const.QWEN3_MAX, const.QWEN35_PLUS,
const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
const.KIMI_K2_5, const.KIMI_K2,
const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
@@ -510,21 +613,21 @@ class ConfigHandler:
"api_key_field": "minimax_api_key",
"api_base_key": None,
"api_base_default": None,
"models": [const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING],
"models": [const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING],
}),
("zhipu", {
"label": "智谱AI",
"api_key_field": "zhipu_ai_api_key",
"api_base_key": "zhipu_ai_api_base",
"api_base_default": "https://open.bigmodel.cn/api/paas/v4",
"models": [const.GLM_5, const.GLM_4_7],
"models": [const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7],
}),
("dashscope", {
"label": "通义千问",
"api_key_field": "dashscope_api_key",
"api_base_key": None,
"api_base_default": None,
"models": [const.QWEN3_MAX, const.QWEN35_PLUS],
"models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
}),
("moonshot", {
"label": "Kimi",
@@ -563,10 +666,17 @@ class ConfigHandler:
}),
("deepseek", {
"label": "DeepSeek",
"api_key_field": "open_ai_api_key",
"api_key_field": "deepseek_api_key",
"api_base_key": "deepseek_api_base",
"api_base_default": "https://api.deepseek.com/v1",
"models": [const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER],
}),
("modelscope", {
"label": "ModelScope",
"api_key_field": "modelscope_api_key",
"api_base_key": None,
"api_base_default": None,
"models": [const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER],
"models": [const.QWEN3_5_27B, const.QWEN3_235B_A22B_INSTRUCT_2507],
}),
("linkai", {
"label": "LinkAI",
@@ -579,9 +689,9 @@ class ConfigHandler:
EDITABLE_KEYS = {
"model", "bot_type", "use_linkai",
"open_ai_api_base", "claude_api_base", "gemini_api_base",
"open_ai_api_base", "deepseek_api_base", "claude_api_base", "gemini_api_base",
"zhipu_ai_api_base", "moonshot_base_url", "ark_base_url",
"open_ai_api_key", "claude_api_key", "gemini_api_key",
"open_ai_api_key", "deepseek_api_key", "claude_api_key", "gemini_api_key",
"zhipu_ai_api_key", "dashscope_api_key", "moonshot_api_key",
"ark_api_key", "minimax_api_key", "linkai_api_key",
"agent_max_context_tokens", "agent_max_context_turns", "agent_max_steps",
@@ -1290,6 +1400,8 @@ class MemoryContentHandler:
service = MemoryService(workspace_root)
result = service.get_content(params.filename)
return json.dumps({"status": "success", **result}, ensure_ascii=False)
except ValueError:
return json.dumps({"status": "error", "message": "invalid filename"})
except FileNotFoundError:
return json.dumps({"status": "error", "message": "file not found"})
except Exception as e:
@@ -1429,3 +1541,51 @@ class AssetsHandler:
except Exception as e:
logger.error(f"Error serving static file: {e}", exc_info=True) # 添加更详细的错误信息
raise web.notfound()
class KnowledgeListHandler:
def GET(self):
web.header('Content-Type', 'application/json; charset=utf-8')
try:
from agent.knowledge.service import KnowledgeService
svc = KnowledgeService(_get_workspace_root())
result = svc.list_tree()
return json.dumps({"status": "success", **result}, ensure_ascii=False)
except Exception as e:
logger.error(f"[WebChannel] Knowledge list error: {e}")
return json.dumps({"status": "error", "message": str(e)})
class KnowledgeReadHandler:
def GET(self):
web.header('Content-Type', 'application/json; charset=utf-8')
try:
from agent.knowledge.service import KnowledgeService
params = web.input(path='')
svc = KnowledgeService(_get_workspace_root())
result = svc.read_file(params.path)
return json.dumps({"status": "success", **result}, ensure_ascii=False)
except (ValueError, FileNotFoundError) as e:
return json.dumps({"status": "error", "message": str(e)})
except Exception as e:
logger.error(f"[WebChannel] Knowledge read error: {e}")
return json.dumps({"status": "error", "message": str(e)})
class KnowledgeGraphHandler:
def GET(self):
web.header('Content-Type', 'application/json; charset=utf-8')
try:
from agent.knowledge.service import KnowledgeService
svc = KnowledgeService(_get_workspace_root())
return json.dumps(svc.build_graph(), ensure_ascii=False)
except Exception as e:
logger.error(f"[WebChannel] Knowledge graph error: {e}")
return json.dumps({"nodes": [], "links": []})
class VersionHandler:
def GET(self):
web.header('Content-Type', 'application/json; charset=utf-8')
from cli import __version__
return json.dumps({"version": __version__})

View File

@@ -330,28 +330,42 @@ class WecomBotChannel(ChatChannel):
All intermediate segments (thinking before tool calls) and the final answer
are accumulated into a single stream message, separated by '---'.
Throttles push to at most once per 100ms to avoid WebSocket congestion.
"""
stream_id = uuid.uuid4().hex[:16]
self._stream_states[req_id] = {
"stream_id": stream_id,
"committed": "", # finalized content from previous segments
"current": "", # current segment being streamed
"committed": "",
"current": "",
"last_push_time": 0,
"last_push_len": 0,
}
def _push_stream(state: dict):
"""Push current stream content to wecom."""
self._ws_send({
"cmd": "aibot_respond_msg",
"headers": {"req_id": req_id},
"body": {
"msgtype": "stream",
"stream": {
"id": state["stream_id"],
"finish": False,
"content": state["committed"] + state["current"],
def _push_stream(state: dict, force: bool = False):
"""Push current stream content to wecom (throttled unless forced)."""
now = time.time()
if not force and now - state["last_push_time"] < 0.1:
return
content = state["committed"] + state["current"]
if len(content) == state["last_push_len"]:
return
state["last_push_time"] = now
state["last_push_len"] = len(content)
try:
self._ws_send({
"cmd": "aibot_respond_msg",
"headers": {"req_id": req_id},
"body": {
"msgtype": "stream",
"stream": {
"id": state["stream_id"],
"finish": False,
"content": content,
},
},
},
})
})
except Exception as e:
logger.warning(f"[WecomBot] Stream push failed: {e}")
def on_event(event: dict):
event_type = event.get("type")
@@ -378,6 +392,7 @@ class WecomBotChannel(ChatChannel):
else:
state["committed"] += state["current"]
state["current"] = ""
_push_stream(state, force=True)
return on_event
@@ -452,11 +467,16 @@ class WecomBotChannel(ChatChannel):
if req_id:
state = self._stream_states.pop(req_id, None)
if state:
final_content = state["committed"]
final_content = state["committed"] if state["committed"] else content
stream_id = state["stream_id"]
else:
final_content = content
stream_id = uuid.uuid4().hex[:16]
# Brief pause so the server finishes processing the last intermediate chunk
# before receiving the finish packet
time.sleep(0.15)
self._ws_send({
"cmd": "aibot_respond_msg",
"headers": {"req_id": req_id},

View File

@@ -37,11 +37,19 @@ def _random_wechat_uin() -> str:
return base64.b64encode(str(val).encode("utf-8")).decode("utf-8")
CHANNEL_VERSION = "2.0.0"
# iLink-App-ClientVersion: uint32 encoded as major<<16 | minor<<8 | patch
# 2.0.0 → 0x00020000 = 131072
CLIENT_VERSION = "131072"
def _build_headers(token: str = "") -> dict:
headers = {
"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _random_wechat_uin(),
"iLink-App-Id": "bot",
"iLink-App-ClientVersion": CLIENT_VERSION,
}
if token:
headers["Authorization"] = f"Bearer {token}"
@@ -64,6 +72,7 @@ class WeixinApi:
def _post(self, endpoint: str, body: dict, timeout: int = DEFAULT_API_TIMEOUT) -> dict:
url = _ensure_trailing_slash(self.base_url) + endpoint
headers = _build_headers(self.token)
body.setdefault("base_info", {}).setdefault("channel_version", CHANNEL_VERSION)
try:
resp = requests.post(url, json=body, headers=headers, timeout=timeout)
resp.raise_for_status()
@@ -172,10 +181,8 @@ class WeixinApi:
def get_upload_url(self, filekey: str, media_type: int, to_user_id: str,
rawsize: int, rawfilemd5: str, filesize: int,
aeskey: str,
thumb_rawsize: int = 0, thumb_rawfilemd5: str = "",
thumb_filesize: int = 0) -> dict:
body = {
aeskey: str) -> dict:
return self._post("ilink/bot/getuploadurl", {
"filekey": filekey,
"media_type": media_type,
"to_user_id": to_user_id,
@@ -183,14 +190,8 @@ class WeixinApi:
"rawfilemd5": rawfilemd5,
"filesize": filesize,
"aeskey": aeskey,
}
if thumb_rawsize > 0:
body["thumb_rawsize"] = thumb_rawsize
body["thumb_rawfilemd5"] = thumb_rawfilemd5
body["thumb_filesize"] = thumb_filesize
else:
body["no_need_thumb"] = True
return self._post("ilink/bot/getuploadurl", body)
"no_need_thumb": True,
})
# ── getConfig / sendTyping ─────────────────────────────────────────
@@ -218,7 +219,10 @@ class WeixinApi:
def poll_qr_status(self, qrcode: str, timeout: int = QR_POLL_TIMEOUT) -> dict:
url = (_ensure_trailing_slash(self.base_url) +
f"ilink/bot/get_qrcode_status?qrcode={requests.utils.quote(qrcode)}")
headers = {"iLink-App-ClientVersion": "1"}
headers = {
"iLink-App-Id": "bot",
"iLink-App-ClientVersion": CLIENT_VERSION,
}
try:
resp = requests.get(url, headers=headers, timeout=timeout)
resp.raise_for_status()
@@ -259,10 +263,18 @@ def _md5_bytes(data: bytes) -> str:
return hashlib.md5(data).hexdigest()
def _aes_ecb_padded_size(plaintext_size: int) -> int:
"""PKCS7 padded size for AES-128-ECB."""
return ((plaintext_size + 1 + 15) // 16) * 16
UPLOAD_MAX_RETRIES = 3
def upload_media_to_cdn(api: WeixinApi, file_path: str, to_user_id: str,
media_type: int) -> dict:
"""
Upload a local file to the Weixin CDN.
Upload a local file to the Weixin CDN (matching official plugin protocol).
Args:
api: WeixinApi instance
@@ -275,75 +287,79 @@ def upload_media_to_cdn(api: WeixinApi, file_path: str, to_user_id: str,
"""
aes_key = os.urandom(16)
aes_key_hex = aes_key.hex()
filekey = uuid.uuid4().hex
with open(file_path, "rb") as f:
raw_data = f.read()
raw_size = len(raw_data)
raw_md5 = _md5_bytes(raw_data)
cipher_size = _aes_ecb_padded_size(raw_size)
encrypted = _aes_ecb_encrypt(raw_data, aes_key)
cipher_size = len(encrypted)
filekey = uuid.uuid4().hex
thumb_rawsize = 0
thumb_rawfilemd5 = ""
thumb_filesize = 0
from urllib.parse import quote
if media_type == 1: # IMAGE - generate a tiny thumbnail
download_param = None
last_error = None
for attempt in range(1, UPLOAD_MAX_RETRIES + 1):
try:
from PIL import Image
import io
img = Image.open(file_path)
img.thumbnail((100, 100))
buf = io.BytesIO()
img.save(buf, format="JPEG", quality=60)
thumb_raw = buf.getvalue()
thumb_rawsize = len(thumb_raw)
thumb_rawfilemd5 = _md5_bytes(thumb_raw)
thumb_encrypted = _aes_ecb_encrypt(thumb_raw, aes_key)
thumb_filesize = len(thumb_encrypted)
except Exception as e:
logger.warning(f"[Weixin] Thumbnail generation failed, skipping: {e}")
if attempt > 1:
filekey = uuid.uuid4().hex
resp = api.get_upload_url(
filekey=filekey,
media_type=media_type,
to_user_id=to_user_id,
rawsize=raw_size,
rawfilemd5=raw_md5,
filesize=cipher_size,
aeskey=aes_key_hex,
)
resp = api.get_upload_url(
filekey=filekey,
media_type=media_type,
to_user_id=to_user_id,
rawsize=raw_size,
rawfilemd5=raw_md5,
filesize=cipher_size,
aeskey=aes_key_hex,
thumb_rawsize=thumb_rawsize,
thumb_rawfilemd5=thumb_rawfilemd5,
thumb_filesize=thumb_filesize,
)
# API may return either upload_full_url (new) or upload_param (legacy)
upload_full_url = resp.get("upload_full_url", "")
upload_param = resp.get("upload_param", "")
if upload_full_url:
cdn_url = upload_full_url
elif upload_param:
cdn_url = (f"{api.cdn_base_url}/upload"
f"?encrypted_query_param={quote(upload_param)}"
f"&filekey={quote(filekey)}")
else:
raise RuntimeError(f"[Weixin] getUploadUrl returned neither upload_full_url nor upload_param: {resp}")
upload_param = resp.get("upload_param", "")
if not upload_param:
raise RuntimeError(f"[Weixin] getUploadUrl returned no upload_param: {resp}")
cdn_url = api.cdn_base_url + "?" + upload_param
put_resp = requests.put(cdn_url, data=encrypted, headers={
"Content-Type": "application/octet-stream",
"Content-Length": str(cipher_size),
}, timeout=60)
put_resp.raise_for_status()
# Upload thumbnail if we have one
thumb_upload_param = resp.get("thumb_upload_param", "")
if thumb_upload_param and thumb_filesize > 0:
thumb_cdn_url = api.cdn_base_url + "?" + thumb_upload_param
try:
requests.put(thumb_cdn_url, data=thumb_encrypted, headers={
cdn_resp = requests.post(cdn_url, data=encrypted, headers={
"Content-Type": "application/octet-stream",
"Content-Length": str(thumb_filesize),
}, timeout=30)
"Content-Length": str(len(encrypted)),
}, timeout=120)
if 400 <= cdn_resp.status_code < 500:
err_msg = cdn_resp.headers.get("x-error-message", cdn_resp.text[:200])
raise RuntimeError(f"CDN client error {cdn_resp.status_code}: {err_msg}")
cdn_resp.raise_for_status()
download_param = cdn_resp.headers.get("x-encrypted-param", "")
if not download_param:
raise RuntimeError("CDN response missing x-encrypted-param header")
logger.debug(f"[Weixin] CDN upload success attempt={attempt} filekey={filekey}")
break
except Exception as e:
logger.warning(f"[Weixin] Thumbnail upload failed (non-fatal): {e}")
last_error = e
if "client error" in str(e):
raise
if attempt < UPLOAD_MAX_RETRIES:
backoff = 2 ** attempt
logger.warning(f"[Weixin] CDN upload attempt {attempt} failed, retrying in {backoff}s: {e}")
time.sleep(backoff)
else:
logger.error(f"[Weixin] CDN upload failed after {UPLOAD_MAX_RETRIES} attempts: {e}")
if not download_param:
raise last_error or RuntimeError("CDN upload failed")
aes_key_b64 = base64.b64encode(aes_key_hex.encode("utf-8")).decode("utf-8")
return {
"encrypt_query_param": upload_param,
"aes_key_b64": base64.b64encode(aes_key).decode("utf-8"),
"encrypt_query_param": download_param,
"aes_key_b64": aes_key_b64,
"ciphertext_size": cipher_size,
"raw_size": raw_size,
}
@@ -363,19 +379,30 @@ def download_media_from_cdn(cdn_base_url: str, encrypt_query_param: str,
Returns:
save_path on success
"""
url = cdn_base_url + "?" + encrypt_query_param
from urllib.parse import quote
url = f"{cdn_base_url}/download?encrypted_query_param={quote(encrypt_query_param)}"
resp = requests.get(url, timeout=60)
resp.raise_for_status()
# Determine key format (hex string or base64)
# Determine key format:
# 1) 32-char hex string → 16 raw bytes
# 2) base64 string → decode → if 32 bytes, treat as hex-encoded → 16 raw bytes
# 3) base64 string → decode → 16 raw bytes directly
try:
key_bytes = bytes.fromhex(aes_key)
if len(key_bytes) != 16:
raise ValueError()
except (ValueError, TypeError):
key_bytes = base64.b64decode(aes_key)
if len(key_bytes) != 16:
raise ValueError(f"Invalid AES key length: {len(key_bytes)}")
decoded = base64.b64decode(aes_key)
if len(decoded) == 32:
try:
key_bytes = bytes.fromhex(decoded.decode("ascii"))
except (ValueError, UnicodeDecodeError):
raise ValueError(f"Invalid AES key: 32 bytes but not valid hex")
elif len(decoded) == 16:
key_bytes = decoded
else:
raise ValueError(f"Invalid AES key length after base64 decode: {len(decoded)}")
decrypted = _aes_ecb_decrypt(resp.content, key_bytes)

View File

@@ -31,6 +31,8 @@ BACKOFF_DELAY = 30
RETRY_DELAY = 2
SESSION_EXPIRED_ERRCODE = -14
TEXT_CHUNK_LIMIT = 4000
QR_LOGIN_TIMEOUT_S = 480
QR_MAX_REFRESHES = 10
def _load_credentials(cred_path: str) -> dict:
@@ -80,6 +82,8 @@ class WeixinChannel(ChatChannel):
# ── Lifecycle ──────────────────────────────────────────────────────
def startup(self):
self._stop_event.clear()
base_url = conf().get("weixin_base_url", DEFAULT_BASE_URL)
cdn_base_url = conf().get("weixin_cdn_base_url", CDN_BASE_URL)
token = conf().get("weixin_token", "")
@@ -95,17 +99,9 @@ class WeixinChannel(ChatChannel):
base_url = creds["base_url"]
if not token:
logger.info("[Weixin] No token found, starting QR login...")
self.login_status = self.LOGIN_STATUS_WAITING
login_result = self._qr_login(base_url)
if not login_result:
self.login_status = self.LOGIN_STATUS_IDLE
err = "[Weixin] QR login failed. Set weixin_token in config or run login again."
logger.error(err)
self.report_startup_error(err)
token, base_url = self._login_with_retry(base_url)
if not token:
return
token = login_result["token"]
base_url = login_result.get("base_url", base_url)
self.api = WeixinApi(base_url=base_url, token=token, cdn_base_url=cdn_base_url)
self.login_status = self.LOGIN_STATUS_OK
@@ -114,9 +110,26 @@ class WeixinChannel(ChatChannel):
f"如需重新扫码登录请删除该文件后重启")
self.report_startup_success()
self._stop_event.clear()
self._poll_loop()
def _login_with_retry(self, base_url: str) -> tuple:
"""Attempt QR login, then wait for stop if failed.
Returns (token, base_url) on success, or ("", "") if stopped."""
logger.info("[Weixin] No token found, starting QR login...")
self.login_status = self.LOGIN_STATUS_WAITING
login_result = self._qr_login(base_url)
if login_result:
return login_result["token"], login_result.get("base_url", base_url)
self.login_status = self.LOGIN_STATUS_IDLE
if not self._stop_event.is_set():
logger.info("[Weixin] QR login timed out, waiting for stop or reconnect...")
print(" 二维码登录超时,请通过控制台重新接入\n")
self._stop_event.wait()
logger.info("[Weixin] Login cancelled by stop event")
return "", ""
def stop(self):
logger.info("[Weixin] stop() called")
self._stop_event.set()
@@ -153,10 +166,18 @@ class WeixinChannel(ChatChannel):
print("=" * 60)
try:
import qrcode as qr_lib
import io
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
qr.add_data(qrcode_url)
qr.make(fit=True)
qr.print_ascii(invert=True)
buf = io.StringIO()
qr.print_ascii(out=buf, invert=True)
try:
print(buf.getvalue())
except UnicodeEncodeError:
# Windows GBK terminals cannot render Unicode block characters
print(f"\n (终端不支持显示二维码,请使用链接扫码)")
print(f" 二维码链接: {qrcode_url}\n")
except ImportError:
print(f"\n 二维码链接: {qrcode_url}")
print(" (安装 'qrcode' 包可在终端显示二维码)\n")
@@ -202,14 +223,21 @@ class WeixinChannel(ChatChannel):
return {}
self._current_qr_url = qrcode_url
logger.info(f"[Weixin] QR code URL: {qrcode_url}")
logger.info(f"[Weixin] 微信二维码链接: {qrcode_url}")
self._print_qr(qrcode_url)
self._notify_cloud_qrcode(qrcode_url)
print(" 等待扫码...\n")
scanned_printed = False
refresh_count = 0
deadline = time.time() + QR_LOGIN_TIMEOUT_S
while not self._stop_event.is_set():
if time.time() >= deadline:
logger.warning(f"[Weixin] QR login timed out after {QR_LOGIN_TIMEOUT_S}s")
print(f"\n 二维码登录超时({QR_LOGIN_TIMEOUT_S}s请重启后重试")
break
try:
status_resp = api.poll_qr_status(qrcode)
except Exception as e:
@@ -226,14 +254,19 @@ class WeixinChannel(ChatChannel):
print(" 已扫码,请在手机上确认...")
scanned_printed = True
elif status == "expired":
print(" 二维码已过期,正在刷新...")
refresh_count += 1
if refresh_count >= QR_MAX_REFRESHES:
logger.warning(f"[Weixin] QR code refreshed {QR_MAX_REFRESHES} times, giving up")
print(f"\n 二维码已刷新 {QR_MAX_REFRESHES} 次仍未扫码,请重启后重试")
break
print(f" 二维码已过期,正在刷新({refresh_count}/{QR_MAX_REFRESHES}...")
try:
qr_resp = api.fetch_qr_code()
qrcode = qr_resp.get("qrcode", "")
qrcode_url = qr_resp.get("qrcode_img_content", "")
scanned_printed = False
self._current_qr_url = qrcode_url
logger.info(f"[Weixin] New QR code: {qrcode_url}")
logger.info(f"[Weixin] 微信二维码链接 ({refresh_count}/{QR_MAX_REFRESHES}): {qrcode_url}")
self._print_qr(qrcode_url)
self._notify_cloud_qrcode(qrcode_url)
except Exception as e:
@@ -267,8 +300,9 @@ class WeixinChannel(ChatChannel):
self._stop_event.wait(1)
logger.info("[Weixin] QR login cancelled by stop event")
self._current_qr_url = ""
if self._stop_event.is_set():
logger.info("[Weixin] QR login cancelled by stop event")
return {}
# ── Long-poll loop ─────────────────────────────────────────────────

View File

@@ -184,12 +184,16 @@ class WeixinMessage(ChatMessage):
logger.warning(f"[Weixin] Missing CDN params for media download (type={media_type})")
return ""
ext_map = {ITEM_IMAGE: ".jpg", ITEM_VIDEO: ".mp4", ITEM_FILE: "", ITEM_VOICE: ".silk"}
ext = ext_map.get(media_type, "")
if media_type == ITEM_FILE:
ext = os.path.splitext(info.get("file_name", ""))[1] or ".bin"
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}{ext}")
original_name = info.get("file_name", "")
if original_name:
save_path = os.path.join(_get_tmp_dir(), original_name)
else:
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}.bin")
else:
ext_map = {ITEM_IMAGE: ".jpg", ITEM_VIDEO: ".mp4", ITEM_VOICE: ".silk"}
ext = ext_map.get(media_type, "")
save_path = os.path.join(_get_tmp_dir(), f"wx_{self.msg_id}{ext}")
try:
download_media_from_cdn(cdn_base_url, encrypt_param, aes_key, save_path)

1
cli/VERSION Normal file
View File

@@ -0,0 +1 @@
2.0.6

13
cli/__init__.py Normal file
View File

@@ -0,0 +1,13 @@
"""CowAgent CLI - Manage your CowAgent from the command line."""
import os as _os
def _read_version():
version_file = _os.path.join(_os.path.dirname(_os.path.abspath(__file__)), "VERSION")
try:
with open(version_file, "r") as f:
return f.read().strip()
except FileNotFoundError:
return "0.0.0"
__version__ = _read_version()

4
cli/__main__.py Normal file
View File

@@ -0,0 +1,4 @@
"""Allow running as: python -m cli"""
from cli.cli import main
main()

79
cli/cli.py Normal file
View File

@@ -0,0 +1,79 @@
"""CowAgent CLI entry point."""
import click
from cli import __version__
from cli.commands.skill import skill
from cli.commands.process import start, stop, restart, update, status, logs
from cli.commands.context import context
from cli.commands.install import install_browser
from cli.commands.knowledge import knowledge
HELP_TEXT = """Usage: cow COMMAND [ARGS]...
CowAgent CLI - Manage your CowAgent instance.
Commands:
help Show this message.
version Show the version.
start Start CowAgent.
stop Stop CowAgent.
restart Restart CowAgent.
update Update CowAgent and restart.
status Show CowAgent running status.
logs View CowAgent logs.
skill Manage CowAgent skills.
knowledge Manage knowledge base.
install-browser Install browser tool (Playwright + Chromium).
Tip: You can also send /help, /skill list, etc. in agent chat."""
class CowCLI(click.Group):
def format_help(self, ctx, formatter):
formatter.write(HELP_TEXT.strip())
formatter.write("\n")
def parse_args(self, ctx, args):
if args and args[0] == 'help':
click.echo(HELP_TEXT.strip())
ctx.exit(0)
return super().parse_args(ctx, args)
@click.group(cls=CowCLI, invoke_without_command=True, context_settings=dict(help_option_names=[]))
@click.pass_context
def main(ctx):
"""CowAgent CLI - Manage your CowAgent instance."""
if ctx.invoked_subcommand is None:
click.echo(HELP_TEXT.strip())
@main.command()
def version():
"""Show the version."""
click.echo(f"cow {__version__}")
@main.command(name='help')
@click.pass_context
def help_cmd(ctx):
"""Show this message."""
click.echo(HELP_TEXT.strip())
main.add_command(skill)
main.add_command(start)
main.add_command(stop)
main.add_command(restart)
main.add_command(update)
main.add_command(status)
main.add_command(logs)
main.add_command(context)
main.add_command(knowledge)
main.add_command(install_browser)
if __name__ == '__main__':
main()

0
cli/commands/__init__.py Normal file
View File

29
cli/commands/context.py Normal file
View File

@@ -0,0 +1,29 @@
"""cow context - Context management commands."""
import click
CHAT_HINT = (
"Context commands operate on the running agent's memory.\n"
"Please send the command in a chat conversation instead:\n\n"
" /context - View current context info\n"
" /context clear - Clear conversation context"
)
@click.group(invoke_without_command=True)
@click.pass_context
def context(ctx):
"""View or manage conversation context.
Context commands need access to the running agent's memory.
Use them in chat conversations: /context or /context clear
"""
if ctx.invoked_subcommand is None:
click.echo(f"\n {CHAT_HINT}\n")
@context.command()
def clear():
"""Clear conversation context (messages history)."""
click.echo(f"\n {CHAT_HINT}\n")

259
cli/commands/install.py Normal file
View File

@@ -0,0 +1,259 @@
"""cow install-browser - Install Playwright + Chromium for the browser tool."""
import os
import sys
import subprocess
from typing import Callable, Optional
import click
PLAYWRIGHT_VERSION = "1.52.0"
PLAYWRIGHT_LEGACY_VERSION = "1.28.0"
GLIBC_THRESHOLD = (2, 28)
CHINA_MIRROR = "https://registry.npmmirror.com/-/binary/playwright"
# stream(msg, fg=None) — fg is "yellow" | "green" | "red" | None
StreamFn = Callable[[str, Optional[str]], None]
# on_phase(msg) — coarse-grained progress for chat channels (Chinese)
PhaseFn = Callable[[str], None]
def _phase(cb: Optional[PhaseFn], msg: str) -> None:
if cb:
cb(msg)
def _has_display() -> bool:
"""Check if a graphical display is available (Linux only)."""
return bool(os.environ.get("DISPLAY") or os.environ.get("WAYLAND_DISPLAY"))
def _is_headless_linux() -> bool:
return sys.platform == "linux" and not _has_display()
def _get_installed_version() -> str:
try:
out = subprocess.check_output(
[sys.executable, "-c", "import playwright; print(playwright.__version__)"],
stderr=subprocess.DEVNULL,
)
return out.decode().strip()
except Exception:
return ""
def _version_tuple(v: str):
try:
return tuple(int(x) for x in v.split(".")[:3])
except (ValueError, AttributeError):
return (0, 0, 0)
def _get_glibc_version():
if sys.platform != "linux":
return None
try:
import ctypes
libc = ctypes.CDLL("libc.so.6")
gnu_get_libc_version = libc.gnu_get_libc_version
gnu_get_libc_version.restype = ctypes.c_char_p
ver = gnu_get_libc_version().decode()
parts = ver.split(".")
return (int(parts[0]), int(parts[1]))
except Exception:
return None
def _is_china_network() -> bool:
try:
out = subprocess.check_output(
[sys.executable, "-m", "pip", "config", "get", "global.index-url"],
stderr=subprocess.DEVNULL,
)
url = out.decode().strip().lower()
return any(kw in url for kw in ("tsinghua", "aliyun", "npmmirror", "douban", "ustc", "huawei", "tencentyun"))
except Exception:
return False
def _pip_install(package_spec: str, stream: StreamFn) -> int:
"""Install a package, retrying with --user on permission failure."""
python = sys.executable
ret = subprocess.call([python, "-m", "pip", "install", package_spec])
if ret != 0:
stream(" Retrying with --user flag...", "yellow")
ret = subprocess.call([python, "-m", "pip", "install", "--user", package_spec])
return ret
def _default_stream(msg: str, fg: Optional[str] = None) -> None:
"""CLI: colored click output."""
if fg == "yellow":
click.echo(click.style(msg, fg="yellow"))
elif fg == "green":
click.echo(click.style(msg, fg="green"))
elif fg == "red":
click.echo(click.style(msg, fg="red"))
else:
click.echo(msg)
def run_install_browser(
stream: Optional[StreamFn] = None,
on_phase: Optional[PhaseFn] = None,
) -> int:
"""
Install Playwright Python package, optional Linux deps, and Chromium.
Reused by ``cow install-browser`` CLI and chat ``/install-browser``.
Args:
stream: Optional callback ``(message, fg)`` for each line. ``fg`` is
``yellow`` / ``green`` / ``red`` or None. Defaults to colored click output.
on_phase: Optional callback for coarse progress (e.g. push to chat);
messages are short Chinese status lines.
Returns:
0 on success, 1 on fatal failure (pip or chromium install failed).
"""
stream = stream or _default_stream
python = sys.executable
legacy_mode = False
_phase(on_phase, "🔧 开始安装浏览器工具依赖(约几分钟,请耐心等待)…")
glibc = _get_glibc_version()
if glibc and glibc < GLIBC_THRESHOLD:
legacy_mode = True
glibc_str = f"{glibc[0]}.{glibc[1]}"
stream(
f"glibc {glibc_str} detected (< 2.28). "
f"Will install playwright {PLAYWRIGHT_LEGACY_VERSION} for compatibility.",
"yellow",
)
stream(" Note: upgrade your OS for full browser tool support.", "yellow")
stream("")
_phase(
on_phase,
f" 检测到 glibc {glibc_str}(较旧),将安装兼容版 Playwright {PLAYWRIGHT_LEGACY_VERSION}",
)
target_version = PLAYWRIGHT_LEGACY_VERSION if legacy_mode else PLAYWRIGHT_VERSION
_phase(on_phase, "📦 [1/3] 正在安装 Playwright Python 包…")
stream("[1/3] Installing playwright Python package...", "yellow")
ret = _pip_install(f"playwright=={target_version}", stream)
if ret != 0:
stream("Failed to install playwright package.", "red")
_phase(on_phase, "❌ [1/3] Playwright Python 包安装失败。")
return 1
installed = _get_installed_version()
if installed:
stream(f" playwright {installed} installed.", "green")
stream("")
_phase(on_phase, f"✅ [1/3] Playwright 包已安装({installed or target_version})。")
if sys.platform == "linux":
_phase(on_phase, "🔧 [2/3] 正在安装 Linux 系统依赖与轻量中文字体(文泉驿正黑,部分步骤可能需要 sudo")
stream("[2/3] Installing system dependencies (Linux)...", "yellow")
ret = subprocess.call([python, "-m", "playwright", "install-deps", "chromium"])
if ret != 0:
stream(
" Could not auto-install system deps (may need sudo).\n"
f" Run manually: sudo {python} -m playwright install-deps chromium",
"yellow",
)
# Prefer fonts-wqy-zenhei only (~few MB). fonts-noto-cjk is much larger (~150MB+).
stream(" Installing CJK font (fonts-wqy-zenhei, lightweight)...")
font_ret = subprocess.call(
["sudo", "apt-get", "install", "-y", "--no-install-recommends", "fonts-wqy-zenhei"],
stderr=subprocess.DEVNULL,
)
if font_ret != 0:
stream(
" Could not auto-install CJK font.\n"
" Run manually: sudo apt-get install -y fonts-wqy-zenhei\n"
" (Optional, larger full coverage: sudo apt-get install -y fonts-noto-cjk)",
"yellow",
)
else:
subprocess.call(["fc-cache", "-fv"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
stream(" CJK font (wqy-zenhei) installed.", "green")
_phase(
on_phase,
"✅ [2/3] Linux 依赖与字体步骤已执行(若有权限问题请查看服务器日志或手动执行提示命令)。",
)
else:
stream(f"[2/3] Skipping system deps (not needed on {sys.platform}).", "yellow")
_phase(on_phase, f" [2/3] 当前系统({sys.platform})跳过 Linux 专用依赖。")
stream("")
_phase(on_phase, "🌐 [3/3] 正在下载并安装 Chromium体积较大请耐心等待")
stream("[3/3] Installing Chromium browser...", "yellow")
cmd = [python, "-m", "playwright", "install", "chromium"]
if _is_headless_linux() and not legacy_mode:
ver = _version_tuple(installed or "")
if ver >= (1, 57, 0):
cmd.append("--only-shell")
stream(" (headless shell for Linux server)", None)
else:
stream(" (full Chromium)", None)
elif sys.platform == "linux" and _has_display():
stream(" (full browser for Linux desktop)", None)
env = os.environ.copy()
use_mirror = _is_china_network()
if use_mirror:
env["PLAYWRIGHT_DOWNLOAD_HOST"] = CHINA_MIRROR
stream(f" (using China mirror: {CHINA_MIRROR})", None)
_phase(on_phase, "📡 检测到国内 pip 源配置Chromium 将优先走国内镜像下载。")
ret = subprocess.call(cmd, env=env)
if ret != 0 and use_mirror:
stream(" Mirror download failed, retrying with official CDN...", "yellow")
_phase(on_phase, "⚠️ 镜像下载失败,正在改用官方源重试…")
env_no_mirror = os.environ.copy()
env_no_mirror.pop("PLAYWRIGHT_DOWNLOAD_HOST", None)
ret = subprocess.call(cmd, env=env_no_mirror)
if ret != 0:
stream("Failed to install Chromium.", "red")
_phase(on_phase, "❌ [3/3] Chromium 安装失败。")
return 1
stream("")
_phase(on_phase, "✅ [3/3] Chromium 已安装。")
stream("Verifying browser installation...", None)
_phase(on_phase, "🔍 正在验证 Playwright 能否正常加载…")
ret = subprocess.call(
[python, "-c", "from playwright.sync_api import sync_playwright; print('OK')"],
stderr=subprocess.DEVNULL,
)
if ret != 0:
stream(
" Warning: playwright import failed. Browser tool may not work on this system.\n"
" Consider upgrading your OS or using Docker.",
"yellow",
)
_phase(on_phase, "⚠️ 验证未完全通过:本机可能仍无法使用浏览器工具,请查看日志或升级系统。")
else:
stream(" Verification passed.", "green")
_phase(on_phase, "✅ 验证通过。")
stream("")
stream("Browser tool ready! Restart CowAgent to enable it.", "green")
_phase(on_phase, "🎉 全部步骤结束。请重启 CowAgent 后使用 browser 工具。")
return 0
@click.command("install-browser")
def install_browser():
"""Install browser tool dependencies (Playwright + Chromium)."""
code = run_install_browser()
if code != 0:
raise SystemExit(code)

121
cli/commands/knowledge.py Normal file
View File

@@ -0,0 +1,121 @@
"""cow knowledge - Knowledge base management commands."""
import os
import click
from cli.utils import get_project_root
def _get_knowledge_dir():
"""Resolve the knowledge directory path from config or default."""
try:
import sys
sys.path.insert(0, get_project_root())
from config import conf
from common.utils import expand_path
workspace = expand_path(conf().get("agent_workspace", "~/cow"))
except Exception:
workspace = os.path.expanduser("~/cow")
return os.path.join(workspace, "knowledge")
def _get_knowledge_enabled():
try:
import sys
sys.path.insert(0, get_project_root())
from config import conf
return conf().get("knowledge", True)
except Exception:
return True
@click.group(invoke_without_command=True)
@click.pass_context
def knowledge(ctx):
"""Manage CowAgent knowledge base."""
if ctx.invoked_subcommand is None:
click.echo(_stats())
@knowledge.command("list")
def knowledge_list():
"""Display knowledge base file tree."""
click.echo(_tree())
def _stats() -> str:
knowledge_dir = _get_knowledge_dir()
if not os.path.isdir(knowledge_dir):
return "Knowledge base directory not found."
enabled = _get_knowledge_enabled()
total_files = 0
total_bytes = 0
cat_count = {}
for root, dirs, files in os.walk(knowledge_dir):
dirs[:] = [d for d in dirs if not d.startswith(".")]
rel_root = os.path.relpath(root, knowledge_dir)
category = rel_root.split(os.sep)[0] if rel_root != "." else "root"
for f in files:
if f.endswith(".md") and f not in ("index.md", "log.md"):
total_files += 1
total_bytes += os.path.getsize(os.path.join(root, f))
cat_count[category] = cat_count.get(category, 0) + 1
status_icon = click.style("enabled", fg="green") if enabled else click.style("disabled", fg="red")
lines = [
f"\n Knowledge Base [{status_icon}]",
"",
f" Pages: {total_files}",
f" Size: {total_bytes / 1024:.1f} KB",
"",
]
if cat_count:
lines.append(" Categories:")
for cat in sorted(cat_count.keys()):
lines.append(f" {cat}/ ({cat_count[cat]} pages)")
lines.append("")
lines.append(f" Path: {knowledge_dir}")
lines.append("")
return "\n".join(lines)
def _tree() -> str:
knowledge_dir = _get_knowledge_dir()
if not os.path.isdir(knowledge_dir):
return "Knowledge base directory not found."
tree_lines = [" knowledge/"]
subdirs = sorted([
d for d in os.listdir(knowledge_dir)
if os.path.isdir(os.path.join(knowledge_dir, d)) and not d.startswith(".")
])
for i, subdir in enumerate(subdirs):
is_last_dir = (i == len(subdirs) - 1)
branch = "└── " if is_last_dir else "├── "
subdir_path = os.path.join(knowledge_dir, subdir)
md_files = sorted([
f for f in os.listdir(subdir_path)
if f.endswith(".md") and not f.startswith(".")
])
tree_lines.append(f" {branch}{subdir}/ ({len(md_files)})")
child_prefix = " " if is_last_dir else ""
max_show = 15
for j, fname in enumerate(md_files[:max_show]):
is_last_file = (j == len(md_files[:max_show]) - 1) and len(md_files) <= max_show
fb = "└── " if is_last_file else "├── "
name = fname.replace(".md", "")
tree_lines.append(f"{child_prefix}{fb}{name}")
if len(md_files) > max_show:
tree_lines.append(f"{child_prefix}└── ... +{len(md_files) - max_show} more")
if not subdirs:
tree_lines.append(" (empty)")
return "\n" + "\n".join(tree_lines) + "\n"

317
cli/commands/process.py Normal file
View File

@@ -0,0 +1,317 @@
"""cow start/stop/restart/status/logs - Process management commands."""
import os
import sys
import subprocess
import time
from typing import Optional
import click
from cli.utils import get_project_root
_IS_WIN = sys.platform == "win32"
def _get_pid_file():
return os.path.join(get_project_root(), ".cow.pid")
def _get_log_file():
return os.path.join(get_project_root(), "nohup.out")
def _is_pid_alive(pid: int) -> bool:
"""Check whether a process is still running (cross-platform)."""
if _IS_WIN:
try:
out = subprocess.check_output(
["tasklist", "/FI", f"PID eq {pid}", "/NH"],
stderr=subprocess.DEVNULL,
)
return str(pid) in out.decode(errors="ignore")
except Exception:
return False
else:
try:
os.kill(pid, 0)
return True
except (ProcessLookupError, PermissionError):
return False
def _kill_pid(pid: int, force: bool = False):
"""Terminate a process by PID (cross-platform)."""
if _IS_WIN:
flag = "/F" if force else ""
cmd = ["taskkill"]
if force:
cmd.append("/F")
cmd.extend(["/PID", str(pid)])
subprocess.run(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
else:
import signal
sig = signal.SIGKILL if force else signal.SIGTERM
os.kill(pid, sig)
def _read_pid() -> Optional[int]:
pid_file = _get_pid_file()
if not os.path.exists(pid_file):
return None
try:
with open(pid_file, "r") as f:
pid = int(f.read().strip())
if _is_pid_alive(pid):
return pid
os.remove(pid_file)
return None
except (ValueError, OSError):
try:
os.remove(pid_file)
except OSError:
pass
return None
def _write_pid(pid: int):
with open(_get_pid_file(), "w") as f:
f.write(str(pid))
def _remove_pid():
pid_file = _get_pid_file()
if os.path.exists(pid_file):
os.remove(pid_file)
@click.command()
@click.option("--foreground", "-f", is_flag=True, help="Run in foreground (don't daemonize)")
@click.option("--no-logs", is_flag=True, help="Don't tail logs after starting")
def start(foreground, no_logs):
"""Start CowAgent."""
pid = _read_pid()
if pid:
click.echo(f"CowAgent is already running (PID: {pid}).")
return
root = get_project_root()
app_py = os.path.join(root, "app.py")
if not os.path.exists(app_py):
click.echo("Error: app.py not found in project root.", err=True)
sys.exit(1)
python = sys.executable
if foreground:
click.echo("Starting CowAgent in foreground...")
if _IS_WIN:
sys.exit(subprocess.call([python, app_py], cwd=root))
else:
os.execv(python, [python, app_py])
else:
log_file = _get_log_file()
click.echo("Starting CowAgent...")
popen_kwargs = dict(cwd=root)
if _IS_WIN:
CREATE_NO_WINDOW = 0x08000000
popen_kwargs["creationflags"] = (
subprocess.CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW
)
else:
popen_kwargs["start_new_session"] = True
with open(log_file, "a") as log:
proc = subprocess.Popen(
[python, app_py],
stdout=log,
stderr=log,
**popen_kwargs,
)
_write_pid(proc.pid)
click.echo(click.style(f"✓ CowAgent started (PID: {proc.pid})", fg="green"))
click.echo(f" Logs: {log_file}")
if not no_logs:
click.echo(" Press Ctrl+C to stop tailing logs.\n")
_tail_log(log_file)
@click.command()
def stop():
"""Stop CowAgent."""
pid = _read_pid()
if not pid:
click.echo("CowAgent is not running.")
return
click.echo(f"Stopping CowAgent (PID: {pid})...")
try:
_kill_pid(pid)
for _ in range(30):
time.sleep(0.1)
if not _is_pid_alive(pid):
break
else:
_kill_pid(pid, force=True)
except (ProcessLookupError, OSError):
pass
_remove_pid()
click.echo(click.style("✓ CowAgent stopped.", fg="green"))
@click.command()
@click.option("--no-logs", is_flag=True, help="Don't tail logs after restarting")
@click.pass_context
def restart(ctx, no_logs):
"""Restart CowAgent."""
ctx.invoke(stop)
time.sleep(1)
ctx.invoke(start, no_logs=no_logs)
@click.command()
@click.pass_context
def update(ctx):
"""Update CowAgent and restart."""
root = get_project_root()
# 1. Stop service first so git pull won't conflict with running code
ctx.invoke(stop)
# 2. Git pull
if os.path.isdir(os.path.join(root, ".git")):
click.echo("Pulling latest code...")
ret = subprocess.call(["git", "pull"], cwd=root)
if ret != 0:
click.echo("Error: git pull failed.", err=True)
sys.exit(1)
else:
click.echo("Not a git repository, skipping code update.")
python = sys.executable
req_file = os.path.join(root, "requirements.txt")
if _IS_WIN:
# On Windows, `cow.exe` (this process) locks the exe file, so
# `pip install -e .` fails with WinError 5. Write a small .bat
# helper that waits for cow.exe to exit, then installs & starts.
bat = os.path.join(root, "_cow_update.bat")
lines = [
"@echo off",
"chcp 65001 >nul",
"echo Waiting for cow.exe to exit...",
"timeout /t 3 /nobreak >nul",
]
if os.path.exists(req_file):
lines.append(f'echo Installing dependencies...')
lines.append(f'"{python}" -m pip install -r requirements.txt -q')
lines += [
"echo Reinstalling cow CLI...",
f'"{python}" -m pip install -e . -q',
"echo Starting CowAgent...",
f'"{python}" -m cli.cli start --no-logs',
"echo.",
"echo Update complete. You can close this window.",
"pause >nul",
"del \"%~f0\"",
]
with open(bat, "w", encoding="utf-8") as f:
f.write("\n".join(lines) + "\n")
subprocess.Popen(
["cmd.exe", "/c", "start", "CowAgent Update", "/wait", bat],
cwd=root,
)
click.echo(click.style(
"✓ Update script launched. Please follow the new window for progress.",
fg="green"))
else:
# 3. Install dependencies
if os.path.exists(req_file):
click.echo("Installing dependencies...")
subprocess.call(
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
cwd=root,
)
click.echo("Reinstalling cow CLI...")
subprocess.call(
[python, "-m", "pip", "install", "-e", ".", "-q"],
cwd=root,
)
# 4. Start service
click.echo("")
time.sleep(1)
ctx.invoke(start, no_logs=False)
@click.command()
def status():
"""Show CowAgent running status."""
from cli import __version__
from cli.utils import load_config_json
pid = _read_pid()
if pid:
click.echo(click.style(f"● CowAgent is running (PID: {pid})", fg="green"))
else:
click.echo(click.style("● CowAgent is not running", fg="red"))
click.echo(f" 版本: v{__version__}")
cfg = load_config_json()
if cfg:
channel = cfg.get("channel_type", "unknown")
if isinstance(channel, list):
channel = ", ".join(channel)
click.echo(f" 通道: {channel}")
click.echo(f" 模型: {cfg.get('model', 'unknown')}")
mode = "Agent" if cfg.get("agent") else "Chat"
click.echo(f" 模式: {mode}")
@click.command()
@click.option("--follow", "-f", is_flag=True, help="Follow log output")
@click.option("--lines", "-n", default=50, help="Number of lines to show")
def logs(follow, lines):
"""View CowAgent logs."""
log_file = _get_log_file()
if not os.path.exists(log_file):
click.echo("No log file found.")
return
if follow:
_tail_log(log_file, lines)
else:
_print_last_lines(log_file, lines)
def _print_last_lines(file_path: str, n: int = 50):
"""Print the last N lines of a file (cross-platform)."""
try:
with open(file_path, "r", encoding="utf-8", errors="replace") as f:
all_lines = f.readlines()
for line in all_lines[-n:]:
click.echo(line, nl=False)
except Exception as e:
click.echo(f"Error reading log file: {e}", err=True)
def _tail_log(log_file: str, lines: int = 50):
"""Follow log file output. Blocks until Ctrl+C (cross-platform)."""
_print_last_lines(log_file, lines)
try:
with open(log_file, "r", encoding="utf-8", errors="replace") as f:
f.seek(0, 2)
while True:
line = f.readline()
if line:
click.echo(line, nl=False)
else:
time.sleep(0.3)
except KeyboardInterrupt:
pass

1463
cli/commands/skill.py Normal file

File diff suppressed because it is too large Load Diff

62
cli/utils.py Normal file
View File

@@ -0,0 +1,62 @@
"""Shared utilities for cow CLI."""
import os
import sys
import json
def get_project_root() -> str:
"""Get the CowAgent project root directory."""
# cli/ is directly under the project root
return os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
def get_workspace_dir() -> str:
"""Get the agent workspace directory from config, defaulting to ~/cow."""
config = load_config_json()
workspace = config.get("agent_workspace", "~/cow")
return os.path.expanduser(workspace)
def get_skills_dir() -> str:
"""Get the custom skills directory."""
return os.path.join(get_workspace_dir(), "skills")
def get_builtin_skills_dir() -> str:
"""Get the builtin skills directory."""
return os.path.join(get_project_root(), "skills")
def load_config_json() -> dict:
"""Load config.json from project root."""
config_path = os.path.join(get_project_root(), "config.json")
if not os.path.exists(config_path):
return {}
try:
with open(config_path, "r", encoding="utf-8") as f:
return json.load(f)
except Exception:
return {}
def load_skills_config() -> dict:
"""Load skills_config.json from the custom skills directory."""
path = os.path.join(get_skills_dir(), "skills_config.json")
if not os.path.exists(path):
return {}
try:
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
except Exception:
return {}
def ensure_sys_path():
"""Add project root to sys.path so we can import agent modules."""
root = get_project_root()
if root not in sys.path:
sys.path.insert(0, root)
SKILL_HUB_API = "https://skills.cowagent.ai/api"

View File

@@ -47,13 +47,14 @@ CREDENTIAL_MAP = {
class CloudClient(LinkAIClient):
def __init__(self, api_key: str, channel, host: str = ""):
super().__init__(api_key, host)
def __init__(self, api_key: str, channel, host: str = "", port=None):
super().__init__(api_key, host, port=port)
self.channel = channel
self.client_type = channel.channel_type
self.channel_mgr = None
self._skill_service = None
self._memory_service = None
self._knowledge_service = None
self._chat_service = None
@property
@@ -88,6 +89,21 @@ class CloudClient(LinkAIClient):
logger.error(f"[CloudClient] Failed to init MemoryService: {e}")
return self._memory_service
@property
def knowledge_service(self):
"""Lazy-init KnowledgeService."""
if self._knowledge_service is None:
try:
from agent.knowledge.service import KnowledgeService
from config import conf
from common.utils import expand_path
workspace_root = expand_path(conf().get("agent_workspace", "~/cow"))
self._knowledge_service = KnowledgeService(workspace_root)
logger.debug("[CloudClient] KnowledgeService initialised")
except Exception as e:
logger.error(f"[CloudClient] Failed to init KnowledgeService: {e}")
return self._knowledge_service
@property
def chat_service(self):
"""Lazy-init ChatService (requires AgentBridge via Bridge singleton)."""
@@ -222,7 +238,14 @@ class CloudClient(LinkAIClient):
return
existing_ch = self.channel_mgr.get_channel(channel_type)
if existing_ch and not cred_changed:
skip_restart = existing_ch and not cred_changed
if skip_restart and channel_type in ("weixin", "wx"):
login_status = getattr(existing_ch, "login_status", "")
if login_status != "logged_in":
skip_restart = False
logger.info(f"[CloudClient] Channel '{channel_type}' not logged in "
f"(status={login_status}), forcing restart")
if skip_restart:
logger.info(f"[CloudClient] Channel '{channel_type}' already running with same config, "
"skip restart, reporting status only")
threading.Thread(
@@ -255,7 +278,14 @@ class CloudClient(LinkAIClient):
).start()
else:
existing_ch = self.channel_mgr.get_channel(channel_type)
if existing_ch and not cred_changed:
needs_restart = cred_changed or not existing_ch
if not needs_restart and channel_type in ("weixin", "wx"):
login_status = getattr(existing_ch, "login_status", "")
if login_status != "logged_in":
needs_restart = True
logger.info(f"[CloudClient] Channel '{channel_type}' not logged in "
f"(status={login_status}), forcing restart")
if existing_ch and not needs_restart:
logger.info(f"[CloudClient] Channel '{channel_type}' already running with same config, "
"skip restart, reporting status only")
threading.Thread(
@@ -454,6 +484,27 @@ class CloudClient(LinkAIClient):
return svc.dispatch(action, payload)
# ------------------------------------------------------------------
# knowledge callback
# ------------------------------------------------------------------
def on_knowledge(self, data: dict) -> dict:
"""
Handle KNOWLEDGE messages from the cloud console.
Delegates to KnowledgeService.dispatch for the actual operations.
:param data: message data with 'action', 'clientId', 'payload'
:return: response dict
"""
action = data.get("action", "")
payload = data.get("payload")
logger.info(f"[CloudClient] on_knowledge: action={action}")
svc = self.knowledge_service
if svc is None:
return {"action": action, "code": 500, "message": "KnowledgeService not available", "payload": None}
return svc.dispatch(action, payload)
# ------------------------------------------------------------------
# chat callback
# ------------------------------------------------------------------
@@ -473,6 +524,19 @@ class CloudClient(LinkAIClient):
session_id = f"session_{session_id}"
logger.info(f"[CloudClient] on_chat: session={session_id}, channel={channel_type}, query={query[:80]}")
# Intercept cow/slash commands before the agent runs
try:
from plugins import PluginManager
mgr = PluginManager()
instance = mgr.instances.get("COW_CLI")
if instance and hasattr(instance, "execute"):
result = instance.execute(query, session_id=session_id)
if result is not None:
send_chunk_fn({"chunk_type": "content", "delta": result, "segment_id": 0})
return
except Exception as e:
logger.warning(f"[CloudClient] cow_cli intercept failed: {e}")
svc = self.chat_service
if svc is None:
raise RuntimeError("ChatService not available")
@@ -615,9 +679,9 @@ def get_deployment_id() -> str:
def get_website_base_url() -> str:
"""Return the public URL prefix that maps to the workspace websites/ dir.
"""Return the URL prefix that maps to the workspace websites/ dir.
Returns empty string when cloud deployment is not configured.
Do nothing when in local env.
"""
deployment_id = get_deployment_id()
if not deployment_id:
@@ -634,6 +698,42 @@ def get_website_base_url() -> str:
return f"https://app.{domain}/{deployment_id}"
# Subdir under websites/ used by the send tool
COW_SEND_WEB_SUBDIR = "cow-send"
def copy_send_file(src_path: str, workspace_root: str) -> str:
"""Copy *src_path* into ``websites/cow-send/`` and return its URL.
Returns empty string in local env.
"""
import shutil
import uuid
from common.utils import expand_path
base = get_website_base_url()
if not base or not src_path or not os.path.isfile(src_path):
return ""
ws = os.path.abspath(expand_path(workspace_root))
send_dir = os.path.join(ws, "websites", COW_SEND_WEB_SUBDIR)
try:
os.makedirs(send_dir, exist_ok=True)
except OSError:
return ""
ext = os.path.splitext(src_path)[1].lower()
if len(ext) > 12 or not ext.replace(".", "").isalnum():
ext = ""
dest_name = f"{uuid.uuid4().hex}{ext}"
dest_path = os.path.join(send_dir, dest_name)
try:
shutil.copy2(src_path, dest_path)
except OSError as e:
logger.warning(f"[cloud] copy_send_file: copy failed: {e}")
return ""
return f"{base}/{COW_SEND_WEB_SUBDIR}/{dest_name}"
def build_website_prompt(workspace_dir: str) -> list:
"""Build system prompt lines for cloud website/file sharing rules.
@@ -654,8 +754,8 @@ def build_website_prompt(workspace_dir: str) -> list:
f" - 例如: `websites/my-app/index.html` → `{base_url}/my-app/index.html`",
"",
"2. **生成文件分享** (PPT、PDF、图片、音视频等): 当你为用户生成了需要下载或查看的文件时,**可以**将文件保存到 `websites/` 目录中",
f" - 例如: 生成的PPT保存到 `websites/files/report.pptx` → 下载链接为 `{base_url}/files/report.pptx`",
" - 你仍然可以同时使用 `send` 工具发送文件(在飞书、钉钉等IM渠道中有效),但**必须同时在回复文本中提供下载链接**作为兜底,因为部分渠道(如网页端)无法通过 send 接收本地文件",
f" - 例如: 生成的PPT保存到 `websites/files/report.pptx` → 下载链接为 `{base_url}/files/report.pptx`",
" - 你仍然可以同时使用 `send` 工具发送文件(在微信、飞书、钉钉、web等渠道中有效),但**必须同时在回复文本中提供下载链接**作为兜底,因为部分渠道无法通过 send 接收本地文件",
"",
"3. **必须发送链接**: 无论是网页还是文件,生成后**必须将完整的访问/下载链接直接写在回复文本中发送给用户**",
"",
@@ -670,7 +770,7 @@ def start(channel, channel_mgr=None):
return
global chat_client
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), channel=channel)
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), port=conf().get("cloud_port"), channel=channel)
chat_client.channel_mgr = channel_mgr
chat_client.config = _build_config()
chat_client.start()

View File

@@ -7,8 +7,8 @@ XUNFEI = "xunfei"
CHATGPTONAZURE = "chatGPTOnAzure"
LINKAI = "linkai"
CLAUDEAPI= "claudeAPI"
QWEN = "qwen" # 旧版千问接入
QWEN_DASHSCOPE = "dashscope" # 新版千问接入(百炼)
QWEN = "qwen" # 千问 (兼容旧配置,实际走 DashscopeBot)
QWEN_DASHSCOPE = "dashscope" # 千问 DashScope 接入
GEMINI = "gemini"
ZHIPU_AI = "zhipu"
MOONSHOT = "moonshot"
@@ -81,18 +81,19 @@ TTS_1_HD = "tts-1-hd"
DEEPSEEK_CHAT = "deepseek-chat" # DeepSeek-V3对话模型
DEEPSEEK_REASONER = "deepseek-reasoner" # DeepSeek-R1模型
# Qwen (通义千问 - 阿里云)
QWEN = "qwen"
# Qwen (通义千问 - 阿里云 DashScope)
QWEN_TURBO = "qwen-turbo"
QWEN_PLUS = "qwen-plus"
QWEN_MAX = "qwen-max"
QWEN_LONG = "qwen-long"
QWEN3_MAX = "qwen3-max" # Qwen3 Max - Agent推荐模型
QWEN35_PLUS = "qwen3.5-plus" # Qwen3.5 Plus - Omni model (MultiModalConversation)
QWEN36_PLUS = "qwen3.6-plus" # Qwen3.6 Plus - Omni model (MultiModalConversation)
QWQ_PLUS = "qwq-plus"
# MiniMax
MINIMAX_M2_7 = "MiniMax-M2.7" # MiniMax M2.7 - Latest
MINIMAX_M2_7_HIGHSPEED = "MiniMax-M2.7-highspeed" # MiniMax M2.7 highspeed
MINIMAX_M2_5 = "MiniMax-M2.5" # MiniMax M2.5
MINIMAX_M2_1 = "MiniMax-M2.1" # MiniMax M2.1
MINIMAX_M2_1_LIGHTNING = "MiniMax-M2.1-lightning" # MiniMax M2.1 极速版
@@ -124,6 +125,10 @@ DOUBAO_SEED_2_PRO = "doubao-seed-2-0-pro-260215"
DOUBAO_SEED_2_LITE = "doubao-seed-2-0-lite-260215"
DOUBAO_SEED_2_MINI = "doubao-seed-2-0-mini-260215"
# ModelScope(魔搭社区)
QWEN3_235B_A22B_INSTRUCT_2507 = "Qwen/Qwen3-235B-A22B-Instruct-2507"
QWEN3_5_27B = "Qwen/Qwen3.5-27B"
# 其他模型
WEN_XIN = "wenxin"
WEN_XIN_4 = "wenxin-4"
@@ -135,11 +140,14 @@ MODELSCOPE = "modelscope"
GITEE_AI_MODEL_LIST = ["Yi-34B-Chat", "InternVL2-8B", "deepseek-coder-33B-instruct", "InternVL2.5-26B", "Qwen2-VL-72B", "Qwen2.5-32B-Instruct", "glm-4-9b-chat", "codegeex4-all-9b", "Qwen2.5-Coder-32B-Instruct", "Qwen2.5-72B-Instruct", "Qwen2.5-7B-Instruct", "Qwen2-72B-Instruct", "Qwen2-7B-Instruct", "code-raccoon-v1", "Qwen2.5-14B-Instruct"]
MODELSCOPE_MODEL_LIST = ["LLM-Research/c4ai-command-r-plus-08-2024","mistralai/Mistral-Small-Instruct-2409","mistralai/Ministral-8B-Instruct-2410","mistralai/Mistral-Large-Instruct-2407",
"Qwen/Qwen2.5-Coder-32B-Instruct","Qwen/Qwen2.5-Coder-14B-Instruct","Qwen/Qwen2.5-Coder-7B-Instruct","Qwen/Qwen2.5-72B-Instruct","Qwen/Qwen2.5-32B-Instruct","Qwen/Qwen2.5-14B-Instruct","Qwen/Qwen2.5-7B-Instruct","Qwen/QwQ-32B-Preview",
"LLM-Research/Llama-3.3-70B-Instruct","opencompass/CompassJudger-1-32B-Instruct","Qwen/QVQ-72B-Preview","LLM-Research/Meta-Llama-3.1-405B-Instruct","LLM-Research/Meta-Llama-3.1-8B-Instruct","Qwen/Qwen2-VL-7B-Instruct","LLM-Research/Meta-Llama-3.1-70B-Instruct",
"Qwen/Qwen2.5-14B-Instruct-1M","Qwen/Qwen2.5-7B-Instruct-1M","Qwen/Qwen2.5-VL-3B-Instruct","Qwen/Qwen2.5-VL-7B-Instruct","Qwen/Qwen2.5-VL-72B-Instruct","deepseek-ai/DeepSeek-R1-Distill-Llama-70B","deepseek-ai/DeepSeek-R1-Distill-Llama-8B","deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-14B","deepseek-ai/DeepSeek-R1-Distill-Qwen-7B","deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B","deepseek-ai/DeepSeek-R1","deepseek-ai/DeepSeek-V3","Qwen/QwQ-32B"]
MODELSCOPE_MODEL_LIST = ["deepseek-ai/DeepSeek-R1-0528", "deepseek-ai/DeepSeek-R1-Distill-Llama-70B", "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", "deepseek-ai/DeepSeek-R1-Distill-Qwen-14B", "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B",
"deepseek-ai/DeepSeek-R1-Distill-Qwen-7B", "deepseek-ai/DeepSeek-V3.2", "LLM-Research/c4ai-command-r-plus-08-2024", "LLM-Research/Llama-4-Maverick-17B-128E-Instruct", "meituan-longcat/LongCat-Flash-Lite", "MiniMax/MiniMax-M1-80k", "MiniMax/MiniMax-M2.5", "mistralai/Ministral-8B-Instruct-2410",
"mistralai/Mistral-Large-Instruct-2407", "mistralai/Mistral-Small-Instruct-2409", "moonshotai/Kimi-K2.5", "MusePublic/Qwen-Image-Edit", "opencompass/CompassJudger-1-32B-Instruct", "OpenGVLab/InternVL3_5-241B-A28B",
"Qwen/QVQ-72B-Preview", "Qwen/Qwen-Image-Edit", "Qwen/Qwen3-0.6B", "Qwen/Qwen3-1.7B", "Qwen/Qwen3-14B", "Qwen/Qwen3-235B-A22B", "Qwen/Qwen3-235B-A22B-Instruct-2507", "Qwen/Qwen3-235B-A22B-Thinking-2507", "Qwen/Qwen3-30B-A3B", "Qwen/Qwen3-30B-A3B-Thinking-2507",
"Qwen/Qwen3-32B", "Qwen/Qwen3-4B", "Qwen/Qwen3-8B", "Qwen/Qwen3-Coder-30B-A3B-Instruct", "Qwen/Qwen3-Coder-480B-A35B-Instruct", "Qwen/Qwen3-Next-80B-A3B-Instruct", "Qwen/Qwen3-Next-80B-A3B-Thinking", "Qwen/Qwen3-VL-235B-A22B-Instruct", "Qwen/Qwen3-VL-8B-Instruct",
"Qwen/Qwen3-VL-8B-Thinking", "Qwen/Qwen3.5-122B-A10B", "Qwen/Qwen3.5-27B", "Qwen/Qwen3.5-35B-A3B", "Qwen/Qwen3.5-397B-A17B", "Qwen/QwQ-32B", "Qwen/QwQ-32B-Preview", "Shanghai_AI_Laboratory/Intern-S1", "Shanghai_AI_Laboratory/Intern-S1-mini",
"stepfun-ai/Step-3.5-Flash", "XiaomiMiMo/MiMo-V2-Flash", "ZhipuAI/GLM-4.7-Flash", "ZhipuAI/GLM-5"]
MODEL_LIST = [
# Claude
@@ -165,10 +173,10 @@ MODEL_LIST = [
DEEPSEEK_CHAT, DEEPSEEK_REASONER,
# Qwen
QWEN, QWEN_TURBO, QWEN_PLUS, QWEN_MAX, QWEN_LONG, QWEN3_MAX, QWEN35_PLUS,
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
# MiniMax
MiniMax, MINIMAX_M2_7, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
MiniMax, MINIMAX_M2_7, MINIMAX_M2_7_HIGHSPEED, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
# GLM
ZHIPU_AI, GLM_5_TURBO, GLM_5, GLM_4, GLM_4_PLUS, GLM_4_flash, GLM_4_LONG, GLM_4_ALLTOOLS,

View File

@@ -1,5 +1,6 @@
import logging
import sys
import io
def _reset_logger(log):
@@ -9,7 +10,10 @@ def _reset_logger(log):
del handler
log.handlers.clear()
log.propagate = False
console_handle = logging.StreamHandler(sys.stdout)
stdout = sys.stdout
if hasattr(stdout, "buffer"):
stdout = io.TextIOWrapper(stdout.buffer, encoding="utf-8", errors="replace", line_buffering=True)
console_handle = logging.StreamHandler(stdout)
console_handle.setFormatter(
logging.Formatter(
"[%(levelname)s][%(asctime)s][%(filename)s:%(lineno)d] - %(message)s",

View File

@@ -29,5 +29,6 @@
"agent": true,
"agent_max_context_tokens": 40000,
"agent_max_context_turns": 20,
"agent_max_steps": 15
"agent_max_steps": 15,
"knowledge": true
}

View File

@@ -180,15 +180,16 @@ available_setting = {
# 豆包(火山方舟) 平台配置
"ark_api_key": "",
"ark_base_url": "https://ark.cn-beijing.volces.com/api/v3",
#魔搭社区 平台配置
# 魔搭社区 平台配置
"modelscope_api_key": "",
"modelscope_base_url": "https://api-inference.modelscope.cn/v1/chat/completions",
# LinkAI平台配置
"use_linkai": False,
"linkai_api_key": "",
"linkai_app_code": "",
"linkai_api_base": "https://api.link-ai.tech", # linkAI服务地址
"linkai_api_base": "https://api.link-ai.tech",
"cloud_host": "client.link-ai.tech",
"cloud_port": None,
"cloud_deployment_id": "",
"minimax_api_key": "",
"Minimax_group_id": "",
@@ -199,6 +200,7 @@ available_setting = {
"agent_max_context_tokens": 50000, # Agent模式下最大上下文tokens
"agent_max_context_turns": 30, # Agent模式下最大上下文记忆轮次
"agent_max_steps": 15, # Agent模式下单次运行最大决策步数
"knowledge": True, # 是否开启知识库功能
}
@@ -408,7 +410,7 @@ def get_root():
def read_file(path):
with open(path, mode="r", encoding="utf-8") as f:
with open(path, mode="r", encoding="utf-8-sig") as f:
return f.read()

View File

@@ -4,32 +4,54 @@ LABEL maintainer="foo@bar.com"
ARG TZ='Asia/Shanghai'
ARG CHATGPT_ON_WECHAT_VER
# Set to "false" to skip Playwright/Chromium and produce a smaller image
ARG INSTALL_BROWSER=true
# Set to "true" to use China mirrors for apt / pip / playwright (faster in CN)
ARG USE_CN_MIRROR=false
RUN echo /etc/apt/sources.list
# RUN sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list
ENV PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright
ENV BUILD_PREFIX=/app
# Optionally switch apt and pip to China mirrors
RUN if [ "$USE_CN_MIRROR" = "true" ]; then \
sed -i 's/deb.debian.org/mirrors.tuna.tsinghua.edu.cn/g' /etc/apt/sources.list; \
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple/; \
fi
ADD . ${BUILD_PREFIX}
# All heavy installs + user creation in ONE layer to avoid chown duplication
RUN apt-get update \
&&apt-get install -y --no-install-recommends bash ffmpeg espeak libavcodec-extra\
&& apt-get install -y --no-install-recommends bash ffmpeg espeak libavcodec-extra \
&& cd ${BUILD_PREFIX} \
&& cp config-template.json config.json \
&& /usr/local/bin/python -m pip install --no-cache --upgrade pip \
&& pip install --no-cache -r requirements.txt \
&& pip install --no-cache -r requirements-optional.txt \
&& pip install azure-cognitiveservices-speech
&& pip install --no-cache -e . \
&& if [ "$INSTALL_BROWSER" = "true" ]; then \
apt-get install -y --no-install-recommends fonts-wqy-zenhei \
&& pip install --no-cache "playwright==1.52.0" \
&& python -m playwright install-deps chromium \
&& mkdir -p /app/ms-playwright \
&& if [ "$USE_CN_MIRROR" = "true" ]; then \
PLAYWRIGHT_DOWNLOAD_HOST=https://registry.npmmirror.com/-/binary/playwright \
python -m playwright install chromium; \
else \
python -m playwright install chromium; \
fi; \
fi \
&& rm -rf /var/lib/apt/lists/* \
&& mkdir -p /home/agent/cow \
&& groupadd -r agent \
&& useradd -r -g agent -s /bin/bash -d /home/agent agent \
&& chown -R agent:agent /home/agent ${BUILD_PREFIX} /usr/local/lib
WORKDIR ${BUILD_PREFIX}
ADD docker/entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh \
&& mkdir -p /home/agent/cow \
&& groupadd -r agent \
&& useradd -r -g agent -s /bin/bash -d /home/agent agent \
&& chown -R agent:agent /home/agent ${BUILD_PREFIX} /usr/local/lib
USER agent
&& chown agent:agent /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@@ -43,9 +43,15 @@ fi
# fi
# go to prefix dir
# fix ownership of mounted volumes then drop to non-root user
if [ "$(id -u)" = "0" ]; then
mkdir -p /home/agent/cow
chown agent:agent /home/agent/cow
exec su agent -s /bin/bash -c "cd $CHATGPT_ON_WECHAT_PREFIX && $CHATGPT_ON_WECHAT_EXEC"
fi
# fallback: already running as agent
cd $CHATGPT_ON_WECHAT_PREFIX
# excute
$CHATGPT_ON_WECHAT_EXEC

View File

@@ -1,185 +0,0 @@
# CowAgent介绍
## 概述
Cow项目从简单的聊天机器人全面升级为超级智能助理 **CowAgent**能够主动规思考和规划任务、拥有长期记忆、操作计算机和外部资源、创造和执行Skill真正理解你并和你一起成长。CowAgent能够长期运行在个人电脑或服务器中通过飞书、钉钉、企业微信、网页等多种方式进行交互。核心能力如下
- **复杂任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标,支持多轮推理和上下文理解
- **工具系统**内置实现10+种工具包括文件读写、bash终端、浏览器、定时任务、记忆管理等通过Agent管理你的计算机或服务器
- **长期记忆**:自动将对话记忆持久化至本地文件和数据库中,包括全局记忆和天级记忆,支持关键词及向量检索
- **Skills系统**新增Skill运行引擎内置多种技能并支持通过自然语言对话完成自定义Skills开发
- **多渠道和多模型支持**支持在Web、飞书、钉钉、企微等多渠道与Agent交互支持Claude、Gemini、OpenAI、GLM、MiniMax、Qwen、Kimi、Doubao 等多种国内外主流模型
- **安全和成本**通过秘钥管理工具、提示词控制、系统权限等手段控制Agent的访问安全通过最大记忆轮次、最大上下文token、工具执行步数对token成本进行限制
## 核心功能
### 1. 长期记忆
> 记忆系统让 Agent 能够长期记住重要信息。Agent 会在用户分享偏好、决策、事实等重要信息时主动存储,也会在对话达到一定长度时自动提取摘要。记忆分为核心记忆、天级记忆,支持语义搜索和向量检索的混合检索模式。
第一次启动Agent会主动向用户获取询问关键信息并记录至工作空间 (默认为 ~/cow) 中的智能体设定、用户身份、记忆文件中。
在后续的长期对话中Agent会在需要的时候智能记录或检索记忆并对自身设定、用户偏好、记忆文件等进行不断更新总结和记录经验和教训真正实现自主思考和不断成长。
<img width="800" src="https://cdn.link-ai.tech/doc/20260203000455.png" />
### 2. 任务规划和工具调用
工具是Agent访问操作系统资源的核心Agent会根据任务需求智能选择和调用工具完成文件读写、命令执行、定时任务等各类操作。内置工具的视线在项目的 `tools` 目录下。
**主要工具:** 文件读写编辑、Bash终端、浏览器、文件发送、定时调度、记忆搜索、环境配置等。
#### 1.1 终端和文件访问能力
针对操作系统的终端和文件的访问能力是最基础和核心的工具其他很多工具或技能都是基于基础工具进行扩展。用户可通过手机端与Agent交互操作个人电脑或服务器上的资源
<img width="800" src="https://cdn.link-ai.tech/doc/20260202181130.png" />
#### 1.2 编程能力
基于编程能力和系统访问能力Agent可以实现从信息搜索、图片等素材生成、编码、测试、部署、Nginx配置修改、发布的 Vibecoding 全流程通过手机端简单的一句命令完成应用的快速demo
<img width="800" src="https://cdn.link-ai.tech/doc/20260203121008.png" />
#### 1.3 定时任务
基于 scheduler 工具实现动态定时任务,支持 **一次性任务、固定时间间隔、Cron表达式** 三种形式,任务触发可选择**固定消息发送** 或 **Agent动态任务** 执行两种模式,有很高灵活性:
<img width="800" src="https://cdn.link-ai.tech/doc/20260202195402.png" />
同时你也可以通过自然语言快速查看和管理已有的定时任务。
#### 1.4 环境变量管理
技能所需要的秘钥存储在环境变量文件中,由 `env_config` 工具进行管理,你可以通过对话的方式更新秘钥,工具内置了安全保护和脱敏策略,会严格保护秘钥安全:
<img width="800" src="https://cdn.link-ai.tech/doc/20260202234939.png" />
### 3. 技能系统
> 技能系统为Agent提供无限的扩展性每个Skill由说明文件、运行脚本 (可选)、资源 (可选) 组成描述如何完成特定类型的任务。通过Skill可以让Agent遵循说明完成复杂流程调用各类工具或对接第三方系统等。
- **内置技能:** 在项目的`skills`目录下包含技能创造器、网络搜索、图像识别openai-image-vision、LinkAI智能体、网页抓取等。内置Skill根据依赖条件 (API Key、系统命令等) 自动判断是否启用。通过技能创造器可以快速创建自定义技能。
- **自定义技能:** 由用户通过对话创建,存放在工作空间中 (`~/cow/skills/`),基于自定义技能可以实现任何复杂的业务流程和第三方系统对接。
#### 3.1 创建技能
通过 `skill-creator` 技能可以通过对话的方式快速创建技能。你可以在与Agent的写作中让他对将某个工作流程固化为技能或者把任意接口文档和示例发送给Agent让他直接完成对接
<img width="800" src="https://cdn.link-ai.tech/doc/20260202202247.png" />
#### 3.2 搜索和图像识别
- **搜索技能:** 系统内置实现了 `bocha-search`(博查搜索)的Skill依赖环境变量 `BOCHA_SEARCH_API_KEY`,可在[控制台](https://open.bochaai.com/)进行创建并发送给Agent完成配置
- **图像识别技能:** 实现了 `openai-image-vision` 插件,可使用 gpt-4.1-mini、gpt-4.1 等图像识别模型。依赖秘钥 `OPENAI_API_KEY`可通过config.json或env_config工具进行维护。
<img width="800" src="https://cdn.link-ai.tech/doc/20260202213219.png" />
#### 3.3 三方知识库和插件
`linkai-agent` 技能可以将 [LinkAI](https://link-ai.tech/) 上的所有智能体作为skill交给Agent使用并实现多智能体决策的效果。
使用方式:需通过对话的方式配置 `LINKAI_API_KEY`或在config.json中添加 `linkai_api_key`。 并在 `skills/linkai-agent/config.json`中添加智能体说明,示例如下:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI客服助手",
"app_description": "当用户需要了解LinkAI平台相关问题时才选择该助手基于LinkAI知识库进行回答"
},
{
"app_code": "SFY5x7JR",
"app_name": "内容创作助手",
"app_description": "当用户需要创作图片或视频时才使用该助手支持Nano Banana、Seedream、即梦、Veo、可灵等多种模型"
}
]
}
```
Agent可根据智能体的名称和描述进行决策并通过 app_code 调用接口访问对应的应用/工作流通过该技能可以灵活访问LinkAI平台上的智能体、知识库、插件等能力实现效果如下
<img width="750" src="https://cdn.link-ai.tech/doc/20260202234350.png" />
注:需通过 `env_config` 配置 `LINKAI_API_KEY`或在config.json中添加 `linkai_api_key` 配置。
## 使用方式
> 详细使用方式参考项目README.md文档进行
### 1.项目运行
在命令行中执行:
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
详细说明及后续程序管理参考:[项目启动脚本](https://github.com/zhayujie/chatgpt-on-wechat/wiki/CowAgentQuickStart)
### 2.模型选择
Agent模式推荐使用以下模型可根据效果及成本综合选择
- **MiniMax**: `MiniMax-M2.7`
- **GLM**: `glm-5-turbo`
- **Kimi**: `kimi-k2.5`
- **Doubao**: `doubao-seed-2-0-code-preview-260215`
- **Qwen**: `qwen3.5-plus`
- **Claude**: `claude-sonnet-4-6`
- **Gemini**: `gemini-3.1-flash-lite-preview`
- **OpenAI**: `gpt-5.4`
详细模型配置方式参考 [README.md 模型说明](../README.md#模型说明)
### 3.Agent核心配置
Agent模式的核心配置项如下`config.json` 中配置:
```bash
{
"agent": true, # 是否启用Agent模式
"agent_workspace": "~/cow", # Agent工作空间路径
"agent_max_context_tokens": 40000, # 最大上下文tokens
"agent_max_context_turns": 30, # 最大上下文记忆轮次
"agent_max_steps": 15 # 单次任务最大决策步数
}
```
**配置说明:**
- `agent`: 设为 `true` 启用Agent模式获得多轮工具决策、长期记忆、Skills等能力
- `agent_workspace`: 工作空间路径,用于存储 memory、skills、其他系统设定提示词
- `agent_max_context_tokens`: 上下文token上限超出将自动丢弃最早的对话
- `agent_max_context_turns`: 上下文记忆轮次,每轮包括一次提问和回复
- `agent_max_steps`: 单次任务最大工具调用步数,防止无限循环
### 4.渠道接入
Agent支持在多种渠道中使用只需修改 `config.json` 中的 `channel_type` 配置即可切换。
- **Web网页**:默认使用该渠道,运行后监听本地端口,通过浏览器访问
- **飞书接入**[飞书接入文档](https://docs.link-ai.tech/cow/multi-platform/feishu)
- **钉钉接入**[钉钉接入文档](https://docs.link-ai.tech/cow/multi-platform/dingtalk)
- **企业微信应用接入**[企微应用文档](https://docs.link-ai.tech/cow/multi-platform/wechat-com)
- **企微智能机器人**[企微智能机器人文档](https://docs.link-ai.tech/cow/multi-platform/wecom-bot)
- **QQ机器人**[QQ机器人文档](https://docs.link-ai.tech/cow/multi-platform/qq)
更多渠道配置参考:[通道说明](../README.md#通道说明)

View File

@@ -9,7 +9,23 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
智能机器人与企业微信自建应用是两种不同的接入方式。智能机器人使用 WebSocket 长连接,无需服务器公网 IP 和域名,配置更简单。
</Note>
## 一、创建智能机器人
## 一、接入方式
### 方式一:扫码一键接入(推荐)
无需提前创建机器人,启动 Cow 项目后打开 Web 控制台本地链接http://127.0.0.1:9899/),选择 **通道** 菜单,点击**接入通道**,选择**企微智能机器人**,切换到「扫码接入」模式,使用**企业微信**扫码即可自动完成机器人创建和接入。
<img src="https://cdn.link-ai.tech/doc/20260401121213.png" width="800"/>
<Note>
扫码成功后,可在企业微信工作台 - **智能机器人**页面对机器人进行进一步配置,包括修改名称、头像、可见范围等。
</Note>
### 方式二:手动创建接入
需要先在企业微信中创建智能机器人并获取 Bot ID 和 Secret再通过 Web 控制台或配置文件接入。
**步骤一:创建智能机器人**
1. 打开企业微信客户端,进入工作台,点击**智能机器人**
@@ -25,34 +41,35 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
4. 设置机器人名称、头像、可见范围,并选择**长连接模式**,记录下 **Bot ID** 和 **Secret** 信息后点击保存。
## 二、配置和运行
**步骤二:接入 CowAgent**
### 方式一Web 控制台接入
<Tabs>
<Tab title="Web 控制台">
打开 Web 控制台,选择**通道**菜单,点击**接入通道**,选择**企微智能机器人**,切换到「手动填写」模式,输入 Bot ID 和 Secret点击接入即可。
启动Cow项目后打开 Web 控制台 (本地链接为: http://127.0.0.1:9899/ ),选择 **通道** 菜单,点击 **接入通道**,选择 **企微智能机器人**,填写上一步保存的 Bot ID 和 Secret点击接入即可。
<img src="https://cdn.link-ai.tech/doc/20260316181711.png" width="800"/>
</Tab>
<Tab title="配置文件">
在 `config.json` 中添加以下配置后启动程序:
<img src="https://cdn.link-ai.tech/doc/20260316181711.png" width="800"/>
```json
{
"channel_type": "wecom_bot",
"wecom_bot_id": "YOUR_BOT_ID",
"wecom_bot_secret": "YOUR_SECRET"
}
```
### 方式二:配置文件接入
| 参数 | 说明 |
| --- | --- |
| `wecom_bot_id` | 智能机器人的 BotID |
| `wecom_bot_secret` | 智能机器人的 Secret |
</Tab>
</Tabs>
在 `config.json` 中添加以下配置:
日志显示 `[WecomBot] Subscribe success` 即表示连接成功。
```json
{
"channel_type": "wecom_bot",
"wecom_bot_id": "YOUR_BOT_ID",
"wecom_bot_secret": "YOUR_SECRET"
}
```
| 参数 | 说明 |
| --- | --- |
| `wecom_bot_id` | 智能机器人的 BotID |
| `wecom_bot_secret` | 智能机器人的 Secret |
配置完成后启动程序,日志显示 `[WecomBot] Subscribe success` 即表示连接成功。
## 三、功能说明
## 二、功能说明
| 功能 | 支持情况 |
| --- | --- |
@@ -64,7 +81,7 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
| 流式回复 | ✅ |
| 定时任务主动推送 | ✅ |
## 、使用
## 、使用
在企业微信中搜索创建的机器人名称,即可开始单聊对话。

View File

@@ -1,9 +1,9 @@
---
title: 微信
description: 将 CowAgent 接入个人微信
description: 将 CowAgent 接入个人微信(基于官方接口)
---
> 接入个人微信,扫码登录即可使用,支持文本、图片、语音、文件、视频等消息的收发
> 接入个人微信,扫码登录即可使用,支持文本、图片、语音、文件、视频等消息的私聊收发。通过微信官方API进行接入无安全风险接入后会在会话中新增一个机器人助手不影响当前账号的使用
## 一、配置和运行

154
docs/cli/general.mdx Normal file
View File

@@ -0,0 +1,154 @@
---
title: 常用命令
description: 查看状态、管理配置和上下文等常用命令
---
以下命令支持在对话中使用 `/` 前缀,也支持在终端中使用 `cow` 前缀(部分命令仅对话可用)。
<Tip>
在 Web 控制台中输入 `/` 会自动弹出命令提示,支持键盘上下选择和 Tab 补全。
</Tip>
## help
显示所有可用命令的帮助信息。
```text
/help
```
## status
查看当前会话和服务的运行状态,包括进程信息、模型配置、会话消息数量和已加载技能数量。
```text
/status
```
输出示例:
```
🐮 CowAgent Status
Process: PID 12345 | Running 2h 15m
Version: 2.0.4
Channel: web
Model: MiniMax-M2.5
Mode: agent
Session: 12 messages | 8 skills loaded
```
## config
查看或修改运行时配置。修改后立即生效,无需重启服务。
**查看所有可配置项:**
```text
/config
```
**查看单个配置项:**
```text
/config model
```
**修改配置项:**
```text
/config model deepseek-chat
```
**支持修改的配置项:**
| 配置项 | 说明 | 示例值 |
| --- | --- | --- |
| `model` | AI 模型名称 | `deepseek-chat` |
| `agent_max_context_tokens` | 最大上下文 tokens | `40000` |
| `agent_max_context_turns` | 最大上下文记忆轮次 | `30` |
| `agent_max_steps` | 单次任务最大决策步数 | `15` |
<Note>
修改 `model` 时,系统会自动匹配对应的模型调用方式。配置会写入 `config.json` 并持久保存。
</Note>
## context
查看当前会话的上下文信息,包括消息数量、内容长度等统计。
```text
/context
```
**清空当前会话上下文:**
```text
/context clear
```
<Tip>
清空上下文后Agent 会"忘记"之前的对话内容,适用于切换话题或释放上下文空间。
</Tip>
## logs
查看最近的服务日志,默认显示最近 20 行,最多 50 行。
```text
/logs
```
**指定行数:**
```text
/logs 50
```
## knowledge
查看和管理个人知识库。默认显示知识库统计信息。
```text
/knowledge
```
输出示例:
```
📚 知识库
- 状态:已开启
- 页面数12
- 总大小45.2 KB
- 分类明细:
- concepts/: 5 篇
- entities/: 4 篇
- sources/: 3 篇
```
**查看目录结构:**
```text
/knowledge list
```
**开启 / 关闭知识库:**
```text
/knowledge on
/knowledge off
```
<Note>
终端 CLI 中 `cow knowledge` 和 `cow knowledge list` 可用,但 `on|off` 仅支持在对话中使用(需实时生效)。
</Note>
## version
显示当前 CowAgent 版本号。
```text
/version
```

93
docs/cli/index.mdx Normal file
View File

@@ -0,0 +1,93 @@
---
title: 命令总览
description: CowAgent 命令系统 — 终端 CLI 和对话命令
---
CowAgent 提供两种命令交互方式:
- **终端CLI** — 在系统终端中执行 `cow <命令>`,用于服务管理、技能管理等运维操作
- **对话命令** — 在对话中输入 `/<命令>` 或 `cow <命令>`,用于查看状态、管理技能、调整配置等
## 终端命令
通过一键安装脚本部署后,`cow` 命令会自动可用。手动安装的用户需要在项目根目录下额外执行:
```bash
pip install -e .
```
安装后即可在任意位置使用 `cow` 命令:
```bash
cow help
```
输出示例:
```
CowAgent CLI
Usage: cow <command>
Service:
start Start the CowAgent service
stop Stop the CowAgent service
restart Restart the CowAgent service
update Update code and restart service
status Show service status
logs View service logs
Skills:
skill Manage skills (list / search / install / uninstall ...)
Knowledge:
knowledge View knowledge base stats and structure
Others:
help Show this help message
version Show version
```
## 对话命令
在 Web 控制台或任意接入渠道的对话中,支持输入以 `/` 开头的命令:
| 命令 | 说明 |
| --- | --- |
| `/help` | 显示命令帮助 |
| `/status` | 查看服务状态和配置 |
| `/config` | 查看或修改运行时配置 |
| `/skill` | 管理技能(安装、卸载、启用、禁用等) |
| `/knowledge` | 查看知识库统计信息 |
| `/knowledge list` | 查看知识库目录结构 |
| `/knowledge on\|off` | 开启或关闭知识库 |
| `/context` | 查看当前会话上下文信息 |
| `/context clear` | 清空当前会话上下文 |
| `/logs` | 查看最近日志 |
| `/version` | 显示版本号 |
<Tip>
对话命令中 `/start`、`/stop`、`/restart` 等服务管理命令会提示到终端中执行,因为它们涉及进程操作。
</Tip>
## 命令对照表
以下是各命令在终端和对话中的可用性:
| 命令 | 终端 (`cow`) | 对话 (`/`) |
| --- | :---: | :---: |
| help | ✓ | ✓ |
| version | ✓ | ✓ |
| status | ✓ | ✓ |
| logs | ✓ | ✓ |
| config | ✗ | ✓ |
| context | — | ✓ |
| knowledge (子命令) | ✓ | ✓ |
| skill (子命令) | ✓ | ✓ |
| start / stop / restart | ✓ | ✗ |
| update | ✓ | ✗ |
| install-browser | ✓ | ✗ |
<Note>
`context` 在终端中仅提示到对话中使用。`config` 仅支持在对话中修改。
</Note>

134
docs/cli/process.mdx Normal file
View File

@@ -0,0 +1,134 @@
---
title: 进程管理
description: 使用 cow 命令管理 CowAgent 进程的启动、停止、重启、更新等操作
---
进程管理命令用于控制 CowAgent 后台进程的生命周期。这些命令仅在终端中可用。
## start
启动 CowAgent 服务。默认以后台进程方式运行,并自动跟踪日志输出。
```bash
cow start
```
**选项:**
| 选项 | 说明 |
| --- | --- |
| `-f`, `--foreground` | 前台运行,不以后台守护进程方式启动 |
| `--no-logs` | 启动后不自动跟踪日志 |
## stop
停止正在运行的 CowAgent 服务。
```bash
cow stop
```
## restart
重启 CowAgent 服务(先停止再启动)。
```bash
cow restart
```
**选项:**
| 选项 | 说明 |
| --- | --- |
| `--no-logs` | 重启后不自动跟踪日志 |
## update
更新代码并重启服务。自动执行以下流程:
1. 拉取最新代码(`git pull`
2. 停止当前服务
3. 更新 Python 依赖
4. 重新安装 CLI
5. 启动服务
```bash
cow update
```
<Warning>
如果 `git pull` 失败(如存在本地未提交的修改),更新会中止,服务不受影响。
</Warning>
## status
查看 CowAgent 服务运行状态,包括进程信息、版本号、当前配置的模型和通道。
```bash
cow status
```
输出示例:
```
🐮 CowAgent Status
Status: ● Running (PID: 12345)
Version: 2.0.4
Channel: web
Model: MiniMax-M2.5
Mode: agent
```
## logs
查看服务日志。
```bash
cow logs
```
**选项:**
| 选项 | 说明 | 默认值 |
| --- | --- | --- |
| `-f`, `--follow` | 持续跟踪日志输出 | 否 |
| `-n`, `--lines` | 显示最近 N 行 | 50 |
示例:
```bash
# 查看最近 100 行日志
cow logs -n 100
# 持续跟踪日志
cow logs -f
```
## install-browser
安装 Playwright 和 Chromium 浏览器,用于启用 [浏览器工具](/tools/browser)。
```bash
cow install-browser
```
<Tip>
仅在需要使用浏览器工具(如网页浏览、截图等)时才需要安装。
</Tip>
## run.sh 兼容
如果未安装 Cow CLI也可以使用 `run.sh` 脚本管理服务:
| cow 命令 | run.sh 等效命令 |
| --- | --- |
| `cow start` | `./run.sh start` |
| `cow stop` | `./run.sh stop` |
| `cow restart` | `./run.sh restart` |
| `cow update` | `./run.sh update` |
| `cow status` | `./run.sh status` |
| `cow logs` | `./run.sh logs` |
<Note>
推荐使用 `cow` 命令,它提供更简洁的语法和更丰富的功能。通过一键安装脚本部署时 `cow` 命令会自动安装。
</Note>

218
docs/cli/skill.mdx Normal file
View File

@@ -0,0 +1,218 @@
---
title: 技能管理
description: 通过命令安装、卸载、启用、禁用和管理技能
---
技能管理命令用于安装、查询和管理 CowAgent 的技能。在对话中使用 `/skill <子命令>`,在终端中使用 `cow skill <子命令>`。
## list
列出已安装的技能及其状态。
<CodeGroup>
```text 对话
/skill list
```
```bash 终端
cow skill list
```
</CodeGroup>
输出示例:
```
📦 已安装的技能 (3/4)
✅ pptx
Use this skill any time a .pptx file is involved…
来源: cowhub
✅ skill-creator
Create, install, or update skills…
来源: builtin
⏸️ image-vision (已禁用)
图片理解和视觉分析
来源: builtin
```
**浏览技能广场**(查看 Hub 上所有可安装的技能):
<CodeGroup>
```text 对话
/skill list --remote
```
```bash 终端
cow skill list --remote
```
</CodeGroup>
**选项:**
| 选项 | 说明 | 默认值 |
| --- | --- | --- |
| `--remote`, `-r` | 浏览 Skill Hub 远程技能列表 | 否 |
| `--page` | 远程列表分页页码 | 1 |
## search
在技能广场中搜索技能。
<CodeGroup>
```text 对话
/skill search pptx
```
```bash 终端
cow skill search pptx
```
</CodeGroup>
## install
安装技能。通过统一的 `install` 命令,可一键安装来自 **Cow 技能广场、GitHub、ClawHub** 以及任意 URLzip 压缩包、SKILL.md 链接)上的技能,无需手动下载和配置。
**从 Cow 技能广场安装(推荐):**
<CodeGroup>
```text 对话
/skill install pptx
```
```bash 终端
cow skill install pptx
```
</CodeGroup>
**从 GitHub 安装:**
<CodeGroup>
```text 对话
# 安装仓库中的所有技能(自动扫描包含 SKILL.md 的子目录)
/skill install larksuite/cli
# 指定子目录,只安装单个技能
/skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# 使用 # 指定子目录
/skill install larksuite/cli#skills/lark-minutes
```
```bash 终端
# 安装仓库中的所有技能(自动扫描包含 SKILL.md 的子目录)
cow skill install larksuite/cli
# 指定子目录,只安装单个技能
cow skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# 使用 # 指定子目录
cow skill install larksuite/cli#skills/lark-minutes
```
</CodeGroup>
支持完整的 GitHub URL 和 `owner/repo` 简写。对于 mono-repo一个仓库中包含多个技能不指定子目录时会自动发现并批量安装所有技能指定子目录时只安装该目录下的技能。
**从 ClawHub 安装:**
<CodeGroup>
```text 对话
/skill install clawhub:baidu-search
```
```bash 终端
cow skill install clawhub:baidu-search
```
</CodeGroup>
**从 URL 安装:**
<CodeGroup>
```text 对话
# 从 zip 压缩包安装(支持单个或批量)
/skill install https://cdn.link-ai.tech/skills/pptx.zip
# 从 SKILL.md 链接安装
/skill install https://example.com/path/to/SKILL.md
```
```bash 终端
# 从 zip 压缩包安装(支持单个或批量)
cow skill install https://cdn.link-ai.tech/skills/pptx.zip
# 从 SKILL.md 链接安装
cow skill install https://example.com/path/to/SKILL.md
```
</CodeGroup>
支持从 zip / tar.gz 压缩包 URL 安装,解压后自动扫描包含 `SKILL.md` 的目录,支持单个或批量安装。也支持直接从 `SKILL.md` 文件链接安装,会自动解析技能名称和描述。
安装成功后会显示技能名称、描述和来源,例如:
```
✅ baidu-search
百度搜索:使用百度搜索引擎检索信息…
来源: clawhub
```
## uninstall
卸载已安装的技能。
<CodeGroup>
```text 对话
/skill uninstall pptx
```
```bash 终端
cow skill uninstall pptx
```
</CodeGroup>
<Warning>
卸载操作会删除技能目录下的所有文件,此操作不可恢复。
</Warning>
## enable / disable
启用或禁用技能,禁用后技能不会被 Agent 调用。
<CodeGroup>
```text 对话
/skill enable pptx
/skill disable pptx
```
```bash 终端
cow skill enable pptx
cow skill disable pptx
```
</CodeGroup>
## info
查看已安装技能的详细信息,包括 `SKILL.md` 内容预览。
<CodeGroup>
```text 对话
/skill info pptx
```
```bash 终端
cow skill info pptx
```
</CodeGroup>
## 技能来源
安装的技能会记录来源信息,可通过 `/skill list` 查看:
| 来源标识 | 说明 |
| --- | --- |
| `builtin` | 项目内置技能 |
| `cowhub` | 从 CowAgent Skill Hub 安装 |
| `github` | 从 GitHub URL 直接安装 |
| `clawhub` | 从 ClawHub 安装 |
| `url` | 从 SKILL.md URL 安装 |
| `local` | 本地创建的技能 |

View File

@@ -106,14 +106,17 @@
"tools/bash",
"tools/send",
"tools/memory",
"tools/env-config"
"tools/env-config",
"tools/web-fetch",
"tools/scheduler"
]
},
{
"group": "可选工具",
"pages": [
"tools/web-search",
"tools/scheduler"
"tools/vision",
"tools/browser"
]
}
]
@@ -125,15 +128,9 @@
"group": "技能系统",
"pages": [
"skills/index",
"skills/skill-creator"
]
},
{
"group": "内置技能",
"pages": [
"skills/image-vision",
"skills/linkai-agent",
"skills/web-fetch"
"skills/install",
"skills/create",
"skills/hub"
]
}
]
@@ -144,7 +141,19 @@
{
"group": "记忆系统",
"pages": [
"memory"
"memory/index",
"memory/context"
]
}
]
},
{
"tab": "知识",
"groups": [
{
"group": "知识库",
"pages": [
"knowledge/index"
]
}
]
@@ -167,6 +176,20 @@
}
]
},
{
"tab": "命令",
"groups": [
{
"group": "命令系统",
"pages": [
"cli/index",
"cli/process",
"cli/skill",
"cli/general"
]
}
]
},
{
"tab": "版本",
"groups": [
@@ -174,6 +197,7 @@
"group": "发布记录",
"pages": [
"releases/overview",
"releases/v2.0.5",
"releases/v2.0.4",
"releases/v2.0.3",
"releases/v2.0.2",
@@ -254,14 +278,17 @@
"en/tools/bash",
"en/tools/send",
"en/tools/memory",
"en/tools/env-config"
"en/tools/env-config",
"en/tools/web-fetch",
"en/tools/scheduler"
]
},
{
"group": "Optional Tools",
"pages": [
"en/tools/web-search",
"en/tools/scheduler"
"en/tools/vision",
"en/tools/browser"
]
}
]
@@ -273,15 +300,9 @@
"group": "Skills System",
"pages": [
"en/skills/index",
"en/skills/skill-creator"
]
},
{
"group": "Built-in Skills",
"pages": [
"en/skills/image-vision",
"en/skills/linkai-agent",
"en/skills/web-fetch"
"en/skills/install",
"en/skills/skill-creator",
"en/skills/hub"
]
}
]
@@ -292,7 +313,19 @@
{
"group": "Memory System",
"pages": [
"en/memory"
"en/memory/index",
"en/memory/context"
]
}
]
},
{
"tab": "Knowledge",
"groups": [
{
"group": "Knowledge Base",
"pages": [
"en/knowledge/index"
]
}
]
@@ -315,6 +348,20 @@
}
]
},
{
"tab": "CLI",
"groups": [
{
"group": "Command System",
"pages": [
"en/cli/index",
"en/cli/process",
"en/cli/skill",
"en/cli/chat"
]
}
]
},
{
"tab": "Releases",
"groups": [
@@ -322,6 +369,7 @@
"group": "Release Notes",
"pages": [
"en/releases/overview",
"en/releases/v2.0.5",
"en/releases/v2.0.4",
"en/releases/v2.0.2",
"en/releases/v2.0.1",
@@ -403,14 +451,16 @@
"ja/tools/send",
"ja/tools/memory",
"ja/tools/env-config",
"ja/tools/browser"
"ja/tools/web-fetch",
"ja/tools/scheduler"
]
},
{
"group": "オプションツール",
"pages": [
"ja/tools/web-search",
"ja/tools/scheduler"
"ja/tools/vision",
"ja/tools/browser"
]
}
]
@@ -422,15 +472,9 @@
"group": "スキルシステム",
"pages": [
"ja/skills/index",
"ja/skills/skill-creator"
]
},
{
"group": "内蔵スキル",
"pages": [
"ja/skills/image-vision",
"ja/skills/linkai-agent",
"ja/skills/web-fetch"
"ja/skills/install",
"ja/skills/create",
"ja/skills/hub"
]
}
]
@@ -441,7 +485,19 @@
{
"group": "メモリシステム",
"pages": [
"ja/memory"
"ja/memory/index",
"ja/memory/context"
]
}
]
},
{
"tab": "ナレッジ",
"groups": [
{
"group": "ナレッジベース",
"pages": [
"ja/knowledge/index"
]
}
]
@@ -464,6 +520,20 @@
}
]
},
{
"tab": "CLI",
"groups": [
{
"group": "コマンドシステム",
"pages": [
"ja/cli/index",
"ja/cli/process",
"ja/cli/skill",
"ja/cli/general"
]
}
]
},
{
"tab": "リリース",
"groups": [
@@ -471,6 +541,7 @@
"group": "リリースノート",
"pages": [
"ja/releases/overview",
"ja/releases/v2.0.5",
"ja/releases/v2.0.4",
"ja/releases/v2.0.3",
"ja/releases/v2.0.2",

View File

@@ -13,6 +13,7 @@
<a href="https://cowagent.ai/">🌐 Website</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/intro/index">📖 Docs</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/guide/quick-start">🚀 Quick Start</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 Skill Hub</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ Try Online</a>
</p>
@@ -20,13 +21,14 @@
> CowAgent is both an out-of-the-box AI super assistant and a highly extensible Agent framework. You can extend it with new model interfaces, channels, built-in tools, and the Skills system to flexibly implement various customization needs.
-**Autonomous Task Planning**: Understands complex tasks and autonomously plans execution, continuously thinking and invoking tools until goals are achieved. Supports accessing files, terminal, browser, schedulers, and other system resources via tools.
-**Autonomous Task Planning**: Understands complex tasks and autonomously plans execution, continuously thinking and invoking tools until goals are achieved.
-**Long-term Memory**: Automatically persists conversation memory to local files and databases, including core memory and daily memory, with keyword and vector retrieval support.
-**Skills System**: Implements a Skills creation and execution engine with multiple built-in skills, and supports custom Skills development through natural language conversation.
-**Skills System**: Implements a Skills creation and execution engine, supports installing skills from [Skill Hub](https://skills.cowagent.ai), GitHub, etc., or creating custom Skills through conversation.
-**Tool System**: Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more — autonomously invoked by the Agent.
-**CLI System**: Provides terminal commands and in-chat commands for process management, skill installation, configuration, and more.
-**Multimodal Messages**: Supports parsing, processing, generating, and sending text, images, voice, files, and other message types.
-**Multiple Model Support**: Supports OpenAI, Claude, Gemini, DeepSeek, MiniMax, GLM, Qwen, Kimi, Doubao, and other mainstream model providers.
-**Multi-platform Deployment**: Runs on local computers or servers, integrable into WeChat, Web, Feishu, DingTalk, WeChat Official Account, and WeCom applications.
-**Knowledge Base**: Integrates enterprise knowledge base capabilities via the [LinkAI](https://link-ai.tech) platform.
## Disclaimer
@@ -40,6 +42,8 @@ Try online (no deployment needed): [CowAgent](https://link-ai.tech/cowagent/crea
## Changelog
> **2026.04.01:** [v2.0.5](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.5) — Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more.
> **2026.02.27:** [v2.0.2](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.2) — Web console overhaul (streaming chat, model/skill/memory/channel/scheduler/log management), multi-channel concurrent running, session persistence, new models including Gemini 3.1 Pro / Claude 4.6 Sonnet / Qwen3.5 Plus.
> **2026.02.13:** [v2.0.1](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.1) — Built-in Web Search tool, smart context trimming, runtime info dynamic update, Windows compatibility, fixes for scheduler memory loss, Feishu connection issues, and more.
@@ -60,13 +64,19 @@ Full changelog: [Release Notes](https://docs.cowagent.ai/en/releases/overview)
The project provides a one-click script for installation, configuration, startup, and management:
**Linux / macOS:**
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
**Windows (PowerShell):**
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
After running, the Web service starts by default. Access `http://localhost:9899/chat` to chat.
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start)
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/cli/index) to manage the service.
### Manual Installation
@@ -84,7 +94,25 @@ pip3 install -r requirements.txt
pip3 install -r requirements-optional.txt # optional but recommended
```
**3. Configure**
**3. Install Cow CLI (recommended)**
```bash
pip3 install -e .
```
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/cli/index).
**4. Install browser (optional)**
If you need the Agent to operate a browser (visit web pages, fill forms, etc.):
```bash
cow install-browser
```
This auto-installs `playwright` and Chromium. See [Browser Tool Docs](https://docs.cowagent.ai/en/tools/browser).
**5. Configure**
```bash
cp config-template.json config.json
@@ -92,13 +120,25 @@ cp config-template.json config.json
Fill in your model API key and channel type in `config.json`. See the [configuration docs](https://docs.cowagent.ai/en/guide/manual-install) for details.
**4. Run**
**6. Run**
```bash
python3 app.py
cow start # recommended, requires Cow CLI
python3 app.py # or run directly
```
For server background run:
For server deployment, use `cow` commands to manage the service:
```bash
cow start # start in background
cow stop # stop service
cow restart # restart service
cow status # check running status
cow logs # view logs
cow update # pull latest code and restart
```
Or use the traditional way:
```bash
nohup python3 app.py & tail -f nohup.out
@@ -125,7 +165,7 @@ Supports mainstream model providers. Recommended models for Agent mode:
| GLM | `glm-5-turbo` |
| Kimi | `kimi-k2.5` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Qwen | `qwen3.5-plus` |
| Qwen | `qwen3.6-plus` |
| Claude | `claude-sonnet-4-6` |
| Gemini | `gemini-3.1-pro-preview` |
| OpenAI | `gpt-5.4` |
@@ -186,6 +226,7 @@ Multiple channels can be enabled simultaneously, separated by commas: `"channel_
## 🔗 Related Projects
- [Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub): Open skill marketplace for AI Agents — browse, search, install, and publish skills for CowAgent, OpenClaw, Claude Code, and more.
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything): Lightweight and highly extensible LLM application framework supporting Slack, Telegram, Discord, Gmail, and more.
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh): Open-source Multi-Agent framework for complex problem solving through agent team collaboration.
@@ -195,7 +236,7 @@ FAQs: <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
## 🛠️ Contributing
Welcome to add new channels, referring to the [Feishu channel](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) as an example. Also welcome to contribute new Skills, referring to the [Skill Creator docs](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/skills/skill-creator/SKILL.md).
Welcome to add new channels, referring to the [Feishu channel](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) as an example. Also welcome to contribute new Skills, see the [Skill Creation docs](https://docs.cowagent.ai/en/skills/create), or submit to [Skill Hub](https://skills.cowagent.ai/submit).
## ✉ Contact

126
docs/en/cli/general.mdx Normal file
View File

@@ -0,0 +1,126 @@
---
title: General Commands
description: View status, manage config, and control context with commonly used commands
---
The following commands can be used in chat with the `/` prefix or in the terminal with the `cow` prefix (some are chat-only).
<Tip>
In the Web console, typing `/` brings up an autocomplete menu with keyboard navigation and Tab completion.
</Tip>
## help
Show help information for all available commands.
```text
/help
```
## status
View current session and service status, including process info, model configuration, message count, and loaded skills.
```text
/status
```
## config
View or modify runtime configuration. Changes take effect immediately without restarting.
**View all configurable items:**
```text
/config
```
**View a single item:**
```text
/config model
```
**Modify a config item:**
```text
/config model deepseek-chat
```
**Configurable items:**
| Item | Description | Example |
| --- | --- | --- |
| `model` | AI model name | `deepseek-chat` |
| `agent_max_context_tokens` | Max context tokens | `40000` |
| `agent_max_context_turns` | Max context memory turns | `30` |
| `agent_max_steps` | Max decision steps per task | `15` |
<Note>
When changing `model`, the system automatically matches the corresponding model API. Configuration is persisted to `config.json`.
</Note>
## context
View current session context statistics, including message count and content length.
```text
/context
```
**Clear current session context:**
```text
/context clear
```
<Tip>
Clearing context makes the Agent "forget" previous conversation, useful for switching topics or freeing context space.
</Tip>
## logs
View recent service logs. Shows the last 20 lines by default, up to 50.
```text
/logs
```
**Specify line count:**
```text
/logs 50
```
## knowledge
View and manage the personal knowledge base. Shows statistics by default.
```text
/knowledge
```
**View directory structure:**
```text
/knowledge list
```
**Enable / disable knowledge base:**
```text
/knowledge on
/knowledge off
```
<Note>
In the terminal CLI, `cow knowledge` and `cow knowledge list` are available, but `on|off` is only supported in chat (requires runtime effect).
</Note>
## version
Show the current CowAgent version.
```text
/version
```

91
docs/en/cli/index.mdx Normal file
View File

@@ -0,0 +1,91 @@
---
title: Commands Overview
description: CowAgent command system — Terminal CLI and chat commands
---
CowAgent provides two ways to interact via commands:
- **Terminal CLI** — Run `cow <command>` in your system terminal for service management, skill management, and other operations
- **Chat Commands** — Type `/<command>` or `cow <command>` in any conversation to check status, manage skills, adjust configuration, etc.
## Cow CLI
After deploying with the one-click install script, the `cow` command is automatically available. For manual installations, run:
```bash
pip install -e .
```
Then use the `cow` command from anywhere:
```bash
cow help
```
Example output:
```
🐮 CowAgent CLI
Usage: cow <command>
Service:
start Start the CowAgent service
stop Stop the CowAgent service
restart Restart the CowAgent service
update Update code and restart service
status Show service status
logs View service logs
Skills:
skill Manage skills (list / search / install / uninstall ...)
Knowledge:
knowledge View knowledge base stats and structure
Others:
help Show this help message
version Show version
```
## Chat Commands
In the Web console or any connected channel, type `/` to see command suggestions. Supported commands:
| Command | Description |
| --- | --- |
| `/help` | Show command help |
| `/status` | View service status and configuration |
| `/config` | View or modify runtime configuration |
| `/skill` | Manage skills (install, uninstall, enable, disable, etc.) |
| `/knowledge` | View knowledge base statistics |
| `/knowledge list` | View knowledge base directory structure |
| `/knowledge on\|off` | Enable or disable knowledge base |
| `/context` | View current session context info |
| `/context clear` | Clear current session context |
| `/logs` | View recent logs |
| `/version` | Show version number |
<Tip>
Service management commands like `/start`, `/stop`, `/restart` will prompt you to use them in the terminal instead, as they involve process operations.
</Tip>
## Command Availability
| Command | Terminal (`cow`) | Chat (`/`) |
| --- | :---: | :---: |
| help | ✓ | ✓ |
| version | ✓ | ✓ |
| status | ✓ | ✓ |
| logs | ✓ | ✓ |
| config | ✗ | ✓ |
| context | — | ✓ |
| knowledge (subcommands) | ✓ | ✓ |
| skill (subcommands) | ✓ | ✓ |
| start / stop / restart | ✓ | ✗ |
| update | ✓ | ✗ |
| install-browser | ✓ | ✗ |
<Note>
`context` only shows a hint in the terminal to use it in chat. `config` is only available in chat.
</Note>

123
docs/en/cli/process.mdx Normal file
View File

@@ -0,0 +1,123 @@
---
title: Process Management
description: Manage CowAgent process lifecycle with cow commands
---
Process management commands control the CowAgent background process. These commands are only available in the terminal.
## start
Start the CowAgent service. Runs as a background daemon by default and automatically tails logs.
```bash
cow start
```
**Options:**
| Option | Description |
| --- | --- |
| `-f`, `--foreground` | Run in foreground, not as a background daemon |
| `--no-logs` | Don't tail logs after starting |
## stop
Stop the running CowAgent service.
```bash
cow stop
```
## restart
Restart the CowAgent service (stop then start).
```bash
cow restart
```
**Options:**
| Option | Description |
| --- | --- |
| `--no-logs` | Don't tail logs after restart |
## update
Update code and restart the service. Automatically performs:
1. Pull latest code (`git pull`)
2. Stop current service
3. Update Python dependencies
4. Reinstall CLI
5. Start service
```bash
cow update
```
<Warning>
If `git pull` fails (e.g., uncommitted local changes), the update aborts and the service remains unaffected.
</Warning>
## status
Check CowAgent service status, including process info, version, and current model/channel configuration.
```bash
cow status
```
## logs
View service logs.
```bash
cow logs
```
**Options:**
| Option | Description | Default |
| --- | --- | --- |
| `-f`, `--follow` | Continuously tail log output | No |
| `-n`, `--lines` | Show last N lines | 50 |
Examples:
```bash
# View last 100 lines
cow logs -n 100
# Continuously tail logs
cow logs -f
```
## install-browser
Install Playwright and Chromium browser for the [browser tool](/en/tools/browser).
```bash
cow install-browser
```
<Tip>
Only needed when using browser tools (web browsing, screenshots, etc.).
</Tip>
## run.sh Compatibility
If Cow CLI is not installed, you can use `run.sh` to manage the service:
| cow command | run.sh equivalent |
| --- | --- |
| `cow start` | `./run.sh start` |
| `cow stop` | `./run.sh stop` |
| `cow restart` | `./run.sh restart` |
| `cow update` | `./run.sh update` |
| `cow status` | `./run.sh status` |
| `cow logs` | `./run.sh logs` |
<Note>
The `cow` command is recommended — it provides cleaner syntax and richer features. It is automatically installed via the one-click install script.
</Note>

192
docs/en/cli/skill.mdx Normal file
View File

@@ -0,0 +1,192 @@
---
title: Skill Management
description: Install, uninstall, enable, disable, and manage skills via commands
---
Skill management commands are used to install, query, and manage CowAgent skills. Use `/skill <subcommand>` in chat or `cow skill <subcommand>` in the terminal.
## list
List installed skills and their status.
<CodeGroup>
```text Chat
/skill list
```
```bash Terminal
cow skill list
```
</CodeGroup>
**Browse the Skill Hub** (view all available skills):
<CodeGroup>
```text Chat
/skill list --remote
```
```bash Terminal
cow skill list --remote
```
</CodeGroup>
**Options:**
| Option | Description | Default |
| --- | --- | --- |
| `--remote`, `-r` | Browse Skill Hub remote skill list | No |
| `--page` | Page number for remote listing | 1 |
## search
Search for skills on the Skill Hub.
<CodeGroup>
```text Chat
/skill search pptx
```
```bash Terminal
cow skill search pptx
```
</CodeGroup>
## install
Install skills with a single `install` command from Cow Skill Hub, GitHub, ClawHub, or any URL (zip archives, SKILL.md links) — no manual download or configuration required.
**From Skill Hub (recommended):**
<CodeGroup>
```text Chat
/skill install pptx
```
```bash Terminal
cow skill install pptx
```
</CodeGroup>
**From GitHub:**
<CodeGroup>
```text Chat
# Install all skills in a repo (auto-discovers subdirectories with SKILL.md)
/skill install larksuite/cli
# Specify a subdirectory to install a single skill
/skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# Use # to specify a subdirectory
/skill install larksuite/cli#skills/lark-minutes
```
```bash Terminal
# Install all skills in a repo (auto-discovers subdirectories with SKILL.md)
cow skill install larksuite/cli
# Specify a subdirectory to install a single skill
cow skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
# Use # to specify a subdirectory
cow skill install larksuite/cli#skills/lark-minutes
```
</CodeGroup>
Supports full GitHub URLs and `owner/repo` shorthand. For mono-repos (multiple skills in one repository), omitting the subdirectory auto-discovers and batch-installs all skills; specifying a subdirectory installs only that skill.
**From ClawHub:**
<CodeGroup>
```text Chat
/skill install clawhub:baidu-search
```
```bash Terminal
cow skill install clawhub:baidu-search
```
</CodeGroup>
**From URL:**
<CodeGroup>
```text Chat
# Install from a zip archive (single or batch)
/skill install https://cdn.link-ai.tech/skills/pptx.zip
# Install from a SKILL.md link
/skill install https://example.com/path/to/SKILL.md
```
```bash Terminal
# Install from a zip archive (single or batch)
cow skill install https://cdn.link-ai.tech/skills/pptx.zip
# Install from a SKILL.md link
cow skill install https://example.com/path/to/SKILL.md
```
</CodeGroup>
Supports installing from zip / tar.gz archive URLs — automatically extracts and discovers directories containing `SKILL.md`, with support for single or batch install. Also supports installing directly from a `SKILL.md` file URL, automatically parsing the skill name and description.
## uninstall
Uninstall an installed skill.
<CodeGroup>
```text Chat
/skill uninstall pptx
```
```bash Terminal
cow skill uninstall pptx
```
</CodeGroup>
<Warning>
Uninstalling deletes all files in the skill directory. This action cannot be undone.
</Warning>
## enable / disable
Enable or disable a skill. Disabled skills will not be invoked by the Agent.
<CodeGroup>
```text Chat
/skill enable pptx
/skill disable pptx
```
```bash Terminal
cow skill enable pptx
cow skill disable pptx
```
</CodeGroup>
## info
View details of an installed skill, including a preview of its `SKILL.md`.
<CodeGroup>
```text Chat
/skill info pptx
```
```bash Terminal
cow skill info pptx
```
</CodeGroup>
## Skill Sources
Installed skills track their origin, viewable via `/skill list`:
| Source | Description |
| --- | --- |
| `builtin` | Built-in project skills |
| `cowhub` | Installed from CowAgent Skill Hub |
| `github` | Installed directly from a GitHub URL |
| `clawhub` | Installed from ClawHub |
| `url` | Installed from a SKILL.md URL |
| `local` | Locally created skills |

View File

@@ -30,7 +30,25 @@ Optional dependencies (recommended):
pip3 install -r requirements-optional.txt
```
### 3. Configure
### 3. Install Cow CLI
Install the command-line tool for managing services and skills:
```bash
pip3 install -e .
```
Then use the `cow` command:
```bash
cow help
```
<Note>
This step is recommended. After installation you can use `cow start`, `cow stop`, `cow update` to manage the service, and `cow skill` to manage skills. Without the CLI, you can use `./run.sh` or `python3 app.py` to run.
</Note>
### 4. Configure
Copy the config template and edit:
@@ -40,22 +58,32 @@ cp config-template.json config.json
Fill in model API keys, channel type, and other settings in `config.json`. See the [model docs](/en/models/index) for details.
### 4. Run
### 5. Run
**Local run:**
**Using Cow CLI (recommended):**
```bash
cow start
```
**Or run locally in foreground:**
```bash
python3 app.py
```
By default, the Web service starts. Access `http://localhost:9899/chat` to chat.
By default, the Web console starts. Access `http://localhost:9899` to chat.
**Background run on server:**
**Background run on server (without CLI):**
```bash
nohup python3 app.py & tail -f nohup.out
```
<Tip>
If deploying on a server, open port `9899` in your firewall or security group to access the Web console. It's recommended to restrict access to specific IPs for security.
</Tip>
## Docker Deployment
Docker deployment does not require cloning source code or installing dependencies. For Agent mode, source deployment is recommended for broader system access.
@@ -84,6 +112,10 @@ sudo docker compose up -d
sudo docker logs -f chatgpt-on-wechat
```
<Tip>
If deploying on a server, open port `9899` in your firewall or security group to access the Web console. It's recommended to restrict access to specific IPs for security.
</Tip>
## Core Configuration
```json

View File

@@ -9,31 +9,46 @@ Supports Linux, macOS, and Windows. Requires Python 3.7-3.12 (3.9 recommended).
## Install Command
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
<Tabs>
<Tab title="Linux / macOS">
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
</Tab>
<Tab title="Windows (PowerShell)">
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
</Tab>
</Tabs>
The script automatically performs these steps:
1. Check Python environment (requires Python 3.7+)
2. Install required tools (git, curl, etc.)
3. Clone project to `~/chatgpt-on-wechat`
4. Install Python dependencies
4. Install Python dependencies and Cow CLI
5. Guided configuration for AI model and channel
6. Start service
By default, the Web service starts after installation. Access `http://localhost:9899/chat` to begin chatting.
By default, the Web console starts after installation. Access `http://localhost:9899` to begin chatting.
## Management Commands
After installation, use these commands to manage the service:
After installation, use the `cow` command to manage the service:
| Command | Description |
| --- | --- |
| `./run.sh start` | Start service |
| `./run.sh stop` | Stop service |
| `./run.sh restart` | Restart service |
| `./run.sh status` | Check run status |
| `./run.sh logs` | View real-time logs |
| `./run.sh config` | Reconfigure |
| `./run.sh update` | Update project code |
| `cow start` | Start service |
| `cow stop` | Stop service |
| `cow restart` | Restart service |
| `cow status` | Check run status |
| `cow logs` | View real-time logs |
| `cow update` | Update code and restart |
| `cow install-browser` | Install browser tool dependencies |
See the [Commands documentation](/en/cli/index) for more details.
<Note>
If the `cow` command is not available, you can use `./run.sh <command>` (Linux/macOS) or `.\scripts\run.ps1 <command>` (Windows) as a fallback. Both are functionally equivalent.
</Note>

View File

@@ -11,14 +11,16 @@ CowAgent's architecture consists of the following core modules:
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
### Core Modules
| Module | Description |
| --- | --- |
| **Channels** | Message channel layer for receiving and sending messages. Supports Web, Feishu, DingTalk, WeCom, WeChat Official Account, and more |
| **Agent Core** | Agent engine including task planning, memory system, and skills engine |
| **Tools** | Tool layer for Agent to access OS resources. 10+ built-in tools |
| **Models** | Model layer with unified access to mainstream LLMs |
| **Plan** | Understands user intent, decomposes complex tasks into multi-step plans, and iteratively invokes tools until the goal is achieved |
| **Memory** | Automatically persists important information as core memory and daily memory, with hybrid keyword and vector retrieval for cross-session context continuity |
| **Knowledge** | Organizes structured knowledge by topic. The Agent autonomously distills valuable information into Markdown pages, maintaining indexes and cross-references to build a growing knowledge network |
| **Tools** | Core capability for Agent to access OS resources. 10+ built-in tools including file read/write, terminal, browser, scheduler, memory search, web search, and more |
| **Skills** | Loads and manages Skills. Supports one-click installation from Skill Hub, GitHub, and more, or custom skill creation through conversation |
| **Models** | Model layer with unified access to OpenAI, Claude, Gemini, DeepSeek, MiniMax, GLM, Qwen, and other mainstream LLMs |
| **Channels** | Message channel layer for receiving and sending messages. Supports Web console, WeChat, Feishu, DingTalk, WeCom, WeChat Official Account, and more with a unified protocol |
| **CLI** | Command-line system providing terminal commands (`cow`) and chat commands (`/`) for process management, skill installation, configuration, knowledge base management, and more |
## Agent Mode Workflow
@@ -28,7 +30,7 @@ When Agent mode is enabled, CowAgent runs as an autonomous agent with the follow
2. **Understand Intent** — Analyze task requirements and context
3. **Plan Task** — Break complex tasks into multiple steps
4. **Invoke Tools** — Select and execute appropriate tools for each step
5. **Update Memory** — Store important information in long-term memory
5. **Update Memory & Knowledge** — Store important information in long-term memory and organize structured knowledge into the knowledge base
6. **Return Result** — Send execution results back to the user
## Workspace Directory Structure
@@ -39,9 +41,12 @@ The Agent workspace is located at `~/cow` by default and stores system prompts,
~/cow/
├── system.md # Agent system prompt
├── user.md # User profile
├── MEMORY.md # Core memory
├── memory/ # Long-term memory storage
── core.md # Core memory
│ └── daily/ # Daily memory
── YYYY-MM-DD.md # Daily memory
├── knowledge/ # Personal knowledge base
│ ├── index.md # Knowledge index
│ └── <category>/ # Topic-based pages
└── skills/ # Custom skills
├── skill-1/
└── skill-2/
@@ -75,3 +80,4 @@ Configure Agent mode parameters in `config.json`:
| `agent_max_context_tokens` | Max context tokens | `40000` |
| `agent_max_context_turns` | Max context turns | `30` |
| `agent_max_steps` | Max decision steps per task | `15` |
| `knowledge` | Enable personal knowledge base | `true` |

View File

@@ -1,6 +1,6 @@
---
title: Features
description: CowAgent long-term memory, task planning, and skills system in detail
description: CowAgent long-term memory, task planning, skills system, CLI commands, and browser tool in detail
---
## 1. Long-term Memory
@@ -15,13 +15,26 @@ In subsequent long-term conversations, the Agent intelligently stores or retriev
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
</Frame>
## 2. Task Planning and Tool Use
## 2. Personal Knowledge Base
> The knowledge base system enables the Agent to continuously accumulate and organize structured knowledge. Unlike memory which records along a timeline, the knowledge base is organized by topics, transforming articles, conversation insights, and learning materials into interconnected Markdown pages that form a continuously growing knowledge network.
The Agent automatically organizes valuable information from conversations into knowledge pages, maintaining cross-references and indexes. The Web console provides document browsing and knowledge graph visualization. Knowledge is stored in `~/cow/knowledge/` within the workspace.
- **Auto-organization**: The Agent autonomously extracts and organizes structured knowledge during conversations, maintaining indexes and cross-references
- **Knowledge graph**: Automatically builds a knowledge graph from cross-references between pages, with interactive graph visualization in the Web console
- **Chat integration**: Knowledge document links referenced in Agent replies can be clicked directly in the Web console for viewing
- **CLI management**: Use `/knowledge` commands to view stats, browse directory, and toggle the feature with `/knowledge on|off`
See [Personal Knowledge Base](/en/knowledge) for details.
## 3. Task Planning and Tool Use
Tools are the core of how the Agent accesses operating system resources. The Agent intelligently selects and invokes tools based on task requirements, performing file read/write, command execution, scheduled tasks, and more. Built-in tools are implemented in the project's `agent/tools/` directory.
**Key tools:** file read/write/edit, Bash terminal, file send, scheduler, memory search, web search, environment config, and more.
**Key tools:** file read/write/edit, Bash terminal, browser, file send, scheduler, memory search, web search, environment config, and more.
### 2.1 Terminal and File Access
### 3.1 Terminal and File Access
Access to the OS terminal and file system is the most fundamental and core capability. Many other tools and skills build on top of this. Users can interact with the Agent from a mobile device to operate resources on their personal computer or server:
@@ -29,7 +42,7 @@ Access to the OS terminal and file system is the most fundamental and core capab
<img src="https://cdn.link-ai.tech/doc/20260202181130.png" width="800" />
</Frame>
### 2.2 Programming Capability
### 3.2 Programming Capability
Combining programming and system access, the Agent can execute the complete **Vibecoding workflow** — from information search, asset generation, coding, testing, deployment, Nginx configuration, to publishing — all triggered by a single command from your phone:
@@ -37,7 +50,7 @@ Combining programming and system access, the Agent can execute the complete **Vi
<img src="https://cdn.link-ai.tech/doc/20260203121008.png" width="800" />
</Frame>
### 2.3 Scheduled Tasks
### 3.3 Scheduled Tasks
The `scheduler` tool enables dynamic scheduled tasks, supporting **one-time tasks, fixed intervals, and Cron expressions**. Tasks can be triggered as either a **fixed message send** or an **Agent dynamic task** execution:
@@ -45,7 +58,15 @@ The `scheduler` tool enables dynamic scheduled tasks, supporting **one-time task
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
</Frame>
### 2.4 Environment Variable Management
### 3.4 Browser
The built-in `browser` tool allows the Agent to control a Chromium browser to visit web pages, fill forms, click elements, and take screenshots, with support for dynamic JS-rendered pages. Run `cow install-browser` to install with one command, automatically adapting to server (headless) and desktop environments:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="800" />
</Frame>
### 3.5 Environment Variable Management
Secrets required by skills are stored in an environment variable file, managed by the `env_config` tool. You can update secrets through conversation, with built-in security protection and desensitization:
@@ -53,14 +74,17 @@ Secrets required by skills are stored in an environment variable file, managed b
<img src="https://cdn.link-ai.tech/doc/20260202234939.png" width="800" />
</Frame>
## 3. Skills System
## 4. Skills System
The Skills system provides infinite extensibility for the Agent. Each Skill consists of a description file, execution scripts (optional), and resources (optional), describing how to complete specific types of tasks. Skills allow the Agent to follow instructions for complex workflows, invoke tools, or integrate third-party systems.
- **[Skill Hub](https://skills.cowagent.ai/):** An open skill marketplace featuring official, community, and third-party skills. Install with one command.
- **Built-in skills:** Located in the project's `skills/` directory, including skill creator, image recognition, LinkAI agent, web fetch, and more. Built-in skills are automatically enabled based on dependency conditions (API keys, system commands, etc.).
- **Custom skills:** Created by users through conversation, stored in the workspace (`~/cow/skills/`), capable of implementing any complex business process or third-party integration.
### 3.1 Creating Skills
Install skills: `/skill install <name>` or `cow skill install <name>`, supporting Skill Hub, GitHub, ClawHub, URL, and more.
### 4.1 Creating Skills
The `skill-creator` skill enables rapid skill creation through conversation. You can ask the Agent to codify a workflow as a skill, or send any API documentation and examples for the Agent to complete the integration directly:
@@ -68,7 +92,7 @@ The `skill-creator` skill enables rapid skill creation through conversation. You
<img src="https://cdn.link-ai.tech/doc/20260202202247.png" width="800" />
</Frame>
### 3.2 Web Search and Image Recognition
### 4.2 Web Search and Image Recognition
- **Web search:** Built-in `web_search` tool, supports multiple search engines. Configure `BOCHA_API_KEY` or `LINKAI_API_KEY` to enable.
- **Image recognition:** Built-in `openai-image-vision` skill, supports `gpt-4.1-mini`, `gpt-4.1`, and other models. Requires `OPENAI_API_KEY`.
@@ -77,29 +101,33 @@ The `skill-creator` skill enables rapid skill creation through conversation. You
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
</Frame>
### 3.3 Third-party Knowledge Bases and Plugins
### 4.3 Skill Hub
The `linkai-agent` skill makes all agents on [LinkAI](https://link-ai.tech/) available as Skills for the Agent, enabling multi-agent decision making.
Visit [skills.cowagent.ai](https://skills.cowagent.ai/) to browse all available skills, or use commands in conversation:
Configuration: set `LINKAI_API_KEY` via `env_config`, then add agent descriptions in `skills/linkai-agent/config.json`:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI Customer Support",
"app_description": "Select only when the user needs help with LinkAI platform questions"
},
{
"app_code": "SFY5x7JR",
"app_name": "Content Creator",
"app_description": "Use only when the user needs to create images or videos"
}
]
}
```text
/skill list --remote # Browse Skill Hub
/skill search <keyword> # Search skills
/skill install <name> # Install with one command
```
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202234350.png" width="750" />
</Frame>
Also supports installing skills from GitHub, ClawHub, LinkAI, and other third-party platforms. See [Install Skills](/en/skills/install) for details.
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 5. CLI Command System
CowAgent provides two command interaction methods, covering service management, skill installation, configuration, and more:
- **Terminal CLI:** Run `cow <command>` in the system terminal, supporting `start`, `stop`, `restart`, `update`, `status`, `logs`, `skill`, etc.
- **Chat commands:** Type `/<command>` in conversation. The Web console shows a command menu when you type `/`.
```bash
cow start # Start service
cow stop # Stop service
cow update # Update and restart
cow skill install pptx # Install a skill
cow install-browser # Install browser tool
```
See [Command Overview](https://docs.cowagent.ai/en/cli) for details.

View File

@@ -22,12 +22,21 @@ CowAgent can proactively think and plan tasks, operate computers and external re
<Card title="Long-term Memory" icon="database" href="/en/memory">
Automatically persists conversation memory to local files and databases, including core memory and daily memory, with keyword and vector retrieval support.
</Card>
<Card title="Knowledge Base" icon="book" href="/en/knowledge">
Automatically organizes structured knowledge with knowledge graph visualization, building a continuously growing knowledge network through cross-references.
</Card>
<Card title="Skills System" icon="puzzle-piece" href="/en/skills/index">
Implements a Skills creation and execution engine with built-in skills, and supports custom Skills development through natural language conversation.
</Card>
<Card title="Multimodal Messages" icon="image" href="/en/channels/web">
Supports parsing, processing, generating, and sending text, images, voice, files, and other message types.
</Card>
<Card title="Tool System" icon="wrench" href="/en/tools/index">
Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more. The Agent autonomously invokes tools to accomplish complex tasks.
</Card>
<Card title="Command System" icon="terminal" href="/en/cli/index">
Provides terminal CLI and in-chat commands for process management, skill installation, configuration, context inspection, and other common operations.
</Card>
<Card title="Multiple Model Support" icon="microchip" href="/en/models/index">
Supports mainstream model providers including OpenAI, Claude, Gemini, DeepSeek, MiniMax, GLM, Qwen, Kimi, Doubao, and more.
</Card>
@@ -40,9 +49,18 @@ CowAgent can proactively think and plan tasks, operate computers and external re
Run the following command in your terminal for one-click install, configuration, and startup:
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
<Tabs>
<Tab title="Linux / macOS">
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
</Tab>
<Tab title="Windows (PowerShell)">
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
</Tab>
</Tabs>
By default, the Web service starts after running. Access `http://localhost:9899/chat` to chat in the web interface.

View File

@@ -0,0 +1,77 @@
---
title: Personal Knowledge Base
description: CowAgent personal knowledge base — structured knowledge accumulation, automatic organization, and knowledge graph
---
The personal knowledge base is the Agent's long-term structured knowledge store, saved in the `knowledge/` directory within the workspace. Unlike memory, which is organized by timeline, the knowledge base organizes content by topic — articles, conversation insights, and learning materials are structured into interlinked Markdown pages, forming a continuously growing knowledge network.
## Core Concepts
### Knowledge vs Memory
| Dimension | Knowledge Base (knowledge/) | Long-term Memory (memory/) |
| --- | --- | --- |
| Organization | By topic, interlinked | By timeline, dated files |
| Writing | Agent actively structures content | Auto-summarized on context trimming |
| Content | Refined, structured knowledge | Raw conversation summaries |
| Use cases | Study notes, tech docs, project knowledge | Conversation history, event records |
### Directory Structure
```
~/cow/knowledge/
├── index.md # Knowledge index, entry point for all pages
├── log.md # Change log, records each write
├── concepts/ # Conceptual knowledge
│ └── machine-learning.md
├── entities/ # Entity knowledge (people, orgs, tools)
│ └── openai.md
└── sources/ # Source knowledge (articles, papers)
└── llm-wiki.md
```
The directory structure is flexible — the Agent automatically creates appropriate category directories based on actual content. Users can also customize the organization.
## Automatic Organization
Knowledge writing is an autonomous Agent behavior, triggered in these scenarios:
- **User shares an article or document** — The Agent automatically extracts key information and creates a structured knowledge page
- **Conversation produces valuable conclusions** — The Agent organizes insights into knowledge pages and links them to existing knowledge
- **User explicitly requests organization** — Users can guide the Agent to organize and update knowledge through conversation
Each knowledge page includes cross-reference links to related pages, gradually building a knowledge graph.
## Knowledge Retrieval
The Agent can retrieve knowledge during conversation through:
- **Index lookup** — Quickly locate relevant pages via `knowledge/index.md`
- **Semantic search** — Search knowledge content via the `memory_search` tool
- **Direct read** — Read specific knowledge files via the `memory_get` tool
## Web Console
The web console provides a dedicated "Knowledge" module with:
- **Document browsing** — Tree-style directory structure, searchable and collapsible, click to view content
- **Knowledge graph** — D3.js force-directed graph visualizing relationships between knowledge pages
- **Chat integration** — Knowledge document links referenced in Agent replies are clickable for direct navigation
## CLI Commands
Manage the knowledge base with the `/knowledge` command:
| Command | Description |
| --- | --- |
| `/knowledge` | Show knowledge base statistics |
| `/knowledge list` | Display file directory as a tree |
| `/knowledge on` | Enable the knowledge base feature |
| `/knowledge off` | Disable the knowledge base feature |
## Configuration
| Parameter | Description | Default |
| --- | --- | --- |
| `knowledge` | Whether to enable the personal knowledge base | `true` |
| `agent_workspace` | Workspace path; knowledge is stored under the `knowledge/` subdirectory | `~/cow` |

View File

@@ -0,0 +1,80 @@
---
title: Short-term Memory
description: Conversation context — message management, compression strategies, and context operations
---
Conversation context is the Agent's short-term memory, containing all messages in the current session (user input, Agent replies, tool calls and results). Proper context management is critical for the Agent's reasoning quality and cost control.
## Context Structure
Each conversation turn consists of:
```
User message → Agent thinking → Tool call → Tool result → ... → Agent final reply
```
A single turn may include multiple tool calls (controlled by `agent_max_steps`). All tool calls and results are retained in context until compressed or trimmed.
## Key Configuration
| Parameter | Description | Default |
| --- | --- | --- |
| `agent_max_context_tokens` | Maximum context token budget | `50000` |
| `agent_max_context_turns` | Maximum conversation turns in context | `20` |
| `agent_max_steps` | Maximum decision steps per turn (tool call count) | `15` |
Configurable via `config.json` or the `/config` chat command.
## Compression Strategy
When context exceeds limits, the system automatically compresses to free space. The process has multiple stages:
### 1. Tool Result Truncation
Before each decision loop, the system checks tool call results in historical turns. Results exceeding **20,000 characters** are truncated, keeping only the beginning and end with a truncation notice. Current turn results are not affected.
### 2. Turn Trimming
When conversation turns exceed `agent_max_context_turns`:
- The **oldest half** of complete turns is trimmed (preserving tool call chain integrity)
- Trimmed messages are summarized by LLM and **written to the daily memory file**
- Remaining turns stay intact
### 3. Token Budget Trimming
After turn trimming, if tokens still exceed the budget:
- **Fewer than 5 turns**: All turns undergo **text compression** — each turn keeps only the first user text and last Agent reply, removing intermediate tool call chains
- **5 or more turns**: The **first half** of turns is trimmed again, with discarded content also written to memory
### 4. Overflow Emergency Handling
When the model API returns a context overflow error:
1. All current messages are summarized and written to memory
2. Aggressive trimming is applied (tool results limited to 10K chars, user text to 10K, max 5 turns)
3. If still overflowing, the entire conversation context is cleared
## Session Persistence
Conversation messages are persisted to a local database, automatically restored after service restart. Restore strategy:
- Restores the most recent **`max(3, max_context_turns / 6)`** turns
- Only retains each turn's **user text and Agent final reply**, not intermediate tool call chains
- Sessions older than **30 days** are automatically cleaned up
## Commands
Use these commands in chat to manage context:
| Command | Description |
| --- | --- |
| `/context` | View current context statistics (message count, role distribution, total characters) |
| `/context clear` | Clear current session context |
| `/config agent_max_context_tokens 80000` | Adjust context token budget |
| `/config agent_max_context_turns 30` | Adjust context turn limit |
<Tip>
After clearing context, the Agent "forgets" previous conversation content. Content that was already written to long-term memory can still be retrieved via memory search.
</Tip>

View File

@@ -1,30 +1,39 @@
---
title: Memory
description: CowAgent long-term memory system
title: Long-term Memory
description: CowAgent long-term memory system — file persistence, automatic writing, and hybrid retrieval
---
The memory system enables the Agent to remember important information over time, continuously accumulating experience, understanding user preferences, and truly achieving autonomous thinking and continuous growth.
Long-term memory is stored in workspace files, persisting across sessions. The Agent loads historical memory on demand via retrieval tools during conversation, and automatically writes conversation summaries to long-term memory when context is trimmed.
## Memory Types
### Core Memory (MEMORY.md)
Stored in `~/cow/MEMORY.md`, containing long-term user preferences, important decisions, key facts, and other information that doesn't fade over time. Automatically injected into the system prompt on every conversation turn as background knowledge.
Stored in `~/cow/MEMORY.md`, containing long-term user preferences, important decisions, key facts, and other information that doesn't fade over time. The Agent reads and writes this file via tools to maintain long-term knowledge.
### Daily Memory (memory/YYYY-MM-DD.md)
Stored in `~/cow/memory/` directory, named by date (e.g. `2026-03-08.md`), recording daily conversation summaries and key events. Files are only created on first write to avoid generating empty files.
Stored in `~/cow/memory/` directory, named by date (e.g., `2026-03-08.md`), recording daily conversation summaries and key events. Files are only created on first write to avoid generating empty files.
## Memory Writing
## Automatic Writing
The Agent automatically persists conversation content to daily memory through the following mechanisms:
The Agent automatically persists conversation content to long-term memory through the following mechanisms:
- **On context trimming** — When conversation turns or tokens exceed the configured limit, the oldest half of the context is trimmed in batch, and the discarded content is summarized by LLM into key information and written to the daily memory file
- **On context trimming** — When conversation turns or tokens exceed the configured limit, the oldest half of the context is trimmed, and the discarded content is summarized by LLM into key information and written to the daily memory file
- **Daily scheduled summary** — A full summary is automatically triggered at 23:55 every day, ensuring memory is preserved even on low-activity days (skipped if content hasn't changed)
- **On API context overflow** — When the model API returns a context overflow error, the current conversation summary is saved as an emergency measure
All memory writes run asynchronously in a background thread (LLM summarization + file writing), never blocking normal conversation replies.
## Memory Retrieval
The memory system supports hybrid retrieval modes:
- **Keyword retrieval** — FTS5 full-text index matching with BM25 ranking
- **Vector retrieval** — Embedding-based semantic similarity search, finds relevant memory even with different wording
The Agent automatically triggers memory retrieval during conversation as needed, incorporating relevant historical information into context. Results are ranked by a combined score (default: 0.7 vector weight + 0.3 keyword weight). Daily memory scores decay over time (30-day half-life), while core memory does not decay.
## First Launch
On first launch, the Agent will proactively ask the user for key information and save it to the workspace (default `~/cow`):
@@ -40,27 +49,10 @@ On first launch, the Agent will proactively ask the user for key information and
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
</Frame>
## Memory Retrieval
The memory system supports hybrid retrieval modes:
- **Keyword retrieval** — Match historical memory based on keywords
- **Vector retrieval** — Semantic similarity search, finds relevant memory even with different wording
The Agent automatically triggers memory retrieval during conversation as needed, incorporating relevant historical information into context. Core memory (`MEMORY.md`) is always injected into the system prompt, while daily memory is loaded on demand via retrieval.
## Configuration
```json
{
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
"agent_max_context_turns": 20
}
```
| Parameter | Description | Default |
| --- | --- | --- |
| `agent_workspace` | Workspace path, memory files stored under this directory | `~/cow` |
| `agent_max_context_tokens` | Max context tokens; when exceeded, half is trimmed and summarized into memory | `40000` |
| `agent_max_context_turns` | Max context turns; when exceeded, half is trimmed and summarized into memory | `20` |
| `agent_max_context_tokens` | Max context tokens; when exceeded, content is trimmed and summarized into memory | `50000` |
| `agent_max_context_turns` | Max context turns; when exceeded, content is trimmed and summarized into memory | `20` |

View File

@@ -3,7 +3,22 @@ title: DeepSeek
description: DeepSeek model configuration
---
Use OpenAI-compatible configuration:
Option 1: Native integration (recommended):
```json
{
"model": "deepseek-chat",
"deepseek_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | `deepseek-chat` (DeepSeek-V3.2, non-thinking mode), `deepseek-reasoner` (DeepSeek-R1, thinking mode) |
| `deepseek_api_key` | Create at [DeepSeek Platform](https://platform.deepseek.com/api_keys) |
| `deepseek_api_base` | Optional, defaults to `https://api.deepseek.com/v1`. Can be changed to a third-party proxy |
Option 2: OpenAI-compatible configuration:
```json
{
@@ -14,9 +29,4 @@ Use OpenAI-compatible configuration:
}
```
| Parameter | Description |
| --- | --- |
| `model` | `deepseek-chat` (DeepSeek-V3), `deepseek-reasoner` (DeepSeek-R1) |
| `bot_type` | Must be `openai` (OpenAI-compatible mode) |
| `open_ai_api_key` | Create at [DeepSeek Platform](https://platform.deepseek.com/api_keys) |
| `open_ai_api_base` | DeepSeek platform BASE URL |

View File

@@ -6,7 +6,7 @@ description: Supported models and recommended choices for CowAgent
CowAgent supports mainstream LLMs from domestic and international providers. Model interfaces are implemented in the project's `models/` directory.
<Note>
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.5-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.6-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
</Note>
## Configuration
@@ -25,7 +25,7 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
glm-5-turbo, glm-5 and other series models
</Card>
<Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
qwen3.5-plus, qwen3-max and more
qwen3.6-plus, qwen3-max and more
</Card>
<Card title="Kimi" href="/en/models/kimi">
kimi-k2.5, kimi-k2 and more

View File

@@ -5,14 +5,14 @@ description: Tongyi Qianwen model configuration
```json
{
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"dashscope_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | Options include `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
| `model` | Options include `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
| `dashscope_api_key` | Create at [Bailian Console](https://bailian.console.aliyun.com/?tab=model#/api-key). See [official docs](https://bailian.console.aliyun.com/?tab=api#/api) |
OpenAI-compatible configuration is also supported:
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
```json
{
"bot_type": "openai",
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -5,6 +5,7 @@ description: CowAgent version history
| Version | Date | Description |
| --- | --- | --- |
| [2.0.5](/en/releases/v2.0.5) | 2026.04.01 | Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more |
| [2.0.4](/en/releases/v2.0.4) | 2026.03.22 | Personal WeChat channel, new model support, Japanese docs, script refactoring and bug fixes |
| [2.0.2](/en/releases/v2.0.2) | 2026.02.27 | Web Console upgrade, multi-channel concurrency, session persistence |
| [2.0.1](/en/releases/v2.0.1) | 2026.02.27 | Built-in Web Search tool, smart context management, multiple fixes |

View File

@@ -0,0 +1,77 @@
---
title: v2.0.5
description: CowAgent 2.0.5 - Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more
---
## 🖥️ Cow CLI
New CLI command system for managing CowAgent from terminal and chat:
- **Terminal commands**: Run `cow <command>` for `start`, `stop`, `restart`, `update`, `status`, `logs`, etc.
- **Chat commands**: Type `/<command>` in conversation for `/help`, `/status`, `/config`, `/skill`, `/context`, `/logs`, `/version`, etc.
- **Web console**: Type `/` in the input box to open a slash command menu, with arrow-key input history
- **Windows support**: New PowerShell script `scripts/run.ps1` with `cow` command support
Docs: [Command Overview](https://docs.cowagent.ai/en/cli)
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
## 🧩 Cow Skill Hub Open Source
[Cow Skill Hub](https://skills.cowagent.ai) is now open source and live — browse, search, install, and publish AI Agent skills:
- **One-command install**: `/skill install <name>` in chat or `cow skill install <name>` in terminal
- **Multi-source**: Install from Skill Hub, GitHub, ClawHub, LinkAI, and more
- **Search**: `/skill search` and `/skill list --remote` to browse the hub
- **Publish**: Submit your own skills at [skills.cowagent.ai/submit](https://skills.cowagent.ai/submit)
- **Mirror**: Mirror acceleration for faster downloads in China
Open source repo: [cow-skill-hub](https://github.com/zhayujie/cow-skill-hub)
Docs: [Skill Hub](https://docs.cowagent.ai/en/skills/hub), [Install Skills](https://docs.cowagent.ai/en/skills/install)
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 🌐 Browser Tool
New Browser tool — Agent can control a Chromium browser to visit and interact with web pages:
- **Navigation & interaction**: `navigate`, `click`, `fill`, `select`, `scroll`, `press`, etc.
- **Page snapshot**: Compact DOM snapshot for efficient page understanding, auto-snapshot after navigation
- **Screenshot**: Save page screenshots to workspace
- **JavaScript execution**: Run custom scripts on pages
- **CLI install**: `cow install-browser` for one-command setup
- **Docker support**: Browser install built into Docker image
Docs: [Browser Tool](https://docs.cowagent.ai/en/tools/browser)
<img src="https://cdn.link-ai.tech/doc/20260401115728.png" width="750" />
## 🤖 WeCom Bot QR Code Setup
WeCom Bot channel now supports QR code scan for one-click bot creation:
- **QR scan in Web console**: Select "Scan QR" mode, scan with WeCom to auto-create and connect a bot — no manual configuration needed
- **Manual mode**: Still supports manual Bot ID and Secret input
- **Stream push optimization**: Throttled push to avoid WebSocket congestion
Docs: [WeCom Bot](https://docs.cowagent.ai/en/channels/wecom-bot)
PR: [#2735](https://github.com/zhayujie/chatgpt-on-wechat/pull/2735). Thanks [@WecomTeam](https://github.com/WecomTeam)
## 🐛 Other Improvements & Fixes
- **DeepSeek module**: Independent DeepSeek Bot with dedicated `deepseek_api_key` config ([#2719](https://github.com/zhayujie/chatgpt-on-wechat/pull/2719)). Thanks [@6vision](https://github.com/6vision)
- **Web console**: Slash command menu, input history, new model options, mobile optimization ([#2731](https://github.com/zhayujie/chatgpt-on-wechat/pull/2731)). Thanks [@zkjqd](https://github.com/zkjqd)
- **Context loss**: Fix context loss after trimming ([393f0c0](https://github.com/zhayujie/chatgpt-on-wechat/commit/393f0c0))
- **System prompt**: Fix system prompt not rebuilding on every turn ([13f5fde](https://github.com/zhayujie/chatgpt-on-wechat/commit/13f5fde))
- **Gemini**: Fix missing model attribute in GoogleGeminiBot ([#2716](https://github.com/zhayujie/chatgpt-on-wechat/pull/2716)). Thanks [@cowagent](https://github.com/cowagent)
- **WeChat channel**: Fix file send failures and filename loss ([6d9b7ba](https://github.com/zhayujie/chatgpt-on-wechat/commit/6d9b7ba), [45faa9c](https://github.com/zhayujie/chatgpt-on-wechat/commit/45faa9c))
- **Docker**: Fix volume permissions, reduce image size ([3eb8348](https://github.com/zhayujie/chatgpt-on-wechat/commit/3eb8348), [4470d4c](https://github.com/zhayujie/chatgpt-on-wechat/commit/4470d4c))
- **Security**: Fix Memory Content path traversal risk. Thanks [@August829](https://github.com/August829)
## 📦 Upgrade
Run `cow update` or `./run.sh update` to upgrade, or pull the latest code and restart. See [Upgrade Guide](https://docs.cowagent.ai/en/guide/upgrade).
**Release Date**: 2026.04.01 | [Full Changelog](https://github.com/zhayujie/chatgpt-on-wechat/compare/2.0.4...master)

58
docs/en/skills/create.mdx Normal file
View File

@@ -0,0 +1,58 @@
---
title: Create Skills
description: Create custom skills through conversation
---
CowAgent includes a built-in Skill Creator that lets you quickly create, install, or update skills through natural language conversation.
## Usage
Simply describe the skill you want in a conversation, and the Agent will handle the creation:
- Codify workflows as skills: "Create a skill from this deployment process"
- Integrate third-party APIs: "Create a skill based on this API documentation"
- Install remote skills: "Install xxx skill for me"
## Creation Flow
1. Tell the Agent what skill you want to create
2. Agent automatically generates `SKILL.md` description and execution scripts
3. Skill is saved to the workspace `~/cow/skills/` directory
4. Agent will automatically recognize and use the skill in future conversations
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202202247.png" width="800" />
</Frame>
## SKILL.md Format
Created skills follow the standard SKILL.md format:
```markdown
---
name: my-skill
description: Brief description of the skill
metadata:
emoji: 🔧
requires:
bins: ["curl"]
env: ["MY_API_KEY"]
primaryEnv: "MY_API_KEY"
---
# My Skill
Detailed instructions...
```
| Field | Description |
| --- | --- |
| `name` | Skill name, must match directory name |
| `description` | Skill description, Agent decides whether to invoke based on this |
| `metadata.requires.bins` | Required system commands |
| `metadata.requires.env` | Required environment variables |
| `metadata.always` | Always load (default false) |
<Tip>
See the [Skill Creator documentation](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/skills/skill-creator/SKILL.md) for details.
</Tip>

View File

@@ -1,31 +0,0 @@
---
title: Image Vision
description: Recognize images using OpenAI vision models
---
Analyze image content using OpenAI's GPT-4 Vision API, understanding objects, text, colors, and other elements in images.
## Dependencies
| Dependency | Description |
| --- | --- |
| `OPENAI_API_KEY` | OpenAI API key |
| `curl`, `base64` | System commands (usually pre-installed) |
Configuration:
- Configure `OPENAI_API_KEY` via the `env_config` tool
- Or set `open_ai_api_key` in `config.json`
## Supported Models
- `gpt-4.1-mini` (recommended, cost-effective)
- `gpt-4.1`
## Usage
Once configured, send an image to the Agent to automatically trigger image recognition.
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
</Frame>

View File

@@ -7,20 +7,17 @@ Skills provide infinite extensibility for the Agent. Each Skill consists of a de
The difference between Skills and Tools: Tools are atomic operations implemented in code (e.g., file read/write, command execution), while Skills are high-level workflows based on description files that can combine multiple Tools to complete complex tasks.
## Built-in Skills
## Getting Skills
Located in the project `skills/` directory, automatically enabled based on dependency conditions:
CowAgent offers multiple ways to acquire skills:
| Skill | Description | Dependencies |
| --- | --- | --- |
| [`skill-creator`](/en/skills/skill-creator) | Create custom skills through conversation | None |
| [`openai-image-vision`](/en/skills/image-vision) | Recognize images using OpenAI vision models | `OPENAI_API_KEY` |
| [`linkai-agent`](/en/skills/linkai-agent) | Integrate LinkAI platform agents | `LINKAI_API_KEY` |
| [`web-fetch`](/en/skills/web-fetch) | Fetch web page text content | `curl` (enabled by default) |
- **Cow Skill Hub** — Browse and install community skills via `/skill list --remote`
- **GitHub** — Install directly from GitHub repositories, with batch install support
- **ClawHub** — Install ClawHub skills via `/skill install clawhub:name`
- **URL** — Install from zip archives or SKILL.md links
- **Conversational creation** — Let the Agent create skills through natural language conversation
## Custom Skills
Created by users through conversation, stored in workspace (`~/cow/skills/`), can implement any complex business process and third-party system integration.
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/cli/skill) for details. You can also [create skills](/en/skills/create) through conversation.
## Skill Loading Priority

View File

@@ -0,0 +1,53 @@
---
title: Install Skills
description: Install skills from multiple sources with a single command
---
CowAgent supports installing skills from **Cow Skill Hub, GitHub, ClawHub**, and any URL with a unified `install` command. Use `/skill install` in chat or `cow skill install` in the terminal.
## From Skill Hub
Browse the Skill Hub and install:
```text
/skill list --remote
/skill install pptx
```
## From GitHub
Supports batch install from repositories and single skill from subdirectories:
```text
/skill install larksuite/cli
/skill install https://github.com/larksuite/cli/tree/main/skills/lark-im
```
## From ClawHub
```text
/skill install clawhub:baidu-search
```
## From URL
Supports zip archives and SKILL.md file links:
```text
/skill install https://cdn.link-ai.tech/skills/pptx.zip
/skill install https://example.com/path/to/SKILL.md
```
## Manage Skills
```text
/skill list # View installed skills
/skill info pptx # View skill details
/skill enable pptx # Enable a skill
/skill disable pptx # Disable a skill
/skill uninstall pptx # Uninstall a skill
```
<Tip>
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/cli/skill) for full documentation.
</Tip>

View File

@@ -1,47 +0,0 @@
---
title: LinkAI Agent
description: Integrate LinkAI platform multi-agent skill
---
Use agents from the [LinkAI](https://link-ai.tech/) platform as Skills for multi-agent decision-making. The Agent intelligently selects based on agent names and descriptions, calling the corresponding application or workflow via `app_code`.
## Dependencies
| Dependency | Description |
| --- | --- |
| `LINKAI_API_KEY` | LinkAI platform API key, created in [Console](https://link-ai.tech/console/interface) |
| `curl` | System command (usually pre-installed) |
Configuration:
- Configure `LINKAI_API_KEY` via the `env_config` tool
- Or set `linkai_api_key` in `config.json`
## Configure Agents
Add available agents in `skills/linkai-agent/config.json`:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI Customer Support",
"app_description": "Select this assistant only when the user needs help with LinkAI platform questions"
},
{
"app_code": "SFY5x7JR",
"app_name": "Content Creator",
"app_description": "Use this assistant only when the user needs to create images or videos"
}
]
}
```
## Usage
Once configured, the Agent will automatically select the appropriate LinkAI agent based on the user's question.
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202234350.png" width="750" />
</Frame>

View File

@@ -1,31 +0,0 @@
---
title: Skill Creator
description: Create custom skills through conversation
---
Quickly create, install, or update skills through natural language conversation.
## Dependencies
No extra dependencies, always available.
## Usage
- Codify workflows as skills: "Create a skill from this deployment process"
- Integrate third-party APIs: "Create a skill based on this API documentation"
- Install remote skills: "Install xxx skill for me"
## Creation Flow
1. Tell the Agent what skill you want to create
2. Agent automatically generates `SKILL.md` description and execution scripts
3. Skill is saved to the workspace `~/cow/skills/` directory
4. Agent will automatically recognize and use the skill in future conversations
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202202247.png" width="800" />
</Frame>
<Tip>
See the [Skill Creator documentation](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/skills/skill-creator/SKILL.md) for details.
</Tip>

View File

@@ -1,31 +0,0 @@
---
title: Web Fetch
description: Fetch web page text content
---
Use curl to fetch web pages and extract readable text content. A lightweight web access method without browser automation.
## Dependencies
| Dependency | Description |
| --- | --- |
| `curl` | System command (usually pre-installed) |
This skill has `always: true` set, enabled by default as long as the system has the `curl` command.
## Usage
Automatically invoked when the Agent needs to fetch content from a URL, no extra configuration needed.
## Comparison with browser Tool
| Feature | web-fetch (skill) | browser (tool) |
| --- | --- | --- |
| Dependencies | curl only | browser-use + playwright |
| JS rendering | Not supported | Supported |
| Page interaction | Not supported | Supports click, type, etc. |
| Best for | Static page text | Dynamic web pages |
<Tip>
For most web content retrieval scenarios, web-fetch is sufficient. Only use the browser tool when you need JS rendering or page interaction.
</Tip>

View File

@@ -1,9 +1,11 @@
---
title: memory - Memory
description: Search and read long-term memory
title: memory - Memory & Knowledge
description: Search and read long-term memory and knowledge base files
---
The memory tool contains two sub-tools: `memory_search` (search memory) and `memory_get` (read memory files).
The memory tool contains two sub-tools: `memory_search` (search memory) and `memory_get` (read memory or knowledge files).
When the [knowledge base](/en/knowledge) feature is enabled, both tools also support accessing files under the `knowledge/` directory.
## Dependencies
@@ -11,7 +13,7 @@ No extra dependencies, available by default. Managed by the Agent Core memory sy
## memory_search
Search historical memory with hybrid keyword and vector retrieval.
Search historical memory and knowledge base content with hybrid keyword and vector retrieval.
| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
@@ -19,11 +21,11 @@ Search historical memory with hybrid keyword and vector retrieval.
## memory_get
Read the content of a specific memory file.
Read the content of a specific memory or knowledge file.
| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `path` | string | Yes | Relative path to memory file (e.g. `MEMORY.md`, `memory/2026-01-01.md`) |
| `path` | string | Yes | Relative path to the file (e.g. `MEMORY.md`, `memory/2026-01-01.md`, `knowledge/concepts/rag.md`) |
| `start_line` | integer | No | Start line number |
| `end_line` | integer | No | End line number |
@@ -34,3 +36,8 @@ The Agent automatically invokes memory tools in these scenarios:
- When the user shares important information → stores to memory
- When historical context is needed → searches relevant memory
- When conversation reaches a certain length → extracts summary for storage
- When discussing domain knowledge → retrieves relevant pages from the knowledge base
<Note>
When `knowledge` is set to `false` in config, the tool descriptions and search scope automatically adjust to include only memory files.
</Note>

72
docs/en/tools/vision.mdx Normal file
View File

@@ -0,0 +1,72 @@
---
title: vision - Image Analysis
description: Analyze image content (recognition, description, OCR, etc.)
---
Analyze local images or image URLs using Vision API. Supports content description, text extraction (OCR), object recognition, and more.
## Model Selection
The vision tool uses a multi-level auto-selection strategy with automatic fallback — no manual configuration required:
1. **Main model** — uses the currently configured main model for image recognition (zero extra cost)
2. **Other configured models** — auto-discovers other models with configured API keys as alternatives
3. **OpenAI** — uses `open_ai_api_key` to call gpt-4.1-mini
4. **LinkAI** — uses `linkai_api_key` to call LinkAI vision service
When `use_linkai=true`, LinkAI is promoted to the highest priority.
If the current provider fails, the tool automatically tries the next one until it succeeds or all fail.
### Supported Models
| Vendor | Vision Model | Notes |
| --- | --- | --- |
| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
| Qwen (DashScope) | Main model | Via MultiModalConversation API |
| Claude | Main model | Anthropic native image format |
| Gemini | Main model | inlineData format |
| Doubao | Main model | doubao-seed-2-0 series natively supported |
| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
<Note>
ZhipuAI and MiniMax text models do not support image understanding, so their dedicated vision models are always used automatically.
</Note>
## Parameters
| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `image` | string | Yes | Local file path or HTTP(S) image URL |
| `question` | string | Yes | Question to ask about the image |
Supported image formats: jpg, jpeg, png, gif, webp
## Custom Configuration
To specify a particular model for the vision tool, add to `config.json`:
```json
{
"tool": {
"vision": {
"model": "gpt-4o"
}
}
}
```
In most cases no configuration is needed. The tool works automatically as long as the main model supports multimodal input or any vision-capable API key is configured.
## Use Cases
- Describe image content
- Extract text from images (OCR)
- Identify objects, colors, scenes
- Analyze screenshots and scanned documents
<Note>
Images larger than 1MB are automatically compressed (max edge 1536px). All images (including remote URLs) are converted to base64 for transmission to ensure compatibility with all model backends.
</Note>

View File

@@ -30,7 +30,41 @@ pip3 install -r requirements.txt
pip3 install -r requirements-optional.txt
```
### 3. 配置
> 国内网络可使用镜像源加速:`pip3 install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple`
### 3. 安装 Cow CLI
安装命令行工具,用于管理服务和技能:
```bash
pip3 install -e .
```
安装后即可使用 `cow` 命令:
```bash
cow help
```
<Note>
此步骤为推荐操作。安装后可以使用 `cow start`、`cow stop`、`cow update` 等命令管理服务,也可以使用 `cow skill` 管理技能。如果不安装 CLI可以使用 `./run.sh` 或 `python3 app.py` 运行。
</Note>
### 3.1 安装浏览器工具(可选)
如需使用浏览器工具(控制浏览器访问网页、填写表单等),运行:
```bash
cow install-browser
```
该命令会自动安装 Playwright 和 Chromium 浏览器。详细说明参考 [浏览器工具文档](/tools/browser)。
<Note>
浏览器工具依赖较重(~300MB如不需要可跳过不影响其他功能正常使用。
</Note>
### 4. 配置
复制配置文件模板并编辑:
@@ -40,9 +74,15 @@ cp config-template.json config.json
在 `config.json` 中填写模型 API Key 和通道类型等配置,详细说明参考各 [模型文档](/models/minimax)。
### 4. 运行
### 5. 运行
**本地运行**
**使用 Cow CLI 运行(推荐)**
```bash
cow start
```
**或者本地前台运行:**
```bash
python3 app.py
@@ -50,7 +90,7 @@ python3 app.py
运行后默认启动 Web 控制台,访问 `http://localhost:9899` 开始对话和管理Agent。
**服务器后台运行:**
**服务器后台运行(不使用 CLI 时)**
```bash
nohup python3 app.py & tail -f nohup.out
@@ -94,28 +134,44 @@ sudo docker logs -f chatgpt-on-wechat
## 核心配置项
```json
{
"channel_type": "web",
"model": "MiniMax-M2.5",
"agent": true,
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
"agent_max_context_turns": 30,
"agent_max_steps": 15
}
```
<Tabs>
<Tab title="源码部署config.json">
```json
{
"channel_type": "web",
"model": "MiniMax-M2.7",
"agent": true,
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
"agent_max_context_turns": 30,
"agent_max_steps": 15
}
```
</Tab>
<Tab title="Docker 部署docker-compose.yml">
```yaml
environment:
CHANNEL_TYPE: 'web'
MODEL: 'MiniMax-M2.7'
MINIMAX_API_KEY: 'your-api-key'
AGENT: 'True'
AGENT_MAX_CONTEXT_TOKENS: 40000
AGENT_MAX_CONTEXT_TURNS: 30
AGENT_MAX_STEPS: 15
```
</Tab>
</Tabs>
| 参数 | 说明 | 默认值 |
| --- | --- | --- |
| `channel_type` | 接入渠道类型 | `web` |
| `model` | 模型名称 | `MiniMax-M2.5` |
| `agent` | 是否启用 Agent 模式 | `true` |
| `agent_workspace` | Agent 工作空间路径 | `~/cow` |
| `agent_max_context_tokens` | 最大上下文 tokens | `40000` |
| `agent_max_context_turns` | 最大上下文记忆轮次 | `30` |
| `agent_max_steps` | 单次任务最大决策步数 | `15` |
| 参数 | 环境变量 | 说明 | 默认值 |
| --- | --- | --- | --- |
| `channel_type` | `CHANNEL_TYPE` | 接入渠道类型 | `web` |
| `model` | `MODEL` | 模型名称 | `MiniMax-M2.5` |
| `agent` | `AGENT` | 是否启用 Agent 模式 | `true` |
| `agent_workspace` | - | Agent 工作空间路径 | `~/cow` |
| `agent_max_context_tokens` | `AGENT_MAX_CONTEXT_TOKENS` | 最大上下文 tokens | `40000` |
| `agent_max_context_turns` | `AGENT_MAX_CONTEXT_TURNS` | 最大上下文记忆轮次 | `30` |
| `agent_max_steps` | `AGENT_MAX_STEPS` | 单次任务最大决策步数 | `15` |
<Tip>
全部配置项可在项目 [`config.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/config.py) 文件中查看。
全部配置项可在项目 [`config.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/config.py) 文件中查看。Docker 部署时,配置项名称需转为大写环境变量格式。
</Tip>

View File

@@ -9,16 +9,25 @@ description: 使用脚本一键安装和管理 CowAgent
## 安装命令
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
<Tabs>
<Tab title="Linux / macOS">
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
</Tab>
<Tab title="Windows (PowerShell)">
```powershell
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
</Tab>
</Tabs>
脚本自动执行以下流程:
1. 检查 Python 环境(需要 Python 3.7+
2. 安装必要工具git、curl 等)
3. 克隆项目代码到 `~/chatgpt-on-wechat`
4. 安装 Python 依赖
4. 安装 Python 依赖和 Cow CLI
5. 引导配置 AI 模型和通信渠道
6. 启动服务
@@ -26,14 +35,20 @@ bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
## 管理命令
安装完成后,使用以下命令管理服务:
安装完成后,使用 `cow` CLI 管理服务:
| 命令 | 说明 |
| --- | --- |
| `./run.sh start` | 启动服务 |
| `./run.sh stop` | 停止服务 |
| `./run.sh restart` | 重启服务 |
| `./run.sh status` | 查看运行状态 |
| `./run.sh logs` | 查看实时日志 |
| `./run.sh config` | 重新配置 |
| `./run.sh update` | 更新项目代码 |
| `cow start` | 启动服务 |
| `cow stop` | 停止服务 |
| `cow restart` | 重启服务 |
| `cow status` | 查看运行状态 |
| `cow logs` | 查看实时日志 |
| `cow update` | 更新代码并重启 |
| `cow install-browser` | 安装浏览器工具依赖 |
更多命令和用法参考 [命令文档](/cli/index)。
<Note>
如果 `cow` 命令不可用,也可以使用 `./run.sh <命令>`Linux/macOS或 `.\scripts\run.ps1 <命令>`Windows作为替代功能等效。
</Note>

View File

@@ -3,20 +3,25 @@ title: 更新升级
description: CowAgent 的升级方式说明
---
## 脚本升级(推荐)
## 命令升级(推荐)
如果使用 `run.sh` 管理服务,执行以下命令即可一键升级
使用 `cow update` 一键完成代码更新和服务重启
```bash
./run.sh update
cow update
```
该命令会自动完成以下流程:
1. 停止当前运行的服务
2. 拉取最新代码
3. 重新检查依赖
4. 启动服务
1. 拉取最新代码(`git pull`
2. 停止当前服务
3. 更新 Python 依赖
4. 重新安装 CLI
5. 启动服务
<Note>
如果未安装 Cow CLI也可以使用 `./run.sh update` 完成相同操作。
</Note>
## 手动升级
@@ -25,15 +30,19 @@ description: CowAgent 的升级方式说明
```bash
git pull
pip3 install -r requirements.txt
pip3 install -e .
```
更新完成后重启服务:
```bash
# 如果使用 run.sh 管理
# 使用 Cow CLI (推荐)
cow restart
# 或使用 run.sh
./run.sh restart
# 如果使用 nohup 直接运行
# 使用 nohup 直接运行
kill $(ps -ef | grep app.py | grep -v grep | awk '{print $2}')
nohup python3 app.py & tail -f nohup.out
```

View File

@@ -11,25 +11,27 @@ CowAgent 的整体架构由以下核心模块组成:
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
### 核心模块说明
| 模块 | 说明 |
| --- | --- |
| **Channels** | 消息通道层,负责接收和发送消息,支持 Web、飞书、钉钉、企微、公众号等 |
| **Agent Core** | 智能体核心引擎,包括任务规划、记忆系统和技能引擎 |
| **Tools** | 工具层Agent 通过工具访问操作系统资源,内置 10+ 种工具 |
| **Models** | 模型层,支持国内外主流大语言模型的统一接入 |
| **Plan** | 理解用户意图,将复杂任务分解为多步骤计划,循环调用工具直到完成目标 |
| **Memory** | 自动将重要信息持久化为核心记忆和日级记忆,支持关键词和向量混合检索,跨会话保持上下文连续性 |
| **Knowledge** | 以主题维度组织结构化知识Agent 自主整理有价值信息为 Markdown 页面,维护索引和交叉引用,构建持续增长的知识网络 |
| **Tools** | Agent 访问操作系统资源的核心能力,内置文件读写、终端执行、浏览器操作、定时调度、记忆检索、联网搜索等 10+ 种工具 |
| **Skills** | 加载和管理 Skills支持从 Skill Hub、GitHub 等一键安装,或通过对话创建自定义技能 |
| **Models** | 模型层,统一接入 OpenAI、Claude、Gemini、DeepSeek、MiniMax、GLM、Qwen 等国内外主流大语言模型 |
| **Channels** | 消息通道层,负责接收和发送消息,支持 Web 控制台、微信、飞书、钉钉、企微、公众号等,统一消息协议 |
| **CLI** | 命令行系统,提供终端命令(`cow`)和对话命令(`/`),支持进程管理、技能安装、配置修改、知识库管理等操作 |
## Agent 模式
启用 Agent 模式后CowAgent 会以自主智能体的方式运行,核心工作流如下:
1. **接收消息** - 通过通道接收用户输入
2. **理解意图** - 分析任务需求和上下文
3. **规划任务** - 将复杂任务分解为多个步骤
4. **调用工具** - 选择合适的工具执行每个步骤
5. **记忆更新** - 将重要信息存入长期记忆
6. **返回结果** - 将执行结果发送回用户
1. **接收消息** 通过通道接收用户输入
2. **理解意图** 分析任务需求和上下文
3. **规划任务** 将复杂任务分解为多个步骤
4. **调用工具** 选择合适的工具执行每个步骤
5. **记忆与知识更新** 将重要信息存入长期记忆,将结构化知识整理至知识库
6. **返回结果** 将执行结果发送回用户
## 工作空间
@@ -37,11 +39,14 @@ Agent 的工作空间默认位于 `~/cow` 目录,用于存储系统提示词
```
~/cow/
├── system.md # Agent system prompt
├── user.md # User profile
├── SYSTEM.md # Agent system prompt
├── USER.md # User profile
├── MEMORY.md # Core memory
├── memory/ # Long-term memory storage
── core.md # Core memory
│ └── daily/ # Daily memory
── YYYY-MM-DD.md # Daily memory
├── knowledge/ # Personal knowledge base
│ ├── index.md # Knowledge index
│ └── <category>/ # Topic-based pages
└── skills/ # Custom skills
├── skill-1/
└── skill-2/
@@ -75,3 +80,4 @@ Agent 的工作空间默认位于 `~/cow` 目录,用于存储系统提示词
| `agent_max_context_tokens` | 最大上下文 token 数 | `40000` |
| `agent_max_context_turns` | 最大上下文记忆轮次 | `30` |
| `agent_max_steps` | 单次任务最大决策步数 | `15` |
| `knowledge` | 是否启用个人知识库 | `true` |

Some files were not shown because too many files have changed in this diff Show More