Compare commits

..

28 Commits

Author SHA1 Message Date
zhayujie
26693acc3f feat(vision): prioritize main model for image recognition with multi-provider fallback
- Add call_vision method to all bot implementations (DashScope, Claude,
  Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot)
  using each vendor's native multimodal API format
- Remove call_with_tools/call_vision from Bot base class to fix MRO
  shadowing issue with OpenAICompatibleBot mixin
- Refactor vision tool provider resolution: MainModel → other configured
  models (auto-discovered) → OpenAI → LinkAI, with automatic fallback
- Return actual model name used in call_vision responses
- Sync config.json API keys to .env bidirectionally on startup
- Fix bot instance cache to detect bot_type/use_linkai config changes
- Add SSE reconnection support for web console
- Preserve image path hints in Gemini text for correct vision tool calls
- Update docs/tools/vision.mdx
2026-04-11 19:46:11 +08:00
zhayujie
3cd92ccda3 feat: add port config 2026-04-09 21:29:53 +08:00
zhayujie
d86cb4ded6 fix(weixin): update weixin channel version 2026-04-09 09:55:07 +08:00
zhayujie
4d5375f6d6 fix(win): add Windows platform hint in bash tool description 2026-04-08 16:54:26 +08:00
zhayujie
424557fedb fix(win): use PowerShell instead of cmd.exe 2026-04-08 16:50:45 +08:00
zhayujie
89251e603f fix(win): use PowerShell instead of cmd.exe for bash tool on Windows 2026-04-08 16:18:56 +08:00
zhayujie
a653ed07eb fix(win): defer pip install to a helper bat after cow.exe exits 2026-04-08 15:31:03 +08:00
zhayujie
ad86deb014 fix: prioritize using a custom master model for vision 2026-04-08 15:16:59 +08:00
zhayujie
9525dc7584 fix: avoid stale cow.exe on Windows by spawing fresh process 2026-04-08 12:07:18 +08:00
zhayujie
cd31dd27fd fix: increase web console capacity and add frontend retry 2026-04-08 11:48:27 +08:00
zhayujie
360e3670eb feat(browser): detect implicit interactive elements 2026-04-07 01:41:14 +08:00
zhayujie
8dabe3b4c8 fix: remove install-browser cmd display in /help 2026-04-04 23:28:57 +08:00
zhayujie
443e0c2806 feat: show video in web channel 2026-04-03 17:09:38 +08:00
zhayujie
9cc173cc4d fix: use dynamic model name in system prompt runtime info 2026-04-02 17:01:56 +08:00
zhayujie
b5f33e5ecd feat: support qwen3.6-plus 2026-04-02 16:46:58 +08:00
zhayujie
40dfc6860f fix: skill list showing sub-skills inside collection 2026-04-02 11:47:24 +08:00
zhayujie
1c02a04423 fix: handle error when printing QR code on Windows GBK terminals 2026-04-01 17:23:57 +08:00
zhayujie
de0e45070c chore: remove conflicting dependency 2026-04-01 17:19:15 +08:00
zhayujie
c169cc7d74 fix: remove conflicting dependency 2026-04-01 17:12:15 +08:00
zhayujie
cd62ad76f6 fix: cow CLI support python3.7 2026-04-01 16:51:23 +08:00
zhayujie
dd25b0fb5b feat: refine system prompt style and tone guidance 2026-04-01 16:24:41 +08:00
zhayujie
a38b22a6a2 docs: update docs 2026-04-01 15:31:41 +08:00
zhayujie
830b8f2971 feat: release 2.0.5 2026-04-01 15:01:53 +08:00
zhayujie
b058af122c feat: release 2.0.5 2026-04-01 12:24:21 +08:00
zhayujie
174ee0cafc fix(security): prevent path traversal in memory content API 2026-04-01 10:03:58 +08:00
zhayujie
1c336380c0 docs: update release doc 2026-03-31 22:30:31 +08:00
zhayujie
3068880413 feat: save skill display name when downloading 2026-03-31 21:43:57 +08:00
zhayujie
be596681e5 Merge pull request #2735 from zhayujie/feat-wecom-bot-qrcode
feat(wecom_bot): add Wecom Bot QR code scan auth
2026-03-31 21:28:39 +08:00
96 changed files with 2330 additions and 1610 deletions

View File

@@ -13,6 +13,7 @@
<a href="https://cowagent.ai/">🌐 官网</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/">📖 文档中心</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/guide/quick-start">🚀 快速开始</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 技能广场</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ 在线体验</a>
</p>
@@ -23,7 +24,7 @@
-**自主任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标
-**长期记忆:** 自动将对话记忆持久化至本地文件和数据库中,包括核心记忆和日级记忆,支持关键词及向量检索
-**技能系统:** Skills 安装和运行的引擎,支持从 Skill Hub、GitHub 等安装技能,或通过对话创造 Skills
-**技能系统:** Skills 安装和运行的引擎,支持从 [Skill Hub](https://skills.cowagent.ai/)、GitHub 等一键安装技能,或通过对话创造 Skills
-**工具系统:** 内置文件读写、终端执行、浏览器操作、定时任务等工具Agent 自主调用以完成复杂任务
-**CLI系统** 提供终端命令和对话命令,支持进程管理、技能安装、配置修改等操作
-**多模态消息:** 支持对文本、图片、语音、文件等多类型消息进行解析、处理、生成、发送等操作
@@ -68,6 +69,8 @@
# 🏷 更新日志
>**2026.04.01** [2.0.5版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.5)Cow CLI 命令系统、Skill Hub 开源、浏览器工具、企微扫码创建、多项优化和修复。
>**2026.03.22** [2.0.4版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.4),新增个人微信通道(微信扫码即用)、新增 MiniMax-M2.7 和 GLM-5-Turbo 模型、run.sh 脚本重构、日文文档及多项修复。
>**2026.03.18** [2.0.3版本](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.3),新增企微智能机器人和 QQ 通道、支持 Coding Plan、新增多个模型、Web 端文件处理、记忆系统升级。
@@ -98,7 +101,7 @@ bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
```
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start``cow stop` 等 [CLI 命令](https://docs.cowagent.ai/commands/index) 管理服务。
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start``cow stop` 等 [CLI 命令](https://docs.cowagent.ai/cli/index) 管理服务。
## 一、准备
@@ -113,7 +116,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
### 2.环境安装
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`Python 版本需在3.7 ~ 3.12 之间推荐使用3.9版本
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`Python 版本需在3.7 ~ 3.12 之间。
> 注意Agent 模式推荐使用源码运行,若选择 Docker 部署则无需安装 python 环境和下载源码,可直接快进到下一节。
@@ -148,7 +151,7 @@ pip3 install -r requirements-optional.txt
pip3 install -e .
```
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/commands/index)。
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/cli/index)。
**(5) 安装浏览器工具 (可选)**
@@ -215,7 +218,7 @@ cow install-browser
<details>
<summary>2. 其他配置</summary>
+ `model`: 模型名称Agent 模式下推荐使用 `MiniMax-M2.7``glm-5-turbo``kimi-k2.5``qwen3.5-plus``claude-sonnet-4-6``gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
+ `model`: 模型名称Agent 模式下推荐使用 `MiniMax-M2.7``glm-5-turbo``kimi-k2.5``qwen3.6-plus``claude-sonnet-4-6``gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
+ `character_desc`:普通对话模式下的机器人系统提示词。在 Agent 模式下该配置不生效,由工作空间中的文件内容构成。
+ `subscribe_msg`:订阅消息,公众号和企业微信 channel 中请填写,当被订阅时会自动回复, 可使用特殊占位符。目前支持的占位符有{trigger_prefix},在程序中它会自动替换成 bot 的触发词。
</details>
@@ -300,7 +303,7 @@ sudo docker logs -f chatgpt-on-wechat
## 模型说明
以下对所有可支持的模型配置和使用方法进行说明,模型接口实现在项目的 `models/` 目录下。
推荐通过 Web 控制台在线管理模型配置,无需手动编辑文件,详见 [模型文档](https://docs.cowagent.ai/models)。以下是手动修改 `config.json` 配置模型的说明:
<details>
<summary>OpenAI</summary>
@@ -408,18 +411,18 @@ sudo docker logs -f chatgpt-on-wechat
```json
{
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"dashscope_api_key": "sk-qVxxxxG"
}
```
- `model`: 可填写 `qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus`
- `dashscope_api_key`: 通义千问的 API-KEY参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
- `model`: 可填写 `qwen3.6-plus、qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus`
- `dashscope_api_key`: 通义千问的 API-KEY参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"open_ai_api_key": "sk-qVxxxxG"
}
@@ -671,7 +674,7 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
## 通道说明
以下对可接入通道配置方式进行说明,应用通道代码在项目的 `channel/` 目录下。
推荐通过 Web 控制台在线管理通道配置,无需手动编辑文件,详见 [通道文档](https://docs.cowagent.ai/channels/weixin)。以下为手动修改 `config.json` 配置通道的说明:
支持同时可接入多个通道,配置时可通过逗号进行分割,例如 `"channel_type": "feishu,dingtalk"`
@@ -866,8 +869,10 @@ QQ 机器人使用 WebSocket 长连接模式,无需公网 IP 和域名,支
# 🔗 相关项目
- [Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub):开源的 AI Agent 技能广场,浏览、搜索、安装和发布技能,支持 CowAgent、OpenClaw、Claude Code 等多种 Agent。
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything):轻量和高可扩展的大模型应用框架,支持接入 Slack, Telegram, Discord, Gmail 等海外平台,可作为本项目的补充使用。
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh):开源的多智能体( Multi-Agent )框架,可以通过多智能体团队的协同来解决复杂问题。本项目基于该框架实现了[Agent 插件](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/plugins/agent/README.md),可访问终端、浏览器、文件系统、搜索引擎 等各类工具,并实现了多智能体协同。
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh):开源的多智能体( Multi-Agent )框架,可以通过多智能体团队的协同来解决复杂问题。
@@ -879,7 +884,7 @@ FAQs <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
# 🛠️ 开发
欢迎接入更多应用通道,参考 [飞书通道](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) 新增自定义通道,实现接收和发送消息逻辑即可完成接入。同时欢迎贡献新的 Skills参考 [技能创建文档](https://docs.cowagent.ai/skills/create)
欢迎接入更多应用通道,参考 [飞书通道](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) 新增自定义通道,实现接收和发送消息逻辑即可完成接入。同时欢迎贡献新的 Skills [Skill Hub](https://skills.cowagent.ai/submit) 提交技能
# ✉ 联系

View File

@@ -134,6 +134,8 @@ class MemoryService:
else:
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 403, "message": "invalid filename", "payload": None}
except FileNotFoundError as e:
return {"action": action, "code": 404, "message": str(e), "payload": None}
except Exception as e:
@@ -145,14 +147,26 @@ class MemoryService:
# ------------------------------------------------------------------
def _resolve_path(self, filename: str) -> str:
"""
Resolve a filename to its absolute path.
Safely resolve a filename to its absolute path within the allowed directory.
- ``MEMORY.md`` → ``{workspace_root}/MEMORY.md``
- ``2026-02-20.md`` → ``{workspace_root}/memory/2026-02-20.md``
Raises ValueError if the resolved path escapes the allowed directory
(path traversal protection).
"""
if filename == "MEMORY.md":
return os.path.join(self.workspace_root, filename)
return os.path.join(self.memory_dir, filename)
base_dir = self.workspace_root
else:
base_dir = self.memory_dir
resolved = os.path.realpath(os.path.join(base_dir, filename))
allowed = os.path.realpath(base_dir)
if resolved != allowed and not resolved.startswith(allowed + os.sep):
raise ValueError(f"Invalid filename: path traversal detected")
return resolved
@staticmethod
def _file_info(path: str, filename: str, file_type: str) -> dict:

View File

@@ -16,16 +16,26 @@ from datetime import datetime
from common.log import logger
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提取值得记住的信息,生成简洁的记忆摘要
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提炼出值得长期记住的关键事件和核心信息
核心原则:
- 按「事件」维度归纳,而不是按对话轮次逐条记录
- 多轮对话如果围绕同一件事,合并为一条摘要
- 只记录有长期价值的信息,忽略闲聊、问候、无意义的短消息
输出要求:
1. 以事件/关键信息为维度记录,每条一行,用 "- " 开头
2. 记录有价值的关键信息,例如用户提出的求及助手的解决方案,对话中涉及的事实信息用户的偏好决策或重要结论
3. 每条摘要需要简明扼要,只保留关键信息
4. 直接输出摘要内容,不要加任何前缀说明
5. 当对话没有任何记录价值例如只是简单问候,可回复"\""""
1. 每条一行,用 "- " 开头,格式为:事件/主题 + 关键结论或结果
2. 值得记录的信息类型:用户提出的求及最终解决方案、重要的事实信息用户的偏好决策、关键技术方案或配置变更
3. 不值得记录的信息:简单问候、闲聊、无实质内容的短消息、重复的中间过程
4. 每条摘要应当简明扼要,一句话概括事件的核心内容和结果
5. 直接输出摘要内容,不要加任何前缀说明
6. 当对话没有任何记录价值(仅含问候或无意义内容),回复""
SUMMARIZE_USER_PROMPT = """请从以下对话记录中提取关键信息,生成记忆摘要
示例(仅供参考格式)
- 用户配置了 XX 功能,设置参数为 YY已生效
- 用户反馈了 XX 问题,原因是 YY通过 ZZ 方式解决"""
SUMMARIZE_USER_PROMPT = """请从以下对话记录中,按关键事件维度提炼记忆摘要(合并同一事件的多轮对话,不要逐条列出):
{conversation}"""
@@ -220,14 +230,16 @@ class MemoryFlushManager:
if not conversation_text.strip():
return ""
# Try LLM summarization first
if self.llm_model:
try:
summary = self._call_llm_for_summary(conversation_text)
if summary and summary.strip() and summary.strip() != "":
return summary.strip()
logger.info(f"[MemoryFlush] LLM returned empty or '', using fallback")
except Exception as e:
logger.warning(f"[MemoryFlush] LLM summarization failed, using fallback: {e}")
else:
logger.info("[MemoryFlush] No LLM model available, using rule-based fallback")
return self._extract_summary_fallback(messages, max_messages)
@@ -277,27 +289,38 @@ class MemoryFlushManager:
@staticmethod
def _extract_summary_fallback(messages: List[Dict], max_messages: int = 0) -> str:
"""Rule-based fallback when LLM is unavailable."""
"""
Rule-based fallback when LLM is unavailable.
Groups consecutive user+assistant messages into events instead of
listing each message individually.
"""
msgs = messages if max_messages == 0 else messages[-max_messages * 2:]
items = []
events: List[str] = []
current_user_text = ""
for msg in msgs:
role = msg.get("role", "")
text = MemoryFlushManager._extract_text_from_content(msg.get("content", ""))
if not text or not text.strip():
continue
text = text.strip()
if role == "user":
if len(text) <= 5:
continue
items.append(f"- 用户请求: {text[:200]}")
elif role == "assistant":
current_user_text = text[:150]
elif role == "assistant" and current_user_text:
first_line = text.split("\n")[0].strip()
if len(first_line) > 10:
items.append(f"- 处理结果: {first_line[:200]}")
return "\n".join(items[:15])
events.append(f"- {current_user_text} {first_line[:150]}")
else:
events.append(f"- {current_user_text}")
current_user_text = ""
if current_user_text:
events.append(f"- {current_user_text}")
return "\n".join(events[:10])
@staticmethod
def _extract_text_from_content(content) -> str:

View File

@@ -207,9 +207,9 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
"",
"工具调用风格:",
"",
"- 多步骤任务、敏感操作或用户要求时简要解释决策过程",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- 多步骤任务、复杂决策、敏感操作时,应简要说明当前在做什么、为什么这样做,让用户了解关键进展",
"- 持续推进直到任务完成,完成后向用户报告结果",
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
"- URL链接直接放在回复文本中即可系统会自动处理和渲染。无需下载后使用send工具发送",
"",
]
@@ -383,7 +383,8 @@ def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
"",
"**💬 交流规范**:",
"",
"- 对话中不要暴露内部技术细节(文件名、工具名等),用自然语言表达。例如说「我已记住」而非「已更新 MEMORY.md」",
"- 记忆相关操作无需暴露文件名,用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md」",
"- 任务执行过程中的关键决策和步骤应该告知用户,让用户了解你在做什么、为什么这么做",
"- 做真正有帮助的助手,而不是表演式的客套,尽可能帮忙解决问题",
"- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
"- 适当使用 emoji 让表达更生动自然 🎯,但不要过度堆砌",
@@ -477,7 +478,14 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
# Add other runtime info
runtime_parts = []
if runtime_info.get("model"):
# Support dynamic model via callable, fallback to static value
if callable(runtime_info.get("_get_model")):
try:
runtime_parts.append(f"模型={runtime_info['_get_model']()}")
except Exception:
if runtime_info.get("model"):
runtime_parts.append(f"模型={runtime_info['model']}")
elif runtime_info.get("model"):
runtime_parts.append(f"模型={runtime_info['model']}")
if runtime_info.get("workspace"):
runtime_parts.append(f"工作空间={runtime_info['workspace']}")

View File

@@ -231,9 +231,9 @@ _你不是一个聊天机器人你正在成为某个人。_
## 🎯 核心原则
**做真正有帮助的助手,而不是表演式的客套。** 跳过「好的!」「当然可以!」之类的套话——直接帮忙。行动胜过废话
**做真正有帮助的助手。** 目标是真正帮用户解决问题,在执行复杂任务时,关键的决策和过程进展要让用户知道
**有自己的观点。** 你可以不同意、有偏好、觉得有趣或无聊。一个没有个性的助手只是多了几步操作的搜索引擎。
**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。
**先自己动手查。** 先试着搞定:读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来,而不是带着问题。

View File

@@ -53,6 +53,12 @@ class SkillLoader:
"""
Recursively load skills from a directory.
If a subdirectory contains its own SKILL.md, it is treated as a
self-contained skill (or skill-collection) and its children are
NOT scanned further. This prevents sub-skills inside a collection
(e.g. style-collection/style-anjing) from being listed as
independent top-level skills.
:param dir_path: Directory to scan
:param source: Source identifier
:param include_root_files: Whether to include root-level .md files
@@ -66,38 +72,41 @@ class SkillLoader:
except Exception as e:
diagnostics.append(f"Failed to list directory {dir_path}: {e}")
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
# If this directory has its own SKILL.md, load it and stop recursing.
# The sub-directories are internal resources of this skill.
if not include_root_files and 'SKILL.md' in entries:
skill_md_path = os.path.join(dir_path, 'SKILL.md')
if os.path.isfile(skill_md_path):
skill_result = self._load_skill_from_file(skill_md_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)
diagnostics.extend(skill_result.diagnostics)
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
for entry in entries:
# Skip hidden files and directories
if entry.startswith('.'):
continue
# Skip common non-skill directories
if entry in ('node_modules', '__pycache__', 'venv', '.git'):
continue
full_path = os.path.join(dir_path, entry)
# Handle directories
if os.path.isdir(full_path):
# Recursively scan subdirectories
sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
skills.extend(sub_result.skills)
diagnostics.extend(sub_result.diagnostics)
continue
# Handle files
if not os.path.isfile(full_path):
continue
# Check if this is a skill file
is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
is_skill_md = not include_root_files and entry == 'SKILL.md'
if not (is_root_md or is_skill_md):
if not is_root_md:
continue
# Load the skill
skill_result = self._load_skill_from_file(full_path, source)
if skill_result.skills:
skills.extend(skill_result.skills)

View File

@@ -18,9 +18,13 @@ from common.utils import expand_path
class Bash(BaseTool):
"""Tool for executing bash commands"""
_IS_WIN = sys.platform == "win32"
name: str = "bash"
description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
{'''
PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
''' if _IS_WIN else ''}
ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.
SAFETY:
@@ -103,13 +107,12 @@ SAFETY:
logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
# On Windows, convert $VAR references to %VAR% for cmd.exe
if sys.platform == "win32":
if self._IS_WIN:
env["PYTHONIOENCODING"] = "utf-8"
command = self._convert_env_vars_for_windows(command, dotenv_vars)
if command and not command.strip().lower().startswith("chcp"):
command = f"chcp 65001 >nul 2>&1 && {command}"
# Execute command with inherited environment variables
result = subprocess.run(
command,
shell=True,
@@ -120,7 +123,7 @@ SAFETY:
encoding="utf-8",
errors="replace",
timeout=timeout,
env=env
env=env,
)
logger.debug(f"[Bash] Exit code: {result.returncode}")

View File

@@ -45,6 +45,11 @@ _SNAPSHOT_JS = """
const KEEP = new Set(%s);
const INTERACTIVE = new Set(%s);
const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
const CLICKABLE_ROLES = new Set([
"button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
"option","switch","checkbox","radio","combobox","searchbox","slider",
"spinbutton","textbox","treeitem"
]);
let refCounter = 0;
const refMap = {};
@@ -56,6 +61,58 @@ _SNAPSHOT_JS = """
return true;
}
// Strong signals: these attributes alone are enough to mark as interactive
function hasStrongInteractiveSignal(el) {
const role = el.getAttribute("role");
if (role && CLICKABLE_ROLES.has(role)) return true;
if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
if (el.getAttribute("contenteditable") === "true") return true;
return false;
}
// Check if cursor:pointer is set directly (not just inherited from parent)
function hasOwnPointerCursor(el) {
try {
const st = window.getComputedStyle(el);
if (st.cursor !== "pointer") return false;
const parent = el.parentElement;
if (parent) {
const pst = window.getComputedStyle(parent);
if (pst.cursor === "pointer") return false;
}
return true;
} catch(e) {}
return false;
}
function hasTextOrContent(el) {
const t = el.textContent || "";
if (t.trim().length > 0) return true;
if (el.querySelector("img,video,audio,canvas")) return true;
const ariaLabel = el.getAttribute("aria-label");
if (ariaLabel && ariaLabel.trim()) return true;
const title = el.getAttribute("title");
if (title && title.trim()) return true;
return false;
}
function isImplicitInteractive(el) {
if (hasStrongInteractiveSignal(el)) return true;
if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
return false;
}
function getTextContent(el) {
let text = "";
for (const ch of el.childNodes) {
if (ch.nodeType === Node.TEXT_NODE) {
text += ch.textContent;
}
}
return text.trim();
}
function walk(node) {
if (node.nodeType === Node.TEXT_NODE) {
const t = node.textContent.trim();
@@ -75,21 +132,35 @@ _SNAPSHOT_JS = """
}
}
const keep = KEEP.has(tag);
const nativeInteractive = INTERACTIVE.has(tag);
const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
const keep = KEEP.has(tag) || implicitInteractive;
if (!keep) {
// Unwrap: promote children
if (children.length === 0) return null;
if (children.length === 1) return children[0];
return children;
}
const obj = { tag };
if (INTERACTIVE.has(tag)) {
if (nativeInteractive || implicitInteractive) {
refCounter++;
obj.ref = refCounter;
refMap[refCounter] = node;
}
if (implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const directText = getTextContent(node);
if (!directText && children.length === 0) {
const ariaLabel = node.getAttribute("aria-label");
const title = node.getAttribute("title");
if (ariaLabel) obj.ariaLabel = ariaLabel;
else if (title) obj.ariaLabel = title;
}
}
// Attributes
if (tag === "a" && node.href) obj.href = node.getAttribute("href");
if (tag === "img") {
@@ -113,11 +184,13 @@ _SNAPSHOT_JS = """
}
if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;
// Role / aria-label
const role = node.getAttribute("role");
if (role) obj.role = role;
const ariaLabel = node.getAttribute("aria-label");
if (ariaLabel) obj.ariaLabel = ariaLabel;
// Role / aria-label for native interactive & semantic elements
if (!implicitInteractive) {
const role = node.getAttribute("role");
if (role) obj.role = role;
const ariaLabel = node.getAttribute("aria-label");
if (ariaLabel) obj.ariaLabel = ariaLabel;
}
// Children
if (children.length === 1 && typeof children[0] === "string") {
@@ -129,7 +202,6 @@ _SNAPSHOT_JS = """
return obj;
}
// Store refMap on window for later use by click/fill actions
const result = walk(document.body);
window.__cowRefMap = refMap;
return { tree: result, refCount: refCounter };

View File

@@ -1,22 +1,30 @@
"""
Vision tool - Analyze images using OpenAI-compatible Vision API.
Vision tool - Analyze images using Vision API.
Supports local files (auto base64-encoded) and HTTP URLs.
Providers: OpenAI (preferred) > LinkAI (fallback).
Provider priority (default):
1. Main model via bot.call_vision — zero extra cost
2. Other models whose API key is configured — auto-discovered
3. OpenAI / LinkAI raw HTTP — reliable fallback
When use_linkai=true, LinkAI is promoted to #1.
When tool.vision.model is set, that model is used exclusively first.
"""
import base64
import os
import subprocess
import tempfile
from typing import Any, Dict, Optional, Tuple
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional
import requests
from agent.tools.base_tool import BaseTool, ToolResult
from common import const
from common.log import logger
from config import conf
DEFAULT_MODEL = "gpt-4.1-mini"
DEFAULT_MODEL = const.GPT_41_MINI
DEFAULT_TIMEOUT = 60
MAX_TOKENS = 1000
COMPRESS_THRESHOLD = 1_048_576 # 1 MB
@@ -29,15 +37,46 @@ SUPPORTED_EXTENSIONS = {
"webp": "image/webp",
}
_MAIN_MODEL_PROVIDER_NAME = "MainModel"
# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
# Auto-discovered as fallback vision providers when their API key is configured.
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
_DISCOVERABLE_MODELS = [
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_5, "Moonshot"),
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
]
@dataclass
class VisionProvider:
"""A single Vision API provider configuration."""
name: str
api_key: str
api_base: str
extra_headers: dict = field(default_factory=dict)
model_override: Optional[str] = None
use_bot: bool = False # When True, call via bot.call_vision instead of raw HTTP
fallback_bot: Any = None # Bot instance for non-main-model providers
class VisionAPIError(Exception):
"""Raised when a Vision API call fails and should trigger fallback."""
pass
class Vision(BaseTool):
"""Analyze images using OpenAI-compatible Vision API"""
"""Analyze images using Vision API"""
name: str = "vision"
description: str = (
"Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
"Can describe content, extract text, identify objects, colors, etc. "
"Requires OPENAI_API_KEY or LINKAI_API_KEY."
)
params: dict = {
@@ -51,13 +90,6 @@ class Vision(BaseTool):
"type": "string",
"description": "Question to ask about the image",
},
"model": {
"type": "string",
"description": (
f"Vision model to use (default: {DEFAULT_MODEL}). "
"Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o"
),
},
},
"required": ["image", "question"],
}
@@ -67,29 +99,26 @@ class Vision(BaseTool):
@staticmethod
def is_available() -> bool:
return bool(
conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
or conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
)
return True
def execute(self, args: Dict[str, Any]) -> ToolResult:
image = args.get("image", "").strip()
question = args.get("question", "").strip()
model = args.get("model", DEFAULT_MODEL).strip() or DEFAULT_MODEL
if not image:
return ToolResult.fail("Error: 'image' parameter is required")
if not question:
return ToolResult.fail("Error: 'question' parameter is required")
api_key, api_base, extra_headers = self._resolve_provider()
if not api_key:
providers = self._resolve_providers()
if not providers:
return ToolResult.fail(
"Error: No API key configured for Vision.\n"
"Please configure one of the following using env_config tool:\n"
" 1. OPENAI_API_KEY (preferred): env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 2. LINKAI_API_KEY (fallback): env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")\n\n"
"Get your key at: https://platform.openai.com/api-keys or https://link-ai.tech"
"Error: No model available for Vision.\n"
"The main model does not support vision and no other API keys are configured.\n"
"Options:\n"
" 1. Switch to a multimodal model (e.g. qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
)
try:
@@ -97,36 +126,221 @@ class Vision(BaseTool):
except Exception as e:
return ToolResult.fail(f"Error: {e}")
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
question: str, image_content: dict) -> ToolResult:
"""Try each provider in order; fall back to the next one on failure."""
errors: List[str] = []
for i, provider in enumerate(providers):
use_model = provider.model_override or model
try:
logger.info(f"[Vision] Trying provider '{provider.name}' "
f"with model '{use_model}' ({i + 1}/{len(providers)})")
if provider.use_bot:
result = self._call_via_bot(use_model, question, image_content, provider)
else:
result = self._call_api(provider, use_model, question, image_content)
logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
return result
except VisionAPIError as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
except requests.Timeout:
errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
logger.warning(f"[Vision] Provider '{provider.name}' timed out")
except requests.ConnectionError:
errors.append(f"[{provider.name}/{use_model}] Connection failed")
logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
except Exception as e:
errors.append(f"[{provider.name}/{use_model}] {e}")
logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
return ToolResult.fail(
"Error: All Vision API providers failed.\n" + "\n".join(f" - {err}" for err in errors)
)
def _resolve_providers(self) -> List[VisionProvider]:
"""
Build an ordered list of available providers.
Priority:
- use_linkai=true → [LinkAI, MainModel, OtherModels…, OpenAI]
- default → [MainModel, OtherModels…, OpenAI, LinkAI]
"OtherModels" are auto-discovered from configured API keys.
The main model's bot_type is excluded from OtherModels to avoid
duplicating the MainModel provider.
"""
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
providers: List[VisionProvider] = []
if use_linkai:
self._append_provider(providers, self._build_linkai_provider)
self._append_provider(providers, self._build_main_model_provider)
self._append_other_model_providers(providers)
self._append_provider(providers, self._build_openai_provider)
else:
self._append_provider(providers, self._build_main_model_provider)
self._append_other_model_providers(providers)
self._append_provider(providers, self._build_openai_provider)
self._append_provider(providers, self._build_linkai_provider)
return providers
@staticmethod
def _append_provider(providers: List[VisionProvider], builder) -> None:
p = builder()
if p:
providers.append(p)
def _append_other_model_providers(self, providers: List[VisionProvider]) -> None:
"""
Auto-discover other models whose API key is configured.
Skip the main model's own bot_type (already covered by MainModel provider).
Skip bot_types that already have a provider in the list (e.g. OpenAI).
"""
# Determine main model's bot_type so we can skip it
main_bot_type = None
if self.model and hasattr(self.model, '_resolve_bot_type'):
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
existing_names = {p.name for p in providers}
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
if display_name in existing_names:
continue
if bot_type == main_bot_type:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
continue
# Create a bot instance and check if it supports call_vision
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
continue
except Exception:
continue
providers.append(VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=default_model,
use_bot=True,
fallback_bot=bot,
))
def _resolve_vision_model(self) -> Optional[str]:
"""
Determine which model to use for vision.
1. User explicit config: tool.vision.model in config.json
2. Fallback to the main configured model name
"""
tool_conf = conf().get("tool", {})
user_vision_model = tool_conf.get("vision", {}).get("model") if isinstance(tool_conf, dict) else None
if user_vision_model:
return user_vision_model
model_name = conf().get("model", "")
return model_name or None
def _build_main_model_provider(self) -> Optional[VisionProvider]:
"""
Use the vendor's own model for vision via bot.call_vision.
Only available when the bot class has call_vision.
"""
if not (self.model and hasattr(self.model, 'bot')):
return None
try:
return self._call_api(api_key, api_base, model, question, image_content, extra_headers)
except requests.Timeout:
return ToolResult.fail(f"Error: Vision API request timed out after {DEFAULT_TIMEOUT}s")
except requests.ConnectionError:
return ToolResult.fail("Error: Failed to connect to Vision API")
except Exception as e:
logger.error(f"[Vision] Unexpected error: {e}", exc_info=True)
return ToolResult.fail(f"Error: Vision API call failed - {e}")
bot = self.model.bot
if not hasattr(bot, 'call_vision'):
return None
except Exception:
return None
def _resolve_provider(self) -> Tuple[Optional[str], str, dict]:
"""Resolve API key, base URL and extra headers. Priority: conf() > env vars."""
vision_model = self._resolve_vision_model()
return VisionProvider(
name=_MAIN_MODEL_PROVIDER_NAME,
api_key="",
api_base="",
model_override=vision_model,
use_bot=True,
)
def _build_openai_provider(self) -> Optional[VisionProvider]:
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
if api_key:
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
return api_key, self._ensure_v1(api_base), {}
if not api_key:
return None
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
return VisionProvider(name="OpenAI", api_key=api_key, api_base=self._ensure_v1(api_base))
def _build_linkai_provider(self) -> Optional[VisionProvider]:
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
if api_key:
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
logger.debug("[Vision] Using LinkAI API (OPENAI_API_KEY not set)")
from common.utils import get_cloud_headers
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
return api_key, self._ensure_v1(api_base), extra
if not api_key:
return None
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
or "https://api.link-ai.tech"
from common.utils import get_cloud_headers
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
return VisionProvider(name="LinkAI", api_key=api_key, api_base=self._ensure_v1(api_base),
extra_headers=extra)
return None, "", {}
def _call_via_bot(self, model: str, question: str, image_content: dict,
provider: Optional[VisionProvider] = None) -> ToolResult:
"""
Call a model's call_vision with vendor-native API format.
Uses the provider's _fallback_bot if set, otherwise the main model bot.
Raises VisionAPIError on failure so fallback can proceed.
"""
try:
bot = (provider and provider.fallback_bot) or self.model.bot
except Exception as e:
raise VisionAPIError(f"Cannot access bot: {e}")
# Extract the raw image URL from the OpenAI-format image_content block
image_url = image_content.get("image_url", {}).get("url", "")
if not image_url:
raise VisionAPIError("No image URL in content block")
try:
response = bot.call_vision(
image_url=image_url,
question=question,
model=model,
max_tokens=MAX_TOKENS,
)
except Exception as e:
raise VisionAPIError(f"call_vision failed: {e}")
if response is NotImplemented:
raise VisionAPIError("Bot does not support vision")
if isinstance(response, dict) and response.get("error"):
raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
content = response.get("content", "") if isinstance(response, dict) else ""
if not content:
raise VisionAPIError("Empty response from main model")
usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
# Use the actual model name from the bot response if available
actual_model = response.get("model", model) if isinstance(response, dict) else model
provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
return ToolResult.success({
"model": actual_model,
"provider": provider_name,
"content": content,
"usage": usage_info,
})
@staticmethod
def _ensure_v1(api_base: str) -> str:
@@ -139,9 +353,13 @@ class Vision(BaseTool):
return api_base.rstrip("/") + "/v1"
def _build_image_content(self, image: str) -> dict:
"""Build the image_url content block for the API request."""
"""
Build the image_url content block.
Both remote URLs and local files are converted to base64 data URLs
so every bot backend can consume them without extra downloads.
"""
if image.startswith(("http://", "https://")):
return {"type": "image_url", "image_url": {"url": image}}
return self._download_to_data_url(image)
if not os.path.isfile(image):
raise FileNotFoundError(f"Image file not found: {image}")
@@ -165,6 +383,19 @@ class Vision(BaseTool):
data_url = f"data:{mime_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _download_to_data_url(url: str) -> dict:
"""Download a remote image and return it as a base64 data URL."""
resp = requests.get(url, timeout=30)
if resp.status_code != 200:
raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
if not content_type.startswith("image/"):
content_type = "image/jpeg"
b64 = base64.b64encode(resp.content).decode("ascii")
data_url = f"data:{content_type};base64,{b64}"
return {"type": "image_url", "image_url": {"url": data_url}}
@staticmethod
def _maybe_compress(path: str) -> str:
"""Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
@@ -220,8 +451,13 @@ class Vision(BaseTool):
os.remove(tmp.name)
return path
def _call_api(self, api_key: str, api_base: str, model: str,
question: str, image_content: dict, extra_headers: dict = None) -> ToolResult:
def _call_api(self, provider: VisionProvider, model: str,
question: str, image_content: dict) -> ToolResult:
"""
Call a single provider's Vision API.
Raises VisionAPIError on recoverable failures so the caller can try
the next provider.
"""
payload = {
"model": model,
"messages": [
@@ -233,34 +469,29 @@ class Vision(BaseTool):
],
}
],
"max_tokens": MAX_TOKENS,
}
headers = {
"Authorization": f"Bearer {api_key}",
"Authorization": f"Bearer {provider.api_key}",
"Content-Type": "application/json",
**(extra_headers or {}),
**provider.extra_headers,
}
resp = requests.post(
f"{api_base}/chat/completions",
f"{provider.api_base}/chat/completions",
headers=headers,
json=payload,
timeout=DEFAULT_TIMEOUT,
)
if resp.status_code == 401:
return ToolResult.fail("Error: Invalid API key. Please check your configuration.")
if resp.status_code == 429:
return ToolResult.fail("Error: API rate limit reached. Please try again later.")
if resp.status_code != 200:
return ToolResult.fail(f"Error: Vision API returned HTTP {resp.status_code}: {resp.text[:200]}")
raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")
data = resp.json()
if "error" in data:
msg = data["error"].get("message", "Unknown API error")
return ToolResult.fail(f"Error: Vision API error - {msg}")
raise VisionAPIError(f"API error - {msg}")
content = ""
choices = data.get("choices", [])
@@ -270,6 +501,7 @@ class Vision(BaseTool):
usage = data.get("usage", {})
result = {
"model": model,
"provider": provider.name,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),

View File

@@ -67,7 +67,7 @@ class AgentLLMModel(LLMModel):
_MODEL_BOT_TYPE_MAP = {
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
const.MODELSCOPE: const.MODELSCOPE,
}
_MODEL_PREFIX_MAP = [
@@ -124,14 +124,15 @@ class AgentLLMModel(LLMModel):
@property
def bot(self):
"""Lazy load the bot, re-create when model changes"""
"""Lazy load the bot, re-create when model or bot_type changes"""
from models.bot_factory import create_bot
cur_model = self.model
if self._bot is None or self._bot_model != cur_model:
bot_type = self._resolve_bot_type(cur_model)
self._bot = create_bot(bot_type)
cur_bot_type = self._resolve_bot_type(cur_model)
if self._bot is None or self._bot_model != cur_model or getattr(self, '_bot_type', None) != cur_bot_type:
self._bot = create_bot(cur_bot_type)
self._bot = add_openai_compatible_support(self._bot)
self._bot_model = cur_model
self._bot_type = cur_bot_type
return self._bot
def call(self, request: LLMRequest):
@@ -505,15 +506,15 @@ class AgentBridge:
def _migrate_config_to_env(self, workspace_root: str):
"""
Migrate API keys from config.json to .env file if not already set
Sync API keys from config.json to .env file.
Adds new keys and updates changed values on each startup.
Args:
workspace_root: Workspace directory path (not used, kept for compatibility)
"""
from config import conf
import os
# Mapping from config.json keys to environment variable names
key_mapping = {
"open_ai_api_key": "OPENAI_API_KEY",
"open_ai_api_base": "OPENAI_API_BASE",
@@ -522,10 +523,9 @@ class AgentBridge:
"linkai_api_key": "LINKAI_API_KEY",
}
# Use fixed secure location for .env file
env_file = expand_path("~/.cow/.env")
# Read existing env vars from .env file
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
@@ -533,48 +533,46 @@ class AgentBridge:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, _ = line.split('=', 1)
existing_env_vars[key.strip()] = True
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentBridge] Failed to read .env file: {e}")
# Check which keys need to be migrated
keys_to_migrate = {}
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
# Skip if already in .env file
if env_key in existing_env_vars:
continue
# Get value from config.json
value = conf().get(config_key, "")
if value and value.strip(): # Only migrate non-empty values
keys_to_migrate[env_key] = value.strip()
# Log summary if there are keys to skip
if existing_env_vars:
logger.debug(f"[AgentBridge] {len(existing_env_vars)} env vars already in .env")
# Write new keys to .env file
if keys_to_migrate:
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
updated = True
if updated:
try:
# Ensure ~/.cow directory and .env file exist
env_dir = os.path.dirname(env_file)
if not os.path.exists(env_dir):
os.makedirs(env_dir, exist_ok=True)
if not os.path.exists(env_file):
open(env_file, 'a').close()
# Append new keys
with open(env_file, 'a', encoding='utf-8') as f:
f.write('\n# Auto-migrated from config.json\n')
for key, value in keys_to_migrate.items():
os.makedirs(env_dir, exist_ok=True)
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
# Also set in current process
os.environ[key] = value
logger.info(f"[AgentBridge] Migrated {len(keys_to_migrate)} API keys from config.json to .env: {list(keys_to_migrate.keys())}")
logger.info(f"[AgentBridge] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentBridge] Failed to migrate API keys: {e}")
logger.warning(f"[AgentBridge] Failed to sync API keys: {e}")
def _persist_messages(
self, session_id: str, new_messages: list, channel_type: str = ""

View File

@@ -465,8 +465,12 @@ class AgentInitializer:
'timezone': timezone_name
}
def get_model():
"""Get current model name dynamically from config"""
return conf().get("model", "unknown")
return {
"model": conf().get("model", "unknown"),
"_get_model": get_model,
"workspace": workspace_root,
"channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
"_get_current_time": get_current_time # Dynamic time function
@@ -486,7 +490,7 @@ class AgentInitializer:
env_file = expand_path("~/.cow/.env")
# Read existing env vars
# Read existing env vars (key -> value)
existing_env_vars = {}
if os.path.exists(env_file):
try:
@@ -494,38 +498,46 @@ class AgentInitializer:
for line in f:
line = line.strip()
if line and not line.startswith('#') and '=' in line:
key, _ = line.split('=', 1)
existing_env_vars[key.strip()] = True
key, val = line.split('=', 1)
existing_env_vars[key.strip()] = val.strip()
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
# Check which keys need migration
keys_to_migrate = {}
# Sync config.json values into .env (add/update/remove)
updated = False
for config_key, env_key in key_mapping.items():
if env_key in existing_env_vars:
continue
value = conf().get(config_key, "")
if value and value.strip():
keys_to_migrate[env_key] = value.strip()
# Write new keys
if keys_to_migrate:
raw = conf().get(config_key, "")
value = raw.strip() if raw else ""
old_value = existing_env_vars.get(env_key)
if value:
if old_value == value:
continue
existing_env_vars[env_key] = value
os.environ[env_key] = value
updated = True
else:
if old_value is None:
continue
existing_env_vars.pop(env_key, None)
os.environ.pop(env_key, None)
updated = True
if updated:
try:
env_dir = os.path.dirname(env_file)
if not os.path.exists(env_dir):
os.makedirs(env_dir, exist_ok=True)
if not os.path.exists(env_file):
open(env_file, 'a').close()
with open(env_file, 'a', encoding='utf-8') as f:
f.write('\n# Auto-migrated from config.json\n')
for key, value in keys_to_migrate.items():
os.makedirs(env_dir, exist_ok=True)
# Rewrite the entire .env file to ensure consistency
with open(env_file, 'w', encoding='utf-8') as f:
f.write('# Environment variables for agent\n')
f.write('# Auto-managed - synced from config.json on startup\n\n')
for key, value in sorted(existing_env_vars.items()):
f.write(f'{key}={value}\n')
os.environ[key] = value
logger.info(f"[AgentInitializer] Migrated {len(keys_to_migrate)} API keys to .env: {list(keys_to_migrate.keys())}")
logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
except Exception as e:
logger.warning(f"[AgentInitializer] Failed to migrate API keys: {e}")
logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")
def _start_daily_flush_timer(self):
"""Start a background thread that flushes all agents' memory daily at 23:55."""

View File

@@ -39,11 +39,8 @@ class Bridge(object):
self.btype["chat"] = const.BAIDU
if model_type in ["xunfei"]:
self.btype["chat"] = const.XUNFEI
if model_type in [const.QWEN]:
self.btype["chat"] = const.QWEN
if model_type in [const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
self.btype["chat"] = const.QWEN_DASHSCOPE
# Support Qwen3 and other DashScope models
if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
self.btype["chat"] = const.QWEN_DASHSCOPE
if model_type and model_type.startswith("gemini"):

View File

@@ -347,38 +347,30 @@ class ChatChannel(Channel):
if media_items:
logger.info(f"[chat_channel] Extracted {len(media_items)} media item(s) from reply")
# 先发送文本(保持原文本不变)
# Send text first (the frontend will embed video players via renderMarkdown).
logger.info(f"[chat_channel] Sending text content before media: {reply.content[:100]}...")
self._send(reply, context)
logger.info(f"[chat_channel] Text sent, now sending {len(media_items)} media item(s)")
# 然后逐个发送媒体文件
for i, (url, media_type) in enumerate(media_items):
try:
# 判断是本地文件还是URL
# Determine whether it is a remote URL or a local file.
if url.startswith(('http://', 'https://')):
# 网络资源
if media_type == 'video':
# 视频使用 FILE 类型发送
media_reply = Reply(ReplyType.FILE, url)
media_reply.file_name = os.path.basename(url)
else:
# 图片使用 IMAGE_URL 类型
media_reply = Reply(ReplyType.IMAGE_URL, url)
elif os.path.exists(url):
# 本地文件
if media_type == 'video':
# 视频使用 FILE 类型,转换为 file:// URL
media_reply = Reply(ReplyType.FILE, f"file://{url}")
media_reply.file_name = os.path.basename(url)
else:
# 图片使用 IMAGE_URL 类型,转换为 file:// URL
media_reply = Reply(ReplyType.IMAGE_URL, f"file://{url}")
else:
logger.warning(f"[chat_channel] Media file not found or invalid URL: {url}")
continue
# 发送媒体文件(添加小延迟避免频率限制)
if i > 0:
time.sleep(0.5)
self._send(media_reply, context)

View File

@@ -455,6 +455,11 @@
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="skills_title">Skills</h2>
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="skills_desc">View, enable, or disable agent skills</p>
</div>
<a href="https://skills.cowagent.ai/" target="_blank"
class="inline-flex items-center gap-1.5 px-3 py-1.5 rounded-lg text-xs font-medium text-primary-500 bg-primary-50 dark:bg-primary-900/20 hover:bg-primary-100 dark:hover:bg-primary-900/30 transition-colors">
<i class="fas fa-puzzle-piece text-[10px]"></i>
<span data-i18n="skills_hub_btn">Skill Hub</span>
</a>
</div>
<!-- Built-in Tools Section -->

View File

@@ -33,7 +33,7 @@ const I18N = {
config_save: '保存', config_saved: '已保存',
config_save_error: '保存失败',
config_custom_option: '自定义...',
skills_title: '技能管理', skills_desc: '查看、启用或禁用 Agent 技能',
skills_title: '技能管理', skills_desc: '查看、启用或禁用 Agent 技能', skills_hub_btn: '探索技能广场',
skills_loading: '加载技能中...', skills_loading_desc: '技能加载后将显示在此处',
tools_section_title: '内置工具', tools_loading: '加载工具中...',
skills_section_title: '技能', skill_enable: '启用', skill_disable: '禁用',
@@ -88,7 +88,7 @@ const I18N = {
config_save: 'Save', config_saved: 'Saved',
config_save_error: 'Save failed',
config_custom_option: 'Custom...',
skills_title: 'Skills', skills_desc: 'View, enable, or disable agent skills',
skills_title: 'Skills', skills_desc: 'View, enable, or disable agent skills', skills_hub_btn: 'Skill Hub',
skills_loading: 'Loading skills...', skills_loading_desc: 'Skills will be displayed here after loading',
tools_section_title: 'Built-in Tools', tools_loading: 'Loading tools...',
skills_section_title: 'Skills', skill_enable: 'Enable', skill_disable: 'Disable',
@@ -270,8 +270,42 @@ function createMd() {
const md = createMd();
const VIDEO_EXT_RE = /\.(?:mp4|webm|mov|avi|mkv)$/i; // tested against URL without query string
function _buildVideoHtml(url) {
const fileName = url.split('/').pop().split('?')[0];
return `<div style="margin:10px 0;">` +
`<video controls preload="metadata" ` +
`style="max-width:100%;border-radius:10px;box-shadow:0 2px 8px rgba(0,0,0,0.15);display:block;">` +
`<source src="${url}"></video>` +
`<a href="${url}" target="_blank" ` +
`style="display:inline-flex;align-items:center;gap:4px;margin-top:4px;font-size:12px;color:#8b8fa8;text-decoration:none;">` +
`<i class="fas fa-download"></i> ${escapeHtml(fileName)}</a></div>`;
}
function injectVideoPlayers(html) {
// Step 1: replace markdown-it anchor tags whose href points to a video file.
const step1 = html.replace(
/<a\s+href="(https?:\/\/[^"]+)"[^>]*>[^<]*<\/a>/gi,
(match, url) => VIDEO_EXT_RE.test(url.split('?')[0]) ? _buildVideoHtml(url) : match
);
// Step 2: replace any remaining bare video URLs in text nodes (not inside HTML tags).
// Split on HTML tags to avoid touching src/href attributes already in markup.
return step1.split(/(<[^>]+>)/).map((chunk, idx) => {
// Even indices are text nodes; odd indices are HTML tags — leave them untouched.
if (idx % 2 !== 0) return chunk;
return chunk.replace(/https?:\/\/\S+/gi, (url) => {
const bare = url.replace(/[),.\s]+$/, ''); // strip trailing punctuation
return VIDEO_EXT_RE.test(bare.split('?')[0]) ? _buildVideoHtml(bare) : url;
});
}).join('');
}
function renderMarkdown(text) {
try { return md.render(text); }
try {
const html = md.render(text);
return injectVideoPlayers(html);
}
catch (e) { return text.replace(/\n/g, '<br>'); }
}
@@ -729,41 +763,60 @@ function sendMessage() {
}));
}
fetch('/message', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body)
})
.then(r => r.json())
.then(data => {
if (data.status === 'success') {
if (data.stream) {
startSSE(data.request_id, loadingEl, timestamp);
const MAX_RETRIES = 2;
const RETRY_DELAY_MS = 1000;
function postWithRetry(attempt) {
fetch('/message', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(body)
})
.then(r => r.json())
.then(data => {
if (data.status === 'success') {
if (data.stream) {
startSSE(data.request_id, loadingEl, timestamp);
} else {
loadingContainers[data.request_id] = loadingEl;
if (!isPolling) startPolling();
}
} else {
loadingContainers[data.request_id] = loadingEl;
if (!isPolling) startPolling();
loadingEl.remove();
addBotMessage(t('error_send'), new Date());
}
})
.catch(err => {
if (err.name === 'AbortError') {
loadingEl.remove();
addBotMessage(t('error_timeout'), new Date());
return;
}
if (attempt < MAX_RETRIES) {
console.warn(`[sendMessage] attempt ${attempt + 1} failed, retrying...`, err);
setTimeout(() => postWithRetry(attempt + 1), RETRY_DELAY_MS * (attempt + 1));
return;
}
} else {
loadingEl.remove();
addBotMessage(t('error_send'), new Date());
}
})
.catch(err => {
loadingEl.remove();
addBotMessage(err.name === 'AbortError' ? t('error_timeout') : t('error_send'), new Date());
});
});
}
postWithRetry(0);
}
function startSSE(requestId, loadingEl, timestamp) {
const es = new EventSource(`/stream?request_id=${encodeURIComponent(requestId)}`);
activeStreams[requestId] = es;
let botEl = null;
let stepsEl = null; // .agent-steps (thinking summaries + tool indicators)
let contentEl = null; // .answer-content (final streaming answer)
let mediaEl = null; // .media-content (images & file attachments)
let accumulatedText = '';
let currentToolEl = null;
let done = false;
const MAX_RECONNECTS = 10;
const RECONNECT_BASE_MS = 1000;
let reconnectCount = 0;
function ensureBotEl() {
if (botEl) return;
@@ -788,162 +841,204 @@ function startSSE(requestId, loadingEl, timestamp) {
mediaEl = botEl.querySelector('.media-content');
}
es.onmessage = function(e) {
let item;
try { item = JSON.parse(e.data); } catch (_) { return; }
function connect() {
const es = new EventSource(`/stream?request_id=${encodeURIComponent(requestId)}`);
activeStreams[requestId] = es;
if (item.type === 'delta') {
ensureBotEl();
accumulatedText += item.content;
contentEl.innerHTML = renderMarkdown(accumulatedText);
scrollChatToBottom();
es.onmessage = function(e) {
let item;
try { item = JSON.parse(e.data); } catch (_) { return; }
} else if (item.type === 'tool_start') {
ensureBotEl();
// Successful data received, reset reconnect counter
reconnectCount = 0;
// Save current thinking as a collapsible step
if (accumulatedText.trim()) {
const fullText = accumulatedText.trim();
const oneLine = fullText.replace(/\n+/g, ' ');
const needsTruncate = oneLine.length > 80;
const stepEl = document.createElement('div');
stepEl.className = 'agent-step agent-thinking-step' + (needsTruncate ? '' : ' no-expand');
if (needsTruncate) {
const truncated = oneLine.substring(0, 80) + '…';
stepEl.innerHTML = `
<div class="thinking-header" onclick="this.parentElement.classList.toggle('expanded')">
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
<span class="thinking-summary">${escapeHtml(truncated)}</span>
<i class="fas fa-chevron-right thinking-chevron"></i>
</div>
<div class="thinking-full">${renderMarkdown(fullText)}</div>`;
} else {
stepEl.innerHTML = `
<div class="thinking-header no-toggle">
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
<span>${escapeHtml(oneLine)}</span>
</div>`;
if (item.type === 'delta') {
ensureBotEl();
accumulatedText += item.content;
contentEl.innerHTML = renderMarkdown(accumulatedText);
scrollChatToBottom();
} else if (item.type === 'tool_start') {
ensureBotEl();
// Save current thinking as a collapsible step
if (accumulatedText.trim()) {
const fullText = accumulatedText.trim();
const oneLine = fullText.replace(/\n+/g, ' ');
const needsTruncate = oneLine.length > 80;
const stepEl = document.createElement('div');
stepEl.className = 'agent-step agent-thinking-step' + (needsTruncate ? '' : ' no-expand');
if (needsTruncate) {
const truncated = oneLine.substring(0, 80) + '…';
stepEl.innerHTML = `
<div class="thinking-header" onclick="this.parentElement.classList.toggle('expanded')">
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
<span class="thinking-summary">${escapeHtml(truncated)}</span>
<i class="fas fa-chevron-right thinking-chevron"></i>
</div>
<div class="thinking-full">${renderMarkdown(fullText)}</div>`;
} else {
stepEl.innerHTML = `
<div class="thinking-header no-toggle">
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
<span>${escapeHtml(oneLine)}</span>
</div>`;
}
stepsEl.appendChild(stepEl);
}
stepsEl.appendChild(stepEl);
}
accumulatedText = '';
contentEl.innerHTML = '';
accumulatedText = '';
contentEl.innerHTML = '';
// Add tool execution indicator (collapsible)
currentToolEl = document.createElement('div');
currentToolEl.className = 'agent-step agent-tool-step';
const argsStr = formatToolArgs(item.arguments || {});
currentToolEl.innerHTML = `
<div class="tool-header" onclick="this.parentElement.classList.toggle('expanded')">
<i class="fas fa-cog fa-spin text-primary-400 flex-shrink-0 tool-icon"></i>
<span class="tool-name">${item.tool}</span>
<i class="fas fa-chevron-right tool-chevron"></i>
</div>
<div class="tool-detail">
<div class="tool-detail-section">
<div class="tool-detail-label">Input</div>
<pre class="tool-detail-content">${argsStr}</pre>
// Add tool execution indicator (collapsible)
currentToolEl = document.createElement('div');
currentToolEl.className = 'agent-step agent-tool-step';
const argsStr = formatToolArgs(item.arguments || {});
currentToolEl.innerHTML = `
<div class="tool-header" onclick="this.parentElement.classList.toggle('expanded')">
<i class="fas fa-cog fa-spin text-primary-400 flex-shrink-0 tool-icon"></i>
<span class="tool-name">${item.tool}</span>
<i class="fas fa-chevron-right tool-chevron"></i>
</div>
<div class="tool-detail-section tool-output-section"></div>
</div>`;
stepsEl.appendChild(currentToolEl);
<div class="tool-detail">
<div class="tool-detail-section">
<div class="tool-detail-label">Input</div>
<pre class="tool-detail-content">${argsStr}</pre>
</div>
<div class="tool-detail-section tool-output-section"></div>
</div>`;
stepsEl.appendChild(currentToolEl);
scrollChatToBottom();
scrollChatToBottom();
} else if (item.type === 'tool_end') {
if (currentToolEl) {
const isError = item.status !== 'success';
const icon = currentToolEl.querySelector('.tool-icon');
icon.className = isError
? 'fas fa-times text-red-400 flex-shrink-0 tool-icon'
: 'fas fa-check text-primary-400 flex-shrink-0 tool-icon';
} else if (item.type === 'tool_end') {
if (currentToolEl) {
const isError = item.status !== 'success';
const icon = currentToolEl.querySelector('.tool-icon');
icon.className = isError
? 'fas fa-times text-red-400 flex-shrink-0 tool-icon'
: 'fas fa-check text-primary-400 flex-shrink-0 tool-icon';
// Show execution time
const nameEl = currentToolEl.querySelector('.tool-name');
if (item.execution_time !== undefined) {
nameEl.innerHTML += ` <span class="tool-time">${item.execution_time}s</span>`;
// Show execution time
const nameEl = currentToolEl.querySelector('.tool-name');
if (item.execution_time !== undefined) {
nameEl.innerHTML += ` <span class="tool-time">${item.execution_time}s</span>`;
}
// Fill output section
const outputSection = currentToolEl.querySelector('.tool-output-section');
if (outputSection && item.result) {
outputSection.innerHTML = `
<div class="tool-detail-label">${isError ? 'Error' : 'Output'}</div>
<pre class="tool-detail-content ${isError ? 'tool-error-text' : ''}">${escapeHtml(String(item.result))}</pre>`;
}
if (isError) currentToolEl.classList.add('tool-failed');
currentToolEl = null;
}
// Fill output section
const outputSection = currentToolEl.querySelector('.tool-output-section');
if (outputSection && item.result) {
outputSection.innerHTML = `
<div class="tool-detail-label">${isError ? 'Error' : 'Output'}</div>
<pre class="tool-detail-content ${isError ? 'tool-error-text' : ''}">${escapeHtml(String(item.result))}</pre>`;
}
} else if (item.type === 'image') {
ensureBotEl();
const imgEl = document.createElement('img');
imgEl.src = item.content;
imgEl.alt = 'screenshot';
imgEl.style.cssText = 'max-width:600px;border-radius:8px;margin:8px 0;cursor:pointer;box-shadow:0 1px 4px rgba(0,0,0,0.1);';
imgEl.onclick = () => window.open(item.content, '_blank');
mediaEl.appendChild(imgEl);
scrollChatToBottom();
if (isError) currentToolEl.classList.add('tool-failed');
currentToolEl = null;
} else if (item.type === 'text') {
// Intermediate text sent before media items; display it but keep SSE open.
ensureBotEl();
contentEl.classList.remove('sse-streaming');
const textContent = item.content || accumulatedText;
if (textContent) contentEl.innerHTML = renderMarkdown(textContent);
applyHighlighting(botEl);
scrollChatToBottom();
} else if (item.type === 'video') {
ensureBotEl();
const wrapper = document.createElement('div');
wrapper.innerHTML = _buildVideoHtml(item.content);
mediaEl.appendChild(wrapper.firstElementChild || wrapper);
scrollChatToBottom();
} else if (item.type === 'file') {
ensureBotEl();
const fileName = item.file_name || item.content.split('/').pop();
const fileEl = document.createElement('a');
fileEl.href = item.content;
fileEl.download = fileName;
fileEl.target = '_blank';
fileEl.className = 'file-attachment';
fileEl.style.cssText = 'display:inline-flex;align-items:center;gap:6px;padding:8px 14px;margin:8px 0;border-radius:8px;background:var(--bg-secondary,#f3f4f6);color:var(--text-primary,#374151);text-decoration:none;font-size:14px;border:1px solid var(--border-color,#e5e7eb);';
fileEl.innerHTML = `<i class="fas fa-file-download" style="color:#6b7280;"></i> ${fileName}`;
mediaEl.appendChild(fileEl);
scrollChatToBottom();
} else if (item.type === 'phase') {
// Coarse progress (e.g. cow install-browser); must not close SSE (unlike "done")
ensureBotEl();
const wrap = document.createElement('div');
wrap.className = 'text-xs sm:text-sm text-slate-600 dark:text-slate-400 border-l-2 border-primary-400 pl-2 py-1 my-0.5';
wrap.textContent = String(item.content || '');
stepsEl.appendChild(wrap);
scrollChatToBottom();
} else if (item.type === 'done') {
done = true;
es.close();
delete activeStreams[requestId];
// item.content may be empty when "done" is only a stream-close signal after media.
const finalText = item.content || accumulatedText;
if (!botEl && finalText) {
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
addBotMessage(finalText, new Date((item.timestamp || Date.now() / 1000) * 1000), requestId);
} else if (botEl) {
contentEl.classList.remove('sse-streaming');
// Only update text content when there is something new to show.
if (finalText) contentEl.innerHTML = renderMarkdown(finalText);
applyHighlighting(botEl);
}
scrollChatToBottom();
} else if (item.type === 'error') {
done = true;
es.close();
delete activeStreams[requestId];
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
addBotMessage(t('error_send'), new Date());
}
};
} else if (item.type === 'image') {
ensureBotEl();
const imgEl = document.createElement('img');
imgEl.src = item.content;
imgEl.alt = 'screenshot';
imgEl.style.cssText = 'max-width:360px;border-radius:8px;margin:8px 0;cursor:pointer;box-shadow:0 1px 4px rgba(0,0,0,0.1);';
imgEl.onclick = () => window.open(item.content, '_blank');
mediaEl.appendChild(imgEl);
scrollChatToBottom();
} else if (item.type === 'file') {
ensureBotEl();
const fileName = item.file_name || item.content.split('/').pop();
const fileEl = document.createElement('a');
fileEl.href = item.content;
fileEl.download = fileName;
fileEl.target = '_blank';
fileEl.className = 'file-attachment';
fileEl.style.cssText = 'display:inline-flex;align-items:center;gap:6px;padding:8px 14px;margin:8px 0;border-radius:8px;background:var(--bg-secondary,#f3f4f6);color:var(--text-primary,#374151);text-decoration:none;font-size:14px;border:1px solid var(--border-color,#e5e7eb);';
fileEl.innerHTML = `<i class="fas fa-file-download" style="color:#6b7280;"></i> ${fileName}`;
mediaEl.appendChild(fileEl);
scrollChatToBottom();
} else if (item.type === 'phase') {
// Coarse progress (e.g. cow install-browser); must not close SSE (unlike "done")
ensureBotEl();
const wrap = document.createElement('div');
wrap.className = 'text-xs sm:text-sm text-slate-600 dark:text-slate-400 border-l-2 border-primary-400 pl-2 py-1 my-0.5';
wrap.textContent = String(item.content || '');
stepsEl.appendChild(wrap);
scrollChatToBottom();
} else if (item.type === 'done') {
es.onerror = function() {
es.close();
delete activeStreams[requestId];
const finalText = item.content || accumulatedText;
if (done) return;
if (!botEl && finalText) {
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
addBotMessage(finalText, new Date((item.timestamp || Date.now() / 1000) * 1000), requestId);
} else if (botEl) {
if (reconnectCount < MAX_RECONNECTS) {
reconnectCount++;
const delay = Math.min(RECONNECT_BASE_MS * reconnectCount, 5000);
console.warn(`[SSE] connection lost for ${requestId}, reconnecting in ${delay}ms (attempt ${reconnectCount}/${MAX_RECONNECTS})`);
setTimeout(connect, delay);
return;
}
// Exhausted retries, show whatever we have
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
if (!botEl) {
addBotMessage(t('error_send'), new Date());
} else if (accumulatedText) {
contentEl.classList.remove('sse-streaming');
if (finalText) contentEl.innerHTML = renderMarkdown(finalText);
contentEl.innerHTML = renderMarkdown(accumulatedText);
applyHighlighting(botEl);
}
scrollChatToBottom();
};
}
} else if (item.type === 'error') {
es.close();
delete activeStreams[requestId];
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
addBotMessage(t('error_send'), new Date());
}
};
es.onerror = function() {
es.close();
delete activeStreams[requestId];
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
if (!botEl) {
addBotMessage(t('error_send'), new Date());
} else if (accumulatedText) {
contentEl.classList.remove('sse-streaming');
contentEl.innerHTML = renderMarkdown(accumulatedText);
applyHighlighting(botEl);
}
};
connect();
}
function startPolling() {

View File

@@ -126,6 +126,13 @@ class WebChannel(ChatChannel):
logger.debug(f"SSE skipped duplicate file for request {request_id}")
return
# Skip http-URL FILE/IMAGE_URL replies produced by chat_channel's media extraction:
# the text reply (already sent as "done") contains the URL and the frontend will
# render it via renderMarkdown/injectVideoPlayers, so no separate SSE event needed.
if reply.type in (ReplyType.FILE, ReplyType.IMAGE_URL) and content.startswith(("http://", "https://")):
logger.debug(f"SSE skipped http media reply for request {request_id}")
return
self.sse_queues[request_id].put({
"type": "done",
"content": content,
@@ -322,14 +329,18 @@ class WebChannel(ChatChannel):
"""
SSE generator for a given request_id.
Yields UTF-8 encoded bytes to avoid WSGI Latin-1 mangling.
Supports client reconnection: the queue is only removed after a
"done" event is consumed, so a new GET /stream with the same
request_id can resume reading remaining events.
"""
if request_id not in self.sse_queues:
yield b"data: {\"type\": \"error\", \"message\": \"invalid request_id\"}\n\n"
return
q = self.sse_queues[request_id]
timeout = 300 # 5 minutes max
deadline = time.time() + timeout
idle_timeout = 600 # 10 minutes without any real event
deadline = time.time() + idle_timeout
done = False
try:
while time.time() < deadline:
@@ -339,13 +350,18 @@ class WebChannel(ChatChannel):
yield b": keepalive\n\n"
continue
# Real event received, reset idle deadline
deadline = time.time() + idle_timeout
payload = json.dumps(item, ensure_ascii=False)
yield f"data: {payload}\n\n".encode("utf-8")
if item.get("type") == "done":
done = True
break
finally:
self.sse_queues.pop(request_id, None)
if done:
self.sse_queues.pop(request_id, None)
def poll_response(self):
"""
@@ -447,8 +463,14 @@ class WebChannel(ChatChannel):
func = web.httpserver.StaticMiddleware(app.wsgifunc())
func = web.httpserver.LogMiddleware(func)
server = web.httpserver.WSGIServer(("0.0.0.0", port), func)
# Allow concurrent requests by not blocking on in-flight handler threads
server.daemon_threads = True
# Default request_queue_size(5) / timeout(10s) / numthreads(10) are
# too small: when SSE streams occupy many threads, the backlog fills
# and new connections get refused (ERR_CONNECTION_ABORTED).
server.request_queue_size = 128
server.timeout = 300
server.requests.min = 20
server.requests.max = 80
self._http_server = server
try:
server.start()
@@ -563,7 +585,7 @@ class ConfigHandler:
_RECOMMENDED_MODELS = [
const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
const.QWEN3_MAX, const.QWEN35_PLUS,
const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
const.KIMI_K2_5, const.KIMI_K2,
const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
@@ -592,7 +614,7 @@ class ConfigHandler:
"api_key_field": "dashscope_api_key",
"api_base_key": None,
"api_base_default": None,
"models": [const.QWEN3_MAX, const.QWEN35_PLUS],
"models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
}),
("moonshot", {
"label": "Kimi",
@@ -1365,6 +1387,8 @@ class MemoryContentHandler:
service = MemoryService(workspace_root)
result = service.get_content(params.filename)
return json.dumps({"status": "success", **result}, ensure_ascii=False)
except ValueError:
return json.dumps({"status": "error", "message": "invalid filename"})
except FileNotFoundError:
return json.dumps({"status": "error", "message": "file not found"})
except Exception as e:

View File

@@ -37,11 +37,19 @@ def _random_wechat_uin() -> str:
return base64.b64encode(str(val).encode("utf-8")).decode("utf-8")
CHANNEL_VERSION = "2.0.0"
# iLink-App-ClientVersion: uint32 encoded as major<<16 | minor<<8 | patch
# 2.0.0 → 0x00020000 = 131072
CLIENT_VERSION = "131072"
def _build_headers(token: str = "") -> dict:
headers = {
"Content-Type": "application/json",
"AuthorizationType": "ilink_bot_token",
"X-WECHAT-UIN": _random_wechat_uin(),
"iLink-App-Id": "bot",
"iLink-App-ClientVersion": CLIENT_VERSION,
}
if token:
headers["Authorization"] = f"Bearer {token}"
@@ -64,6 +72,7 @@ class WeixinApi:
def _post(self, endpoint: str, body: dict, timeout: int = DEFAULT_API_TIMEOUT) -> dict:
url = _ensure_trailing_slash(self.base_url) + endpoint
headers = _build_headers(self.token)
body.setdefault("base_info", {}).setdefault("channel_version", CHANNEL_VERSION)
try:
resp = requests.post(url, json=body, headers=headers, timeout=timeout)
resp.raise_for_status()
@@ -210,7 +219,10 @@ class WeixinApi:
def poll_qr_status(self, qrcode: str, timeout: int = QR_POLL_TIMEOUT) -> dict:
url = (_ensure_trailing_slash(self.base_url) +
f"ilink/bot/get_qrcode_status?qrcode={requests.utils.quote(qrcode)}")
headers = {"iLink-App-ClientVersion": "1"}
headers = {
"iLink-App-Id": "bot",
"iLink-App-ClientVersion": CLIENT_VERSION,
}
try:
resp = requests.get(url, headers=headers, timeout=timeout)
resp.raise_for_status()

View File

@@ -166,10 +166,18 @@ class WeixinChannel(ChatChannel):
print("=" * 60)
try:
import qrcode as qr_lib
import io
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
qr.add_data(qrcode_url)
qr.make(fit=True)
qr.print_ascii(invert=True)
buf = io.StringIO()
qr.print_ascii(out=buf, invert=True)
try:
print(buf.getvalue())
except UnicodeEncodeError:
# Windows GBK terminals cannot render Unicode block characters
print(f"\n (终端不支持显示二维码,请使用链接扫码)")
print(f" 二维码链接: {qrcode_url}\n")
except ImportError:
print(f"\n 二维码链接: {qrcode_url}")
print(" (安装 'qrcode' 包可在终端显示二维码)\n")

View File

@@ -1 +1 @@
2.0.4
2.0.5

View File

@@ -178,7 +178,10 @@ def update(ctx):
"""Update CowAgent and restart."""
root = get_project_root()
# 1. Git pull while service is still running
# 1. Stop service first so git pull won't conflict with running code
ctx.invoke(stop)
# 2. Git pull
if os.path.isdir(os.path.join(root, ".git")):
click.echo("Pulling latest code...")
ret = subprocess.call(["git", "pull"], cwd=root)
@@ -188,28 +191,61 @@ def update(ctx):
else:
click.echo("Not a git repository, skipping code update.")
# 2. Stop service
ctx.invoke(stop)
# 3. Install dependencies
python = sys.executable
req_file = os.path.join(root, "requirements.txt")
if os.path.exists(req_file):
click.echo("Installing dependencies...")
subprocess.call(
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
if _IS_WIN:
# On Windows, `cow.exe` (this process) locks the exe file, so
# `pip install -e .` fails with WinError 5. Write a small .bat
# helper that waits for cow.exe to exit, then installs & starts.
bat = os.path.join(root, "_cow_update.bat")
lines = [
"@echo off",
"chcp 65001 >nul",
"echo Waiting for cow.exe to exit...",
"timeout /t 3 /nobreak >nul",
]
if os.path.exists(req_file):
lines.append(f'echo Installing dependencies...')
lines.append(f'"{python}" -m pip install -r requirements.txt -q')
lines += [
"echo Reinstalling cow CLI...",
f'"{python}" -m pip install -e . -q',
"echo Starting CowAgent...",
f'"{python}" -m cli.cli start --no-logs',
"echo.",
"echo Update complete. You can close this window.",
"pause >nul",
"del \"%~f0\"",
]
with open(bat, "w", encoding="utf-8") as f:
f.write("\n".join(lines) + "\n")
subprocess.Popen(
["cmd.exe", "/c", "start", "CowAgent Update", "/wait", bat],
cwd=root,
)
click.echo(click.style(
"✓ Update script launched. Please follow the new window for progress.",
fg="green"))
else:
# 3. Install dependencies
if os.path.exists(req_file):
click.echo("Installing dependencies...")
subprocess.call(
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
cwd=root,
)
click.echo("Reinstalling cow CLI...")
subprocess.call(
[python, "-m", "pip", "install", "-e", ".", "-q"],
cwd=root,
)
click.echo("Reinstalling cow CLI...")
subprocess.call(
[python, "-m", "pip", "install", "-e", ".", "-q"],
cwd=root,
)
# 4. Start service and follow logs
click.echo("")
time.sleep(1)
ctx.invoke(start, no_logs=False)
# 4. Start service
click.echo("")
time.sleep(1)
ctx.invoke(start, no_logs=False)
@click.command()

View File

@@ -263,8 +263,9 @@ def _scan_skills_in_dir(directory: str) -> list:
return found
def _batch_install_skills(discovered, spec, skills_dir, source, result: InstallResult):
def _batch_install_skills(discovered, spec, skills_dir, source, result: InstallResult, display_name: str = ""):
"""Install a list of discovered skills into skills_dir."""
single = len(discovered) == 1
result.messages.append(f"Found {len(discovered)} skill(s) in {spec}:")
for sname, sdir in discovered:
safe_name = re.sub(r'[^a-zA-Z0-9_\-]', '-', sname)[:64]
@@ -275,7 +276,7 @@ def _batch_install_skills(discovered, spec, skills_dir, source, result: InstallR
if os.path.exists(target_dir):
shutil.rmtree(target_dir)
shutil.copytree(sdir, target_dir)
_register_installed_skill(safe_name, source=source)
_register_installed_skill(safe_name, source=source, display_name=display_name if single else "")
result.installed.append(safe_name)
result.messages.append(f" + {safe_name}")
@@ -517,12 +518,16 @@ def _install_targz_bytes(content: bytes, name: str, skills_dir: str, result: Ins
def _print_install_success(name: str, source: str):
"""Print a unified install success message with description and source."""
skills_dir = get_skills_dir()
config = load_skills_config()
display = config.get(name, {}).get("display_name", "")
desc = _read_skill_description(os.path.join(skills_dir, name))
click.echo(click.style(f"{name}", fg="green"))
if display and display != name:
click.echo(f" 名称: {display}")
if desc:
if len(desc) > 60:
desc = desc[:57] + ""
click.echo(f" {desc}")
click.echo(f" 描述: {desc}")
click.echo(f" 来源: {source}")
@@ -748,7 +753,8 @@ def _list_remote(page: int = 1):
nav_parts.append(f"cow skill list --remote --page {page + 1}")
if nav_parts:
click.echo(f" Navigate: {' | '.join(nav_parts)}")
click.echo(f" Install: cow skill install <name>\n")
click.echo(f" Install: cow skill install <name>")
click.echo(f" Browse: https://skills.cowagent.ai\n")
# ------------------------------------------------------------------
@@ -875,6 +881,15 @@ def _route_install(name: str, result: InstallResult):
_install_hub(skill_name, result, provider="clawhub")
return
# --- linkai: prefix ---
if name.startswith("linkai:"):
skill_code = name[7:]
# LinkAI codes can be mixed-case alphanumeric; validate loosely
if not re.match(r"^[a-zA-Z0-9_\-]{1,128}$", skill_code):
raise SkillInstallError(f"Invalid LinkAI skill code '{skill_code}'.")
_install_hub(skill_code, result, provider="linkai")
return
# --- owner/repo or owner/repo#subpath shorthand ---
if re.match(r"^[a-zA-Z0-9_\-]+/[a-zA-Z0-9_.\-]+(?:#.+)?$", name):
subpath = None
@@ -1006,13 +1021,11 @@ def _install_hub(name, result: InstallResult, provider=None):
expected_checksum = mirror_resp.headers.get("X-Checksum-Sha256")
_check_checksum(mirror_resp.content, expected_checksum)
installed_before = len(result.installed)
_install_zip_bytes(mirror_resp.content, name, skills_dir, result=result, source_label="cowhub")
_install_zip_bytes(mirror_resp.content, name, skills_dir, result=result, source_label="cowhub", display_name=hub_display_name)
if len(result.installed) == installed_before:
_register_installed_skill(name, source="cowhub", display_name=hub_display_name)
result.installed.append(name)
result.messages.append(f"Installed '{name}' from mirror.")
elif hub_display_name:
_register_installed_skill(name, display_name=hub_display_name)
return
if source_type == "registry":
@@ -1043,13 +1056,11 @@ def _install_hub(name, result: InstallResult, provider=None):
if dl_err is None:
_check_checksum(dl_resp.content, expected_checksum)
installed_before = len(result.installed)
_install_zip_bytes(dl_resp.content, name, skills_dir, result=result, source_label=src_provider)
_install_zip_bytes(dl_resp.content, name, skills_dir, result=result, source_label=src_provider, display_name=hub_display_name)
if len(result.installed) == installed_before:
_register_installed_skill(name, source=src_provider, display_name=hub_display_name)
result.installed.append(name)
result.messages.append(f"Installed '{name}' from {src_provider}.")
elif hub_display_name:
_register_installed_skill(name, display_name=hub_display_name)
return
# Fallback: download mirror from Skill Hub
@@ -1073,13 +1084,11 @@ def _install_hub(name, result: InstallResult, provider=None):
expected_checksum = mirror_resp.headers.get("X-Checksum-Sha256")
_check_checksum(mirror_resp.content, expected_checksum)
installed_before = len(result.installed)
_install_zip_bytes(mirror_resp.content, name, skills_dir, result=result, source_label="cowhub")
_install_zip_bytes(mirror_resp.content, name, skills_dir, result=result, source_label="cowhub", display_name=hub_display_name)
if len(result.installed) == installed_before:
_register_installed_skill(name, source="cowhub", display_name=hub_display_name)
result.installed.append(name)
result.messages.append(f"Installed '{name}' from mirror.")
elif hub_display_name:
_register_installed_skill(name, display_name=hub_display_name)
else:
raise SkillInstallError("Unsupported registry provider.")
return
@@ -1264,7 +1273,7 @@ def _install_git_clone(git_url: str, result: InstallResult, display_name: str =
shutil.rmtree(tmp_dir, ignore_errors=True)
def _install_zip_bytes(content, name, skills_dir, result: InstallResult = None, source_label: str = "zip"):
def _install_zip_bytes(content, name, skills_dir, result: InstallResult = None, source_label: str = "zip", display_name: str = ""):
"""Extract a zip archive and install skill(s).
Supports three scenarios:
@@ -1289,7 +1298,7 @@ def _install_zip_bytes(content, name, skills_dir, result: InstallResult = None,
discovered = _scan_skills_in_repo(pkg_root) or _scan_skills_in_dir(pkg_root)
if discovered and len(discovered) > 1 and result is not None:
_batch_install_skills(discovered, name, skills_dir, source_label, result)
_batch_install_skills(discovered, name, skills_dir, source_label, result, display_name=display_name)
return
if discovered and len(discovered) == 1:
@@ -1301,7 +1310,7 @@ def _install_zip_bytes(content, name, skills_dir, result: InstallResult = None,
if os.path.exists(target):
shutil.rmtree(target)
shutil.copytree(sdir, target)
_register_installed_skill(safe_name, source=source_label)
_register_installed_skill(safe_name, source=source_label, display_name=display_name)
if result is not None:
result.installed.append(safe_name)
result.messages.append(f"Installed '{safe_name}' from {source_label}.")

View File

@@ -47,8 +47,8 @@ CREDENTIAL_MAP = {
class CloudClient(LinkAIClient):
def __init__(self, api_key: str, channel, host: str = ""):
super().__init__(api_key, host)
def __init__(self, api_key: str, channel, host: str = "", port=None):
super().__init__(api_key, host, port=port)
self.channel = channel
self.client_type = channel.channel_type
self.channel_mgr = None
@@ -733,7 +733,7 @@ def start(channel, channel_mgr=None):
return
global chat_client
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), channel=channel)
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), port=conf().get("cloud_port"), channel=channel)
chat_client.channel_mgr = channel_mgr
chat_client.config = _build_config()
chat_client.start()

View File

@@ -7,8 +7,8 @@ XUNFEI = "xunfei"
CHATGPTONAZURE = "chatGPTOnAzure"
LINKAI = "linkai"
CLAUDEAPI= "claudeAPI"
QWEN = "qwen" # 旧版千问接入
QWEN_DASHSCOPE = "dashscope" # 新版千问接入(百炼)
QWEN = "qwen" # 千问 (兼容旧配置,实际走 DashscopeBot)
QWEN_DASHSCOPE = "dashscope" # 千问 DashScope 接入
GEMINI = "gemini"
ZHIPU_AI = "zhipu"
MOONSHOT = "moonshot"
@@ -81,14 +81,14 @@ TTS_1_HD = "tts-1-hd"
DEEPSEEK_CHAT = "deepseek-chat" # DeepSeek-V3对话模型
DEEPSEEK_REASONER = "deepseek-reasoner" # DeepSeek-R1模型
# Qwen (通义千问 - 阿里云)
QWEN = "qwen"
# Qwen (通义千问 - 阿里云 DashScope)
QWEN_TURBO = "qwen-turbo"
QWEN_PLUS = "qwen-plus"
QWEN_MAX = "qwen-max"
QWEN_LONG = "qwen-long"
QWEN3_MAX = "qwen3-max" # Qwen3 Max - Agent推荐模型
QWEN35_PLUS = "qwen3.5-plus" # Qwen3.5 Plus - Omni model (MultiModalConversation)
QWEN36_PLUS = "qwen3.6-plus" # Qwen3.6 Plus - Omni model (MultiModalConversation)
QWQ_PLUS = "qwq-plus"
# MiniMax
@@ -172,7 +172,7 @@ MODEL_LIST = [
DEEPSEEK_CHAT, DEEPSEEK_REASONER,
# Qwen
QWEN, QWEN_TURBO, QWEN_PLUS, QWEN_MAX, QWEN_LONG, QWEN3_MAX, QWEN35_PLUS,
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
# MiniMax
MiniMax, MINIMAX_M2_7, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,

View File

@@ -189,6 +189,7 @@ available_setting = {
"linkai_app_code": "",
"linkai_api_base": "https://api.link-ai.tech", # linkAI服务地址
"cloud_host": "client.link-ai.tech",
"cloud_port": None,
"cloud_deployment_id": "",
"minimax_api_key": "",
"Minimax_group_id": "",

View File

@@ -1,185 +0,0 @@
# CowAgent介绍
## 概述
Cow项目从简单的聊天机器人全面升级为超级智能助理 **CowAgent**能够主动规思考和规划任务、拥有长期记忆、操作计算机和外部资源、创造和执行Skill真正理解你并和你一起成长。CowAgent能够长期运行在个人电脑或服务器中通过飞书、钉钉、企业微信、网页等多种方式进行交互。核心能力如下
- **复杂任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标,支持多轮推理和上下文理解
- **工具系统**内置实现10+种工具包括文件读写、bash终端、浏览器、定时任务、记忆管理等通过Agent管理你的计算机或服务器
- **长期记忆**:自动将对话记忆持久化至本地文件和数据库中,包括全局记忆和天级记忆,支持关键词及向量检索
- **Skills系统**新增Skill运行引擎内置多种技能并支持通过自然语言对话完成自定义Skills开发
- **多渠道和多模型支持**支持在Web、飞书、钉钉、企微等多渠道与Agent交互支持Claude、Gemini、OpenAI、GLM、MiniMax、Qwen、Kimi、Doubao 等多种国内外主流模型
- **安全和成本**通过秘钥管理工具、提示词控制、系统权限等手段控制Agent的访问安全通过最大记忆轮次、最大上下文token、工具执行步数对token成本进行限制
## 核心功能
### 1. 长期记忆
> 记忆系统让 Agent 能够长期记住重要信息。Agent 会在用户分享偏好、决策、事实等重要信息时主动存储,也会在对话达到一定长度时自动提取摘要。记忆分为核心记忆、天级记忆,支持语义搜索和向量检索的混合检索模式。
第一次启动Agent会主动向用户获取询问关键信息并记录至工作空间 (默认为 ~/cow) 中的智能体设定、用户身份、记忆文件中。
在后续的长期对话中Agent会在需要的时候智能记录或检索记忆并对自身设定、用户偏好、记忆文件等进行不断更新总结和记录经验和教训真正实现自主思考和不断成长。
<img width="800" src="https://cdn.link-ai.tech/doc/20260203000455.png" />
### 2. 任务规划和工具调用
工具是Agent访问操作系统资源的核心Agent会根据任务需求智能选择和调用工具完成文件读写、命令执行、定时任务等各类操作。内置工具的视线在项目的 `tools` 目录下。
**主要工具:** 文件读写编辑、Bash终端、浏览器、文件发送、定时调度、记忆搜索、环境配置等。
#### 1.1 终端和文件访问能力
针对操作系统的终端和文件的访问能力是最基础和核心的工具其他很多工具或技能都是基于基础工具进行扩展。用户可通过手机端与Agent交互操作个人电脑或服务器上的资源
<img width="800" src="https://cdn.link-ai.tech/doc/20260202181130.png" />
#### 1.2 编程能力
基于编程能力和系统访问能力Agent可以实现从信息搜索、图片等素材生成、编码、测试、部署、Nginx配置修改、发布的 Vibecoding 全流程通过手机端简单的一句命令完成应用的快速demo
<img width="800" src="https://cdn.link-ai.tech/doc/20260203121008.png" />
#### 1.3 定时任务
基于 scheduler 工具实现动态定时任务,支持 **一次性任务、固定时间间隔、Cron表达式** 三种形式,任务触发可选择**固定消息发送** 或 **Agent动态任务** 执行两种模式,有很高灵活性:
<img width="800" src="https://cdn.link-ai.tech/doc/20260202195402.png" />
同时你也可以通过自然语言快速查看和管理已有的定时任务。
#### 1.4 环境变量管理
技能所需要的秘钥存储在环境变量文件中,由 `env_config` 工具进行管理,你可以通过对话的方式更新秘钥,工具内置了安全保护和脱敏策略,会严格保护秘钥安全:
<img width="800" src="https://cdn.link-ai.tech/doc/20260202234939.png" />
### 3. 技能系统
> 技能系统为Agent提供无限的扩展性每个Skill由说明文件、运行脚本 (可选)、资源 (可选) 组成描述如何完成特定类型的任务。通过Skill可以让Agent遵循说明完成复杂流程调用各类工具或对接第三方系统等。
- **内置技能:** 在项目的`skills`目录下包含技能创造器、网络搜索、图像识别openai-image-vision、LinkAI智能体、网页抓取等。内置Skill根据依赖条件 (API Key、系统命令等) 自动判断是否启用。通过技能创造器可以快速创建自定义技能。
- **自定义技能:** 由用户通过对话创建,存放在工作空间中 (`~/cow/skills/`),基于自定义技能可以实现任何复杂的业务流程和第三方系统对接。
#### 3.1 创建技能
通过 `skill-creator` 技能可以通过对话的方式快速创建技能。你可以在与Agent的写作中让他对将某个工作流程固化为技能或者把任意接口文档和示例发送给Agent让他直接完成对接
<img width="800" src="https://cdn.link-ai.tech/doc/20260202202247.png" />
#### 3.2 搜索和图像识别
- **搜索技能:** 系统内置实现了 `bocha-search`(博查搜索)的Skill依赖环境变量 `BOCHA_SEARCH_API_KEY`,可在[控制台](https://open.bochaai.com/)进行创建并发送给Agent完成配置
- **图像识别技能:** 实现了 `openai-image-vision` 插件,可使用 gpt-4.1-mini、gpt-4.1 等图像识别模型。依赖秘钥 `OPENAI_API_KEY`可通过config.json或env_config工具进行维护。
<img width="800" src="https://cdn.link-ai.tech/doc/20260202213219.png" />
#### 3.3 三方知识库和插件
`linkai-agent` 技能可以将 [LinkAI](https://link-ai.tech/) 上的所有智能体作为skill交给Agent使用并实现多智能体决策的效果。
使用方式:需通过对话的方式配置 `LINKAI_API_KEY`或在config.json中添加 `linkai_api_key`。 并在 `skills/linkai-agent/config.json`中添加智能体说明,示例如下:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI客服助手",
"app_description": "当用户需要了解LinkAI平台相关问题时才选择该助手基于LinkAI知识库进行回答"
},
{
"app_code": "SFY5x7JR",
"app_name": "内容创作助手",
"app_description": "当用户需要创作图片或视频时才使用该助手支持Nano Banana、Seedream、即梦、Veo、可灵等多种模型"
}
]
}
```
Agent可根据智能体的名称和描述进行决策并通过 app_code 调用接口访问对应的应用/工作流通过该技能可以灵活访问LinkAI平台上的智能体、知识库、插件等能力实现效果如下
<img width="750" src="https://cdn.link-ai.tech/doc/20260202234350.png" />
注:需通过 `env_config` 配置 `LINKAI_API_KEY`或在config.json中添加 `linkai_api_key` 配置。
## 使用方式
> 详细使用方式参考项目README.md文档进行
### 1.项目运行
在命令行中执行:
```bash
bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
```
详细说明及后续程序管理参考:[项目启动脚本](https://github.com/zhayujie/chatgpt-on-wechat/wiki/CowAgentQuickStart)
### 2.模型选择
Agent模式推荐使用以下模型可根据效果及成本综合选择
- **MiniMax**: `MiniMax-M2.7`
- **GLM**: `glm-5-turbo`
- **Kimi**: `kimi-k2.5`
- **Doubao**: `doubao-seed-2-0-code-preview-260215`
- **Qwen**: `qwen3.5-plus`
- **Claude**: `claude-sonnet-4-6`
- **Gemini**: `gemini-3.1-flash-lite-preview`
- **OpenAI**: `gpt-5.4`
详细模型配置方式参考 [README.md 模型说明](../README.md#模型说明)
### 3.Agent核心配置
Agent模式的核心配置项如下`config.json` 中配置:
```bash
{
"agent": true, # 是否启用Agent模式
"agent_workspace": "~/cow", # Agent工作空间路径
"agent_max_context_tokens": 40000, # 最大上下文tokens
"agent_max_context_turns": 30, # 最大上下文记忆轮次
"agent_max_steps": 15 # 单次任务最大决策步数
}
```
**配置说明:**
- `agent`: 设为 `true` 启用Agent模式获得多轮工具决策、长期记忆、Skills等能力
- `agent_workspace`: 工作空间路径,用于存储 memory、skills、其他系统设定提示词
- `agent_max_context_tokens`: 上下文token上限超出将自动丢弃最早的对话
- `agent_max_context_turns`: 上下文记忆轮次,每轮包括一次提问和回复
- `agent_max_steps`: 单次任务最大工具调用步数,防止无限循环
### 4.渠道接入
Agent支持在多种渠道中使用只需修改 `config.json` 中的 `channel_type` 配置即可切换。
- **Web网页**:默认使用该渠道,运行后监听本地端口,通过浏览器访问
- **飞书接入**[飞书接入文档](https://docs.link-ai.tech/cow/multi-platform/feishu)
- **钉钉接入**[钉钉接入文档](https://docs.link-ai.tech/cow/multi-platform/dingtalk)
- **企业微信应用接入**[企微应用文档](https://docs.link-ai.tech/cow/multi-platform/wechat-com)
- **企微智能机器人**[企微智能机器人文档](https://docs.link-ai.tech/cow/multi-platform/wecom-bot)
- **QQ机器人**[QQ机器人文档](https://docs.link-ai.tech/cow/multi-platform/qq)
更多渠道配置参考:[通道说明](../README.md#通道说明)

View File

@@ -9,7 +9,23 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
智能机器人与企业微信自建应用是两种不同的接入方式。智能机器人使用 WebSocket 长连接,无需服务器公网 IP 和域名,配置更简单。
</Note>
## 一、创建智能机器人
## 一、接入方式
### 方式一:扫码一键接入(推荐)
无需提前创建机器人,启动 Cow 项目后打开 Web 控制台本地链接http://127.0.0.1:9899/),选择 **通道** 菜单,点击**接入通道**,选择**企微智能机器人**,切换到「扫码接入」模式,使用**企业微信**扫码即可自动完成机器人创建和接入。
<img src="https://cdn.link-ai.tech/doc/20260401121213.png" width="800"/>
<Note>
扫码成功后,可在企业微信工作台 - **智能机器人**页面对机器人进行进一步配置,包括修改名称、头像、可见范围等。
</Note>
### 方式二:手动创建接入
需要先在企业微信中创建智能机器人并获取 Bot ID 和 Secret再通过 Web 控制台或配置文件接入。
**步骤一:创建智能机器人**
1. 打开企业微信客户端,进入工作台,点击**智能机器人**
@@ -25,34 +41,35 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
4. 设置机器人名称、头像、可见范围,并选择**长连接模式**,记录下 **Bot ID** 和 **Secret** 信息后点击保存。
## 二、配置和运行
**步骤二:接入 CowAgent**
### 方式一Web 控制台接入
<Tabs>
<Tab title="Web 控制台">
打开 Web 控制台,选择**通道**菜单,点击**接入通道**,选择**企微智能机器人**,切换到「手动填写」模式,输入 Bot ID 和 Secret点击接入即可。
启动Cow项目后打开 Web 控制台 (本地链接为: http://127.0.0.1:9899/ ),选择 **通道** 菜单,点击 **接入通道**,选择 **企微智能机器人**,填写上一步保存的 Bot ID 和 Secret点击接入即可。
<img src="https://cdn.link-ai.tech/doc/20260316181711.png" width="800"/>
</Tab>
<Tab title="配置文件">
在 `config.json` 中添加以下配置后启动程序:
<img src="https://cdn.link-ai.tech/doc/20260316181711.png" width="800"/>
```json
{
"channel_type": "wecom_bot",
"wecom_bot_id": "YOUR_BOT_ID",
"wecom_bot_secret": "YOUR_SECRET"
}
```
### 方式二:配置文件接入
| 参数 | 说明 |
| --- | --- |
| `wecom_bot_id` | 智能机器人的 BotID |
| `wecom_bot_secret` | 智能机器人的 Secret |
</Tab>
</Tabs>
在 `config.json` 中添加以下配置:
日志显示 `[WecomBot] Subscribe success` 即表示连接成功。
```json
{
"channel_type": "wecom_bot",
"wecom_bot_id": "YOUR_BOT_ID",
"wecom_bot_secret": "YOUR_SECRET"
}
```
| 参数 | 说明 |
| --- | --- |
| `wecom_bot_id` | 智能机器人的 BotID |
| `wecom_bot_secret` | 智能机器人的 Secret |
配置完成后启动程序,日志显示 `[WecomBot] Subscribe success` 即表示连接成功。
## 三、功能说明
## 二、功能说明
| 功能 | 支持情况 |
| --- | --- |
@@ -64,7 +81,7 @@ description: 将 CowAgent 接入企业微信智能机器人(长连接模式)
| 流式回复 | ✅ |
| 定时任务主动推送 | ✅ |
## 、使用
## 、使用
在企业微信中搜索创建的机器人名称,即可开始单聊对话。

View File

@@ -129,7 +129,8 @@
"pages": [
"skills/index",
"skills/install",
"skills/create"
"skills/create",
"skills/hub"
]
}
]
@@ -170,10 +171,10 @@
{
"group": "命令系统",
"pages": [
"commands/index",
"commands/process",
"commands/skill",
"commands/general"
"cli/index",
"cli/process",
"cli/skill",
"cli/general"
]
}
]
@@ -185,6 +186,7 @@
"group": "发布记录",
"pages": [
"releases/overview",
"releases/v2.0.5",
"releases/v2.0.4",
"releases/v2.0.3",
"releases/v2.0.2",
@@ -288,7 +290,8 @@
"pages": [
"en/skills/index",
"en/skills/install",
"en/skills/skill-creator"
"en/skills/skill-creator",
"en/skills/hub"
]
}
]
@@ -324,15 +327,15 @@
]
},
{
"tab": "Commands",
"tab": "CLI",
"groups": [
{
"group": "Command System",
"pages": [
"en/commands/index",
"en/commands/process",
"en/commands/skill",
"en/commands/chat"
"en/cli/index",
"en/cli/process",
"en/cli/skill",
"en/cli/chat"
]
}
]
@@ -344,6 +347,7 @@
"group": "Release Notes",
"pages": [
"en/releases/overview",
"en/releases/v2.0.5",
"en/releases/v2.0.4",
"en/releases/v2.0.2",
"en/releases/v2.0.1",
@@ -447,7 +451,8 @@
"pages": [
"ja/skills/index",
"ja/skills/install",
"ja/skills/create"
"ja/skills/create",
"ja/skills/hub"
]
}
]
@@ -483,15 +488,15 @@
]
},
{
"tab": "コマンド",
"tab": "CLI",
"groups": [
{
"group": "コマンドシステム",
"pages": [
"ja/commands/index",
"ja/commands/process",
"ja/commands/skill",
"ja/commands/general"
"ja/cli/index",
"ja/cli/process",
"ja/cli/skill",
"ja/cli/general"
]
}
]
@@ -503,6 +508,7 @@
"group": "リリースノート",
"pages": [
"ja/releases/overview",
"ja/releases/v2.0.5",
"ja/releases/v2.0.4",
"ja/releases/v2.0.3",
"ja/releases/v2.0.2",

View File

@@ -13,6 +13,7 @@
<a href="https://cowagent.ai/">🌐 Website</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/intro/index">📖 Docs</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/guide/quick-start">🚀 Quick Start</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 Skill Hub</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ Try Online</a>
</p>
@@ -41,6 +42,8 @@ Try online (no deployment needed): [CowAgent](https://link-ai.tech/cowagent/crea
## Changelog
> **2026.04.01:** [v2.0.5](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.5) — Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more.
> **2026.02.27:** [v2.0.2](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.2) — Web console overhaul (streaming chat, model/skill/memory/channel/scheduler/log management), multi-channel concurrent running, session persistence, new models including Gemini 3.1 Pro / Claude 4.6 Sonnet / Qwen3.5 Plus.
> **2026.02.13:** [v2.0.1](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.1) — Built-in Web Search tool, smart context trimming, runtime info dynamic update, Windows compatibility, fixes for scheduler memory loss, Feishu connection issues, and more.
@@ -73,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
After running, the Web service starts by default. Access `http://localhost:9899/chat` to chat.
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/commands/index) to manage the service.
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/cli/index) to manage the service.
### Manual Installation
@@ -97,7 +100,7 @@ pip3 install -r requirements-optional.txt # optional but recommended
pip3 install -e .
```
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/commands/index).
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/cli/index).
**4. Install browser (optional)**
@@ -162,7 +165,7 @@ Supports mainstream model providers. Recommended models for Agent mode:
| GLM | `glm-5-turbo` |
| Kimi | `kimi-k2.5` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Qwen | `qwen3.5-plus` |
| Qwen | `qwen3.6-plus` |
| Claude | `claude-sonnet-4-6` |
| Gemini | `gemini-3.1-pro-preview` |
| OpenAI | `gpt-5.4` |
@@ -223,6 +226,7 @@ Multiple channels can be enabled simultaneously, separated by commas: `"channel_
## 🔗 Related Projects
- [Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub): Open skill marketplace for AI Agents — browse, search, install, and publish skills for CowAgent, OpenClaw, Claude Code, and more.
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything): Lightweight and highly extensible LLM application framework supporting Slack, Telegram, Discord, Gmail, and more.
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh): Open-source Multi-Agent framework for complex problem solving through agent team collaboration.
@@ -232,7 +236,7 @@ FAQs: <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
## 🛠️ Contributing
Welcome to add new channels, referring to the [Feishu channel](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) as an example. Also welcome to contribute new Skills, see the [Skill Creation docs](https://docs.cowagent.ai/en/skills/create).
Welcome to add new channels, referring to the [Feishu channel](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py) as an example. Also welcome to contribute new Skills, see the [Skill Creation docs](https://docs.cowagent.ai/en/skills/create), or submit to [Skill Hub](https://skills.cowagent.ai/submit).
## ✉ Contact

View File

@@ -47,7 +47,7 @@ After installation, use the `cow` command to manage the service:
| `cow update` | Update code and restart |
| `cow install-browser` | Install browser tool dependencies |
See the [Commands documentation](/en/commands/index) for more details.
See the [Commands documentation](/en/cli/index) for more details.
<Note>
If the `cow` command is not available, you can use `./run.sh <command>` (Linux/macOS) or `.\scripts\run.ps1 <command>` (Windows) as a fallback. Both are functionally equivalent.

View File

@@ -1,6 +1,6 @@
---
title: Features
description: CowAgent long-term memory, task planning, and skills system in detail
description: CowAgent long-term memory, task planning, skills system, CLI commands, and browser tool in detail
---
## 1. Long-term Memory
@@ -19,7 +19,7 @@ In subsequent long-term conversations, the Agent intelligently stores or retriev
Tools are the core of how the Agent accesses operating system resources. The Agent intelligently selects and invokes tools based on task requirements, performing file read/write, command execution, scheduled tasks, and more. Built-in tools are implemented in the project's `agent/tools/` directory.
**Key tools:** file read/write/edit, Bash terminal, file send, scheduler, memory search, web search, environment config, and more.
**Key tools:** file read/write/edit, Bash terminal, browser, file send, scheduler, memory search, web search, environment config, and more.
### 2.1 Terminal and File Access
@@ -45,7 +45,15 @@ The `scheduler` tool enables dynamic scheduled tasks, supporting **one-time task
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
</Frame>
### 2.4 Environment Variable Management
### 2.4 Browser
The built-in `browser` tool allows the Agent to control a Chromium browser to visit web pages, fill forms, click elements, and take screenshots, with support for dynamic JS-rendered pages. Run `cow install-browser` to install with one command, automatically adapting to server (headless) and desktop environments:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="800" />
</Frame>
### 2.5 Environment Variable Management
Secrets required by skills are stored in an environment variable file, managed by the `env_config` tool. You can update secrets through conversation, with built-in security protection and desensitization:
@@ -57,9 +65,12 @@ Secrets required by skills are stored in an environment variable file, managed b
The Skills system provides infinite extensibility for the Agent. Each Skill consists of a description file, execution scripts (optional), and resources (optional), describing how to complete specific types of tasks. Skills allow the Agent to follow instructions for complex workflows, invoke tools, or integrate third-party systems.
- **[Skill Hub](https://skills.cowagent.ai/):** An open skill marketplace featuring official, community, and third-party skills. Install with one command.
- **Built-in skills:** Located in the project's `skills/` directory, including skill creator, image recognition, LinkAI agent, web fetch, and more. Built-in skills are automatically enabled based on dependency conditions (API keys, system commands, etc.).
- **Custom skills:** Created by users through conversation, stored in the workspace (`~/cow/skills/`), capable of implementing any complex business process or third-party integration.
Install skills: `/skill install <name>` or `cow skill install <name>`, supporting Skill Hub, GitHub, ClawHub, URL, and more.
### 3.1 Creating Skills
The `skill-creator` skill enables rapid skill creation through conversation. You can ask the Agent to codify a workflow as a skill, or send any API documentation and examples for the Agent to complete the integration directly:
@@ -77,29 +88,33 @@ The `skill-creator` skill enables rapid skill creation through conversation. You
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
</Frame>
### 3.3 Third-party Knowledge Bases and Plugins
### 3.3 Skill Hub
The `linkai-agent` skill makes all agents on [LinkAI](https://link-ai.tech/) available as Skills for the Agent, enabling multi-agent decision making.
Visit [skills.cowagent.ai](https://skills.cowagent.ai/) to browse all available skills, or use commands in conversation:
Configuration: set `LINKAI_API_KEY` via `env_config`, then add agent descriptions in `skills/linkai-agent/config.json`:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI Customer Support",
"app_description": "Select only when the user needs help with LinkAI platform questions"
},
{
"app_code": "SFY5x7JR",
"app_name": "Content Creator",
"app_description": "Use only when the user needs to create images or videos"
}
]
}
```text
/skill list --remote # Browse Skill Hub
/skill search <keyword> # Search skills
/skill install <name> # Install with one command
```
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202234350.png" width="750" />
</Frame>
Also supports installing skills from GitHub, ClawHub, LinkAI, and other third-party platforms. See [Install Skills](/en/skills/install) for details.
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 4. CLI Command System
CowAgent provides two command interaction methods, covering service management, skill installation, configuration, and more:
- **Terminal CLI:** Run `cow <command>` in the system terminal, supporting `start`, `stop`, `restart`, `update`, `status`, `logs`, `skill`, etc.
- **Chat commands:** Type `/<command>` in conversation. The Web console shows a command menu when you type `/`.
```bash
cow start # Start service
cow stop # Stop service
cow update # Update and restart
cow skill install pptx # Install a skill
cow install-browser # Install browser tool
```
See [Command Overview](https://docs.cowagent.ai/en/cli) for details.

View File

@@ -31,7 +31,7 @@ CowAgent can proactively think and plan tasks, operate computers and external re
<Card title="Tool System" icon="wrench" href="/en/tools/index">
Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more. The Agent autonomously invokes tools to accomplish complex tasks.
</Card>
<Card title="Command System" icon="terminal" href="/en/commands/index">
<Card title="Command System" icon="terminal" href="/en/cli/index">
Provides terminal CLI and in-chat commands for process management, skill installation, configuration, context inspection, and other common operations.
</Card>
<Card title="Multiple Model Support" icon="microchip" href="/en/models/index">

View File

@@ -6,7 +6,7 @@ description: Supported models and recommended choices for CowAgent
CowAgent supports mainstream LLMs from domestic and international providers. Model interfaces are implemented in the project's `models/` directory.
<Note>
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.5-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.6-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
</Note>
## Configuration
@@ -25,7 +25,7 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
glm-5-turbo, glm-5 and other series models
</Card>
<Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
qwen3.5-plus, qwen3-max and more
qwen3.6-plus, qwen3-max and more
</Card>
<Card title="Kimi" href="/en/models/kimi">
kimi-k2.5, kimi-k2 and more

View File

@@ -5,14 +5,14 @@ description: Tongyi Qianwen model configuration
```json
{
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"dashscope_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | Options include `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
| `model` | Options include `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
| `dashscope_api_key` | Create at [Bailian Console](https://bailian.console.aliyun.com/?tab=model#/api-key). See [official docs](https://bailian.console.aliyun.com/?tab=api#/api) |
OpenAI-compatible configuration is also supported:
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
```json
{
"bot_type": "openai",
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -5,6 +5,7 @@ description: CowAgent version history
| Version | Date | Description |
| --- | --- | --- |
| [2.0.5](/en/releases/v2.0.5) | 2026.04.01 | Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more |
| [2.0.4](/en/releases/v2.0.4) | 2026.03.22 | Personal WeChat channel, new model support, Japanese docs, script refactoring and bug fixes |
| [2.0.2](/en/releases/v2.0.2) | 2026.02.27 | Web Console upgrade, multi-channel concurrency, session persistence |
| [2.0.1](/en/releases/v2.0.1) | 2026.02.27 | Built-in Web Search tool, smart context management, multiple fixes |

View File

@@ -0,0 +1,77 @@
---
title: v2.0.5
description: CowAgent 2.0.5 - Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more
---
## 🖥️ Cow CLI
New CLI command system for managing CowAgent from terminal and chat:
- **Terminal commands**: Run `cow <command>` for `start`, `stop`, `restart`, `update`, `status`, `logs`, etc.
- **Chat commands**: Type `/<command>` in conversation for `/help`, `/status`, `/config`, `/skill`, `/context`, `/logs`, `/version`, etc.
- **Web console**: Type `/` in the input box to open a slash command menu, with arrow-key input history
- **Windows support**: New PowerShell script `scripts/run.ps1` with `cow` command support
Docs: [Command Overview](https://docs.cowagent.ai/en/cli)
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
## 🧩 Cow Skill Hub Open Source
[Cow Skill Hub](https://skills.cowagent.ai) is now open source and live — browse, search, install, and publish AI Agent skills:
- **One-command install**: `/skill install <name>` in chat or `cow skill install <name>` in terminal
- **Multi-source**: Install from Skill Hub, GitHub, ClawHub, LinkAI, and more
- **Search**: `/skill search` and `/skill list --remote` to browse the hub
- **Publish**: Submit your own skills at [skills.cowagent.ai/submit](https://skills.cowagent.ai/submit)
- **Mirror**: Mirror acceleration for faster downloads in China
Open source repo: [cow-skill-hub](https://github.com/zhayujie/cow-skill-hub)
Docs: [Skill Hub](https://docs.cowagent.ai/en/skills/hub), [Install Skills](https://docs.cowagent.ai/en/skills/install)
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 🌐 Browser Tool
New Browser tool — Agent can control a Chromium browser to visit and interact with web pages:
- **Navigation & interaction**: `navigate`, `click`, `fill`, `select`, `scroll`, `press`, etc.
- **Page snapshot**: Compact DOM snapshot for efficient page understanding, auto-snapshot after navigation
- **Screenshot**: Save page screenshots to workspace
- **JavaScript execution**: Run custom scripts on pages
- **CLI install**: `cow install-browser` for one-command setup
- **Docker support**: Browser install built into Docker image
Docs: [Browser Tool](https://docs.cowagent.ai/en/tools/browser)
<img src="https://cdn.link-ai.tech/doc/20260401115728.png" width="750" />
## 🤖 WeCom Bot QR Code Setup
WeCom Bot channel now supports QR code scan for one-click bot creation:
- **QR scan in Web console**: Select "Scan QR" mode, scan with WeCom to auto-create and connect a bot — no manual configuration needed
- **Manual mode**: Still supports manual Bot ID and Secret input
- **Stream push optimization**: Throttled push to avoid WebSocket congestion
Docs: [WeCom Bot](https://docs.cowagent.ai/en/channels/wecom-bot)
PR: [#2735](https://github.com/zhayujie/chatgpt-on-wechat/pull/2735). Thanks [@WecomTeam](https://github.com/WecomTeam)
## 🐛 Other Improvements & Fixes
- **DeepSeek module**: Independent DeepSeek Bot with dedicated `deepseek_api_key` config ([#2719](https://github.com/zhayujie/chatgpt-on-wechat/pull/2719)). Thanks [@6vision](https://github.com/6vision)
- **Web console**: Slash command menu, input history, new model options, mobile optimization ([#2731](https://github.com/zhayujie/chatgpt-on-wechat/pull/2731)). Thanks [@zkjqd](https://github.com/zkjqd)
- **Context loss**: Fix context loss after trimming ([393f0c0](https://github.com/zhayujie/chatgpt-on-wechat/commit/393f0c0))
- **System prompt**: Fix system prompt not rebuilding on every turn ([13f5fde](https://github.com/zhayujie/chatgpt-on-wechat/commit/13f5fde))
- **Gemini**: Fix missing model attribute in GoogleGeminiBot ([#2716](https://github.com/zhayujie/chatgpt-on-wechat/pull/2716)). Thanks [@cowagent](https://github.com/cowagent)
- **WeChat channel**: Fix file send failures and filename loss ([6d9b7ba](https://github.com/zhayujie/chatgpt-on-wechat/commit/6d9b7ba), [45faa9c](https://github.com/zhayujie/chatgpt-on-wechat/commit/45faa9c))
- **Docker**: Fix volume permissions, reduce image size ([3eb8348](https://github.com/zhayujie/chatgpt-on-wechat/commit/3eb8348), [4470d4c](https://github.com/zhayujie/chatgpt-on-wechat/commit/4470d4c))
- **Security**: Fix Memory Content path traversal risk. Thanks [@August829](https://github.com/August829)
## 📦 Upgrade
Run `cow update` or `./run.sh update` to upgrade, or pull the latest code and restart. See [Upgrade Guide](https://docs.cowagent.ai/en/guide/upgrade).
**Release Date**: 2026.04.01 | [Full Changelog](https://github.com/zhayujie/chatgpt-on-wechat/compare/2.0.4...master)

View File

@@ -17,7 +17,7 @@ CowAgent offers multiple ways to acquire skills:
- **URL** — Install from zip archives or SKILL.md links
- **Conversational creation** — Let the Agent create skills through natural language conversation
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/commands/skill) for details. You can also [create skills](/en/skills/create) through conversation.
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/cli/skill) for details. You can also [create skills](/en/skills/create) through conversation.
## Skill Loading Priority

View File

@@ -49,5 +49,5 @@ Supports zip archives and SKILL.md file links:
```
<Tip>
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/commands/skill) for full documentation.
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/cli/skill) for full documentation.
</Tip>

72
docs/en/tools/vision.mdx Normal file
View File

@@ -0,0 +1,72 @@
---
title: vision - Image Analysis
description: Analyze image content (recognition, description, OCR, etc.)
---
Analyze local images or image URLs using Vision API. Supports content description, text extraction (OCR), object recognition, and more.
## Model Selection
The vision tool uses a multi-level auto-selection strategy with automatic fallback — no manual configuration required:
1. **Main model** — uses the currently configured main model for image recognition (zero extra cost)
2. **Other configured models** — auto-discovers other models with configured API keys as alternatives
3. **OpenAI** — uses `open_ai_api_key` to call gpt-4.1-mini
4. **LinkAI** — uses `linkai_api_key` to call LinkAI vision service
When `use_linkai=true`, LinkAI is promoted to the highest priority.
If the current provider fails, the tool automatically tries the next one until it succeeds or all fail.
### Supported Models
| Vendor | Vision Model | Notes |
| --- | --- | --- |
| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
| Qwen (DashScope) | Main model | Via MultiModalConversation API |
| Claude | Main model | Anthropic native image format |
| Gemini | Main model | inlineData format |
| Doubao | Main model | doubao-seed-2-0 series natively supported |
| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
<Note>
ZhipuAI and MiniMax text models do not support image understanding, so their dedicated vision models are always used automatically.
</Note>
## Parameters
| Parameter | Type | Required | Description |
| --- | --- | --- | --- |
| `image` | string | Yes | Local file path or HTTP(S) image URL |
| `question` | string | Yes | Question to ask about the image |
Supported image formats: jpg, jpeg, png, gif, webp
## Custom Configuration
To specify a particular model for the vision tool, add to `config.json`:
```json
{
"tool": {
"vision": {
"model": "gpt-4o"
}
}
}
```
In most cases no configuration is needed. The tool works automatically as long as the main model supports multimodal input or any vision-capable API key is configured.
## Use Cases
- Describe image content
- Extract text from images (OCR)
- Identify objects, colors, scenes
- Analyze screenshots and scanned documents
<Note>
Images larger than 1MB are automatically compressed (max edge 1536px). All images (including remote URLs) are converted to base64 for transmission to ensure compatibility with all model backends.
</Note>

View File

@@ -47,7 +47,7 @@ description: 使用脚本一键安装和管理 CowAgent
| `cow update` | 更新代码并重启 |
| `cow install-browser` | 安装浏览器工具依赖 |
更多命令和用法参考 [命令文档](/commands/index)。
更多命令和用法参考 [命令文档](/cli/index)。
<Note>
如果 `cow` 命令不可用,也可以使用 `./run.sh <命令>`Linux/macOS或 `.\scripts\run.ps1 <命令>`Windows作为替代功能等效。

View File

@@ -36,7 +36,7 @@ pip3 install -e .
更新完成后重启服务:
```bash
# 使用 Cow CLI
# 使用 Cow CLI (推荐)
cow restart
# 或使用 run.sh

View File

@@ -1,6 +1,6 @@
---
title: 功能介绍
description: CowAgent 长期记忆、任务规划、技能系统详细说明
description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、浏览器工具详细说明
---
## 1. 长期记忆
@@ -19,7 +19,7 @@ description: CowAgent 长期记忆、任务规划、技能系统详细说明
工具是 Agent 访问操作系统资源的核心Agent 会根据任务需求智能选择和调用工具,完成文件读写、命令执行、定时任务等各类操作。内置工具的实现在项目的 `agent/tools/` 目录下。
**主要工具:** 文件读写编辑、Bash 终端、文件发送、定时调度、记忆搜索、联网搜索、环境配置等。
**主要工具:** 文件读写编辑、Bash 终端、浏览器操作、文件发送、定时调度、记忆搜索、联网搜索、环境配置等。
### 2.1 终端和文件访问
@@ -45,7 +45,15 @@ description: CowAgent 长期记忆、任务规划、技能系统详细说明
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
</Frame>
### 2.4 环境变量管理
### 2.4 浏览器操作
内置 `browser` 工具Agent 可控制浏览器访问网页、填写表单、点击元素、截图,支持动态 JS 渲染页面。运行 `cow install-browser` 一键安装,自动适配服务器(无头模式)和桌面环境:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260401115728.png" width="750" />
</Frame>
### 2.5 环境变量管理
技能所需的秘钥存储在环境变量文件中,由 `env_config` 工具进行管理,你可以通过对话的方式更新秘钥,工具内置安全保护和脱敏策略:
@@ -57,9 +65,12 @@ description: CowAgent 长期记忆、任务规划、技能系统详细说明
技能系统为 Agent 提供无限的扩展性,每个 Skill 由说明文件、运行脚本(可选)、资源(可选)组成,描述如何完成特定类型的任务。通过 Skill 可以让 Agent 遵循说明完成复杂流程、调用各类工具或对接第三方系统。
- **[Skill Hub](https://skills.cowagent.ai/)** 开放的技能广场,汇集官方推荐、社区贡献和第三方技能,支持一键安装。
- **内置技能:** 在项目的 `skills/` 目录下包含技能创造器、图像识别、LinkAI 智能体、网页抓取等。内置 Skill 根据依赖条件API Key、系统命令等自动判断是否启用。
- **自定义技能:** 由用户通过对话创建,存放在工作空间中(`~/cow/skills/`),可实现任何复杂的业务流程和第三方系统对接。
安装技能:`/skill install <名称>` 或 `cow skill install <名称>`,支持从 Skill Hub、GitHub、ClawHub、URL 等来源安装。
### 3.1 创建技能
通过 `skill-creator` 技能可以通过对话的方式快速创建技能。你可以让 Agent 将某个工作流程固化为技能,或者把任意接口文档和示例发送给 Agent让他直接完成对接
@@ -77,29 +88,36 @@ description: CowAgent 长期记忆、任务规划、技能系统详细说明
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
</Frame>
### 3.3 三方知识库和插件
### 3.3 技能广场
`linkai-agent` 技能可以将 [LinkAI](https://link-ai.tech/) 上的所有智能体作为 Skill 交给 Agent 使用,实现多智能体决策效果。
访问 [skills.cowagent.ai](https://skills.cowagent.ai/) 浏览所有可用技能,或在对话中执行:
配置方式:通过 `env_config` 配置 `LINKAI_API_KEY`,并在 `skills/linkai-agent/config.json` 中添加智能体说明:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI客服助手",
"app_description": "当用户需要了解LinkAI平台相关问题时才选择该助手"
},
{
"app_code": "SFY5x7JR",
"app_name": "内容创作助手",
"app_description": "当用户需要创作图片或视频时才使用该助手"
}
]
}
```text
/skill list --remote # 浏览技能广场
/skill search <关键词> # 搜索技能
/skill install <名称> # 一键安装
```
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202234350.png" width="750" />
</Frame>
同时还支持安装Github、ClawHub、LinkAI等第三方平台上的所有技能详情查看 [技能安装](/skills/install)
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 4. CLI 命令系统
CowAgent 提供两种命令交互方式,覆盖服务管理、技能安装、配置调整等日常运维操作:
- **终端 CLI** 在系统终端执行 `cow <命令>`,支持 `start`、`stop`、`restart`、`update`、`status`、`logs`、`skill` 等
- **对话命令:** 在对话中输入 `/<命令>`Web 控制台输入 `/` 可弹出指令菜单快速选择
```bash
cow start # 启动服务
cow stop # 停止服务
cow update # 更新并重启
cow skill install pptx # 安装技能
cow install-browser # 安装浏览器工具
```
详细命令参考 [命令总览](https://docs.cowagent.ai/cli)。
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />

View File

@@ -36,7 +36,7 @@ CowAgent 支持灵活切换多种模型,能处理文本、语音、图片、
<Card title="工具系统" icon="wrench" href="/tools/index">
内置文件读写、终端执行、浏览器操作、定时任务、消息发送等工具Agent 可自主调用工具完成复杂任务。
</Card>
<Card title="命令系统" icon="terminal" href="/commands/index">
<Card title="命令系统" icon="terminal" href="/cli/index">
提供终端 CLI 和对话中的命令,支持进程管理、技能安装、配置修改、上下文查看等常用操作。
</Card>
<Card title="多模型支持" icon="microchip" href="/models/index">

View File

@@ -13,6 +13,7 @@
<a href="https://cowagent.ai/">🌐 ウェブサイト</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/intro/index">📖 ドキュメント</a> &nbsp;·&nbsp;
<a href="https://docs.cowagent.ai/en/guide/quick-start">🚀 クイックスタート</a> &nbsp;·&nbsp;
<a href="https://skills.cowagent.ai/">🧩 Skill Hub</a> &nbsp;·&nbsp;
<a href="https://link-ai.tech/cowagent/create">☁️ オンラインで試す</a>
</p>
@@ -41,6 +42,8 @@
## 更新履歴
> **2026.04.01:** [v2.0.5](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.5) — Cow CLI、Skill Hubオープンソース化、ブラウザツール、WeCom Botスキャン作成など。
> **2026.02.27:** [v2.0.2](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.2) — Webコンソールの全面刷新ストリーミングチャット、モデル/Skill/メモリ/チャネル/スケジューラ/ログ管理、マルチチャネル同時実行、セッション永続化、Gemini 3.1 Pro / Claude 4.6 Sonnet / Qwen3.5 Plusなど新モデル追加。
> **2026.02.13:** [v2.0.1](https://github.com/zhayujie/chatgpt-on-wechat/releases/tag/2.0.1) — 組み込みWeb検索ツール、スマートコンテキストトリミング、ランタイム情報の動的更新、Windows互換性、スケジューラのメモリ喪失やFeishu接続問題などの修正。
@@ -73,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
実行後、デフォルトでWebサービスが起動します。`http://localhost:9899/chat` にアクセスしてチャットを開始できます。
スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start``cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/commands/index)でサービスを管理できます。
スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start``cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/cli/index)でサービスを管理できます。
### 手動インストール
@@ -97,7 +100,7 @@ pip3 install -r requirements-optional.txt # 任意ですが推奨
pip3 install -e .
```
インストール後、`cow` コマンドでサービス管理起動、停止、更新などやSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/commands/index)を参照してください。
インストール後、`cow` コマンドでサービス管理起動、停止、更新などやSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/cli/index)を参照してください。
**4. ブラウザのインストール(任意)**
@@ -162,7 +165,7 @@ sudo docker logs -f chatgpt-on-wechat
| GLM | `glm-5-turbo` |
| Kimi | `kimi-k2.5` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Qwen | `qwen3.5-plus` |
| Qwen | `qwen3.6-plus` |
| Claude | `claude-sonnet-4-6` |
| Gemini | `gemini-3.1-pro-preview` |
| OpenAI | `gpt-5.4` |
@@ -223,6 +226,7 @@ Coding Planは各プロバイダーが提供する月額サブスクリプショ
## 🔗 関連プロジェクト
- [Cow Skill Hub](https://github.com/zhayujie/cow-skill-hub): AIエージェント向けのオープンSkillマーケットプレイス。CowAgent、OpenClaw、Claude Codeなどで利用可能なSkillの閲覧・検索・インストール・公開が可能。
- [bot-on-anything](https://github.com/zhayujie/bot-on-anything): 軽量で高い拡張性を持つLLMアプリケーションフレームワーク。Slack、Telegram、Discord、Gmailなどに対応。
- [AgentMesh](https://github.com/MinimalFuture/AgentMesh): エージェントチームの協調による複雑な問題解決のためのオープンソースのマルチエージェントフレームワーク。
@@ -232,7 +236,7 @@ FAQ: <https://github.com/zhayujie/chatgpt-on-wechat/wiki/FAQs>
## 🛠️ コントリビューション
新しいチャネルの追加を歓迎します。[Feishuチャネル](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py)を参考にしてください。また、新しいSkillのコントリビューションも歓迎します。[Skill作成ドキュメント](https://docs.cowagent.ai/ja/skills/create)を参照してください。
新しいチャネルの追加を歓迎します。[Feishuチャネル](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/channel/feishu/feishu_channel.py)を参考にしてください。また、新しいSkillのコントリビューションも歓迎します。[Skill作成ドキュメント](https://docs.cowagent.ai/ja/skills/create)を参照するか、[Skill Hub](https://skills.cowagent.ai/submit)に提出してください。
## ✉ お問い合わせ

View File

@@ -47,7 +47,7 @@ Linux、macOS、Windowsに対応しています。Python 3.7〜3.12が必要で
| `cow update` | コードを更新して再起動 |
| `cow install-browser` | ブラウザツールの依存をインストール |
詳細は[コマンドドキュメント](/ja/commands/index)を参照してください。
詳細は[コマンドドキュメント](/ja/cli/index)を参照してください。
<Note>
`cow` コマンドが利用できない場合は、`./run.sh <コマンド>`Linux/macOSまたは `.\scripts\run.ps1 <コマンド>`Windowsで代替できます。機能は同等です。

View File

@@ -1,6 +1,6 @@
---
title: 機能詳細
description: CowAgent の長期記憶、タスク計画、Skill システムの詳細
description: CowAgent の長期記憶、タスク計画、Skill システム、CLI コマンド、ブラウザツールの詳細
---
## 1. 長期記憶
@@ -19,7 +19,7 @@ description: CowAgent の長期記憶、タスク計画、Skill システムの
ツールは Agent がオペレーティングシステムのリソースにアクセスするための中核です。Agent はタスク要件に基づいてインテリジェントにツールを選択・呼び出し、ファイルの読み書き、コマンド実行、スケジュールタスクなどを実行します。組み込みツールはプロジェクトの `agent/tools/` ディレクトリに実装されています。
**主なツール:** ファイルの読み書き・編集、Bash ターミナル、ファイル送信、スケジューラ、記憶検索、Web 検索、環境設定など。
**主なツール:** ファイルの読み書き・編集、Bash ターミナル、ブラウザ操作、ファイル送信、スケジューラ、記憶検索、Web 検索、環境設定など。
### 2.1 ターミナルとファイルアクセス
@@ -45,7 +45,15 @@ OS のターミナルとファイルシステムへのアクセスは、最も
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
</Frame>
### 2.4 環境変数管理
### 2.4 ブラウザ操作
組み込みの `browser` ツールにより、Agent は Chromium ブラウザを制御して Web ページへのアクセス、フォームの入力、要素のクリック、スクリーンショットの撮影が可能です。動的 JS レンダリングページにも対応しています。`cow install-browser` でワンコマンドインストール、サーバー(ヘッドレス)とデスクトップ環境に自動対応します:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="800" />
</Frame>
### 2.5 環境変数管理
Skill が必要とするシークレットキーは環境変数ファイルに保存され、`env_config` ツールによって管理されます。会話を通じてシークレットを更新でき、セキュリティ保護とマスキング機能が組み込まれています:
@@ -57,9 +65,12 @@ Skill が必要とするシークレットキーは環境変数ファイルに
Skill システムは Agent に無限の拡張性を提供します。各 Skill は説明ファイル、実行スクリプト任意、リソース任意で構成され、特定のタイプのタスクを完了する方法を記述します。Skill により Agent は複雑なワークフローの指示に従い、ツールを呼び出し、サードパーティシステムと連携できます。
- **[Skill Hub](https://skills.cowagent.ai/)** オープンな Skill マーケットプレイス。公式推奨、コミュニティ、サードパーティの Skill を収録。ワンコマンドでインストール可能。
- **組み込み Skill** プロジェクトの `skills/` ディレクトリにあり、Skill クリエイター、画像認識、LinkAI Agent、Web フェッチなどが含まれます。組み込み Skill は依存条件API キー、システムコマンドなど)に基づいて自動的に有効化されます。
- **カスタム Skill** ユーザーが会話を通じて作成し、ワークスペース(`~/cow/skills/`)に保存されます。あらゆる複雑なビジネスプロセスやサードパーティ連携を実装できます。
Skill のインストール:`/skill install <名前>` または `cow skill install <名前>`。Skill Hub、GitHub、ClawHub、URL などからインストール可能。
### 3.1 Skill の作成
`skill-creator` Skill により、会話を通じて Skill を素早く作成できます。ワークフローを Skill としてコード化するよう Agent に依頼したり、API ドキュメントやサンプルを送信して Agent に直接連携を完成させることができます:
@@ -77,29 +88,33 @@ Skill システムは Agent に無限の拡張性を提供します。各 Skill
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
</Frame>
### 3.3 サードパーティナレッジベースとプラグイン
### 3.3 Skill Hub
`linkai-agent` Skill により、[LinkAI](https://link-ai.tech/) 上のすべての Agent を Skill として利用でき、マルチ Agent による意思決定が可能になります
[skills.cowagent.ai](https://skills.cowagent.ai/) で利用可能なすべての Skill を閲覧するか、会話内でコマンドを実行できます
設定方法:`env_config` で `LINKAI_API_KEY` を設定し、`skills/linkai-agent/config.json` に Agent の説明を追加します:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI Customer Support",
"app_description": "Select only when the user needs help with LinkAI platform questions"
},
{
"app_code": "SFY5x7JR",
"app_name": "Content Creator",
"app_description": "Use only when the user needs to create images or videos"
}
]
}
```text
/skill list --remote # Skill Hub を閲覧
/skill search <キーワード> # Skill を検索
/skill install <名前> # ワンコマンドでインストール
```
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260202234350.png" width="750" />
</Frame>
GitHub、ClawHub、LinkAI などサードパーティプラットフォームの Skill もインストール可能です。詳細は [Skill のインストール](/ja/skills/install) を参照してください。
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 4. CLI コマンドシステム
CowAgent はサービス管理、Skill インストール、設定変更などをカバーする2つのコマンドインターフェースを提供します
- **ターミナル CLI** システムターミナルで `cow <コマンド>` を実行。`start`、`stop`、`restart`、`update`、`status`、`logs`、`skill` などをサポート。
- **チャットコマンド:** 会話内で `/<コマンド>` を入力。Web コンソールでは `/` を入力するとコマンドメニューが表示されます。
```bash
cow start # サービスを開始
cow stop # サービスを停止
cow update # 更新して再起動
cow skill install pptx # Skill をインストール
cow install-browser # ブラウザツールをインストール
```
詳細は [コマンド一覧](https://docs.cowagent.ai/ja/cli) を参照してください。

View File

@@ -31,7 +31,7 @@ CowAgent は自ら思考しタスクを計画し、コンピュータや外部
<Card title="ツールシステム" icon="wrench" href="/ja/tools/index">
ファイル読み書き、ターミナル実行、ブラウザ操作、スケジュールタスク、メッセージ送信などの組み込みツールを提供。Agent が自律的にツールを呼び出して複雑なタスクを完了します。
</Card>
<Card title="コマンドシステム" icon="terminal" href="/ja/commands/index">
<Card title="コマンドシステム" icon="terminal" href="/ja/cli/index">
ターミナル CLI とチャット内コマンドを提供し、プロセス管理、Skill インストール、設定変更、コンテキスト確認などの一般的な操作をサポートします。
</Card>
<Card title="複数モデル対応" icon="microchip" href="/ja/models/index">

View File

@@ -6,7 +6,7 @@ description: CowAgentがサポートするモデルとおすすめの選択肢
CowAgentは国内外の主要なLLMをサポートしています。モデルインターフェースはプロジェクトの`models/`ディレクトリに実装されています。
<Note>
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
</Note>
## 設定
@@ -25,7 +25,7 @@ CowAgentは国内外の主要なLLMをサポートしています。モデルイ
glm-5-turbo、glm-5およびその他のシリーズモデル
</Card>
<Card title="Qwen (通义千问)" href="/ja/models/qwen">
qwen3.5-plus、qwen3-maxなど
qwen3.6-plus、qwen3-maxなど
</Card>
<Card title="Kimi" href="/ja/models/kimi">
kimi-k2.5、kimi-k2など

View File

@@ -1,18 +1,18 @@
---
title: Qwen (通义千问)
description: 通义千问モデルの設定
title: Qwen (通義千問)
description: 通義千問モデルの設定
---
```json
{
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"dashscope_api_key": "YOUR_API_KEY"
}
```
| パラメータ | 説明 |
| --- | --- |
| `model` | `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
| `model` | `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
| `dashscope_api_key` | [百炼 Console](https://bailian.console.aliyun.com/?tab=model#/api-key)で作成。[公式ドキュメント](https://bailian.console.aliyun.com/?tab=api#/api)を参照 |
OpenAI互換の設定もサポートしています:
@@ -20,7 +20,7 @@ OpenAI互換の設定もサポートしています:
```json
{
"bot_type": "openai",
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -5,6 +5,7 @@ description: CowAgent バージョン履歴
| バージョン | 日付 | 説明 |
| --- | --- | --- |
| [2.0.5](/ja/releases/v2.0.5) | 2026.04.01 | Cow CLI、Skill Hub オープンソース、ブラウザツール、企業微信スキャン作成、その他改善 |
| [2.0.4](/ja/releases/v2.0.4) | 2026.03.22 | 個人WeChatチャネル追加、新モデルサポート、日本語ドキュメント、スクリプトリファクタリングおよび複数修正 |
| [2.0.2](/ja/releases/v2.0.2) | 2026.02.27 | Web Console アップグレード、マルチチャネル同時実行、セッション永続化 |
| [2.0.1](/en/releases/v2.0.1) | 2026.02.27 | 組み込み Web Search ツール、スマートコンテキスト管理、複数の修正 |

View File

@@ -0,0 +1,77 @@
---
title: v2.0.5
description: CowAgent 2.0.5 - Cow CLI、Skill Hub オープンソース、ブラウザツール、企業微信スキャン作成、その他改善
---
## 🖥️ Cow CLI コマンドシステム
ターミナルと会話の両方で CowAgent を管理する新しい CLI コマンドシステム:
- **ターミナルコマンド**`cow <コマンド>` で `start`、`stop`、`restart`、`update`、`status`、`logs` などを実行
- **チャットコマンド**:会話で `/<コマンド>` を入力して `/help`、`/status`、`/config`、`/skill`、`/context`、`/logs`、`/version` など
- **Web コンソール**:入力欄で `/` を入力するとスラッシュコマンドメニューが表示、矢印キーで入力履歴を辿れる
- **Windows サポート**PowerShell スクリプト `scripts/run.ps1` を追加、`cow` コマンドに対応
ドキュメント:[コマンド一覧](https://docs.cowagent.ai/ja/cli)
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
## 🧩 Cow Skill Hub オープンソース
[Cow Skill Hub](https://skills.cowagent.ai)スキル広場がオープンソースとして公開。AI Agent スキルの閲覧、検索、インストール、公開が可能:
- **ワンコマンドインストール**:会話で `/skill install <名前>` またはターミナルで `cow skill install <名前>`
- **マルチソース**Skill Hub、GitHub、ClawHub、LinkAI などからインストール可能
- **検索**`/skill search` と `/skill list --remote` でスキル広場を閲覧・検索
- **スキル公開**[skills.cowagent.ai/submit](https://skills.cowagent.ai/submit) で自作スキルを提出
- **ミラー加速**:中国国内向けミラーダウンロード対応
オープンソースリポジトリ:[cow-skill-hub](https://github.com/zhayujie/cow-skill-hub)
ドキュメント:[スキル広場](https://docs.cowagent.ai/ja/skills/hub)、[スキルのインストール](https://docs.cowagent.ai/ja/skills/install)
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 🌐 ブラウザツール
新しい Browser ツール — Agent が Chromium ブラウザを制御して Web ページにアクセス・操作:
- **ナビゲーションと操作**`navigate`、`click`、`fill`、`select`、`scroll`、`press` など
- **ページスナップショット**:精簡 DOM スナップショットで Agent がページ構造を効率的に理解、ナビゲーション後に自動スナップショット
- **スクリーンショット**:ワークスペースにページのスクリーンショットを保存
- **JavaScript 実行**:ページでカスタムスクリプトを実行
- **CLI インストール**`cow install-browser` でワンコマンドセットアップ
- **Docker サポート**Docker イメージにブラウザインストール組み込み
ドキュメント:[ブラウザツール](https://docs.cowagent.ai/ja/tools/browser)
<img src="https://cdn.link-ai.tech/doc/20260401115728.png" width="750" />
## 🤖 企業微信 Bot スキャン作成
企業微信 Bot チャネルで QR コードスキャンによるワンクリック作成をサポート:
- **Web コンソールでスキャン**:「スキャン接入」モードを選択し、企業微信でスキャンするとボットが自動作成・接続
- **手動モード**:既存の Bot ID と Secret を手動入力する方式も引き続きサポート
- **ストリーム配信最適化**WebSocket 混雑を避けるためのスロットリング
ドキュメント:[企業微信 Bot](https://docs.cowagent.ai/ja/channels/wecom-bot)
PR[#2735](https://github.com/zhayujie/chatgpt-on-wechat/pull/2735)。Thanks [@WecomTeam](https://github.com/WecomTeam)
## 🐛 その他の改善と修正
- **DeepSeek モジュール**:独立 DeepSeek Bot、`deepseek_api_key` 専用設定対応([#2719](https://github.com/zhayujie/chatgpt-on-wechat/pull/2719)。Thanks [@6vision](https://github.com/6vision)
- **Web コンソール**:スラッシュコマンドメニュー、入力履歴、新モデル選択肢、モバイル最適化([#2731](https://github.com/zhayujie/chatgpt-on-wechat/pull/2731)。Thanks [@zkjqd](https://github.com/zkjqd)
- **コンテキスト**:トリミング後のコンテキスト喪失を修正([393f0c0](https://github.com/zhayujie/chatgpt-on-wechat/commit/393f0c0)
- **システムプロンプト**:毎ターン再構築されない問題を修正([13f5fde](https://github.com/zhayujie/chatgpt-on-wechat/commit/13f5fde)
- **Gemini**GoogleGeminiBot の model 属性欠落を修正([#2716](https://github.com/zhayujie/chatgpt-on-wechat/pull/2716)。Thanks [@cowagent](https://github.com/cowagent)
- **WeChat チャネル**:ファイル送信失敗・ファイル名消失の修正([6d9b7ba](https://github.com/zhayujie/chatgpt-on-wechat/commit/6d9b7ba)、[45faa9c](https://github.com/zhayujie/chatgpt-on-wechat/commit/45faa9c)
- **Docker**:ボリューム権限修正、イメージサイズ削減([3eb8348](https://github.com/zhayujie/chatgpt-on-wechat/commit/3eb8348)、[4470d4c](https://github.com/zhayujie/chatgpt-on-wechat/commit/4470d4c)
- **セキュリティ**Memory Content パストラバーサルリスクを修正。Thanks [@August829](https://github.com/August829)
## 📦 アップグレード
`cow update` または `./run.sh update` でアップグレード、またはコードを手動で pull して再起動。詳細は[アップグレードガイド](https://docs.cowagent.ai/ja/guide/upgrade)を参照。
**リリース日**2026.04.01 | [Full Changelog](https://github.com/zhayujie/chatgpt-on-wechat/compare/2.0.4...master)

View File

@@ -17,7 +17,7 @@ CowAgent ではスキルを取得する複数の方法を提供しています
- **URL** — zip アーカイブや SKILL.md リンクからインストール
- **会話で作成** — 自然言語の会話を通じて Agent にスキルを自動作成させる
詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/commands/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/cli/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
## スキルの読み込み優先順位

View File

@@ -49,5 +49,5 @@ zip アーカイブと SKILL.md ファイルリンクに対応:
```
<Tip>
上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/commands/skill)を参照してください。
上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/cli/skill)を参照してください。
</Tip>

72
docs/ja/tools/vision.mdx Normal file
View File

@@ -0,0 +1,72 @@
---
title: vision - 画像分析
description: 画像コンテンツの分析認識、説明、OCR など)
---
Vision API を使用してローカル画像や画像 URL を分析します。コンテンツの説明、テキスト抽出OCR、オブジェクト認識などに対応しています。
## モデル選択
Vision ツールは多段階の自動選択+自動フォールバック戦略を採用しており、手動設定なしで利用可能です:
1. **メインモデル** — 現在設定されているメインモデルで画像認識を実行(追加コストなし)
2. **その他の設定済みモデル** — API キーが設定されている他のマルチモーダルモデルを自動検出
3. **OpenAI** — `open_ai_api_key` を使用して gpt-4.1-mini を呼び出し
4. **LinkAI** — `linkai_api_key` を使用して LinkAI ビジョンサービスを呼び出し
`use_linkai=true` の場合、LinkAI が最優先になります。
現在のプロバイダーが失敗した場合、成功するかすべて失敗するまで自動的に次のプロバイダーを試行します。
### 対応モデル
| ベンダー | ビジョンモデル | 説明 |
| --- | --- | --- |
| OpenAI / 互換プロトコル | メインモデル | すべての OpenAI 互換マルチモーダルモデルに対応 |
| 通義千問 (DashScope) | メインモデル | MultiModalConversation API 経由 |
| Claude | メインモデル | Anthropic ネイティブ画像形式 |
| Gemini | メインモデル | inlineData 形式 |
| 豆包 (Doubao) | メインモデル | doubao-seed-2-0 シリーズがネイティブ対応 |
| Kimi (Moonshot) | メインモデル | kimi-k2.5 がネイティブ対応 |
| 智谱 AI | glm-5v-turbo | 常にビジョン専用モデルを使用 |
| MiniMax | MiniMax-Text-01 | 常にビジョン専用モデルを使用 |
<Note>
智谱 AI と MiniMax のテキストモデルは画像理解に対応していないため、対応するビジョン専用モデルが自動的に使用されます。
</Note>
## パラメータ
| パラメータ | 型 | 必須 | 説明 |
| --- | --- | --- | --- |
| `image` | string | はい | ローカルファイルパスまたは HTTP(S) 画像 URL |
| `question` | string | はい | 画像に対する質問 |
対応画像形式jpg、jpeg、png、gif、webp
## カスタム設定
Vision ツールで使用するモデルを指定するには、`config.json` に以下を追加します:
```json
{
"tool": {
"vision": {
"model": "gpt-4o"
}
}
}
```
ほとんどの場合、設定は不要です。メインモデルがマルチモーダルに対応しているか、ビジョン対応の API キーが設定されていれば自動的に動作します。
## ユースケース
- 画像コンテンツの説明
- 画像からのテキスト抽出OCR
- オブジェクト、色、シーンの識別
- スクリーンショットやスキャン文書の分析
<Note>
1MB を超える画像は自動的に圧縮されます(最大辺 1536px。すべての画像リモート URL を含む)は base64 に変換して送信され、すべてのモデルバックエンドとの互換性を確保します。
</Note>

View File

@@ -6,19 +6,20 @@ description: CowAgent 支持的模型及推荐选择
CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在项目的 `models/` 目录下。
<Note>
Agent 模式下推荐使用以下模型可根据效果及成本综合选择MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
Agent 模式下推荐使用以下模型可根据效果及成本综合选择MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
</Note>
## 配置方式
根据所选模型,在 `config.json` 中填写对应的模型名称和 API Key 即可。每个模型也支持 OpenAI 兼容方式接入,将 `bot_type` 设为 `openai`,配置 `open_ai_api_base` 和 `open_ai_api_key`。
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
也可以通过 [Web 控制台](/channels/web) 在线管理模型配置,无需手动编辑配置文件:
**方式一(推荐):** 通过 [Web 控制台](/channels/web) 在线管理模型配置,无需手动编辑配置文件:
<img width="850" src="https://cdn.link-ai.tech/doc/20260227173811.png" />
**方式二:** 手动编辑 `config.json`,根据所选模型填写对应的模型名称和 API Key。每个模型也支持 OpenAI 兼容方式接入,将 `bot_type` 设为 `openai`,配置 `open_ai_api_base` 和 `open_ai_api_key` 即可。
## 支持的模型
<CardGroup cols={2}>
@@ -29,7 +30,7 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
glm-5-turbo、glm-5 等系列模型
</Card>
<Card title="通义千问 Qwen" href="/models/qwen">
qwen3.5-plus、qwen3-max 等
qwen3.6-plus、qwen3-max 等
</Card>
<Card title="Kimi" href="/models/kimi">
kimi-k2.5、kimi-k2 等
@@ -54,6 +55,7 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
</Card>
</CardGroup>
<Tip>
全部模型名称可参考项目 [`common/const.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py) 文件。
</Tip>

View File

@@ -5,14 +5,14 @@ description: 通义千问模型配置
```json
{
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"dashscope_api_key": "YOUR_API_KEY"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | 可填 `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
| `model` | 可填 `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
| `dashscope_api_key` | 在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) |
也支持 OpenAI 兼容方式接入:
@@ -20,7 +20,7 @@ description: 通义千问模型配置
```json
{
"bot_type": "openai",
"model": "qwen3.5-plus",
"model": "qwen3.6-plus",
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -5,6 +5,7 @@ description: CowAgent 版本更新历史
| 版本 | 日期 | 说明 |
| --- | --- | --- |
| [2.0.5](/releases/v2.0.5) | 2026.04.01 | Cow CLI、Skill Hub 开源、浏览器工具、企微扫码创建、多项优化和修复 |
| [2.0.4](/releases/v2.0.4) | 2026.03.22 | 新增个人微信通道、新模型支持、日文文档、脚本重构及多项修复 |
| [2.0.3](/releases/v2.0.3) | 2026.03.18 | 新增企微智能机器人和 QQ 通道、支持Coding Plan、新增多个模型、Web端文件处理、记忆系统升级 |
| [2.0.2](/releases/v2.0.2) | 2026.02.27 | Web 控制台升级、多通道同时运行、会话持久化 |

84
docs/releases/v2.0.5.mdx Normal file
View File

@@ -0,0 +1,84 @@
---
title: v2.0.5
description: CowAgent 2.0.5 - Cow CLI、Skill Hub 开源、浏览器工具、企微扫码创建、DeepSeek 独立模块及多项优化
---
## 🖥️ Cow CLI 命令系统
新增 Cow CLI 命令系统,支持在终端和对话中执行命令,实现对 CowAgent 的全方位管理:
- **终端命令**:在系统终端中执行 `cow <命令>`,支持 `start`、`stop`、`restart`、`update`、`status`、`logs` 等服务管理操作
- **对话命令**:在对话中输入 `/<命令>` 或 `cow <命令>`,支持 `/help`、`/status`、`/config`、`/skill`、`/context`、`/logs`、`/version` 等
- **web控制台**Web 控制台输入框输入 `/` 即可弹出指令菜单,支持方向键回溯历史输入
- **Windows 支持**:新增 PowerShell 一键安装脚本 `scripts/run.ps1`,同时支持 `cow` 命令
相关文档:[命令总览](https://docs.cowagent.ai/cli)
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
## 🧩 Cow Skill Hub 开源
[Cow Skill Hub](https://skills.cowagent.ai)(技能广场)正式开源并上线,提供 AI Agent 技能的浏览、搜索、安装和发布,汇集精选技能、社区贡献技能、三方技能:
- **一键安装**:在对话中 `/skill install <名称>` 或终端 `cow skill install <名称>` 一键安装
- **多来源支持**:支持安装 Skill Hub、GitHub、ClawHub、LinkAI 上的全部技能,支持 GitHub 批量安装和子目录指定
- **技能搜索**`/skill search` 和 `/skill list --remote` 浏览和搜索技能广场
- **技能发布**:通过 [skills.cowagent.ai/submit](https://skills.cowagent.ai/submit) 提交自己的技能
- **镜像加速**:支持 Skill Hub 镜像加速,国内环境下载更流畅
Skill Hub 开源仓库:[cow-skill-hub](https://github.com/zhayujie/cow-skill-hub)。
相关文档:[技能广场](https://docs.cowagent.ai/skills/hub)、[安装技能](https://docs.cowagent.ai/skills/install)
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
## 🌐 新增浏览器工具
新增 Browser 工具Agent 可控制浏览器访问和操作网页:
- **网页导航与交互**:支持 `navigate`、`click`、`fill`、`select`、`scroll`、`press` 等操作
- **页面快照**:使用精简 DOM 快照技术,让 Agent 高效理解页面结构,导航后自动快照
- **截图能力**:支持页面截图保存到工作区
- **JavaScript 执行**:支持在页面中执行自定义脚本
- **CLI 安装**:通过 `cow install-browser` 一键安装浏览器及依赖,自动适配系统环境
- **Docker 支持**Docker 镜像已内置浏览器安装支持
相关文档:[浏览器工具](https://docs.cowagent.ai/tools/browser)。
<img src="https://cdn.link-ai.tech/doc/20260401115728.png" width="750" />
## 🤖 企微智能机器人扫码创建
企业微信智能机器人通道新增扫码一键创建功能:
- **Web 控制台扫码**:在 Web 控制台通道页面,选择「扫码接入」模式,使用企业微信扫码即可自动创建并接入智能机器人,无需手动到企业微信后台配置
- **手动模式保留**:同时保留「手动填写」模式,可输入已有的 Bot ID 和 Secret 接入
- **流式推送优化**:增加推送节流,避免 WebSocket 拥塞
相关文档:[企微智能机器人接入](https://docs.cowagent.ai/channels/wecom-bot)。
相关提交:[#2735](https://github.com/zhayujie/chatgpt-on-wechat/pull/2735)
Thanks [@WecomTeam](https://github.com/WecomTeam)
## 🐛 其他优化与修复
- **DeepSeek 独立模块**:新增独立的 DeepSeek Bot 模块,支持 `deepseek_api_key` 专属配置,无需再通过 OpenAI 兼容方式接入([#2719](https://github.com/zhayujie/chatgpt-on-wechat/pull/2719)。Thanks [@6vision](https://github.com/6vision)
- **Web 控制台优化**:新增斜杠指令菜单和输入历史回溯,新增模型选项,优化移动端适配([#2731](https://github.com/zhayujie/chatgpt-on-wechat/pull/2731)。Thanks [@zkjqd](https://github.com/zkjqd)
- **上下文丢失**:修复上下文裁剪后丢失的问题 ([393f0c0](https://github.com/zhayujie/chatgpt-on-wechat/commit/393f0c0))
- **系统提示词**:修复系统提示词未在每轮重建的问题 ([13f5fde](https://github.com/zhayujie/chatgpt-on-wechat/commit/13f5fde))
- **Agent 响应**:去除 Agent 响应首尾空白字符 ([f890318](https://github.com/zhayujie/chatgpt-on-wechat/commit/f890318))
- **视觉压缩**:优化视觉图片压缩策略 ([22b8ca0](https://github.com/zhayujie/chatgpt-on-wechat/commit/22b8ca0))
- **Gemini 模型**:修复 GoogleGeminiBot 缺少 model 属性的问题([#2716](https://github.com/zhayujie/chatgpt-on-wechat/pull/2716)。Thanks [@cowagent](https://github.com/cowagent)
- **微信通道**:修复文件发送失败、文件名丢失等问题 ([6d9b7ba](https://github.com/zhayujie/chatgpt-on-wechat/commit/6d9b7ba)、[baf66a1](https://github.com/zhayujie/chatgpt-on-wechat/commit/baf66a1)、[45faa9c](https://github.com/zhayujie/chatgpt-on-wechat/commit/45faa9c))
- **Docker 优化**:修复卷权限问题,精简镜像体积 ([3eb8348](https://github.com/zhayujie/chatgpt-on-wechat/commit/3eb8348)、[4470d4c](https://github.com/zhayujie/chatgpt-on-wechat/commit/4470d4c))
- **README 排版**:优化中英文排版空格([#2723](https://github.com/zhayujie/chatgpt-on-wechat/pull/2723)。Thanks [@Xiaozhou345](https://github.com/Xiaozhou345)
- **安全修复**:修复 Memory Content路径遍历风险Thanks [@August829](https://github.com/August829)
## 📦 升级方式
源码部署可执行 `cow update` 或 `./run.sh update` 一键升级,或手动拉取代码后重启。详见 [更新升级文档](https://docs.cowagent.ai/guide/upgrade)。
**发布日期**2026.04.01 | [Full Changelog](https://github.com/zhayujie/chatgpt-on-wechat/compare/2.0.4...master)

View File

@@ -1,5 +1,5 @@
---
title: 创技能
title: 创技能
description: 通过对话创建自定义技能
---

65
docs/skills/hub.mdx Normal file
View File

@@ -0,0 +1,65 @@
---
title: 技能广场
description: 浏览、搜索和安装 AI Agent 技能
---
[Cow Skill Hub](https://skills.cowagent.ai/) 是开源的 AI Agent 技能广场汇集了官方推荐、社区贡献和第三方平台GitHub、ClawHub 等)的技能。
开源仓库:[github.com/zhayujie/cow-skill-hub](https://github.com/zhayujie/cow-skill-hub)
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="800" />
## 功能
- **浏览技能**:按类别(推荐 / 社区 / 第三方)和标签筛选
- **搜索技能**:按名称或描述搜索
- **查看详情**:查看技能文档、文件内容、安装命令和依赖的环境变量
- **一键安装**:复制安装命令即可在 CowAgent 中使用
## 安装技能
在对话中或终端中执行安装命令:
<CodeGroup>
```text 对话
/skill install <name>
```
```bash 终端
cow skill install <name>
```
</CodeGroup>
也可以在对话中浏览技能广场:
```text
/skill list --remote
/skill search <关键词>
```
除了在列表中展示的精选技能,还可以通过 **CLI命令 + Skill Hub** 安装各种第三方技能(**GitHub、ClawHub、LinkAI、URL** 等)参考 [安装技能](/skills/install)。
## 贡献技能
欢迎向技能广场提交你的技能:
1. 访问 [skills.cowagent.ai/submit](https://skills.cowagent.ai/submit)
2. 使用 GitHub 或 Google 账号登录
3. 上传包含 `SKILL.md` 的文件夹或 zip 包
4. 自动解析技能名称、显示名称和描述,可按需修改
5. 提交后将经过安全检查和审核后发布
<img src="https://cdn.link-ai.tech/doc/20260401111904.png" width="800" />
技能文件结构:
```
your-skill/
├── SKILL.md # 必须,放在根目录
├── scripts/ # 可选,运行脚本
└── resources/ # 可选,其他资源
```
<Tip>
技能基于 `SKILL.md` 文件构建,你也可以在技能详情页下载 SKILL.md用于任何支持自定义指令的 Agent如 OpenClaw、Cursor、Claude Code 等)。
</Tip>

View File

@@ -11,13 +11,14 @@ Skill 与 Tool 的区别Tool 是由代码实现的原子操作(如读写文
CowAgent 提供多种方式获取技能:
- **Cow 技能广场** — 通过 `/skill list --remote` 浏览和安装社区技能
- **[Cow 技能广场](https://skills.cowagent.ai/)** — 在线浏览所有可用技能,或通过 `/skill list --remote` 在对话中浏览和安装
- **GitHub** — 直接从 GitHub 仓库安装,支持批量安装
- **ClawHub** — 通过 `/skill install clawhub:名称` 安装 ClawHub 上的技能
- **ClawHub** — 通过 `/skill install clawhub:名称` 安装 ClawHub 上的技能 (4w+个)
- **LinkA** — 通过 `/skill install linkai:编码` 安装 LinkAI 上的公开资源和创建的知识库/数据库/工作流/插件等资源
- **URL** — 从 zip 压缩包或 SKILL.md 链接安装
- **对话创建** — 通过自然语言对话让 Agent 自动创建技能
详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/commands/skill)。也可以通过对话 [创建技能](/skills/create)。
详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/cli/skill)。也可以通过对话 [创建技能](/skills/create),或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能
## 技能加载优先级

View File

@@ -3,11 +3,11 @@ title: 安装技能
description: 通过命令一键安装来自多种来源的技能
---
CowAgent 支持通过统一的 `install` 命令安装来自 **Cow 技能广场、GitHub、ClawHub** 以及任意 URL 上的技能。在对话中使用 `/skill install`,在终端中使用 `cow skill install`。
CowAgent 支持通过统一的 `install` 命令安装来自 **[Cow 技能广场](https://skills.cowagent.ai/)、GitHub、ClawHub、LinkAI** 以及任意 URL 上的技能。在对话中使用 `/skill install`,在终端中使用 `cow skill install`。
## 从Cow技能广场安装
浏览技能广场,找到想要的技能后直接安装:
访问 [skills.cowagent.ai](https://skills.cowagent.ai/) 浏览所有可用技能,找到想要的技能后直接安装,例如
```text
/skill list --remote
@@ -16,7 +16,7 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **Cow 技能广场
## 从 GitHub 安装
支持仓库级批量安装和指定子目录安装:
> Github上的所有技能都可以直接安装支持仓库级批量安装和指定子目录安装,例如
```text
/skill install larksuite/cli
@@ -25,10 +25,23 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **Cow 技能广场
## 从 ClawHub 安装
[ClawHub](https://clawhub.ai/) 上的所有技能 (4w+个) 都可以一键安装,例如:
```text
/skill install clawhub:baidu-search
/skill install clawhub:<name>
```
## 从 LinkAI 安装
[LinkAI](https://link-ai.tech/console) 上的所有公开资源 (1w+个插件/应用/工作流) ,以及自己创建的资源 (应用/工作流/知识库/数据库/插件) 都可以通过命令一键安装:
```text
/skill install linkai:<code>
```
> LinkAI平台上创建的所有应用、工作流、知识库、数据库、插件都有唯一的code可在[控制台](https://link-ai.tech/console)各资源页面中进行获取并填写到命令中
## 从 URL 安装
支持 zip 压缩包和 SKILL.md 文件链接:
@@ -49,5 +62,5 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **Cow 技能广场
```
<Tip>
以上所有命令在终端中使用时,将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/commands/skill)。
以上所有命令在终端中使用时,将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/cli/skill)。
</Tip>

View File

@@ -5,14 +5,49 @@ description: 分析图片内容识别、描述、OCR 等)
使用 Vision API 分析本地图片或图片 URL支持内容描述、文字提取OCR、物体识别等。
## 依赖
## 模型选择
需要配置至少一个 API Key通过 `env_config` 工具或工作空间 `.env` 文件配置)
Vision 工具采用多级自动选择 + 自动兜底策略,无需手动配置即可使用
| 后端 | 环境变量 | 优先级 |
1. **主模型** — 优先使用当前配置的主模型进行图像识别(需要是多模态模型)
2. **其他已配置模型** — 自动发现已配置 API Key 的其他多模态模型作为备选
如果当前 provider 调用失败,会自动尝试下一个,直到成功或全部失败。
### 支持的模型
| 厂商 | 视觉模型 | 说明 |
| --- | --- | --- |
| OpenAI | `OPENAI_API_KEY` | 优先使用 |
| LinkAI | `LINKAI_API_KEY` | 备选 |
| OpenAI / 兼容协议 | 使用主模型 | 支持所有 OpenAI 协议兼容的多模态模型 |
| 通义千问 (DashScope) | 使用主模型 | 例如 qwen3.6-plus 等 |
| Claude | 使用主模型 | Anthropic 原生图像格式 |
| Gemini | 使用主模型 | inlineData 格式 |
| 豆包 (Doubao) | 使用主模型 | doubao-seed-2-0 系列原生支持 |
| Kimi (Moonshot) | 使用主模型 | kimi-k2.5 原生支持 |
| 智谱 AI | glm-5v-turbo | 固定使用视觉专用模型 |
| MiniMax | MiniMax-Text-01 | 固定使用视觉专用模型 |
<Note>
智谱和 MiniMax 的文本模型不支持图像理解,因此始终使用对应的视觉专用模型,无需手动指定。
</Note>
> 当 `use_linkai=true` 时,默认使用 LinkAI 的多模态模型进行
## 自定义配置
如果希望指定 Vision 使用的模型,可在 `config.json` 中配置,例如:
```json
{
"tool": {
"vision": {
"model": "gpt-4o"
}
}
}
```
大多数情况下无需配置,主模型支持多模态或配置任意一个支持视觉的 API Key 即可自动工作。
## 参数
@@ -20,17 +55,18 @@ description: 分析图片内容识别、描述、OCR 等)
| --- | --- | --- | --- |
| `image` | string | 是 | 本地文件路径或 HTTP(S) 图片 URL |
| `question` | string | 是 | 对图片提出的问题 |
| `model` | string | 否 | 模型名称(默认 gpt-4.1-mini |
支持的图片格式jpg、jpeg、png、gif、webp
## 使用场景
- 描述图片中的内容
- 提取图片中的文字OCR
- 识别物体、颜色、场景
- 分析截图、文档扫描
- 分析截图、文档扫描图片等
<Note>
超过 1MB 的图片会自动压缩后上传。如果未配置任何 Vision API Key该工具不会被加载
超过 1MB 的图片会自动压缩后上传,所有图片(包括远程 URL会统一转为 base64 传输,确保兼容所有模型后端
</Note>

View File

@@ -1,214 +0,0 @@
# encoding:utf-8
import json
import time
from typing import List, Tuple
import openai
from models.openai.openai_compat import RateLimitError, Timeout, APIError, APIConnectionError
import broadscope_bailian
from broadscope_bailian import ChatQaMessage
from models.bot import Bot
from models.ali.ali_qwen_session import AliQwenSession
from models.session_manager import SessionManager
from bridge.context import ContextType
from bridge.reply import Reply, ReplyType
from common.log import logger
from common import const
from config import conf, load_config
class AliQwenBot(Bot):
def __init__(self):
super().__init__()
self.api_key_expired_time = self.set_api_key()
self.sessions = SessionManager(AliQwenSession, model=conf().get("model", const.QWEN))
def api_key_client(self):
return broadscope_bailian.AccessTokenClient(access_key_id=self.access_key_id(), access_key_secret=self.access_key_secret())
def access_key_id(self):
return conf().get("qwen_access_key_id")
def access_key_secret(self):
return conf().get("qwen_access_key_secret")
def agent_key(self):
return conf().get("qwen_agent_key")
def app_id(self):
return conf().get("qwen_app_id")
def node_id(self):
return conf().get("qwen_node_id", "")
def temperature(self):
return conf().get("temperature", 0.2 )
def top_p(self):
return conf().get("top_p", 1)
def reply(self, query, context=None):
# acquire reply content
if context.type == ContextType.TEXT:
logger.info("[QWEN] query={}".format(query))
session_id = context["session_id"]
reply = None
clear_memory_commands = conf().get("clear_memory_commands", ["#清除记忆"])
if query in clear_memory_commands:
self.sessions.clear_session(session_id)
reply = Reply(ReplyType.INFO, "记忆已清除")
elif query == "#清除所有":
self.sessions.clear_all_session()
reply = Reply(ReplyType.INFO, "所有人记忆已清除")
elif query == "#更新配置":
load_config()
reply = Reply(ReplyType.INFO, "配置已更新")
if reply:
return reply
session = self.sessions.session_query(query, session_id)
logger.debug("[QWEN] session query={}".format(session.messages))
reply_content = self.reply_text(session)
logger.debug(
"[QWEN] new_query={}, session_id={}, reply_cont={}, completion_tokens={}".format(
session.messages,
session_id,
reply_content["content"],
reply_content["completion_tokens"],
)
)
if reply_content["completion_tokens"] == 0 and len(reply_content["content"]) > 0:
reply = Reply(ReplyType.ERROR, reply_content["content"])
elif reply_content["completion_tokens"] > 0:
self.sessions.session_reply(reply_content["content"], session_id, reply_content["total_tokens"])
reply = Reply(ReplyType.TEXT, reply_content["content"])
else:
reply = Reply(ReplyType.ERROR, reply_content["content"])
logger.debug("[QWEN] reply {} used 0 tokens.".format(reply_content))
return reply
else:
reply = Reply(ReplyType.ERROR, "Bot不支持处理{}类型的消息".format(context.type))
return reply
def reply_text(self, session: AliQwenSession, retry_count=0) -> dict:
"""
call bailian's ChatCompletion to get the answer
:param session: a conversation session
:param retry_count: retry count
:return: {}
"""
try:
prompt, history = self.convert_messages_format(session.messages)
self.update_api_key_if_expired()
# NOTE 阿里百炼的call()函数未提供temperature参数考虑到temperature和top_p参数作用相同取两者较小的值作为top_p参数传入详情见文档 https://help.aliyun.com/document_detail/2587502.htm
response = broadscope_bailian.Completions().call(app_id=self.app_id(), prompt=prompt, history=history, top_p=min(self.temperature(), self.top_p()))
completion_content = self.get_completion_content(response, self.node_id())
completion_tokens, total_tokens = self.calc_tokens(session.messages, completion_content)
return {
"total_tokens": total_tokens,
"completion_tokens": completion_tokens,
"content": completion_content,
}
except Exception as e:
need_retry = retry_count < 2
result = {"completion_tokens": 0, "content": "我现在有点累了,等会再来吧"}
if isinstance(e, RateLimitError):
logger.warn("[QWEN] RateLimitError: {}".format(e))
result["content"] = "提问太快啦,请休息一下再问我吧"
if need_retry:
time.sleep(20)
elif isinstance(e, Timeout):
logger.warn("[QWEN] Timeout: {}".format(e))
result["content"] = "我没有收到你的消息"
if need_retry:
time.sleep(5)
elif isinstance(e, APIError):
logger.warn("[QWEN] Bad Gateway: {}".format(e))
result["content"] = "请再问我一次"
if need_retry:
time.sleep(10)
elif isinstance(e, APIConnectionError):
logger.warn("[QWEN] APIConnectionError: {}".format(e))
need_retry = False
result["content"] = "我连接不到你的网络"
else:
logger.exception("[QWEN] Exception: {}".format(e))
need_retry = False
self.sessions.clear_session(session.session_id)
if need_retry:
logger.warn("[QWEN] 第{}次重试".format(retry_count + 1))
return self.reply_text(session, retry_count + 1)
else:
return result
def set_api_key(self):
api_key, expired_time = self.api_key_client().create_token(agent_key=self.agent_key())
broadscope_bailian.api_key = api_key
return expired_time
def update_api_key_if_expired(self):
if time.time() > self.api_key_expired_time:
self.api_key_expired_time = self.set_api_key()
def convert_messages_format(self, messages) -> Tuple[str, List[ChatQaMessage]]:
history = []
user_content = ''
assistant_content = ''
system_content = ''
for message in messages:
role = message.get('role')
if role == 'user':
user_content += message.get('content')
elif role == 'assistant':
assistant_content = message.get('content')
history.append(ChatQaMessage(user_content, assistant_content))
user_content = ''
assistant_content = ''
elif role =='system':
system_content += message.get('content')
if user_content == '':
raise Exception('no user message')
if system_content != '':
# NOTE 模拟系统消息,测试发现人格描述以"你需要扮演ChatGPT"开头能够起作用,而以"你是ChatGPT"开头模型会直接否认
system_qa = ChatQaMessage(system_content, '好的,我会严格按照你的设定回答问题')
history.insert(0, system_qa)
logger.debug("[QWEN] converted qa messages: {}".format([item.to_dict() for item in history]))
logger.debug("[QWEN] user content as prompt: {}".format(user_content))
return user_content, history
def get_completion_content(self, response, node_id):
if not response['Success']:
return f"[ERROR]\n{response['Code']}:{response['Message']}"
text = response['Data']['Text']
if node_id == '':
return text
# TODO: 当使用流程编排创建大模型应用时,响应结构如下,最终结果在['finalResult'][node_id]['response']['text']中,暂时先这么写
# {
# 'Success': True,
# 'Code': None,
# 'Message': None,
# 'Data': {
# 'ResponseId': '9822f38dbacf4c9b8daf5ca03a2daf15',
# 'SessionId': 'session_id',
# 'Text': '{"finalResult":{"LLM_T7islK":{"params":{"modelId":"qwen-plus-v1","prompt":"${systemVars.query}${bizVars.Text}"},"response":{"text":"作为一个AI语言模型我没有年龄因为我没有生日。\n我只是一个程序没有生命和身体。"}}}}',
# 'Thoughts': [],
# 'Debug': {},
# 'DocReferences': []
# },
# 'RequestId': '8e11d31551ce4c3f83f49e6e0dd998b0',
# 'Failed': None
# }
text_dict = json.loads(text)
completion_content = text_dict['finalResult'][node_id]['response']['text']
return completion_content
def calc_tokens(self, messages, completion_content):
completion_tokens = len(completion_content)
prompt_tokens = 0
for message in messages:
prompt_tokens += len(message["content"])
return completion_tokens, prompt_tokens + completion_tokens

View File

@@ -1,62 +0,0 @@
from models.session_manager import Session
from common.log import logger
"""
e.g.
[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the world series in 2020?"},
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
{"role": "user", "content": "Where was it played?"}
]
"""
class AliQwenSession(Session):
def __init__(self, session_id, system_prompt=None, model="qianwen"):
super().__init__(session_id, system_prompt)
self.model = model
self.reset()
def discard_exceeding(self, max_tokens, cur_tokens=None):
precise = True
try:
cur_tokens = self.calc_tokens()
except Exception as e:
precise = False
if cur_tokens is None:
raise e
logger.debug("Exception when counting tokens precisely for query: {}".format(e))
while cur_tokens > max_tokens:
if len(self.messages) > 2:
self.messages.pop(1)
elif len(self.messages) == 2 and self.messages[1]["role"] == "assistant":
self.messages.pop(1)
if precise:
cur_tokens = self.calc_tokens()
else:
cur_tokens = cur_tokens - max_tokens
break
elif len(self.messages) == 2 and self.messages[1]["role"] == "user":
logger.warn("user message exceed max_tokens. total_tokens={}".format(cur_tokens))
break
else:
logger.debug("max_tokens={}, total_tokens={}, len(messages)={}".format(max_tokens, cur_tokens, len(self.messages)))
break
if precise:
cur_tokens = self.calc_tokens()
else:
cur_tokens = cur_tokens - max_tokens
return cur_tokens
def calc_tokens(self):
return num_tokens_from_messages(self.messages, self.model)
def num_tokens_from_messages(messages, model):
"""Returns the number of tokens used by a list of messages."""
# 官方token计算规则"对于中文文本来说1个token通常对应一个汉字对于英文文本来说1个token通常对应3至4个字母或1个单词"
# 详情请产看文档https://help.aliyun.com/document_detail/2586397.html
# 目前根据字符串长度粗略估计token数不影响正常使用
tokens = 0
for msg in messages:
tokens += len(msg["content"])
return tokens

View File

@@ -2,12 +2,27 @@
Auto-replay chat robot abstract class
"""
from bridge.context import Context
from bridge.reply import Reply
class Bot(object):
"""
Base class for all chat-bot implementations.
Subclasses may also implement:
call_with_tools(messages, tools=None, stream=False, **kwargs)
-> dict | generator (OpenAI-compatible format)
call_vision(image_url, question, model=None, max_tokens=1000)
-> dict with keys: model, content, usage (or error/message)
These are NOT defined here to avoid shadowing concrete implementations
provided by mixin classes (e.g. OpenAICompatibleBot) in the MRO.
Use ``hasattr(bot, 'call_vision')`` to detect support at runtime.
"""
def reply(self, query, context: Context = None) -> Reply:
"""
bot auto-reply content

View File

@@ -46,10 +46,7 @@ def create_bot(bot_type):
elif bot_type == const.CLAUDEAPI:
from models.claudeapi.claude_api_bot import ClaudeAPIBot
return ClaudeAPIBot()
elif bot_type == const.QWEN:
from models.ali.ali_qwen_bot import AliQwenBot
return AliQwenBot()
elif bot_type == const.QWEN_DASHSCOPE:
elif bot_type in (const.QWEN, const.QWEN_DASHSCOPE):
from models.dashscope.dashscope_bot import DashscopeBot
return DashscopeBot()
elif bot_type == const.GEMINI:

View File

@@ -1,7 +1,10 @@
# encoding:utf-8
import base64
import json
import re
import time
from typing import Optional
import requests
@@ -224,6 +227,79 @@ class ClaudeAPIBot(Bot, OpenAIImage):
return 64000
return 8192
@staticmethod
def _parse_data_url(data_url: str):
"""Parse a data:<mime>;base64,<data> URL into (media_type, base64_data)."""
m = re.match(r"^data:([^;]+);base64,(.+)$", data_url, re.DOTALL)
if m:
return m.group(1), m.group(2)
return None, None
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using Claude Messages API (native image blocks)."""
try:
actual_model = model or self._model_mapping(conf().get("model"))
# Build Claude-native image content block
if image_url.startswith("data:"):
media_type, b64_data = self._parse_data_url(image_url)
if not b64_data:
return {"error": True, "message": "Invalid base64 data URL"}
image_block = {
"type": "image",
"source": {"type": "base64",
"media_type": media_type or "image/jpeg",
"data": b64_data},
}
else:
image_block = {
"type": "image",
"source": {"type": "url", "url": image_url},
}
data = {
"model": actual_model,
"max_tokens": max_tokens,
"messages": [{
"role": "user",
"content": [
image_block,
{"type": "text", "text": question},
],
}],
}
headers = {
"x-api-key": self.api_key,
"anthropic-version": "2023-06-01",
"content-type": "application/json",
}
proxies = {"http": self.proxy, "https": self.proxy} if self.proxy else None
resp = requests.post(f"{self.api_base}/messages",
headers=headers, json=data, proxies=proxies)
if resp.status_code != 200:
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
body = resp.json()
text_parts = [b.get("text", "") for b in body.get("content", [])
if b.get("type") == "text"]
usage = body.get("usage", {})
return {
"model": actual_model,
"content": "".join(text_parts),
"usage": {
"prompt_tokens": usage.get("input_tokens", 0),
"completion_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("input_tokens", 0) + usage.get("output_tokens", 0),
},
}
except Exception as e:
logger.error(f"[CLAUDE] call_vision error: {e}")
return {"error": True, "message": str(e)}
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
"""
Call Claude API with tool support for agent integration

View File

@@ -1,6 +1,8 @@
# encoding:utf-8
import json
from typing import Optional
from models.bot import Bot
from models.session_manager import SessionManager
from bridge.context import ContextType
@@ -26,15 +28,15 @@ dashscope_models = {
# Model name prefixes that require MultiModalConversation API instead of Generation API.
# Qwen3.5+ series are omni models that only support MultiModalConversation.
MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-",)
MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-", "qwen3.6-")
# Qwen对话模型API
class DashscopeBot(Bot):
def __init__(self):
super().__init__()
self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen-plus")
self.model_name = conf().get("model") or "qwen-plus"
self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen3.6-plus")
self.model_name = conf().get("model") or "qwen3.6-plus"
self.client = dashscope.Generation
api_key = conf().get("dashscope_api_key")
if api_key:
@@ -153,6 +155,56 @@ class DashscopeBot(Bot):
else:
return result
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using DashScope MultiModalConversation API."""
try:
dashscope.api_key = self.api_key
vision_model = model or "qwen-vl-max"
# DashScope multimodal format: {"image": url} + {"text": question}
messages = [{
"role": "user",
"content": [
{"image": image_url},
{"text": question},
],
}]
response = MultiModalConversation.call(
model=vision_model,
messages=messages,
max_tokens=max_tokens,
)
if response.status_code != HTTPStatus.OK:
return {
"error": True,
"message": f"{response.code} - {response.message}",
}
resp_dict = self._response_to_dict(response)
choice = resp_dict["output"]["choices"][0]
content = choice.get("message", {}).get("content", "")
if isinstance(content, list):
content = "".join(
item.get("text", "") for item in content if isinstance(item, dict)
)
usage = resp_dict.get("usage", {})
return {
"model": vision_model,
"content": content,
"usage": {
"prompt_tokens": usage.get("input_tokens", 0),
"completion_tokens": usage.get("output_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
except Exception as e:
logger.error(f"[DASHSCOPE] call_vision error: {e}")
return {"error": True, "message": str(e)}
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
"""
Call DashScope API with tool support for agent integration

View File

@@ -2,6 +2,7 @@
import json
import time
from typing import Optional
import requests
from models.bot import Bot
@@ -147,6 +148,49 @@ class DoubaoBot(Bot):
else:
return result
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using Doubao (Volcengine Ark) OpenAI-compatible API."""
try:
vision_model = model or self.args.get("model", "doubao-seed-2-0-pro-260215")
payload = {
"model": vision_model,
"max_tokens": max_tokens,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image_url", "image_url": {"url": image_url}},
],
}],
}
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
resp = requests.post(f"{self.base_url}/chat/completions",
headers=headers, json=payload, timeout=60)
if resp.status_code != 200:
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
data = resp.json()
if "error" in data:
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
usage = data.get("usage", {})
return {
"model": vision_model,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
except Exception as e:
logger.error(f"[DOUBAO] call_vision error: {e}")
return {"error": True, "message": str(e)}
# ==================== Agent mode support ====================
def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
@@ -434,31 +478,37 @@ class DoubaoBot(Bot):
continue
if role == "user":
text_parts = []
tool_results = []
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
)
if has_tool_result:
text_parts = []
tool_results = []
for block in content:
if not isinstance(block, dict):
continue
if block.get("type") == "text":
text_parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
tool_call_id = block.get("tool_use_id") or ""
result_content = block.get("content", "")
if not isinstance(result_content, str):
result_content = json.dumps(result_content, ensure_ascii=False)
tool_results.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content
})
for block in content:
if not isinstance(block, dict):
continue
if block.get("type") == "text":
text_parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
tool_call_id = block.get("tool_use_id") or ""
result_content = block.get("content", "")
if not isinstance(result_content, str):
result_content = json.dumps(result_content, ensure_ascii=False)
tool_results.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content
})
# Tool results first (must come right after assistant with tool_calls)
for tr in tool_results:
converted.append(tr)
for tr in tool_results:
converted.append(tr)
if text_parts:
converted.append({"role": "user", "content": "\n".join(text_parts)})
if text_parts:
converted.append({"role": "user", "content": "\n".join(text_parts)})
else:
# Keep as-is for multimodal content (e.g. image_url blocks)
converted.append(msg)
elif role == "assistant":
openai_msg = {"role": "assistant"}

View File

@@ -12,6 +12,8 @@ import mimetypes
import os
import re
import time
from typing import Optional
import requests
from models.bot import Bot
from models.session_manager import SessionManager
@@ -144,7 +146,12 @@ class GoogleGeminiBot(Bot):
return "", []
pattern = r"\[图片:\s*([^\]]+)\]"
image_paths = [m.strip().strip("'\"") for m in re.findall(pattern, content) if m.strip()]
cleaned_text = re.sub(pattern, "", content)
# Replace markers with path-only hints so the model still knows the
# original file location (needed when it calls tools like vision).
def _replace_with_hint(m):
path = m.group(1).strip().strip("'\"")
return f"[attached image: {path}]"
cleaned_text = re.sub(pattern, _replace_with_hint, content)
cleaned_text = re.sub(r"\n{3,}", "\n\n", cleaned_text).strip()
return cleaned_text, image_paths
@@ -225,6 +232,57 @@ class GoogleGeminiBot(Bot):
logger.warning(f"[Gemini] Unsupported image URL format: {image_url[:120]}")
return None
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using Gemini REST API."""
try:
model_name = model or self.model or "gemini-2.0-flash"
image_part = self._build_inline_part_from_image_url({"url": image_url})
if not image_part:
return {"error": True, "message": f"Cannot process image URL: {image_url[:120]}"}
payload = {
"contents": [{
"role": "user",
"parts": [image_part, {"text": question}],
}],
"generationConfig": {"maxOutputTokens": max_tokens},
"safetySettings": [
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
],
}
endpoint = f"{self.api_base}/v1beta/models/{model_name}:generateContent"
headers = {"x-goog-api-key": self.api_key, "Content-Type": "application/json"}
resp = requests.post(endpoint, headers=headers, json=payload, timeout=60)
if resp.status_code != 200:
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
body = resp.json()
candidates = body.get("candidates", [])
text_parts = []
for part in candidates[0].get("content", {}).get("parts", []) if candidates else []:
if "text" in part:
text_parts.append(part["text"])
usage_meta = body.get("usageMetadata", {})
return {
"model": model_name,
"content": "".join(text_parts),
"usage": {
"prompt_tokens": usage_meta.get("promptTokenCount", 0),
"completion_tokens": usage_meta.get("candidatesTokenCount", 0),
"total_tokens": usage_meta.get("totalTokenCount", 0),
},
}
except Exception as e:
logger.error(f"[Gemini] call_vision error: {e}")
return {"error": True, "message": str(e)}
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
"""
Call Gemini API with tool support using REST API (following official docs)

View File

@@ -2,6 +2,8 @@
import time
import json
from typing import Optional
import requests
from models.bot import Bot
@@ -175,6 +177,51 @@ class MinimaxBot(Bot):
else:
return result
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using MiniMax OpenAI-compatible API.
Always uses MiniMax-Text-01 — other MiniMax models do not support vision.
"""
try:
vision_model = "MiniMax-Text-01"
payload = {
"model": vision_model,
"max_tokens": max_tokens,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image_url", "image_url": {"url": image_url}},
],
}],
}
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
resp = requests.post(f"{self.api_base}/chat/completions",
headers=headers, json=payload, timeout=60)
if resp.status_code != 200:
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
data = resp.json()
if "error" in data:
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
usage = data.get("usage", {})
return {
"model": vision_model,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
except Exception as e:
logger.error(f"[MINIMAX] call_vision error: {e}")
return {"error": True, "message": str(e)}
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
"""
Call MiniMax API with tool support for agent integration
@@ -273,37 +320,41 @@ class MinimaxBot(Bot):
if role == "user":
# Handle user message
if isinstance(content, list):
# Extract text from content blocks
text_parts = []
tool_results = []
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
)
if has_tool_result:
text_parts = []
tool_results = []
for block in content:
if isinstance(block, dict):
if block.get("type") == "text":
text_parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
# Tool result should be a separate message with role="tool"
tool_call_id = block.get("tool_use_id") or ""
if not tool_call_id:
logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
result_content = block.get("content", "")
if not isinstance(result_content, str):
result_content = json.dumps(result_content, ensure_ascii=False)
tool_results.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content
})
for block in content:
if isinstance(block, dict):
if block.get("type") == "text":
text_parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
tool_call_id = block.get("tool_use_id") or ""
if not tool_call_id:
logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
result_content = block.get("content", "")
if not isinstance(result_content, str):
result_content = json.dumps(result_content, ensure_ascii=False)
tool_results.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content
})
if text_parts:
converted.append({
"role": "user",
"content": "\n".join(text_parts)
})
if text_parts:
converted.append({
"role": "user",
"content": "\n".join(text_parts)
})
# Add all tool results (not just the last one)
for tool_result in tool_results:
converted.append(tool_result)
for tool_result in tool_results:
converted.append(tool_result)
else:
# Keep as-is for multimodal content (e.g. image_url blocks)
converted.append(msg)
else:
# Simple text content
converted.append({

View File

@@ -2,6 +2,7 @@
import json
import time
from typing import Optional
import requests
from models.bot import Bot
@@ -147,6 +148,49 @@ class MoonshotBot(Bot):
else:
return result
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using Moonshot (Kimi) OpenAI-compatible API."""
try:
vision_model = model or self.args.get("model", "kimi-k2.5")
payload = {
"model": vision_model,
"max_tokens": max_tokens,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image_url", "image_url": {"url": image_url}},
],
}],
}
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
resp = requests.post(f"{self.base_url}/chat/completions",
headers=headers, json=payload, timeout=60)
if resp.status_code != 200:
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
data = resp.json()
if "error" in data:
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
usage = data.get("usage", {})
return {
"model": vision_model,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
except Exception as e:
logger.error(f"[MOONSHOT] call_vision error: {e}")
return {"error": True, "message": str(e)}
# ==================== Agent mode support ====================
def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
@@ -435,31 +479,37 @@ class MoonshotBot(Bot):
continue
if role == "user":
text_parts = []
tool_results = []
has_tool_result = any(
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
)
if has_tool_result:
text_parts = []
tool_results = []
for block in content:
if not isinstance(block, dict):
continue
if block.get("type") == "text":
text_parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
tool_call_id = block.get("tool_use_id") or ""
result_content = block.get("content", "")
if not isinstance(result_content, str):
result_content = json.dumps(result_content, ensure_ascii=False)
tool_results.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content
})
for block in content:
if not isinstance(block, dict):
continue
if block.get("type") == "text":
text_parts.append(block.get("text", ""))
elif block.get("type") == "tool_result":
tool_call_id = block.get("tool_use_id") or ""
result_content = block.get("content", "")
if not isinstance(result_content, str):
result_content = json.dumps(result_content, ensure_ascii=False)
tool_results.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": result_content
})
# Tool results first (must come right after assistant with tool_calls)
for tr in tool_results:
converted.append(tr)
for tr in tool_results:
converted.append(tr)
if text_parts:
converted.append({"role": "user", "content": "\n".join(text_parts)})
if text_parts:
converted.append({"role": "user", "content": "\n".join(text_parts)})
else:
# Keep as-is for multimodal content (e.g. image_url blocks)
converted.append(msg)
elif role == "assistant":
openai_msg = {"role": "assistant"}

View File

@@ -9,6 +9,8 @@ This includes: OpenAI, LinkAI, Azure OpenAI, and many third-party providers.
import json
import openai
import requests
from typing import Optional
from common.log import logger
from agent.protocol.message_utils import drop_orphaned_tool_results_openai
@@ -306,3 +308,51 @@ class OpenAICompatibleBot:
openai_messages.append(msg)
return drop_orphaned_tool_results_openai(openai_messages)
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using the OpenAI-compatible /chat/completions endpoint."""
try:
api_config = self.get_api_config()
vision_model = model or api_config.get("model", "gpt-4o")
api_key = api_config.get("api_key", "")
api_base = (api_config.get("api_base") or "https://api.openai.com/v1").rstrip("/")
payload = {
"model": vision_model,
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image_url", "image_url": {"url": image_url}},
],
}],
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
resp = requests.post(
f"{api_base}/chat/completions",
headers=headers, json=payload, timeout=60,
)
if resp.status_code != 200:
body = resp.text[:500]
logger.error(f"[{self.__class__.__name__}] call_vision HTTP {resp.status_code}: {body}")
return {"error": True, "message": f"HTTP {resp.status_code}: {body}"}
data = resp.json()
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
usage = data.get("usage", {})
return {
"model": vision_model,
"content": content,
"usage": {
"prompt_tokens": usage.get("prompt_tokens", 0),
"completion_tokens": usage.get("completion_tokens", 0),
"total_tokens": usage.get("total_tokens", 0),
},
}
except Exception as e:
logger.error(f"[{self.__class__.__name__}] call_vision error: {e}")
return {"error": True, "message": str(e)}

View File

@@ -2,6 +2,7 @@
import time
import json
from typing import Optional
from models.bot import Bot
from models.zhipuai.zhipu_ai_session import ZhipuAISession
@@ -149,6 +150,40 @@ class ZHIPUAIBot(Bot, ZhipuAIImage):
else:
return result
def call_vision(self, image_url: str, question: str,
model: Optional[str] = None,
max_tokens: int = 1000) -> dict:
"""Analyze an image using ZhipuAI OpenAI-compatible SDK.
Always uses glm-5v-turbo — the text models (glm-5-turbo etc.) do not support vision.
"""
try:
vision_model = "glm-5v-turbo"
response = self.client.chat.completions.create(
model=vision_model,
max_tokens=max_tokens,
messages=[{
"role": "user",
"content": [
{"type": "text", "text": question},
{"type": "image_url", "image_url": {"url": image_url}},
],
}],
)
content = response.choices[0].message.content or ""
usage = response.usage
return {
"model": vision_model,
"content": content,
"usage": {
"prompt_tokens": getattr(usage, "prompt_tokens", 0),
"completion_tokens": getattr(usage, "completion_tokens", 0),
"total_tokens": getattr(usage, "total_tokens", 0),
},
}
except Exception as e:
logger.error(f"[ZHIPU_AI] call_vision error: {e}")
return {"error": True, "message": str(e)}
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
"""
Call ZhipuAI API with tool support for agent integration

View File

@@ -157,7 +157,6 @@ class CowCliPlugin(Plugin):
" /config 查看当前配置",
" /config <key> 查看某项配置",
" /config <key> <val> 修改配置",
" /install-browser 安装浏览器工具依赖",
"",
"💡 也可以用 cow <command> 代替 /<command>",
]
@@ -407,7 +406,7 @@ class CowCliPlugin(Plugin):
from common import const
_EXACT = {
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
const.MODELSCOPE: const.MODELSCOPE,
const.MOONSHOT: const.MOONSHOT,
"moonshot-v1-8k": const.MOONSHOT, "moonshot-v1-32k": const.MOONSHOT,
@@ -599,7 +598,7 @@ class CowCliPlugin(Plugin):
page = min(page, total_pages)
installed = set(load_skills_config().keys())
lines = [f"🌐 技能广场 (共 {total} 个技能)", ""]
lines = ["🌐 技能广场", ""]
for s in skills:
name = s.get("name", "")
display = s.get("display_name", "") or name
@@ -621,6 +620,7 @@ class CowCliPlugin(Plugin):
lines.append(f"💡 /skill list --remote --page {page - 1} 上一页")
lines.append("💡 /skill install <名称> 安装技能")
lines.append("💡 /skill search <关键词> 搜索技能")
lines.append("🌐 https://skills.cowagent.ai 在线浏览全部技能")
return "\n".join(lines)
def _skill_search(self, query: str) -> str:
@@ -695,11 +695,12 @@ class CowCliPlugin(Plugin):
lines = []
for skill_name in result.installed:
desc = _read_skill_description(os.path.join(skills_dir, skill_name))
display = config.get(skill_name, {}).get("display_name", "") or skill_name
label = f"{display} ({skill_name})" if display != skill_name else skill_name
lines.append(f"✅ 技能安装成功:{label}")
display = config.get(skill_name, {}).get("display_name", "")
lines.append(f"✅ 技能安装成功:{skill_name}")
if display and display != skill_name:
lines.append(f" 名称:{display}")
if desc:
lines.append(f" {desc}")
lines.append(f" 描述:{desc}")
if len(result.installed) > 1:
lines.append(f"\n共安装 {len(result.installed)} 个技能")

View File

@@ -315,7 +315,7 @@ class Godcmd(Plugin):
except Exception as e:
ok, result = False, "你没有设置私有GPT模型"
elif cmd == "reset":
if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI, const.BAIDU, const.XUNFEI, const.QWEN, const.GEMINI, const.ZHIPU_AI, const.CLAUDEAPI]:
if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI, const.BAIDU, const.XUNFEI, const.QWEN, const.QWEN_DASHSCOPE, const.GEMINI, const.ZHIPU_AI, const.CLAUDEAPI]:
bot.sessions.clear_session(session_id)
if Bridge().chat_bots.get(bottype):
Bridge().chat_bots.get(bottype).sessions.clear_session(session_id)
@@ -341,7 +341,7 @@ class Godcmd(Plugin):
ok, result = True, "配置已重载"
elif cmd == "resetall":
if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI,
const.BAIDU, const.XUNFEI, const.QWEN, const.GEMINI, const.ZHIPU_AI, const.MOONSHOT,
const.BAIDU, const.XUNFEI, const.QWEN, const.QWEN_DASHSCOPE, const.GEMINI, const.ZHIPU_AI, const.MOONSHOT,
const.MODELSCOPE]:
channel.cancel_all_session()
bot.sessions.clear_all_session()

View File

@@ -4,9 +4,9 @@ build-backend = "setuptools.build_meta"
[project]
name = "cowagent"
version = "0.0.1"
version = "1.0.0"
description = "CowAgent - AI Agent on WeChat and more"
requires-python = ">=3.9"
requires-python = ">=3.7"
dependencies = [
"click>=8.0",
"requests>=2.28.2",

View File

@@ -4,8 +4,6 @@ requests>=2.28.2
chardet>=5.1.0
Pillow
web.py
linkai>=0.0.6.0
agentmesh-sdk>=0.1.3
python-dotenv>=1.0.0
PyYAML>=6.0
croniter>=2.0.0

4
run.sh
View File

@@ -271,7 +271,7 @@ select_model() {
echo -e "${YELLOW}2) Zhipu AI (glm-5-turbo, glm-5, etc.)${NC}"
echo -e "${YELLOW}3) Kimi (kimi-k2.5, kimi-k2, etc.)${NC}"
echo -e "${YELLOW}4) Doubao (doubao-seed-2-0-code-preview-260215, etc.)${NC}"
echo -e "${YELLOW}5) Qwen (qwen3.5-plus, qwen3-max, qwq-plus, etc.)${NC}"
echo -e "${YELLOW}5) Qwen (qwen3.6-plus, qwen3.5-plus, qwen3-max, qwq-plus, etc.)${NC}"
echo -e "${YELLOW}6) Claude (claude-sonnet-4-6, claude-opus-4-6, etc.)${NC}"
echo -e "${YELLOW}7) Gemini (gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, etc.)${NC}"
echo -e "${YELLOW}8) OpenAI GPT (gpt-5.4, gpt-5.2, gpt-4.1, etc.)${NC}"
@@ -318,7 +318,7 @@ configure_model() {
2) read_model_config "Zhipu AI" "glm-5-turbo" "ZHIPU_KEY" ;;
3) read_model_config "Kimi (Moonshot)" "kimi-k2.5" "MOONSHOT_KEY" ;;
4) read_model_config "Doubao (Volcengine Ark)" "doubao-seed-2-0-code-preview-260215" "ARK_KEY" ;;
5) read_model_config "Qwen (DashScope)" "qwen3.5-plus" "DASHSCOPE_KEY" ;;
5) read_model_config "Qwen (DashScope)" "qwen3.6-plus" "DASHSCOPE_KEY" ;;
6)
read_model_config "Claude" "claude-sonnet-4-6" "CLAUDE_KEY"
read_api_base "CLAUDE_BASE" "https://api.anthropic.com/v1"

View File

@@ -154,7 +154,7 @@ $ModelChoices = @{
"2" = @{ Provider = "Zhipu AI"; Default = "glm-5-turbo"; Key = "ZHIPU_KEY" }
"3" = @{ Provider = "Kimi (Moonshot)"; Default = "kimi-k2.5"; Key = "MOONSHOT_KEY" }
"4" = @{ Provider = "Doubao (Volcengine Ark)"; Default = "doubao-seed-2-0-code-preview-260215"; Key = "ARK_KEY" }
"5" = @{ Provider = "Qwen (DashScope)"; Default = "qwen3.5-plus"; Key = "DASHSCOPE_KEY" }
"5" = @{ Provider = "Qwen (DashScope)"; Default = "qwen3.6-plus"; Key = "DASHSCOPE_KEY" }
"6" = @{ Provider = "Claude"; Default = "claude-sonnet-4-6"; Key = "CLAUDE_KEY"; Base = "https://api.anthropic.com/v1" }
"7" = @{ Provider = "Gemini"; Default = "gemini-3.1-pro-preview"; Key = "GEMINI_KEY"; Base = "https://generativelanguage.googleapis.com" }
"8" = @{ Provider = "OpenAI GPT"; Default = "gpt-5.4"; Key = "OPENAI_KEY"; Base = "https://api.openai.com/v1" }
@@ -169,7 +169,7 @@ function Select-Model {
Write-Host "2) Zhipu AI (glm-5-turbo, glm-5, etc.)"
Write-Host "3) Kimi (kimi-k2.5, kimi-k2, etc.)"
Write-Host "4) Doubao (doubao-seed-2-0-code-preview-260215, etc.)"
Write-Host "5) Qwen (qwen3.5-plus, qwen3-max, qwq-plus, etc.)"
Write-Host "5) Qwen (qwen3.6-plus, qwen3.5-plus, qwen3-max, qwq-plus, etc.)"
Write-Host "6) Claude (claude-sonnet-4-6, claude-opus-4-6, etc.)"
Write-Host "7) Gemini (gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, etc.)"
Write-Host "8) OpenAI GPT (gpt-5.4, gpt-5.2, gpt-4.1, etc.)"
@@ -453,7 +453,11 @@ function Update-Project {
Assert-Python
Install-Dependencies
Start-CowAgent
# Start via python -m cli.cli instead of cow.exe, because the exe may
# still be cached/locked from the previous installation on Windows.
Write-Cow "Starting CowAgent..."
& $PythonCmd -m cli.cli start
}
# ── main ──────────────────────────────────────────────────────────

View File

@@ -1,124 +1,89 @@
# Skills Directory
# Skills
This directory contains skills for the COW agent system. Skills are markdown files that provide specialized instructions for specific tasks.
Skills are reusable instruction sets that extend the agent's capabilities. Each skill is a `SKILL.md` file in its own directory, providing specialized knowledge, workflows, and tool integrations for specific tasks.
## What are Skills?
## Skill Hub
Skills are reusable instruction sets that help the agent perform specific tasks more effectively. Each skill:
Browse, search, and install skills from [Cow Skill Hub](https://skills.cowagent.ai/).
- Provides context-specific guidance
- Documents best practices
- Includes examples and usage patterns
- Can have requirements (binaries, environment variables, etc.)
Open source: [github.com/zhayujie/cow-skill-hub](https://github.com/zhayujie/cow-skill-hub)
## Install Skills
Install skills from multiple sources via chat (`/skill`) or terminal (`cow skill`):
```bash
/skill install <name> # From Skill Hub
/skill install <owner>/<repo> # From GitHub
/skill install clawhub:<name> # From ClawHub
/skill install linkai:<code> # From LinkAI
/skill install <url> # From URL (zip or SKILL.md)
```
List all available remote skills:
```bash
/skill list --remote
```
## Manage Skills
```bash
/skill list # List installed skills
/skill info <name> # View skill details
/skill enable <name> # Enable a skill
/skill disable <name> # Disable a skill
/skill uninstall <name> # Uninstall a skill
```
> In terminal, replace `/skill` with `cow skill`.
## Skill Structure
Each skill is a markdown file (`SKILL.md`) in its own directory with frontmatter:
```
skills/
my-skill/
SKILL.md # Required: skill definition
scripts/ # Optional: bundled scripts
resources/ # Optional: reference files
```
`SKILL.md` uses YAML frontmatter:
```markdown
---
name: skill-name
name: my-skill
description: Brief description of what the skill does
metadata: {"cow":{"emoji":"🎯","requires":{"bins":["tool"]}}}
metadata: {"cow":{"emoji":"🔧","requires":{"bins":["tool"],"env":["API_KEY"]}}}
---
# Skill Name
# My Skill
Detailed instructions and examples...
Instructions, examples, and usage patterns...
```
## Available Skills
- **calculator**: Mathematical calculations and expressions
- **web-search**: Search the web for current information
- **file-operations**: Read, write, and manage files
## Creating Custom Skills
To create a new skill:
1. Create a directory: `skills/my-skill/`
2. Create `SKILL.md` with frontmatter and content
3. Restart the agent to load the new skill
### Frontmatter Fields
- `name`: Skill name (must match directory name)
- `description`: Brief description (required)
- `metadata`: JSON object with additional configuration
- `emoji`: Display emoji
- `always`: Always include this skill (default: false)
- `primaryEnv`: Primary environment variable needed
- `os`: Supported operating systems (e.g., ["darwin", "linux"])
- `requires`: Requirements object
- `bins`: Required binaries
- `env`: Required environment variables
- `config`: Required config paths
- `disable-model-invocation`: If true, skill won't be shown to model (default: false)
- `user-invocable`: If false, users can't invoke directly (default: true)
| Field | Description |
|---|---|
| `name` | Skill name (must match directory name) |
| `description` | Brief description (required) |
| `metadata.cow.emoji` | Display emoji |
| `metadata.cow.always` | Always include this skill (default: false) |
| `metadata.cow.requires.bins` | Required binaries |
| `metadata.cow.requires.env` | Required environment variables |
| `metadata.cow.requires.config` | Required config paths |
| `metadata.cow.os` | Supported OS (e.g., `["darwin", "linux"]`) |
### Example Skill
## Skill Loading Order
```markdown
---
name: my-tool
description: Use my-tool to process data
metadata: {"cow":{"emoji":"🔧","requires":{"bins":["my-tool"],"env":["MY_TOOL_API_KEY"]}}}
---
Skills are loaded from two locations (higher precedence overrides lower):
# My Tool Skill
1. **Builtin skills** (lower): `<project_root>/skills/` — shipped with the codebase
2. **Custom skills** (higher): `~/cow/skills/` — installed via `cow skill install` or skill creator
Use this skill when you need to process data with my-tool.
Skills with the same name in the custom directory override builtin ones.
## Prerequisites
## Create & Contribute
- Install my-tool: `pip install my-tool`
- Set `MY_TOOL_API_KEY` environment variable
## Usage
\`\`\`python
# Example usage
my_tool_command("input data")
\`\`\`
```
## Skill Loading
Skills are loaded from multiple locations with precedence:
1. **Workspace skills** (highest): `workspace/skills/` - Project-specific skills
2. **Managed skills**: `~/.cow/skills/` - User-installed skills
3. **Bundled skills** (lowest): Built-in skills
Skills with the same name in higher-precedence locations override lower ones.
## Skill Requirements
Skills can specify requirements that determine when they're available:
- **OS requirements**: Only load on specific operating systems
- **Binary requirements**: Only load if required binaries are installed
- **Environment variables**: Only load if required env vars are set
- **Config requirements**: Only load if config values are set
## Best Practices
1. **Clear descriptions**: Write clear, concise skill descriptions
2. **Include examples**: Provide practical usage examples
3. **Document prerequisites**: List all requirements clearly
4. **Use appropriate metadata**: Set correct requirements and flags
5. **Keep skills focused**: Each skill should have a single, clear purpose
## Workspace Skills
You can create workspace-specific skills in your agent's workspace:
```
workspace/
skills/
custom-skill/
SKILL.md
```
These skills are only available when working in that specific workspace.
See the [Skill Creation docs](https://docs.cowagent.ai/skills/create) for details, or submit your skill to [Skill Hub](https://skills.cowagent.ai/submit).

View File

@@ -1,258 +0,0 @@
# LinkAI Agent Skill
这个 skill 允许你调用 LinkAI 平台上的多个应用(App)和工作流(Workflow),通过简单的配置即可集成多个智能体能力。
## 特性
-**多应用支持** - 在一个配置文件中管理多个 LinkAI 应用/工作流
-**动态加载** - skill 系统加载时自动从 `config.json` 读取应用列表
-**自动技能描述** - 所有配置的应用会自动添加到技能描述中
-**模型切换** - 可以为每个请求指定不同的模型
-**知识库集成** - 支持应用绑定的知识库
-**插件能力** - 支持应用启用的各类插件
-**工作流执行** - 支持执行复杂的多步骤工作流
## 快速开始
### 1. 配置 API Key
```bash
env_config(action="set", key="LINKAI_API_KEY", value="your-linkai-api-key")
```
获取 API Key: https://link-ai.tech/console/interface
### 2. 配置应用列表
`config.json.template` 复制为 `config.json`
```bash
cp config.json.template config.json
```
编辑 `config.json`,添加你的应用/工作流:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "通用助手",
"app_description": "通用AI助手可以回答各类问题"
},
{
"app_code": "your_kb_app",
"app_name": "产品文档助手",
"app_description": "基于产品文档知识库的问答助手"
},
{
"app_code": "your_workflow",
"app_name": "数据分析工作流",
"app_description": "执行数据清洗、分析和可视化的完整工作流"
}
]
}
```
**注意:** 修改 `config.json`Agent 在下次加载技能时会自动读取新配置。
### 3. 调用应用
```bash
bash(command='curl -sS --max-time 120 -X POST "https://api.link-ai.tech/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer $LINKAI_API_KEY" -d "{\"app_code\":\"G7z6vKwp\",\"messages\":[{\"role\":\"user\",\"content\":\"What is artificial intelligence?\"}],\"stream\":false}"', timeout=130)
```
## 使用示例
### 基础调用
```bash
# 调用默认模型 (通过 bash + curl)
bash(command='curl -sS --max-time 120 -X POST "https://api.link-ai.tech/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer $LINKAI_API_KEY" -d "{\"app_code\":\"G7z6vKwp\",\"messages\":[{\"role\":\"user\",\"content\":\"解释一下量子计算\"}],\"stream\":false}"', timeout=130)
```
### 指定模型
在 JSON body 中添加 `model` 字段:
```json
{
"app_code": "G7z6vKwp",
"model": "LinkAI-4.1",
"messages": [{"role": "user", "content": "写一篇关于AI的文章"}],
"stream": false
}
```
### 调用工作流
工作流的 app_code 从 LinkAI 控制台获取,调用方式与普通应用相同。
## ⚠️ 重要提示
### 超时配置
LinkAI 应用(特别是视频/图片生成、复杂工作流)可能需要较长时间处理。在 curl 命令中加入 `--max-time 180`,并相应增加 bash 工具的 `timeout` 参数。
## 配置说明
### config.json 字段
| 字段 | 类型 | 说明 |
|------|------|------|
| `app_code` | string | 应用或工作流的唯一标识码,从 LinkAI 控制台获取 |
| `app_name` | string | 应用名称,会显示在技能描述中 |
| `app_description` | string | 应用功能描述,帮助 Agent 理解何时使用该应用 |
### 获取 app_code
1. 登录 [LinkAI 控制台](https://link-ai.tech/console)
2. 进入「应用管理」或「工作流管理」
3. 选择要集成的应用/工作流
4. 在应用详情页找到 `app_code`
## 应用类型
### 1. 普通应用
配置了系统提示词和参数的标准对话应用,可以:
- 设置角色和性格
- 绑定知识库
- 启用插件(图像识别、网页搜索、代码执行等)
### 2. 知识库应用
基于特定知识库的问答应用,适合:
- 企业内部知识库
- 产品文档问答
- 客户支持
### 3. 工作流
多步骤的自动化流程,可以:
- 串联多个处理节点
- 条件分支
- 循环处理
- 调用外部 API
## 响应格式
### 成功响应
API 返回 OpenAI 兼容格式,从 `choices[0].message.content` 获取回复内容:
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": "人工智能AI是计算机科学的一个分支..."
}
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 150,
"total_tokens": 160
}
}
```
### 错误响应
```json
{
"error": {
"message": "应用不存在",
"code": "xxx"
}
}
```
## 常见错误
### LINKAI_API_KEY environment variable is not set
**原因:** 未配置 API Key
**解决:** 使用 `env_config` 工具设置 LINKAI_API_KEY
### 应用不存在 (402)
**原因:** app_code 不正确或应用已删除
**解决:** 检查 app_code 是否正确,确认应用存在
### 无访问权限 (403)
**原因:** 尝试访问他人的私有应用
**解决:** 确保应用是公开的或你是创建者
### 账号积分额度不足 (406)
**原因:** LinkAI 账户余额不足
**解决:** 前往控制台充值
### 内容审核不通过 (409)
**原因:** 请求或响应包含敏感内容
**解决:** 修改输入内容,避免敏感词
## 技术实现
### 自动技能描述生成
当 skill 系统加载 `linkai-agent` 时,会自动:
1. 读取 `config.json` 中的应用列表
2. 将每个应用的 name 和 description 动态添加到技能描述中
3. Agent 加载时会看到完整的应用列表
这是在 `agent/skills/loader.py` 中实现的特殊处理。
### 工作流程
```
用户配置 config.json
Agent 启动/重新加载技能
SkillLoader 检测到 linkai-agent
动态读取 config.json
生成包含所有应用描述的 description
Agent 看到所有可用应用的完整信息
用户请求触发
Agent 根据描述选择合适的应用
通过 bash + curl 调用 LinkAI API
LinkAI API 处理并返回结果
```
## 最佳实践
1. **清晰的描述** - 为每个应用写清晰、具体的描述,帮助 Agent 理解应用用途
2. **合理分工** - 不同应用负责不同领域,避免功能重叠
3. **无需重启** - 修改 config.json 后Agent 下次加载技能时会自动更新
4. **模型选择** - 根据任务复杂度选择合适的模型
5. **知识库优化** - 为专业领域的应用绑定相关知识库
## 扩展用法
### 在 Agent 系统中使用
当 Agent 系统加载这个 skill 时,会自动从 `config.json` 读取应用列表并生成描述:
```
Call LinkAI apps/workflows. 通用助手(G7z6vKwp: 通用AI助手可以回答各类问题); 产品文档助手(kb_app_001: 基于产品文档知识库的问答助手); 数据分析工作流(wf_002: 执行数据清洗、分析和可视化的完整工作流)
```
Agent 会根据用户问题自动选择最合适的应用进行调用。
## 相关链接
- LinkAI 平台: https://link-ai.tech
- API 文档: https://docs.link-ai.tech
- 控制台: https://link-ai.tech/console
- 模型列表: https://link-ai.tech/console/models
- 应用广场: https://link-ai.tech/square
## License
Part of the chatgpt-on-wechat project.

View File

@@ -1,85 +0,0 @@
---
name: linkai-agent
description: Call LinkAI applications and workflows. Use bash with curl to invoke the chat completions API.
homepage: https://link-ai.tech
metadata:
emoji: 🤖
default_enabled: false
requires:
bins: ["curl"]
env: ["LINKAI_API_KEY"]
---
# LinkAI Agent
Call LinkAI applications and workflows through the chat completions API. Available apps are loaded from config.json.
## Setup
This skill requires a LinkAI API key.
1. Get your API key from [LinkAI Console](https://link-ai.tech/console/interface)
2. Set the environment variable: `export LINKAI_API_KEY=Link_xxxxxxxxxxxx` (or use env_config tool)
## Configuration
1. Copy `config.json.template` to `config.json`
2. Add your apps/workflows in config.json. The skill description is auto-generated from this config when loaded.
## Usage
Use the bash tool with curl to call the API. **Prefer curl** to avoid encoding issues on Windows PowerShell.
```bash
curl -X POST "https://api.link-ai.tech/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LINKAI_API_KEY" \
-d '{
"app_code": "<app_code>",
"messages": [{"role": "user", "content": "<question>"}],
"stream": false
}'
```
**Optional parameters**:
- Add `--max-time 120` to curl for long-running tasks (video/image generation)
**On Windows cmd**: Use `%LINKAI_API_KEY%` instead of `$LINKAI_API_KEY`.
**Example** (via bash tool):
```bash
bash(command='curl -sS --max-time 120 -X POST "https://api.link-ai.tech/v1/chat/completions" -H "Content-Type: application/json" -H "Authorization: Bearer $LINKAI_API_KEY" -d "{\"app_code\":\"G7z6vKwp\",\"messages\":[{\"role\":\"user\",\"content\":\"What is AI?\"}],\"stream\":false}"', timeout=130)
```
## Response
Success (extract `choices[0].message.content` from JSON):
```json
{
"choices": [{
"message": {
"role": "assistant",
"content": "AI stands for Artificial Intelligence..."
}
}],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 50,
"total_tokens": 60
}
}
```
Error:
```json
{
"error": {
"message": "Error description",
"code": "error_code"
}
}
```

View File

@@ -1,14 +0,0 @@
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "LinkAI客服助手",
"app_description": "当用户需要了解LinkAI平台相关问题时才选择该助手基于LinkAI知识库进行回答"
},
{
"app_code": "SFY5x7JR",
"app_name": "内容创作助手",
"app_description": "当用户需要创作图片或视频时才使用该助手支持Nano Banana、Seedream、即梦、Veo、可灵等多种模型"
}
]
}