mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 18:17:11 +08:00
Compare commits
36 Commits
2.0.5
...
feat-knowl
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
5162da5654 | ||
|
|
a1d82f6193 | ||
|
|
ea78e3d0c6 | ||
|
|
3497f00cb4 | ||
|
|
5355d45031 | ||
|
|
26693acc3f | ||
|
|
76e9fef3b2 | ||
|
|
c34308cbd4 | ||
|
|
5a10476010 | ||
|
|
46e80dceec | ||
|
|
90d1835353 | ||
|
|
845fadd0aa | ||
|
|
5748ded52c | ||
|
|
6a737fb734 | ||
|
|
3cd92ccda3 | ||
|
|
54e81aba11 | ||
|
|
d86cb4ded6 | ||
|
|
4d5375f6d6 | ||
|
|
424557fedb | ||
|
|
89251e603f | ||
|
|
a653ed07eb | ||
|
|
ad86deb014 | ||
|
|
9525dc7584 | ||
|
|
cd31dd27fd | ||
|
|
360e3670eb | ||
|
|
8dabe3b4c8 | ||
|
|
443e0c2806 | ||
|
|
9cc173cc4d | ||
|
|
b5f33e5ecd | ||
|
|
40dfc6860f | ||
|
|
1c02a04423 | ||
|
|
de0e45070c | ||
|
|
c169cc7d74 | ||
|
|
cd62ad76f6 | ||
|
|
dd25b0fb5b | ||
|
|
a38b22a6a2 |
28
README.md
28
README.md
@@ -7,7 +7,7 @@
|
||||
[中文] | [<a href="docs/en/README.md">English</a>] | [<a href="docs/ja/README.md">日本語</a>]
|
||||
</p>
|
||||
|
||||
**CowAgent** 是基于大模型的超级 AI 助理,能够主动思考和任务规划、操作计算机和外部资源、创造和执行 Skills、拥有长期记忆并不断成长,比 OpenClaw 更轻量和便捷。CowAgent 支持灵活切换多种模型,能处理文本、语音、图片、文件等多模态消息,可接入微信、飞书、钉钉、企微智能机器人、QQ、企微自建应用、微信公众号、网页中使用,7*24小时运行于你的个人电脑或服务器中。
|
||||
**CowAgent** 是基于大模型的超级 AI 助理,能够主动思考和任务规划、操作计算机和外部资源、创造和执行 Skills、拥有长期记忆和知识库并不断成长,比 OpenClaw 更轻量和便捷。CowAgent 支持灵活切换多种模型,能处理文本、语音、图片、文件等多模态消息,可接入微信、飞书、钉钉、企微智能机器人、QQ、企微自建应用、微信公众号、网页中使用,7*24小时运行于你的个人电脑或服务器中。
|
||||
|
||||
<p align="center">
|
||||
<a href="https://cowagent.ai/">🌐 官网</a> ·
|
||||
@@ -24,6 +24,7 @@
|
||||
|
||||
- ✅ **自主任务规划**:能够理解复杂任务并自主规划执行,持续思考和调用工具直到完成目标
|
||||
- ✅ **长期记忆:** 自动将对话记忆持久化至本地文件和数据库中,包括核心记忆和日级记忆,支持关键词及向量检索
|
||||
- ✅ **个人知识库:** 自动整理结构化知识,通过交叉引用构建知识图谱,支持通过对话管理和可视化浏览知识库
|
||||
- ✅ **技能系统:** Skills 安装和运行的引擎,支持从 [Skill Hub](https://skills.cowagent.ai/)、GitHub 等一键安装技能,或通过对话创造 Skills
|
||||
- ✅ **工具系统:** 内置文件读写、终端执行、浏览器操作、定时任务等工具,Agent 自主调用以完成复杂任务
|
||||
- ✅ **CLI系统:** 提供终端命令和对话命令,支持进程管理、技能安装、配置修改等操作
|
||||
@@ -101,7 +102,7 @@ bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
|
||||
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
```
|
||||
|
||||
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start`、`cow stop` 等 [CLI 命令](https://docs.cowagent.ai/commands/index) 管理服务。
|
||||
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start`、`cow stop` 等 [CLI 命令](https://docs.cowagent.ai/cli/index) 管理服务。
|
||||
|
||||
|
||||
## 一、准备
|
||||
@@ -116,7 +117,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
|
||||
### 2.环境安装
|
||||
|
||||
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`,Python 版本需在3.7 ~ 3.12 之间,推荐使用3.9版本。
|
||||
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`,Python 版本需在3.7 ~ 3.12 之间。
|
||||
|
||||
> 注意:Agent 模式推荐使用源码运行,若选择 Docker 部署则无需安装 python 环境和下载源码,可直接快进到下一节。
|
||||
|
||||
@@ -151,7 +152,7 @@ pip3 install -r requirements-optional.txt
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/commands/index)。
|
||||
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/cli/index)。
|
||||
|
||||
**(5) 安装浏览器工具 (可选):**
|
||||
|
||||
@@ -213,12 +214,13 @@ cow install-browser
|
||||
+ 添加 `"speech_recognition": true` 将开启语音识别,默认使用 openai 的 whisper 模型识别为文字,同时以文字回复,该参数仅支持私聊 (注意由于语音消息无法匹配前缀,一旦开启将对所有语音自动回复,支持语音触发画图);
|
||||
+ 添加 `"group_speech_recognition": true` 将开启群组语音识别,默认使用 openai 的 whisper 模型识别为文字,同时以文字回复,参数仅支持群聊 (会匹配 group_chat_prefix 和 group_chat_keyword, 支持语音触发画图);
|
||||
+ 添加 `"voice_reply_voice": true` 将开启语音回复语音(同时作用于私聊和群聊)
|
||||
+ 使用 MiniMax TTS:设置 `"text_to_voice": "minimax"`,并配置 `minimax_api_key`;可通过 `"tts_voice_id"` 指定发音人(如 `English_Graceful_Lady`),`"text_to_voice_model"` 指定模型(如 `speech-2.8-hd`、`speech-2.8-turbo`)
|
||||
</details>
|
||||
|
||||
<details>
|
||||
<summary>2. 其他配置</summary>
|
||||
|
||||
+ `model`: 模型名称,Agent 模式下推荐使用 `MiniMax-M2.7`、`glm-5-turbo`、`kimi-k2.5`、`qwen3.5-plus`、`claude-sonnet-4-6`、`gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
|
||||
+ `model`: 模型名称,Agent 模式下推荐使用 `MiniMax-M2.7`、`glm-5-turbo`、`kimi-k2.5`、`qwen3.6-plus`、`claude-sonnet-4-6`、`gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
|
||||
+ `character_desc`:普通对话模式下的机器人系统提示词。在 Agent 模式下该配置不生效,由工作空间中的文件内容构成。
|
||||
+ `subscribe_msg`:订阅消息,公众号和企业微信 channel 中请填写,当被订阅时会自动回复, 可使用特殊占位符。目前支持的占位符有{trigger_prefix},在程序中它会自动替换成 bot 的触发词。
|
||||
</details>
|
||||
@@ -303,7 +305,7 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
|
||||
## 模型说明
|
||||
|
||||
以下对所有可支持的模型的配置和使用方法进行说明,模型接口实现在项目的 `models/` 目录下。
|
||||
推荐通过 Web 控制台在线管理模型配置,无需手动编辑文件,详见 [模型文档](https://docs.cowagent.ai/models)。以下是手动修改 `config.json` 配置模型的说明:
|
||||
|
||||
<details>
|
||||
<summary>OpenAI</summary>
|
||||
@@ -357,7 +359,7 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
"minimax_api_key": ""
|
||||
}
|
||||
```
|
||||
- `model`: 可填写 `MiniMax-M2.7、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2、abab6.5-chat` 等
|
||||
- `model`: 可填写 `MiniMax-M2.7、MiniMax-M2.7-highspeed、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2、abab6.5-chat` 等
|
||||
- `minimax_api_key`:MiniMax 平台的 API-KEY,在 [控制台](https://platform.minimaxi.com/user-center/basic-information/interface-key) 创建
|
||||
|
||||
方式二:OpenAI 兼容方式接入,配置如下:
|
||||
@@ -370,7 +372,7 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
}
|
||||
```
|
||||
- `bot_type`: OpenAI 兼容方式
|
||||
- `model`: 可填 `MiniMax-M2.7、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2`,参考[API文档](https://platform.minimaxi.com/document/%E5%AF%B9%E8%AF%9D?key=66701d281d57f38758d581d0#QklxsNSbaf6kM4j6wjO5eEek)
|
||||
- `model`: 可填 `MiniMax-M2.7、MiniMax-M2.7-highspeed、MiniMax-M2.5、MiniMax-M2.1、MiniMax-M2.1-lightning、MiniMax-M2`,参考[API文档](https://platform.minimaxi.com/document/%E5%AF%B9%E8%AF%9D?key=66701d281d57f38758d581d0#QklxsNSbaf6kM4j6wjO5eEek)
|
||||
- `open_ai_api_base`: MiniMax 平台 API 的 BASE URL
|
||||
- `open_ai_api_key`: MiniMax 平台的 API-KEY
|
||||
</details>
|
||||
@@ -411,18 +413,18 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "sk-qVxxxxG"
|
||||
}
|
||||
```
|
||||
- `model`: 可填写 `qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus` 等
|
||||
- `dashscope_api_key`: 通义千问的 API-KEY,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
|
||||
- `model`: 可填写 `qwen3.6-plus、qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus` 等
|
||||
- `dashscope_api_key`: 通义千问的 API-KEY,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
|
||||
|
||||
方式二:OpenAI 兼容方式接入,配置如下:
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "sk-qVxxxxG"
|
||||
}
|
||||
@@ -674,7 +676,7 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
|
||||
|
||||
## 通道说明
|
||||
|
||||
以下对可接入通道的配置方式进行说明,应用通道代码在项目的 `channel/` 目录下。
|
||||
推荐通过 Web 控制台在线管理通道配置,无需手动编辑文件,详见 [通道文档](https://docs.cowagent.ai/channels/weixin)。以下为手动修改 `config.json` 配置通道的说明:
|
||||
|
||||
支持同时可接入多个通道,配置时可通过逗号进行分割,例如 `"channel_type": "feishu,dingtalk"`。
|
||||
|
||||
|
||||
@@ -57,7 +57,16 @@ class ChatService:
|
||||
event_type = event.get("type")
|
||||
data = event.get("data", {})
|
||||
|
||||
if event_type == "message_update":
|
||||
if event_type == "reasoning_update":
|
||||
delta = data.get("delta", "")
|
||||
if delta:
|
||||
send_chunk_fn({
|
||||
"chunk_type": "reasoning",
|
||||
"delta": delta,
|
||||
"segment_id": state.segment_id,
|
||||
})
|
||||
|
||||
elif event_type == "message_update":
|
||||
# Incremental text delta
|
||||
delta = data.get("delta", "")
|
||||
if delta:
|
||||
|
||||
0
agent/knowledge/__init__.py
Normal file
0
agent/knowledge/__init__.py
Normal file
218
agent/knowledge/service.py
Normal file
218
agent/knowledge/service.py
Normal file
@@ -0,0 +1,218 @@
|
||||
"""
|
||||
Knowledge service for handling knowledge base operations.
|
||||
|
||||
Provides a unified interface for listing, reading, and graphing knowledge files,
|
||||
callable from the web console, API, or CLI.
|
||||
|
||||
Knowledge file layout (under workspace_root):
|
||||
knowledge/index.md
|
||||
knowledge/log.md
|
||||
knowledge/<category>/<slug>.md
|
||||
"""
|
||||
|
||||
import os
|
||||
import re
|
||||
from pathlib import Path
|
||||
from typing import Optional
|
||||
|
||||
from common.log import logger
|
||||
from config import conf
|
||||
|
||||
|
||||
class KnowledgeService:
|
||||
"""
|
||||
High-level service for knowledge base queries.
|
||||
Operates directly on the filesystem.
|
||||
"""
|
||||
|
||||
def __init__(self, workspace_root: str):
|
||||
self.workspace_root = workspace_root
|
||||
self.knowledge_dir = os.path.join(workspace_root, "knowledge")
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# list — directory tree with stats
|
||||
# ------------------------------------------------------------------
|
||||
def list_tree(self) -> dict:
|
||||
"""
|
||||
Return the knowledge directory tree grouped by category.
|
||||
|
||||
Returns::
|
||||
|
||||
{
|
||||
"tree": [
|
||||
{
|
||||
"dir": "concepts",
|
||||
"files": [
|
||||
{"name": "moe.md", "title": "MoE", "size": 1234},
|
||||
...
|
||||
]
|
||||
},
|
||||
...
|
||||
],
|
||||
"stats": {"pages": 15, "size": 32768},
|
||||
"enabled": true
|
||||
}
|
||||
"""
|
||||
if not os.path.isdir(self.knowledge_dir):
|
||||
return {"tree": [], "stats": {"pages": 0, "size": 0}, "enabled": conf().get("knowledge", True)}
|
||||
|
||||
tree = []
|
||||
total_files = 0
|
||||
total_bytes = 0
|
||||
for name in sorted(os.listdir(self.knowledge_dir)):
|
||||
full = os.path.join(self.knowledge_dir, name)
|
||||
if not os.path.isdir(full) or name.startswith("."):
|
||||
continue
|
||||
files = []
|
||||
for fname in sorted(os.listdir(full)):
|
||||
if fname.endswith(".md") and not fname.startswith("."):
|
||||
fpath = os.path.join(full, fname)
|
||||
size = os.path.getsize(fpath)
|
||||
total_files += 1
|
||||
total_bytes += size
|
||||
title = fname.replace(".md", "")
|
||||
try:
|
||||
with open(fpath, "r", encoding="utf-8") as f:
|
||||
first_line = f.readline().strip()
|
||||
if first_line.startswith("# "):
|
||||
title = first_line[2:].strip()
|
||||
except Exception:
|
||||
pass
|
||||
files.append({"name": fname, "title": title, "size": size})
|
||||
tree.append({"dir": name, "files": files})
|
||||
|
||||
return {
|
||||
"tree": tree,
|
||||
"stats": {"pages": total_files, "size": total_bytes},
|
||||
"enabled": conf().get("knowledge", True),
|
||||
}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# read — single file content
|
||||
# ------------------------------------------------------------------
|
||||
def read_file(self, rel_path: str) -> dict:
|
||||
"""
|
||||
Read a single knowledge markdown file.
|
||||
|
||||
:param rel_path: Relative path within knowledge/, e.g. ``concepts/moe.md``
|
||||
:return: dict with ``content`` and ``path``
|
||||
:raises ValueError: if path is invalid or escapes knowledge dir
|
||||
:raises FileNotFoundError: if file does not exist
|
||||
"""
|
||||
if not rel_path or ".." in rel_path:
|
||||
raise ValueError("invalid path")
|
||||
|
||||
full_path = os.path.normpath(os.path.join(self.knowledge_dir, rel_path))
|
||||
allowed = os.path.normpath(self.knowledge_dir)
|
||||
if not full_path.startswith(allowed + os.sep) and full_path != allowed:
|
||||
raise ValueError("path outside knowledge dir")
|
||||
|
||||
if not os.path.isfile(full_path):
|
||||
raise FileNotFoundError(f"file not found: {rel_path}")
|
||||
|
||||
with open(full_path, "r", encoding="utf-8") as f:
|
||||
content = f.read()
|
||||
return {"content": content, "path": rel_path}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# graph — nodes and links for visualization
|
||||
# ------------------------------------------------------------------
|
||||
def build_graph(self) -> dict:
|
||||
"""
|
||||
Parse all knowledge pages and extract cross-reference links.
|
||||
|
||||
Returns::
|
||||
|
||||
{
|
||||
"nodes": [
|
||||
{"id": "concepts/moe.md", "label": "MoE", "category": "concepts"},
|
||||
...
|
||||
],
|
||||
"links": [
|
||||
{"source": "concepts/moe.md", "target": "entities/deepseek.md"},
|
||||
...
|
||||
]
|
||||
}
|
||||
"""
|
||||
knowledge_path = Path(self.knowledge_dir)
|
||||
if not knowledge_path.is_dir():
|
||||
return {"nodes": [], "links": []}
|
||||
|
||||
nodes = {}
|
||||
links = []
|
||||
link_re = re.compile(r'\[([^\]]*)\]\(([^)]+\.md)\)')
|
||||
|
||||
for md_file in knowledge_path.rglob("*.md"):
|
||||
rel = str(md_file.relative_to(knowledge_path))
|
||||
if rel in ("index.md", "log.md"):
|
||||
continue
|
||||
parts = rel.split("/")
|
||||
category = parts[0] if len(parts) > 1 else "root"
|
||||
title = md_file.stem.replace("-", " ").title()
|
||||
try:
|
||||
content = md_file.read_text(encoding="utf-8")
|
||||
first_line = content.strip().split("\n")[0]
|
||||
if first_line.startswith("# "):
|
||||
title = first_line[2:].strip()
|
||||
for _, link_target in link_re.findall(content):
|
||||
resolved = (md_file.parent / link_target).resolve()
|
||||
try:
|
||||
target_rel = str(resolved.relative_to(knowledge_path))
|
||||
except ValueError:
|
||||
continue
|
||||
if target_rel != rel:
|
||||
links.append({"source": rel, "target": target_rel})
|
||||
except Exception:
|
||||
pass
|
||||
nodes[rel] = {"id": rel, "label": title, "category": category}
|
||||
|
||||
valid_ids = set(nodes.keys())
|
||||
links = [l for l in links if l["source"] in valid_ids and l["target"] in valid_ids]
|
||||
seen = set()
|
||||
deduped = []
|
||||
for l in links:
|
||||
key = tuple(sorted([l["source"], l["target"]]))
|
||||
if key not in seen:
|
||||
seen.add(key)
|
||||
deduped.append(l)
|
||||
|
||||
return {"nodes": list(nodes.values()), "links": deduped}
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# dispatch — single entry point for protocol messages
|
||||
# ------------------------------------------------------------------
|
||||
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
|
||||
"""
|
||||
Dispatch a knowledge management action.
|
||||
|
||||
:param action: ``list``, ``read``, or ``graph``
|
||||
:param payload: action-specific payload
|
||||
:return: protocol-compatible response dict
|
||||
"""
|
||||
payload = payload or {}
|
||||
try:
|
||||
if action == "list":
|
||||
result = self.list_tree()
|
||||
return {"action": action, "code": 200, "message": "success", "payload": result}
|
||||
|
||||
elif action == "read":
|
||||
path = payload.get("path")
|
||||
if not path:
|
||||
return {"action": action, "code": 400, "message": "path is required", "payload": None}
|
||||
result = self.read_file(path)
|
||||
return {"action": action, "code": 200, "message": "success", "payload": result}
|
||||
|
||||
elif action == "graph":
|
||||
result = self.build_graph()
|
||||
return {"action": action, "code": 200, "message": "success", "payload": result}
|
||||
|
||||
else:
|
||||
return {"action": action, "code": 400, "message": f"unknown action: {action}", "payload": None}
|
||||
|
||||
except ValueError as e:
|
||||
return {"action": action, "code": 403, "message": str(e), "payload": None}
|
||||
except FileNotFoundError as e:
|
||||
return {"action": action, "code": 404, "message": str(e), "payload": None}
|
||||
except Exception as e:
|
||||
logger.error(f"[KnowledgeService] dispatch error: action={action}, error={e}")
|
||||
return {"action": action, "code": 500, "message": str(e), "payload": None}
|
||||
@@ -188,8 +188,9 @@ def _group_into_display_turns(
|
||||
if text:
|
||||
turns.append({"role": "user", "content": text, "created_at": created_at})
|
||||
|
||||
# Collect all tool_calls and tool_results from the rest of the group
|
||||
all_tool_calls: List[Dict[str, Any]] = []
|
||||
# Build an ordered list of steps preserving the original sequence:
|
||||
# thinking → content → tool_call → content → ...
|
||||
steps: List[Dict[str, Any]] = []
|
||||
tool_results: Dict[str, str] = {}
|
||||
final_text = ""
|
||||
final_ts: Optional[int] = None
|
||||
@@ -198,24 +199,46 @@ def _group_into_display_turns(
|
||||
if role == "user":
|
||||
tool_results.update(_extract_tool_results(content))
|
||||
elif role == "assistant":
|
||||
tcs = _extract_tool_calls(content)
|
||||
all_tool_calls.extend(tcs)
|
||||
t = _extract_display_text(content)
|
||||
if t:
|
||||
final_text = t
|
||||
# Walk content blocks in order to preserve interleaving
|
||||
if isinstance(content, list):
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
btype = block.get("type")
|
||||
if btype == "thinking":
|
||||
txt = block.get("thinking", "").strip()
|
||||
if txt:
|
||||
steps.append({"type": "thinking", "content": txt})
|
||||
elif btype == "text":
|
||||
txt = block.get("text", "").strip()
|
||||
if txt:
|
||||
steps.append({"type": "content", "content": txt})
|
||||
final_text = txt
|
||||
elif btype == "tool_use":
|
||||
steps.append({
|
||||
"type": "tool",
|
||||
"id": block.get("id", ""),
|
||||
"name": block.get("name", ""),
|
||||
"arguments": block.get("input", {}),
|
||||
})
|
||||
elif isinstance(content, str) and content.strip():
|
||||
steps.append({"type": "content", "content": content.strip()})
|
||||
final_text = content.strip()
|
||||
final_ts = created_at
|
||||
|
||||
# Attach tool results to their matching tool_call entries
|
||||
for tc in all_tool_calls:
|
||||
tc["result"] = tool_results.get(tc.get("id", ""), "")
|
||||
# Attach tool results to tool steps
|
||||
for step in steps:
|
||||
if step["type"] == "tool":
|
||||
step["result"] = tool_results.get(step.get("id", ""), "")
|
||||
|
||||
if final_text or all_tool_calls:
|
||||
turns.append({
|
||||
if steps or final_text:
|
||||
turn = {
|
||||
"role": "assistant",
|
||||
"content": final_text,
|
||||
"tool_calls": all_tool_calls,
|
||||
"steps": steps,
|
||||
"created_at": final_ts or (user_row[1] if user_row else 0),
|
||||
})
|
||||
}
|
||||
turns.append(turn)
|
||||
|
||||
return turns
|
||||
|
||||
@@ -312,6 +335,9 @@ class ConversationStore:
|
||||
content = json.loads(raw_content)
|
||||
except Exception:
|
||||
content = raw_content
|
||||
# Strip thinking blocks — they are stored for UI display only
|
||||
if role == "assistant" and isinstance(content, list):
|
||||
content = [b for b in content if b.get("type") != "thinking"]
|
||||
result.append({"role": role, "content": content})
|
||||
return result
|
||||
|
||||
|
||||
@@ -285,6 +285,10 @@ class MemoryManager:
|
||||
# Scan memory directory (including daily summaries)
|
||||
if memory_dir.exists():
|
||||
for file_path in memory_dir.rglob("*.md"):
|
||||
# Skip hidden directories (e.g. .dreams/)
|
||||
if any(part.startswith('.') for part in file_path.relative_to(workspace_dir).parts):
|
||||
continue
|
||||
|
||||
# Determine scope and user_id from path
|
||||
rel_path = file_path.relative_to(workspace_dir)
|
||||
parts = rel_path.parts
|
||||
@@ -312,6 +316,14 @@ class MemoryManager:
|
||||
scope = "shared"
|
||||
|
||||
await self._sync_file(file_path, "memory", scope, user_id)
|
||||
|
||||
# Scan knowledge directory (structured knowledge wiki)
|
||||
from config import conf
|
||||
if conf().get("knowledge", True):
|
||||
knowledge_dir = Path(workspace_dir) / "knowledge"
|
||||
if knowledge_dir.exists():
|
||||
for file_path in knowledge_dir.rglob("*.md"):
|
||||
await self._sync_file(file_path, "knowledge", "shared", None)
|
||||
|
||||
self._dirty = False
|
||||
|
||||
|
||||
@@ -1,9 +1,10 @@
|
||||
"""
|
||||
Memory flush manager
|
||||
Memory flush manager (with Light Dream)
|
||||
|
||||
Handles memory persistence when conversation context is trimmed or overflows:
|
||||
- Uses LLM to summarize discarded messages into concise key-information entries
|
||||
- Writes to daily memory files (lazy creation)
|
||||
- Light Dream: extracts long-term memories to MEMORY.md in the same LLM call
|
||||
- Deduplicates trim flushes to avoid repeated writes
|
||||
- Runs summarization asynchronously to avoid blocking normal replies
|
||||
- Provides daily summary interface for scheduler
|
||||
@@ -16,26 +17,41 @@ from datetime import datetime
|
||||
from common.log import logger
|
||||
|
||||
|
||||
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提炼出值得长期记住的关键事件和核心信息。
|
||||
SUMMARIZE_SYSTEM_PROMPT = """你是一个记忆提取助手。你的任务是从对话记录中提炼出两种记忆:
|
||||
|
||||
核心原则:
|
||||
- 按「事件」维度归纳,而不是按对话轮次逐条记录
|
||||
- 多轮对话如果围绕同一件事,合并为一条摘要
|
||||
- 只记录有长期价值的信息,忽略闲聊、问候、无意义的短消息
|
||||
## 第一部分:日常记录([DAILY])
|
||||
|
||||
输出要求:
|
||||
1. 每条一行,用 "- " 开头,格式为:事件/主题 + 关键结论或结果
|
||||
2. 值得记录的信息类型:用户提出的需求及最终解决方案、重要的事实信息、用户的偏好或决策、关键技术方案或配置变更
|
||||
3. 不值得记录的信息:简单问候、闲聊、无实质内容的短消息、重复的中间过程
|
||||
4. 每条摘要应当简明扼要,一句话概括事件的核心内容和结果
|
||||
5. 直接输出摘要内容,不要加任何前缀说明
|
||||
6. 当对话没有任何记录价值(仅含问候或无意义内容),回复"无"
|
||||
按「事件」维度归纳当天发生的事,不要按对话轮次逐条记录:
|
||||
- 每条一行,用 "- " 开头
|
||||
- 合并同一件事的多轮对话
|
||||
- 只记录有意义的事件,忽略闲聊和问候
|
||||
|
||||
示例(仅供参考格式):
|
||||
- 用户配置了 XX 功能,设置参数为 YY,已生效
|
||||
- 用户反馈了 XX 问题,原因是 YY,通过 ZZ 方式解决"""
|
||||
## 第二部分:长期记忆([MEMORY])
|
||||
|
||||
SUMMARIZE_USER_PROMPT = """请从以下对话记录中,按关键事件维度提炼记忆摘要(合并同一事件的多轮对话,不要逐条列出):
|
||||
提取值得**永久记住**的关键信息,这些信息在未来的对话中仍然有价值:
|
||||
- 用户的偏好、习惯、风格(如"用户偏好中文回复"、"用户喜欢简洁风格")
|
||||
- 重要的决策或约定(如"项目决定使用 PostgreSQL")
|
||||
- 关键人物信息(如"张总是用户的上级")
|
||||
- 用户明确要求记住的内容
|
||||
- 重要的教训或经验总结
|
||||
|
||||
**如果没有值得永久记住的信息,[MEMORY] 部分留空即可。**
|
||||
|
||||
## 输出格式(严格遵守)
|
||||
|
||||
```
|
||||
[DAILY]
|
||||
- 事件1的摘要
|
||||
- 事件2的摘要
|
||||
|
||||
[MEMORY]
|
||||
- 值得永久记住的信息1
|
||||
- 值得永久记住的信息2
|
||||
```
|
||||
|
||||
当对话没有任何记录价值(仅含问候或无意义内容),直接回复"无"。"""
|
||||
|
||||
SUMMARIZE_USER_PROMPT = """请从以下对话记录中提取记忆(按 [DAILY] 和 [MEMORY] 两部分输出):
|
||||
|
||||
{conversation}"""
|
||||
|
||||
@@ -160,40 +176,111 @@ class MemoryFlushManager:
|
||||
reason: str,
|
||||
max_messages: int,
|
||||
):
|
||||
"""Background worker: summarize with LLM and write to daily file."""
|
||||
"""Background worker: summarize with LLM, write daily file + MEMORY.md (Light Dream)."""
|
||||
try:
|
||||
summary = self._summarize_messages(messages, max_messages)
|
||||
if not summary or not summary.strip() or summary.strip() == "无":
|
||||
raw_summary = self._summarize_messages(messages, max_messages)
|
||||
if not raw_summary or not raw_summary.strip() or raw_summary.strip() == "无":
|
||||
logger.info(f"[MemoryFlush] No valuable content to flush (reason={reason})")
|
||||
return
|
||||
|
||||
daily_file = ensure_daily_memory_file(self.workspace_dir, user_id)
|
||||
|
||||
if reason == "overflow":
|
||||
header = f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})"
|
||||
note = "The following conversation was trimmed due to context overflow:\n"
|
||||
elif reason == "trim":
|
||||
header = f"## Trimmed Context ({datetime.now().strftime('%H:%M')})"
|
||||
note = ""
|
||||
elif reason == "daily_summary":
|
||||
header = f"## Daily Summary ({datetime.now().strftime('%H:%M')})"
|
||||
note = ""
|
||||
else:
|
||||
header = f"## Session Notes ({datetime.now().strftime('%H:%M')})"
|
||||
note = ""
|
||||
|
||||
flush_entry = f"\n{header}\n\n{note}{summary}\n"
|
||||
|
||||
with open(daily_file, "a", encoding="utf-8") as f:
|
||||
f.write(flush_entry)
|
||||
|
||||
|
||||
daily_part, memory_part = self._parse_dual_output(raw_summary)
|
||||
|
||||
# --- Write daily memory ---
|
||||
if daily_part:
|
||||
daily_file = ensure_daily_memory_file(self.workspace_dir, user_id)
|
||||
|
||||
if reason == "overflow":
|
||||
header = f"## Context Overflow Recovery ({datetime.now().strftime('%H:%M')})"
|
||||
note = "The following conversation was trimmed due to context overflow:\n"
|
||||
elif reason == "trim":
|
||||
header = f"## Trimmed Context ({datetime.now().strftime('%H:%M')})"
|
||||
note = ""
|
||||
elif reason == "daily_summary":
|
||||
header = f"## Daily Summary ({datetime.now().strftime('%H:%M')})"
|
||||
note = ""
|
||||
else:
|
||||
header = f"## Session Notes ({datetime.now().strftime('%H:%M')})"
|
||||
note = ""
|
||||
|
||||
flush_entry = f"\n{header}\n\n{note}{daily_part}\n"
|
||||
|
||||
with open(daily_file, "a", encoding="utf-8") as f:
|
||||
f.write(flush_entry)
|
||||
|
||||
logger.info(f"[MemoryFlush] Wrote daily memory to {daily_file.name} (reason={reason}, chars={len(daily_part)})")
|
||||
|
||||
# --- Light Dream: write long-term memory to MEMORY.md ---
|
||||
if memory_part:
|
||||
self._append_to_main_memory(memory_part, user_id)
|
||||
|
||||
self.last_flush_timestamp = datetime.now()
|
||||
|
||||
logger.info(f"[MemoryFlush] Wrote to {daily_file.name} (reason={reason}, chars={len(summary)})")
|
||||
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"[MemoryFlush] Async flush failed (reason={reason}): {e}")
|
||||
|
||||
|
||||
@staticmethod
|
||||
def _parse_dual_output(raw: str) -> tuple:
|
||||
"""
|
||||
Parse LLM output into (daily_part, memory_part).
|
||||
Handles both new [DAILY]/[MEMORY] format and legacy single-section format.
|
||||
"""
|
||||
raw = raw.strip()
|
||||
|
||||
if "[DAILY]" in raw or "[MEMORY]" in raw:
|
||||
daily_part = ""
|
||||
memory_part = ""
|
||||
|
||||
# Extract [DAILY] section
|
||||
if "[DAILY]" in raw:
|
||||
start = raw.index("[DAILY]") + len("[DAILY]")
|
||||
end = raw.index("[MEMORY]") if "[MEMORY]" in raw else len(raw)
|
||||
daily_part = raw[start:end].strip()
|
||||
|
||||
# Extract [MEMORY] section
|
||||
if "[MEMORY]" in raw:
|
||||
start = raw.index("[MEMORY]") + len("[MEMORY]")
|
||||
memory_part = raw[start:].strip()
|
||||
|
||||
# Filter out empty markers
|
||||
if memory_part and all(
|
||||
not line.strip() or line.strip() == "-"
|
||||
for line in memory_part.split("\n")
|
||||
):
|
||||
memory_part = ""
|
||||
|
||||
return daily_part, memory_part
|
||||
|
||||
# Legacy format: treat entire output as daily, no memory extraction
|
||||
return raw, ""
|
||||
|
||||
def _append_to_main_memory(self, memory_entries: str, user_id: Optional[str] = None):
|
||||
"""Append extracted long-term memories to MEMORY.md with date stamp."""
|
||||
try:
|
||||
main_file = self.get_main_memory_file(user_id)
|
||||
today = datetime.now().strftime("%Y-%m-%d")
|
||||
|
||||
# Add date prefix to each entry line
|
||||
stamped_lines = []
|
||||
for line in memory_entries.strip().split("\n"):
|
||||
line = line.strip()
|
||||
if line.startswith("- "):
|
||||
stamped_lines.append(f"- ({today}) {line[2:]}")
|
||||
elif line:
|
||||
stamped_lines.append(f"- ({today}) {line}")
|
||||
|
||||
if not stamped_lines:
|
||||
return
|
||||
|
||||
stamped_text = "\n".join(stamped_lines)
|
||||
|
||||
with open(main_file, "a", encoding="utf-8") as f:
|
||||
f.write(f"\n{stamped_text}\n")
|
||||
|
||||
logger.info(f"[LightDream] Appended {len(stamped_lines)} entries to MEMORY.md")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"[LightDream] Failed to append to MEMORY.md: {e}")
|
||||
|
||||
def create_daily_summary(
|
||||
self,
|
||||
messages: List[Dict],
|
||||
|
||||
@@ -10,6 +10,7 @@ from typing import List, Dict, Optional, Any
|
||||
from dataclasses import dataclass
|
||||
|
||||
from common.log import logger
|
||||
from config import conf
|
||||
|
||||
|
||||
@dataclass
|
||||
@@ -92,10 +93,11 @@ def build_agent_system_prompt(
|
||||
顺序说明(按重要性和逻辑关系排列):
|
||||
1. 工具系统 - 核心能力,最先介绍
|
||||
2. 技能系统 - 紧跟工具,因为技能需要用 read 工具读取
|
||||
3. 记忆系统 - 独立的记忆能力
|
||||
3. 记忆系统 - 记忆检索与写入引导
|
||||
3.5 知识系统 - 结构化知识库(knowledge/index.md 注入)
|
||||
4. 工作空间 - 工作环境说明
|
||||
5. 用户身份 - 用户信息(可选)
|
||||
6. 项目上下文 - AGENT.md, USER.md, RULE.md, BOOTSTRAP.md(定义人格、身份、规则、初始化引导)
|
||||
6. 项目上下文 - AGENT.md, USER.md, RULE.md, MEMORY.md, BOOTSTRAP.md
|
||||
7. 运行时信息 - 元信息(时间、模型等)
|
||||
|
||||
Args:
|
||||
@@ -126,6 +128,10 @@ def build_agent_system_prompt(
|
||||
# 3. 记忆系统(独立的记忆能力)
|
||||
if memory_manager:
|
||||
sections.extend(_build_memory_section(memory_manager, tools, language))
|
||||
|
||||
# 3.5 知识系统(结构化知识库)
|
||||
if conf().get("knowledge", True):
|
||||
sections.extend(_build_knowledge_section(workspace_dir, language))
|
||||
|
||||
# 4. 工作空间(工作环境说明)
|
||||
sections.extend(_build_workspace_section(workspace_dir, language))
|
||||
@@ -207,9 +213,9 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
|
||||
"",
|
||||
"工具调用风格:",
|
||||
"",
|
||||
"- 在多步骤任务、敏感操作或用户要求时简要解释决策过程",
|
||||
"- 持续推进直到任务完成,完成后向用户报告结果。",
|
||||
"- 回复中涉及密钥、令牌等敏感信息必须脱敏。",
|
||||
"- 多步骤任务、复杂决策、敏感操作时,应简要说明当前在做什么、为什么这样做,让用户了解关键进展",
|
||||
"- 持续推进直到任务完成,完成后向用户报告结果",
|
||||
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
|
||||
"- URL链接直接放在回复文本中即可,系统会自动处理和渲染。无需下载后使用send工具发送",
|
||||
"",
|
||||
]
|
||||
@@ -268,55 +274,105 @@ def _build_memory_section(memory_manager: Any, tools: Optional[List[Any]], langu
|
||||
"""构建记忆系统section"""
|
||||
if not memory_manager:
|
||||
return []
|
||||
|
||||
# 检查是否有memory工具
|
||||
|
||||
has_memory_tools = False
|
||||
if tools:
|
||||
tool_names = [tool.name if hasattr(tool, 'name') else str(tool) for tool in tools]
|
||||
has_memory_tools = any(name in ['memory_search', 'memory_get'] for name in tool_names)
|
||||
|
||||
|
||||
if not has_memory_tools:
|
||||
return []
|
||||
|
||||
|
||||
from datetime import datetime
|
||||
today_file = datetime.now().strftime("%Y-%m-%d") + ".md"
|
||||
|
||||
|
||||
lines = [
|
||||
"## 🧠 记忆系统",
|
||||
"",
|
||||
"### 检索记忆",
|
||||
"### Memory Recall(mandatory)",
|
||||
"",
|
||||
"在回答关于以前的工作、决定、日期、人物、偏好或待办事项的任何问题之前:",
|
||||
"在回答任何关于过往工作、决策、日期、人物、偏好或待办事项的问题之前,**必须**先检索记忆。",
|
||||
"MEMORY.md 已自动加载在项目上下文中(可能被截断),完整内容和每日记忆需要通过工具检索。",
|
||||
"",
|
||||
"1. 不确定记忆文件位置 → 先用 `memory_search` 通过关键词和语义检索相关内容",
|
||||
"2. 已知文件位置 → 直接用 `memory_get` 读取相应的行 (例如:MEMORY.md, memory/YYYY-MM-DD.md)",
|
||||
"3. search 无结果 → 尝试用 `memory_get` 读取MEMORY.md及最近两天记忆文件",
|
||||
"1. 不确定位置 → `memory_search` 关键词/语义检索",
|
||||
"2. 已知位置 → `memory_get` 直接读取对应行",
|
||||
"3. search 无结果 → `memory_get` 读最近两天记忆",
|
||||
"",
|
||||
"**记忆文件结构**:",
|
||||
f"- `MEMORY.md`: 长期记忆(核心信息、偏好、决策等)",
|
||||
"- `MEMORY.md`: 长期记忆索引(已自动加载到上下文,核心信息、偏好、决策等)",
|
||||
f"- `memory/YYYY-MM-DD.md`: 每日记忆,今天是 `memory/{today_file}`",
|
||||
"- `knowledge/`: 结构化知识库(见下方知识系统)",
|
||||
"",
|
||||
"### 写入记忆",
|
||||
"",
|
||||
"**主动存储**:遇到以下情况时,应主动将信息写入记忆文件(无需告知用户):",
|
||||
"遇到以下情况时,**主动**将信息写入记忆文件(无需告知用户):",
|
||||
"",
|
||||
"- 用户明确要求你记住某些信息",
|
||||
"- 用户要求记住某些信息",
|
||||
"- 用户分享了重要的个人偏好、习惯、决策",
|
||||
"- 对话中产生了重要的结论、方案、约定",
|
||||
"- 完成了复杂任务,值得记录关键步骤和结果",
|
||||
"- 发现了用户经常遇到的问题或解决方案",
|
||||
"",
|
||||
"**存储规则**:",
|
||||
f"- 长期有效的核心信息 → `MEMORY.md`(文件保持精简,< 2000 tokens)",
|
||||
f"- 当天的事件、进展、笔记 → `memory/{today_file}`",
|
||||
"- 追加内容 → `edit` 工具,oldText 留空",
|
||||
"- 修改内容 → `edit` 工具,oldText 填写要替换的文本",
|
||||
"- **禁止写入敏感信息**:API密钥、令牌等敏感信息严禁写入记忆文件",
|
||||
f"- 长期核心信息 → `MEMORY.md`",
|
||||
f"- 当天事件/进展 → `memory/{today_file}`",
|
||||
"- 结构化知识 → `knowledge/`(见知识系统)",
|
||||
"- 追加 → `edit` 工具,oldText 留空",
|
||||
"- 修改 → `edit` 工具,oldText 填写要替换的文本",
|
||||
"- **禁止写入敏感信息**(API密钥、令牌等)",
|
||||
"",
|
||||
"**使用原则**: 自然使用记忆,就像你本来就知道;不用刻意提起,除非用户问起。",
|
||||
"",
|
||||
]
|
||||
|
||||
|
||||
return lines
|
||||
|
||||
|
||||
def _build_knowledge_section(workspace_dir: str, language: str) -> List[str]:
|
||||
"""Build knowledge wiki section. Injects knowledge/index.md when present."""
|
||||
index_path = os.path.join(workspace_dir, "knowledge", "index.md")
|
||||
if not os.path.exists(index_path):
|
||||
return []
|
||||
|
||||
try:
|
||||
with open(index_path, 'r', encoding='utf-8') as f:
|
||||
index_content = f.read().strip()
|
||||
except Exception:
|
||||
return []
|
||||
|
||||
lines = [
|
||||
"## 📚 知识系统",
|
||||
"",
|
||||
"你拥有一个持续积累的个人知识库 `knowledge/`,这是你的长期结构化知识存储。",
|
||||
"",
|
||||
"### 自动写入规则(mandatory)",
|
||||
"",
|
||||
"以下场景**必须**在回复的同时写入知识库,**直接写入,不要询问用户是否需要**:",
|
||||
"",
|
||||
"1. **用户分享了文章/链接/文档** → 阅读理解后,在同一轮回复中将要点写入 `knowledge/sources/<slug>.md`",
|
||||
"2. **深度讨论产生了结论/方案** → 整理为 `knowledge/analysis/<slug>.md`",
|
||||
"3. **对话涉及重要实体**(人物/公司/项目)→ 创建或更新 `knowledge/entities/<name>.md`",
|
||||
"4. **讨论了技术概念/方法论** → 整理为 `knowledge/concepts/<topic>.md`",
|
||||
"",
|
||||
"每次写入知识页面后,**必须同步更新** `knowledge/index.md` 添加一行索引。",
|
||||
"详细的页面格式和操作规范,请读取技能 `knowledge-wiki` 的 SKILL.md。",
|
||||
"",
|
||||
"⚠️ 不要问「要不要存到知识库」——符合上述场景就直接写入,这是你的本能行为。",
|
||||
"",
|
||||
]
|
||||
|
||||
if index_content:
|
||||
lines.extend([
|
||||
"### 当前知识索引",
|
||||
"",
|
||||
index_content,
|
||||
"",
|
||||
])
|
||||
|
||||
lines.extend([
|
||||
"**查询方式**:用 `read` 读取知识页面,或用 `memory_search` 检索(知识已纳入向量索引)。",
|
||||
"",
|
||||
])
|
||||
|
||||
return lines
|
||||
|
||||
|
||||
@@ -375,15 +431,17 @@ def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
|
||||
"",
|
||||
"**重要说明 - 文件已自动加载**:",
|
||||
"",
|
||||
"以下文件在会话启动时**已经自动加载**到系统提示词的「项目上下文」section 中,你**无需再用 read 工具读取它们**:",
|
||||
"以下文件在会话启动时**已经自动加载**到系统提示词中,你**无需再用 read 工具读取**:",
|
||||
"",
|
||||
"- ✅ `AGENT.md`: 已加载 - 你的人格和灵魂设定,请严格遵循。当你的名字、性格或交流风格发生变化时,主动用 `edit` 更新此文件",
|
||||
"- ✅ `USER.md`: 已加载 - 用户的身份信息。当用户修改称呼、姓名等身份信息时,用 `edit` 更新此文件",
|
||||
"- ✅ `RULE.md`: 已加载 - 工作空间使用指南和规则,请严格遵循",
|
||||
"- ✅ `MEMORY.md`: 已加载 - 长期记忆索引",
|
||||
"",
|
||||
"**💬 交流规范**:",
|
||||
"",
|
||||
"- 对话中不要暴露内部技术细节(文件名、工具名等),用自然语言表达。例如说「我已记住」而非「已更新 MEMORY.md」",
|
||||
"- 记忆相关操作无需暴露文件名,用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md」",
|
||||
"- 任务执行过程中的关键决策和步骤应该告知用户,让用户了解你在做什么、为什么这么做",
|
||||
"- 做真正有帮助的助手,而不是表演式的客套,尽可能帮忙解决问题",
|
||||
"- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
|
||||
"- 适当使用 emoji 让表达更生动自然 🎯,但不要过度堆砌",
|
||||
@@ -477,7 +535,14 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
|
||||
|
||||
# Add other runtime info
|
||||
runtime_parts = []
|
||||
if runtime_info.get("model"):
|
||||
# Support dynamic model via callable, fallback to static value
|
||||
if callable(runtime_info.get("_get_model")):
|
||||
try:
|
||||
runtime_parts.append(f"模型={runtime_info['_get_model']()}")
|
||||
except Exception:
|
||||
if runtime_info.get("model"):
|
||||
runtime_parts.append(f"模型={runtime_info['model']}")
|
||||
elif runtime_info.get("model"):
|
||||
runtime_parts.append(f"模型={runtime_info['model']}")
|
||||
if runtime_info.get("workspace"):
|
||||
runtime_parts.append(f"工作空间={runtime_info['workspace']}")
|
||||
|
||||
@@ -67,6 +67,12 @@ def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> Works
|
||||
# 创建websites子目录 (for web pages / sites generated by agent)
|
||||
websites_dir = os.path.join(workspace_dir, "websites")
|
||||
os.makedirs(websites_dir, exist_ok=True)
|
||||
|
||||
from config import conf
|
||||
knowledge_enabled = conf().get("knowledge", True)
|
||||
if knowledge_enabled:
|
||||
knowledge_dir = os.path.join(workspace_dir, "knowledge")
|
||||
os.makedirs(knowledge_dir, exist_ok=True)
|
||||
|
||||
# 如果需要,创建模板文件
|
||||
if create_templates:
|
||||
@@ -74,6 +80,15 @@ def ensure_workspace(workspace_dir: str, create_templates: bool = True) -> Works
|
||||
_create_template_if_missing(user_path, _get_user_template())
|
||||
_create_template_if_missing(rule_path, _get_rule_template())
|
||||
_create_template_if_missing(memory_path, _get_memory_template())
|
||||
if knowledge_enabled:
|
||||
_create_template_if_missing(
|
||||
os.path.join(knowledge_dir, "index.md"),
|
||||
_get_knowledge_index_template()
|
||||
)
|
||||
_create_template_if_missing(
|
||||
os.path.join(knowledge_dir, "log.md"),
|
||||
_get_knowledge_log_template()
|
||||
)
|
||||
|
||||
# Only create BOOTSTRAP.md for brand new workspaces;
|
||||
# agent deletes it after completing onboarding
|
||||
@@ -109,6 +124,7 @@ def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] =
|
||||
DEFAULT_AGENT_FILENAME,
|
||||
DEFAULT_USER_FILENAME,
|
||||
DEFAULT_RULE_FILENAME,
|
||||
DEFAULT_MEMORY_FILENAME, # Long-term memory (frozen snapshot)
|
||||
DEFAULT_BOOTSTRAP_FILENAME, # Only exists when onboarding is incomplete
|
||||
]
|
||||
|
||||
@@ -138,6 +154,10 @@ def load_context_files(workspace_dir: str, files_to_load: Optional[List[str]] =
|
||||
# 跳过空文件或只包含模板占位符的文件
|
||||
if not content or _is_template_placeholder(content):
|
||||
continue
|
||||
|
||||
# Truncate MEMORY.md to protect context window (frozen snapshot)
|
||||
if filename == DEFAULT_MEMORY_FILENAME:
|
||||
content = _truncate_memory_content(content)
|
||||
|
||||
context_files.append(ContextFile(
|
||||
path=filename,
|
||||
@@ -163,6 +183,36 @@ def _create_template_if_missing(filepath: str, template_content: str):
|
||||
logger.error(f"[Workspace] Failed to create template {filepath}: {e}")
|
||||
|
||||
|
||||
_MEMORY_MAX_LINES = 200
|
||||
_MEMORY_MAX_BYTES = 25000
|
||||
|
||||
|
||||
def _truncate_memory_content(content: str) -> str:
|
||||
"""Truncate MEMORY.md to keep system prompt manageable.
|
||||
|
||||
Takes the **last** N lines (newest entries are appended at the bottom),
|
||||
subject to 200 lines / 25 KB limits (whichever is hit first).
|
||||
Prepends a hint when truncated so the model knows older content exists.
|
||||
"""
|
||||
lines = content.split('\n')
|
||||
truncated = False
|
||||
|
||||
if len(lines) > _MEMORY_MAX_LINES:
|
||||
lines = lines[-_MEMORY_MAX_LINES:]
|
||||
truncated = True
|
||||
|
||||
result = '\n'.join(lines)
|
||||
if len(result.encode('utf-8')) > _MEMORY_MAX_BYTES:
|
||||
while len(result.encode('utf-8')) > _MEMORY_MAX_BYTES and lines:
|
||||
lines.pop(0)
|
||||
truncated = True
|
||||
result = '\n'.join(lines)
|
||||
|
||||
if truncated:
|
||||
result = "...(older entries truncated, use `memory_search` or `memory_get` for full content)\n\n" + result
|
||||
return result
|
||||
|
||||
|
||||
def _is_template_placeholder(content: str) -> bool:
|
||||
"""检查内容是否为模板占位符"""
|
||||
# 常见的占位符模式
|
||||
@@ -231,9 +281,9 @@ _你不是一个聊天机器人,你正在成为某个人。_
|
||||
|
||||
## 🎯 核心原则
|
||||
|
||||
**做真正有帮助的助手,而不是表演式的客套。** 跳过「好的!」「当然可以!」之类的套话——直接帮忙。行动胜过废话。
|
||||
**做真正有帮助的助手。** 目标是真正帮用户解决问题,在执行复杂任务时,关键的决策和过程进展要让用户知道。
|
||||
|
||||
**有自己的观点。** 你可以不同意、有偏好、觉得有趣或无聊。一个没有个性的助手只是多了几步操作的搜索引擎。
|
||||
**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。
|
||||
|
||||
**先自己动手查。** 先试着搞定:读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来,而不是带着问题。
|
||||
|
||||
@@ -287,39 +337,88 @@ def _get_rule_template() -> str:
|
||||
|
||||
这个文件夹是你的家。好好对待它。
|
||||
|
||||
## 工作空间目录结构
|
||||
|
||||
```
|
||||
~/cow/
|
||||
├── AGENT.md # 你的身份和灵魂设定
|
||||
├── USER.md # 用户基本信息(静态)
|
||||
├── RULE.md # 工作空间规则(本文件)
|
||||
├── MEMORY.md # 长期记忆索引(会话启动时自动加载)
|
||||
│
|
||||
├── memory/ # 每日对话记忆
|
||||
│ └── YYYY-MM-DD.md # 当天事件、进展、笔记
|
||||
│
|
||||
├── knowledge/ # 结构化知识库(持续积累的知识)
|
||||
│ ├── index.md # 知识目录索引(必须维护)
|
||||
│ ├── log.md # 知识操作日志
|
||||
│ └── <子目录>/ # 按需创建,参考 index.md 已有分类
|
||||
│
|
||||
├── skills/ # 技能
|
||||
├── websites/ # 网页产物
|
||||
└── tmp/ # 系统临时文件(自动管理,勿手动存放重要文件)
|
||||
```
|
||||
|
||||
## 记忆系统
|
||||
|
||||
你每次会话都是全新的,记忆文件让你保持连续性:
|
||||
|
||||
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
|
||||
- 原始的对话日志
|
||||
- 记录当天发生的事情
|
||||
- 如果 `memory/` 目录不存在,创建它
|
||||
|
||||
### 🧠 长期记忆:`MEMORY.md`
|
||||
- 你精选的记忆,就像人类的长期记忆
|
||||
- **仅在主会话中加载**(与用户的直接聊天)
|
||||
- **不要在共享上下文中加载**(群聊、与其他人的会话)
|
||||
- 这是为了**安全** - 包含不应泄露给陌生人的个人上下文
|
||||
- 记录重要事件、想法、决定、观点、经验教训
|
||||
- 这是你精选的记忆 - 精华,而不是原始日志
|
||||
- 用 `edit` 工具追加新的记忆内容
|
||||
- 你精选的记忆索引,每次会话启动时**自动加载**到上下文中
|
||||
- 记录核心事实、偏好、决策、重要人物、教训
|
||||
- 保持精简(< 200 行),是精华索引而非原始日志
|
||||
- 用 `edit` 工具追加或修改
|
||||
|
||||
### 📝 每日记忆:`memory/YYYY-MM-DD.md`
|
||||
- 当天的事件、进展、笔记
|
||||
- 原始对话日志的沉淀
|
||||
|
||||
### 📝 写下来 - 不要"记在心里"!
|
||||
- **记忆是有限的** - 如果你想记住某事,写入文件
|
||||
- **记忆是有限的** - 想记住的事就写入文件
|
||||
- "记在心里"不会在会话重启后保留,文件才会
|
||||
- 当有人说"记住这个" → 更新 `MEMORY.md` 或 `memory/YYYY-MM-DD.md`
|
||||
- 当你学到教训 → 更新 RULE.md 或相关技能
|
||||
- 当你犯错 → 记录下来,这样未来的你不会重复,**文字 > 大脑** 📝
|
||||
- 当你犯错 → 记录下来,**文字 > 大脑** 📝
|
||||
|
||||
### 存储规则
|
||||
|
||||
当用户分享信息时,根据类型选择存储位置:
|
||||
|
||||
1. **你的身份设定 → AGENT.md**(你的名字、角色、性格、交流风格——用户修改时必须用 `edit` 更新)
|
||||
2. **用户静态身份 → USER.md**(姓名、称呼、职业、时区、联系方式、生日——用户修改时必须用 `edit` 更新)
|
||||
3. **动态记忆 → MEMORY.md**(爱好、偏好、决策、目标、项目、教训、待办事项)
|
||||
1. **你的身份设定 → AGENT.md**(名字、角色、性格、风格)
|
||||
2. **用户静态身份 → USER.md**(姓名、称呼、职业、联系方式、生日)
|
||||
3. **动态记忆 → MEMORY.md**(偏好、决策、目标、教训、待办)
|
||||
4. **当天对话 → memory/YYYY-MM-DD.md**(今天聊的内容)
|
||||
5. **结构化知识 → knowledge/**(见下方知识系统)
|
||||
|
||||
## 知识系统
|
||||
|
||||
知识库 `knowledge/` 是你持续积累的结构化知识。与记忆不同,知识是经过整理和编译的,有明确的主题和交叉引用。
|
||||
|
||||
### 自动写入(不要询问,直接写入)
|
||||
|
||||
当对话中产生了有沉淀价值的知识——无论是用户分享的资料、讨论的结论、学到的概念、还是重要的决策——你**必须**在回复的同时主动写入知识库,**无需问用户"要不要存到知识库"**。
|
||||
|
||||
**关键原则**:学完就记是你的本能,不要征求确认。回复中可以顺带告知"已存入知识库"。
|
||||
|
||||
### 目录组织
|
||||
|
||||
子目录结构**不是固定的**,由你根据实际内容自主决定:
|
||||
- **首次写入时**:先读 `knowledge/index.md`,如果已有分类则延续;如果为空,根据内容选择合适的目录名
|
||||
- **默认建议**:按信息类型组织(例如sources/、concepts/、entities/、analysis/),如果用户有明确的分类偏好(例如按领域 work/、life/、tech/ 等),则按用户要求调整
|
||||
- **保持一致性**:同一用户的知识库应保持统一的组织风格
|
||||
|
||||
### 交叉引用
|
||||
|
||||
知识的核心价值在于**关联**。每个页面都应通过 markdown 链接引用相关页面,构建知识网络:
|
||||
- 提到已有页面的概念时,添加 `[概念名](../category/page.md)` 链接
|
||||
- 新建页面时,检查是否有已有页面应该反向链接到新页面
|
||||
- **只链接已存在的页面**——不要引用尚未创建的页面。如果某个概念值得单独建页,先创建该页面再添加链接
|
||||
|
||||
### 索引维护
|
||||
|
||||
每次创建或更新知识页面后,**必须同步更新** `knowledge/index.md`。
|
||||
索引格式:每行一个 `[标题](路径) — 一句话摘要`,按分类分组,不要用表格。
|
||||
详细操作规范见技能 `knowledge-wiki`。
|
||||
|
||||
## 安全
|
||||
|
||||
@@ -381,4 +480,12 @@ _你刚刚启动,这是你的第一次对话。_ ✨
|
||||
"""
|
||||
|
||||
|
||||
def _get_knowledge_index_template() -> str:
|
||||
"""Knowledge wiki index template — empty file, agent fills it."""
|
||||
return ""
|
||||
|
||||
|
||||
def _get_knowledge_log_template() -> str:
|
||||
"""Knowledge wiki operation log template — empty file, agent fills it."""
|
||||
return ""
|
||||
|
||||
|
||||
@@ -527,6 +527,7 @@ class AgentStreamExecutor:
|
||||
|
||||
# Streaming response
|
||||
full_content = ""
|
||||
full_reasoning = ""
|
||||
tool_calls_buffer = {} # {index: {id, name, arguments}}
|
||||
gemini_raw_parts = None # Preserve Gemini thoughtSignature for round-trip
|
||||
stop_reason = None # Track why the stream stopped
|
||||
@@ -584,10 +585,10 @@ class AgentStreamExecutor:
|
||||
if finish_reason:
|
||||
stop_reason = finish_reason
|
||||
|
||||
# Skip reasoning_content (internal thinking from models like GLM-5)
|
||||
reasoning_delta = delta.get("reasoning_content") or ""
|
||||
# if reasoning_delta:
|
||||
# logger.debug(f"🧠 [thinking] {reasoning_delta[:100]}...")
|
||||
if reasoning_delta:
|
||||
full_reasoning += reasoning_delta
|
||||
self._emit_event("reasoning_update", {"delta": reasoning_delta})
|
||||
|
||||
# Handle text content
|
||||
content_delta = delta.get("content") or ""
|
||||
@@ -788,7 +789,12 @@ class AgentStreamExecutor:
|
||||
# Add assistant message to history (Claude format uses content blocks)
|
||||
assistant_msg = {"role": "assistant", "content": []}
|
||||
|
||||
# Add text content block if present
|
||||
if full_reasoning:
|
||||
assistant_msg["content"].append({
|
||||
"type": "thinking",
|
||||
"thinking": full_reasoning
|
||||
})
|
||||
|
||||
if full_content:
|
||||
assistant_msg["content"].append({
|
||||
"type": "text",
|
||||
|
||||
@@ -53,6 +53,12 @@ class SkillLoader:
|
||||
"""
|
||||
Recursively load skills from a directory.
|
||||
|
||||
If a subdirectory contains its own SKILL.md, it is treated as a
|
||||
self-contained skill (or skill-collection) and its children are
|
||||
NOT scanned further. This prevents sub-skills inside a collection
|
||||
(e.g. style-collection/style-anjing) from being listed as
|
||||
independent top-level skills.
|
||||
|
||||
:param dir_path: Directory to scan
|
||||
:param source: Source identifier
|
||||
:param include_root_files: Whether to include root-level .md files
|
||||
@@ -66,38 +72,41 @@ class SkillLoader:
|
||||
except Exception as e:
|
||||
diagnostics.append(f"Failed to list directory {dir_path}: {e}")
|
||||
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
|
||||
|
||||
# If this directory has its own SKILL.md, load it and stop recursing.
|
||||
# The sub-directories are internal resources of this skill.
|
||||
if not include_root_files and 'SKILL.md' in entries:
|
||||
skill_md_path = os.path.join(dir_path, 'SKILL.md')
|
||||
if os.path.isfile(skill_md_path):
|
||||
skill_result = self._load_skill_from_file(skill_md_path, source)
|
||||
if skill_result.skills:
|
||||
skills.extend(skill_result.skills)
|
||||
diagnostics.extend(skill_result.diagnostics)
|
||||
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
|
||||
|
||||
for entry in entries:
|
||||
# Skip hidden files and directories
|
||||
if entry.startswith('.'):
|
||||
continue
|
||||
|
||||
# Skip common non-skill directories
|
||||
if entry in ('node_modules', '__pycache__', 'venv', '.git'):
|
||||
continue
|
||||
|
||||
full_path = os.path.join(dir_path, entry)
|
||||
|
||||
# Handle directories
|
||||
if os.path.isdir(full_path):
|
||||
# Recursively scan subdirectories
|
||||
sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
|
||||
skills.extend(sub_result.skills)
|
||||
diagnostics.extend(sub_result.diagnostics)
|
||||
continue
|
||||
|
||||
# Handle files
|
||||
if not os.path.isfile(full_path):
|
||||
continue
|
||||
|
||||
# Check if this is a skill file
|
||||
is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
|
||||
is_skill_md = not include_root_files and entry == 'SKILL.md'
|
||||
|
||||
if not (is_root_md or is_skill_md):
|
||||
if not is_root_md:
|
||||
continue
|
||||
|
||||
# Load the skill
|
||||
skill_result = self._load_skill_from_file(full_path, source)
|
||||
if skill_result.skills:
|
||||
skills.extend(skill_result.skills)
|
||||
|
||||
@@ -210,6 +210,10 @@ class SkillManager:
|
||||
if not include_disabled:
|
||||
entries = [e for e in entries if self.is_skill_enabled(e.skill.name)]
|
||||
|
||||
from config import conf
|
||||
if not conf().get("knowledge", True):
|
||||
entries = [e for e in entries if e.skill.name != "knowledge-wiki"]
|
||||
|
||||
return entries
|
||||
|
||||
def filter_unavailable_skills(
|
||||
|
||||
@@ -18,9 +18,13 @@ from common.utils import expand_path
|
||||
class Bash(BaseTool):
|
||||
"""Tool for executing bash commands"""
|
||||
|
||||
_IS_WIN = sys.platform == "win32"
|
||||
|
||||
name: str = "bash"
|
||||
description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
|
||||
|
||||
{'''
|
||||
PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
|
||||
''' if _IS_WIN else ''}
|
||||
ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.
|
||||
|
||||
SAFETY:
|
||||
@@ -103,13 +107,12 @@ SAFETY:
|
||||
logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
|
||||
|
||||
# On Windows, convert $VAR references to %VAR% for cmd.exe
|
||||
if sys.platform == "win32":
|
||||
if self._IS_WIN:
|
||||
env["PYTHONIOENCODING"] = "utf-8"
|
||||
command = self._convert_env_vars_for_windows(command, dotenv_vars)
|
||||
if command and not command.strip().lower().startswith("chcp"):
|
||||
command = f"chcp 65001 >nul 2>&1 && {command}"
|
||||
|
||||
# Execute command with inherited environment variables
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
@@ -120,7 +123,7 @@ SAFETY:
|
||||
encoding="utf-8",
|
||||
errors="replace",
|
||||
timeout=timeout,
|
||||
env=env
|
||||
env=env,
|
||||
)
|
||||
|
||||
logger.debug(f"[Bash] Exit code: {result.returncode}")
|
||||
|
||||
@@ -45,6 +45,11 @@ _SNAPSHOT_JS = """
|
||||
const KEEP = new Set(%s);
|
||||
const INTERACTIVE = new Set(%s);
|
||||
const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
|
||||
const CLICKABLE_ROLES = new Set([
|
||||
"button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
|
||||
"option","switch","checkbox","radio","combobox","searchbox","slider",
|
||||
"spinbutton","textbox","treeitem"
|
||||
]);
|
||||
let refCounter = 0;
|
||||
const refMap = {};
|
||||
|
||||
@@ -56,6 +61,58 @@ _SNAPSHOT_JS = """
|
||||
return true;
|
||||
}
|
||||
|
||||
// Strong signals: these attributes alone are enough to mark as interactive
|
||||
function hasStrongInteractiveSignal(el) {
|
||||
const role = el.getAttribute("role");
|
||||
if (role && CLICKABLE_ROLES.has(role)) return true;
|
||||
if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
|
||||
if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
|
||||
if (el.getAttribute("contenteditable") === "true") return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check if cursor:pointer is set directly (not just inherited from parent)
|
||||
function hasOwnPointerCursor(el) {
|
||||
try {
|
||||
const st = window.getComputedStyle(el);
|
||||
if (st.cursor !== "pointer") return false;
|
||||
const parent = el.parentElement;
|
||||
if (parent) {
|
||||
const pst = window.getComputedStyle(parent);
|
||||
if (pst.cursor === "pointer") return false;
|
||||
}
|
||||
return true;
|
||||
} catch(e) {}
|
||||
return false;
|
||||
}
|
||||
|
||||
function hasTextOrContent(el) {
|
||||
const t = el.textContent || "";
|
||||
if (t.trim().length > 0) return true;
|
||||
if (el.querySelector("img,video,audio,canvas")) return true;
|
||||
const ariaLabel = el.getAttribute("aria-label");
|
||||
if (ariaLabel && ariaLabel.trim()) return true;
|
||||
const title = el.getAttribute("title");
|
||||
if (title && title.trim()) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function isImplicitInteractive(el) {
|
||||
if (hasStrongInteractiveSignal(el)) return true;
|
||||
if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function getTextContent(el) {
|
||||
let text = "";
|
||||
for (const ch of el.childNodes) {
|
||||
if (ch.nodeType === Node.TEXT_NODE) {
|
||||
text += ch.textContent;
|
||||
}
|
||||
}
|
||||
return text.trim();
|
||||
}
|
||||
|
||||
function walk(node) {
|
||||
if (node.nodeType === Node.TEXT_NODE) {
|
||||
const t = node.textContent.trim();
|
||||
@@ -75,21 +132,35 @@ _SNAPSHOT_JS = """
|
||||
}
|
||||
}
|
||||
|
||||
const keep = KEEP.has(tag);
|
||||
const nativeInteractive = INTERACTIVE.has(tag);
|
||||
const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
|
||||
const keep = KEEP.has(tag) || implicitInteractive;
|
||||
|
||||
if (!keep) {
|
||||
// Unwrap: promote children
|
||||
if (children.length === 0) return null;
|
||||
if (children.length === 1) return children[0];
|
||||
return children;
|
||||
}
|
||||
|
||||
const obj = { tag };
|
||||
if (INTERACTIVE.has(tag)) {
|
||||
if (nativeInteractive || implicitInteractive) {
|
||||
refCounter++;
|
||||
obj.ref = refCounter;
|
||||
refMap[refCounter] = node;
|
||||
}
|
||||
|
||||
if (implicitInteractive) {
|
||||
const role = node.getAttribute("role");
|
||||
if (role) obj.role = role;
|
||||
const directText = getTextContent(node);
|
||||
if (!directText && children.length === 0) {
|
||||
const ariaLabel = node.getAttribute("aria-label");
|
||||
const title = node.getAttribute("title");
|
||||
if (ariaLabel) obj.ariaLabel = ariaLabel;
|
||||
else if (title) obj.ariaLabel = title;
|
||||
}
|
||||
}
|
||||
|
||||
// Attributes
|
||||
if (tag === "a" && node.href) obj.href = node.getAttribute("href");
|
||||
if (tag === "img") {
|
||||
@@ -113,11 +184,13 @@ _SNAPSHOT_JS = """
|
||||
}
|
||||
if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;
|
||||
|
||||
// Role / aria-label
|
||||
const role = node.getAttribute("role");
|
||||
if (role) obj.role = role;
|
||||
const ariaLabel = node.getAttribute("aria-label");
|
||||
if (ariaLabel) obj.ariaLabel = ariaLabel;
|
||||
// Role / aria-label for native interactive & semantic elements
|
||||
if (!implicitInteractive) {
|
||||
const role = node.getAttribute("role");
|
||||
if (role) obj.role = role;
|
||||
const ariaLabel = node.getAttribute("aria-label");
|
||||
if (ariaLabel) obj.ariaLabel = ariaLabel;
|
||||
}
|
||||
|
||||
// Children
|
||||
if (children.length === 1 && typeof children[0] === "string") {
|
||||
@@ -129,7 +202,6 @@ _SNAPSHOT_JS = """
|
||||
return obj;
|
||||
}
|
||||
|
||||
// Store refMap on window for later use by click/fill actions
|
||||
const result = walk(document.body);
|
||||
window.__cowRefMap = refMap;
|
||||
return { tree: result, refCount: refCounter };
|
||||
|
||||
@@ -44,6 +44,19 @@ class MemoryGetTool(BaseTool):
|
||||
"""
|
||||
super().__init__()
|
||||
self.memory_manager = memory_manager
|
||||
|
||||
from config import conf
|
||||
if conf().get("knowledge", True):
|
||||
self.description = (
|
||||
"Read specific content from memory or knowledge files. "
|
||||
"Use this to get full context from a memory file, knowledge page, or specific line range."
|
||||
)
|
||||
self.params = {**self.params}
|
||||
self.params["properties"] = {**self.params["properties"]}
|
||||
self.params["properties"]["path"] = {
|
||||
"type": "string",
|
||||
"description": "Relative path to the memory or knowledge file (e.g. 'MEMORY.md', 'memory/2026-01-01.md', 'knowledge/concepts/moe.md')"
|
||||
}
|
||||
|
||||
def execute(self, args: dict):
|
||||
"""
|
||||
@@ -68,11 +81,15 @@ class MemoryGetTool(BaseTool):
|
||||
workspace_dir = self.memory_manager.config.get_workspace()
|
||||
|
||||
# Auto-prepend memory/ if not present and not absolute path
|
||||
# Exception: MEMORY.md is in the root directory
|
||||
if not path.startswith('memory/') and not path.startswith('/') and path != 'MEMORY.md':
|
||||
# Exceptions: MEMORY.md in root, knowledge/ files at workspace root
|
||||
if not path.startswith('memory/') and not path.startswith('knowledge/') and not path.startswith('/') and path != 'MEMORY.md':
|
||||
path = f'memory/{path}'
|
||||
|
||||
file_path = workspace_dir / path
|
||||
file_path = (workspace_dir / path).resolve()
|
||||
workspace_resolved = workspace_dir.resolve()
|
||||
|
||||
if not str(file_path).startswith(str(workspace_resolved) + '/') and file_path != workspace_resolved:
|
||||
return ToolResult.fail(f"Error: Access denied: path outside workspace")
|
||||
|
||||
if not file_path.exists():
|
||||
return ToolResult.fail(f"Error: File not found: {path}")
|
||||
|
||||
@@ -48,6 +48,13 @@ class MemorySearchTool(BaseTool):
|
||||
super().__init__()
|
||||
self.memory_manager = memory_manager
|
||||
self.user_id = user_id
|
||||
|
||||
from config import conf
|
||||
if conf().get("knowledge", True):
|
||||
self.description = (
|
||||
"Search agent's long-term memory and knowledge base using semantic and keyword search. "
|
||||
"Use this to recall past conversations, preferences, and knowledge pages."
|
||||
)
|
||||
|
||||
def execute(self, args: dict):
|
||||
"""
|
||||
|
||||
@@ -1,22 +1,30 @@
|
||||
"""
|
||||
Vision tool - Analyze images using OpenAI-compatible Vision API.
|
||||
Vision tool - Analyze images using Vision API.
|
||||
Supports local files (auto base64-encoded) and HTTP URLs.
|
||||
Providers: OpenAI (preferred) > LinkAI (fallback).
|
||||
|
||||
Provider priority (default):
|
||||
1. Main model via bot.call_vision — zero extra cost
|
||||
2. Other models whose API key is configured — auto-discovered
|
||||
3. OpenAI / LinkAI raw HTTP — reliable fallback
|
||||
When use_linkai=true, LinkAI is promoted to #1.
|
||||
When tool.vision.model is set, that model is used exclusively first.
|
||||
"""
|
||||
|
||||
import base64
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
from typing import Any, Dict, Optional, Tuple
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import requests
|
||||
|
||||
from agent.tools.base_tool import BaseTool, ToolResult
|
||||
from common import const
|
||||
from common.log import logger
|
||||
from config import conf
|
||||
|
||||
DEFAULT_MODEL = "gpt-4.1-mini"
|
||||
DEFAULT_MODEL = const.GPT_41_MINI
|
||||
DEFAULT_TIMEOUT = 60
|
||||
MAX_TOKENS = 1000
|
||||
COMPRESS_THRESHOLD = 1_048_576 # 1 MB
|
||||
@@ -29,15 +37,46 @@ SUPPORTED_EXTENSIONS = {
|
||||
"webp": "image/webp",
|
||||
}
|
||||
|
||||
_MAIN_MODEL_PROVIDER_NAME = "MainModel"
|
||||
|
||||
# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
|
||||
# Auto-discovered as fallback vision providers when their API key is configured.
|
||||
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
|
||||
_DISCOVERABLE_MODELS = [
|
||||
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_5, "Moonshot"),
|
||||
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
|
||||
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
|
||||
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
|
||||
("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
|
||||
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
|
||||
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class VisionProvider:
|
||||
"""A single Vision API provider configuration."""
|
||||
name: str
|
||||
api_key: str
|
||||
api_base: str
|
||||
extra_headers: dict = field(default_factory=dict)
|
||||
model_override: Optional[str] = None
|
||||
use_bot: bool = False # When True, call via bot.call_vision instead of raw HTTP
|
||||
fallback_bot: Any = None # Bot instance for non-main-model providers
|
||||
|
||||
|
||||
class VisionAPIError(Exception):
|
||||
"""Raised when a Vision API call fails and should trigger fallback."""
|
||||
pass
|
||||
|
||||
|
||||
class Vision(BaseTool):
|
||||
"""Analyze images using OpenAI-compatible Vision API"""
|
||||
"""Analyze images using Vision API"""
|
||||
|
||||
name: str = "vision"
|
||||
description: str = (
|
||||
"Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
|
||||
"Can describe content, extract text, identify objects, colors, etc. "
|
||||
"Requires OPENAI_API_KEY or LINKAI_API_KEY."
|
||||
)
|
||||
|
||||
params: dict = {
|
||||
@@ -51,13 +90,6 @@ class Vision(BaseTool):
|
||||
"type": "string",
|
||||
"description": "Question to ask about the image",
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
f"Vision model to use (default: {DEFAULT_MODEL}). "
|
||||
"Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o"
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["image", "question"],
|
||||
}
|
||||
@@ -67,29 +99,26 @@ class Vision(BaseTool):
|
||||
|
||||
@staticmethod
|
||||
def is_available() -> bool:
|
||||
return bool(
|
||||
conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
|
||||
or conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
|
||||
)
|
||||
return True
|
||||
|
||||
def execute(self, args: Dict[str, Any]) -> ToolResult:
|
||||
image = args.get("image", "").strip()
|
||||
question = args.get("question", "").strip()
|
||||
model = args.get("model", DEFAULT_MODEL).strip() or DEFAULT_MODEL
|
||||
|
||||
if not image:
|
||||
return ToolResult.fail("Error: 'image' parameter is required")
|
||||
if not question:
|
||||
return ToolResult.fail("Error: 'question' parameter is required")
|
||||
|
||||
api_key, api_base, extra_headers = self._resolve_provider()
|
||||
if not api_key:
|
||||
providers = self._resolve_providers()
|
||||
if not providers:
|
||||
return ToolResult.fail(
|
||||
"Error: No API key configured for Vision.\n"
|
||||
"Please configure one of the following using env_config tool:\n"
|
||||
" 1. OPENAI_API_KEY (preferred): env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
|
||||
" 2. LINKAI_API_KEY (fallback): env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")\n\n"
|
||||
"Get your key at: https://platform.openai.com/api-keys or https://link-ai.tech"
|
||||
"Error: No model available for Vision.\n"
|
||||
"The main model does not support vision and no other API keys are configured.\n"
|
||||
"Options:\n"
|
||||
" 1. Switch to a multimodal model (e.g. qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
|
||||
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
|
||||
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
|
||||
)
|
||||
|
||||
try:
|
||||
@@ -97,36 +126,221 @@ class Vision(BaseTool):
|
||||
except Exception as e:
|
||||
return ToolResult.fail(f"Error: {e}")
|
||||
|
||||
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
|
||||
|
||||
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
|
||||
question: str, image_content: dict) -> ToolResult:
|
||||
"""Try each provider in order; fall back to the next one on failure."""
|
||||
errors: List[str] = []
|
||||
for i, provider in enumerate(providers):
|
||||
use_model = provider.model_override or model
|
||||
try:
|
||||
logger.info(f"[Vision] Trying provider '{provider.name}' "
|
||||
f"with model '{use_model}' ({i + 1}/{len(providers)})")
|
||||
if provider.use_bot:
|
||||
result = self._call_via_bot(use_model, question, image_content, provider)
|
||||
else:
|
||||
result = self._call_api(provider, use_model, question, image_content)
|
||||
logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
|
||||
return result
|
||||
except VisionAPIError as e:
|
||||
errors.append(f"[{provider.name}/{use_model}] {e}")
|
||||
logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
|
||||
except requests.Timeout:
|
||||
errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
|
||||
logger.warning(f"[Vision] Provider '{provider.name}' timed out")
|
||||
except requests.ConnectionError:
|
||||
errors.append(f"[{provider.name}/{use_model}] Connection failed")
|
||||
logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
|
||||
except Exception as e:
|
||||
errors.append(f"[{provider.name}/{use_model}] {e}")
|
||||
logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
|
||||
|
||||
return ToolResult.fail(
|
||||
"Error: All Vision API providers failed.\n" + "\n".join(f" - {err}" for err in errors)
|
||||
)
|
||||
|
||||
def _resolve_providers(self) -> List[VisionProvider]:
|
||||
"""
|
||||
Build an ordered list of available providers.
|
||||
|
||||
Priority:
|
||||
- use_linkai=true → [LinkAI, MainModel, OtherModels…, OpenAI]
|
||||
- default → [MainModel, OtherModels…, OpenAI, LinkAI]
|
||||
|
||||
"OtherModels" are auto-discovered from configured API keys.
|
||||
The main model's bot_type is excluded from OtherModels to avoid
|
||||
duplicating the MainModel provider.
|
||||
"""
|
||||
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
|
||||
providers: List[VisionProvider] = []
|
||||
|
||||
if use_linkai:
|
||||
self._append_provider(providers, self._build_linkai_provider)
|
||||
self._append_provider(providers, self._build_main_model_provider)
|
||||
self._append_other_model_providers(providers)
|
||||
self._append_provider(providers, self._build_openai_provider)
|
||||
else:
|
||||
self._append_provider(providers, self._build_main_model_provider)
|
||||
self._append_other_model_providers(providers)
|
||||
self._append_provider(providers, self._build_openai_provider)
|
||||
self._append_provider(providers, self._build_linkai_provider)
|
||||
|
||||
return providers
|
||||
|
||||
@staticmethod
|
||||
def _append_provider(providers: List[VisionProvider], builder) -> None:
|
||||
p = builder()
|
||||
if p:
|
||||
providers.append(p)
|
||||
|
||||
def _append_other_model_providers(self, providers: List[VisionProvider]) -> None:
|
||||
"""
|
||||
Auto-discover other models whose API key is configured.
|
||||
Skip the main model's own bot_type (already covered by MainModel provider).
|
||||
Skip bot_types that already have a provider in the list (e.g. OpenAI).
|
||||
"""
|
||||
# Determine main model's bot_type so we can skip it
|
||||
main_bot_type = None
|
||||
if self.model and hasattr(self.model, '_resolve_bot_type'):
|
||||
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
|
||||
|
||||
existing_names = {p.name for p in providers}
|
||||
|
||||
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
|
||||
if display_name in existing_names:
|
||||
continue
|
||||
if bot_type == main_bot_type:
|
||||
continue
|
||||
api_key = conf().get(config_key, "")
|
||||
if not api_key or not api_key.strip():
|
||||
continue
|
||||
|
||||
# Create a bot instance and check if it supports call_vision
|
||||
try:
|
||||
from models.bot_factory import create_bot
|
||||
bot = create_bot(bot_type)
|
||||
if not hasattr(bot, 'call_vision'):
|
||||
continue
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
providers.append(VisionProvider(
|
||||
name=display_name,
|
||||
api_key="",
|
||||
api_base="",
|
||||
model_override=default_model,
|
||||
use_bot=True,
|
||||
fallback_bot=bot,
|
||||
))
|
||||
|
||||
def _resolve_vision_model(self) -> Optional[str]:
|
||||
"""
|
||||
Determine which model to use for vision.
|
||||
|
||||
1. User explicit config: tool.vision.model in config.json
|
||||
2. Fallback to the main configured model name
|
||||
"""
|
||||
tool_conf = conf().get("tool", {})
|
||||
user_vision_model = tool_conf.get("vision", {}).get("model") if isinstance(tool_conf, dict) else None
|
||||
if user_vision_model:
|
||||
return user_vision_model
|
||||
model_name = conf().get("model", "")
|
||||
return model_name or None
|
||||
|
||||
def _build_main_model_provider(self) -> Optional[VisionProvider]:
|
||||
"""
|
||||
Use the vendor's own model for vision via bot.call_vision.
|
||||
Only available when the bot class has call_vision.
|
||||
"""
|
||||
if not (self.model and hasattr(self.model, 'bot')):
|
||||
return None
|
||||
try:
|
||||
return self._call_api(api_key, api_base, model, question, image_content, extra_headers)
|
||||
except requests.Timeout:
|
||||
return ToolResult.fail(f"Error: Vision API request timed out after {DEFAULT_TIMEOUT}s")
|
||||
except requests.ConnectionError:
|
||||
return ToolResult.fail("Error: Failed to connect to Vision API")
|
||||
except Exception as e:
|
||||
logger.error(f"[Vision] Unexpected error: {e}", exc_info=True)
|
||||
return ToolResult.fail(f"Error: Vision API call failed - {e}")
|
||||
bot = self.model.bot
|
||||
if not hasattr(bot, 'call_vision'):
|
||||
return None
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
def _resolve_provider(self) -> Tuple[Optional[str], str, dict]:
|
||||
"""Resolve API key, base URL and extra headers. Priority: conf() > env vars."""
|
||||
vision_model = self._resolve_vision_model()
|
||||
|
||||
return VisionProvider(
|
||||
name=_MAIN_MODEL_PROVIDER_NAME,
|
||||
api_key="",
|
||||
api_base="",
|
||||
model_override=vision_model,
|
||||
use_bot=True,
|
||||
)
|
||||
|
||||
def _build_openai_provider(self) -> Optional[VisionProvider]:
|
||||
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
|
||||
if api_key:
|
||||
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.openai.com/v1"
|
||||
return api_key, self._ensure_v1(api_base), {}
|
||||
if not api_key:
|
||||
return None
|
||||
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.openai.com/v1"
|
||||
return VisionProvider(name="OpenAI", api_key=api_key, api_base=self._ensure_v1(api_base))
|
||||
|
||||
def _build_linkai_provider(self) -> Optional[VisionProvider]:
|
||||
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
|
||||
if api_key:
|
||||
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.link-ai.tech"
|
||||
logger.debug("[Vision] Using LinkAI API (OPENAI_API_KEY not set)")
|
||||
from common.utils import get_cloud_headers
|
||||
extra = get_cloud_headers(api_key)
|
||||
extra.pop("Authorization", None)
|
||||
extra.pop("Content-Type", None)
|
||||
return api_key, self._ensure_v1(api_base), extra
|
||||
if not api_key:
|
||||
return None
|
||||
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.link-ai.tech"
|
||||
from common.utils import get_cloud_headers
|
||||
extra = get_cloud_headers(api_key)
|
||||
extra.pop("Authorization", None)
|
||||
extra.pop("Content-Type", None)
|
||||
return VisionProvider(name="LinkAI", api_key=api_key, api_base=self._ensure_v1(api_base),
|
||||
extra_headers=extra)
|
||||
|
||||
return None, "", {}
|
||||
def _call_via_bot(self, model: str, question: str, image_content: dict,
|
||||
provider: Optional[VisionProvider] = None) -> ToolResult:
|
||||
"""
|
||||
Call a model's call_vision with vendor-native API format.
|
||||
Uses the provider's _fallback_bot if set, otherwise the main model bot.
|
||||
Raises VisionAPIError on failure so fallback can proceed.
|
||||
"""
|
||||
try:
|
||||
bot = (provider and provider.fallback_bot) or self.model.bot
|
||||
except Exception as e:
|
||||
raise VisionAPIError(f"Cannot access bot: {e}")
|
||||
|
||||
# Extract the raw image URL from the OpenAI-format image_content block
|
||||
image_url = image_content.get("image_url", {}).get("url", "")
|
||||
if not image_url:
|
||||
raise VisionAPIError("No image URL in content block")
|
||||
|
||||
try:
|
||||
response = bot.call_vision(
|
||||
image_url=image_url,
|
||||
question=question,
|
||||
model=model,
|
||||
max_tokens=MAX_TOKENS,
|
||||
)
|
||||
except Exception as e:
|
||||
raise VisionAPIError(f"call_vision failed: {e}")
|
||||
|
||||
if response is NotImplemented:
|
||||
raise VisionAPIError("Bot does not support vision")
|
||||
|
||||
if isinstance(response, dict) and response.get("error"):
|
||||
raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
|
||||
|
||||
content = response.get("content", "") if isinstance(response, dict) else ""
|
||||
if not content:
|
||||
raise VisionAPIError("Empty response from main model")
|
||||
|
||||
usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
|
||||
|
||||
# Use the actual model name from the bot response if available
|
||||
actual_model = response.get("model", model) if isinstance(response, dict) else model
|
||||
provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
|
||||
return ToolResult.success({
|
||||
"model": actual_model,
|
||||
"provider": provider_name,
|
||||
"content": content,
|
||||
"usage": usage_info,
|
||||
})
|
||||
|
||||
@staticmethod
|
||||
def _ensure_v1(api_base: str) -> str:
|
||||
@@ -139,9 +353,13 @@ class Vision(BaseTool):
|
||||
return api_base.rstrip("/") + "/v1"
|
||||
|
||||
def _build_image_content(self, image: str) -> dict:
|
||||
"""Build the image_url content block for the API request."""
|
||||
"""
|
||||
Build the image_url content block.
|
||||
Both remote URLs and local files are converted to base64 data URLs
|
||||
so every bot backend can consume them without extra downloads.
|
||||
"""
|
||||
if image.startswith(("http://", "https://")):
|
||||
return {"type": "image_url", "image_url": {"url": image}}
|
||||
return self._download_to_data_url(image)
|
||||
|
||||
if not os.path.isfile(image):
|
||||
raise FileNotFoundError(f"Image file not found: {image}")
|
||||
@@ -165,6 +383,19 @@ class Vision(BaseTool):
|
||||
data_url = f"data:{mime_type};base64,{b64}"
|
||||
return {"type": "image_url", "image_url": {"url": data_url}}
|
||||
|
||||
@staticmethod
|
||||
def _download_to_data_url(url: str) -> dict:
|
||||
"""Download a remote image and return it as a base64 data URL."""
|
||||
resp = requests.get(url, timeout=30)
|
||||
if resp.status_code != 200:
|
||||
raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
|
||||
content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
|
||||
if not content_type.startswith("image/"):
|
||||
content_type = "image/jpeg"
|
||||
b64 = base64.b64encode(resp.content).decode("ascii")
|
||||
data_url = f"data:{content_type};base64,{b64}"
|
||||
return {"type": "image_url", "image_url": {"url": data_url}}
|
||||
|
||||
@staticmethod
|
||||
def _maybe_compress(path: str) -> str:
|
||||
"""Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
|
||||
@@ -220,8 +451,13 @@ class Vision(BaseTool):
|
||||
os.remove(tmp.name)
|
||||
return path
|
||||
|
||||
def _call_api(self, api_key: str, api_base: str, model: str,
|
||||
question: str, image_content: dict, extra_headers: dict = None) -> ToolResult:
|
||||
def _call_api(self, provider: VisionProvider, model: str,
|
||||
question: str, image_content: dict) -> ToolResult:
|
||||
"""
|
||||
Call a single provider's Vision API.
|
||||
Raises VisionAPIError on recoverable failures so the caller can try
|
||||
the next provider.
|
||||
"""
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": [
|
||||
@@ -233,34 +469,29 @@ class Vision(BaseTool):
|
||||
],
|
||||
}
|
||||
],
|
||||
"max_tokens": MAX_TOKENS,
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Authorization": f"Bearer {provider.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
**(extra_headers or {}),
|
||||
**provider.extra_headers,
|
||||
}
|
||||
|
||||
resp = requests.post(
|
||||
f"{api_base}/chat/completions",
|
||||
f"{provider.api_base}/chat/completions",
|
||||
headers=headers,
|
||||
json=payload,
|
||||
timeout=DEFAULT_TIMEOUT,
|
||||
)
|
||||
|
||||
if resp.status_code == 401:
|
||||
return ToolResult.fail("Error: Invalid API key. Please check your configuration.")
|
||||
if resp.status_code == 429:
|
||||
return ToolResult.fail("Error: API rate limit reached. Please try again later.")
|
||||
if resp.status_code != 200:
|
||||
return ToolResult.fail(f"Error: Vision API returned HTTP {resp.status_code}: {resp.text[:200]}")
|
||||
raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")
|
||||
|
||||
data = resp.json()
|
||||
|
||||
if "error" in data:
|
||||
msg = data["error"].get("message", "Unknown API error")
|
||||
return ToolResult.fail(f"Error: Vision API error - {msg}")
|
||||
raise VisionAPIError(f"API error - {msg}")
|
||||
|
||||
content = ""
|
||||
choices = data.get("choices", [])
|
||||
@@ -270,6 +501,7 @@ class Vision(BaseTool):
|
||||
usage = data.get("usage", {})
|
||||
result = {
|
||||
"model": model,
|
||||
"provider": provider.name,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
|
||||
@@ -67,7 +67,7 @@ class AgentLLMModel(LLMModel):
|
||||
|
||||
_MODEL_BOT_TYPE_MAP = {
|
||||
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
|
||||
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
|
||||
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
|
||||
const.MODELSCOPE: const.MODELSCOPE,
|
||||
}
|
||||
_MODEL_PREFIX_MAP = [
|
||||
@@ -124,14 +124,15 @@ class AgentLLMModel(LLMModel):
|
||||
|
||||
@property
|
||||
def bot(self):
|
||||
"""Lazy load the bot, re-create when model changes"""
|
||||
"""Lazy load the bot, re-create when model or bot_type changes"""
|
||||
from models.bot_factory import create_bot
|
||||
cur_model = self.model
|
||||
if self._bot is None or self._bot_model != cur_model:
|
||||
bot_type = self._resolve_bot_type(cur_model)
|
||||
self._bot = create_bot(bot_type)
|
||||
cur_bot_type = self._resolve_bot_type(cur_model)
|
||||
if self._bot is None or self._bot_model != cur_model or getattr(self, '_bot_type', None) != cur_bot_type:
|
||||
self._bot = create_bot(cur_bot_type)
|
||||
self._bot = add_openai_compatible_support(self._bot)
|
||||
self._bot_model = cur_model
|
||||
self._bot_type = cur_bot_type
|
||||
return self._bot
|
||||
|
||||
def call(self, request: LLMRequest):
|
||||
@@ -498,22 +499,26 @@ class AgentBridge:
|
||||
reply.text_content = text_response
|
||||
return reply
|
||||
|
||||
# For other unknown file types, return text with file info
|
||||
message = text_response or file_info.get("message", "文件已准备")
|
||||
message += f"\n\n[文件: {file_info.get('file_name', file_path)}]"
|
||||
return Reply(ReplyType.TEXT, message)
|
||||
# For all other file types (tar.gz, zip, etc.), also use FILE type
|
||||
file_url = f"file://{file_path}"
|
||||
logger.info(f"[AgentBridge] Sending generic file: {file_url}")
|
||||
reply = Reply(ReplyType.FILE, file_url)
|
||||
reply.file_name = file_info.get("file_name", os.path.basename(file_path))
|
||||
if text_response:
|
||||
reply.text_content = text_response
|
||||
return reply
|
||||
|
||||
def _migrate_config_to_env(self, workspace_root: str):
|
||||
"""
|
||||
Migrate API keys from config.json to .env file if not already set
|
||||
|
||||
Sync API keys from config.json to .env file.
|
||||
Adds new keys and updates changed values on each startup.
|
||||
|
||||
Args:
|
||||
workspace_root: Workspace directory path (not used, kept for compatibility)
|
||||
"""
|
||||
from config import conf
|
||||
import os
|
||||
|
||||
# Mapping from config.json keys to environment variable names
|
||||
key_mapping = {
|
||||
"open_ai_api_key": "OPENAI_API_KEY",
|
||||
"open_ai_api_base": "OPENAI_API_BASE",
|
||||
@@ -522,10 +527,9 @@ class AgentBridge:
|
||||
"linkai_api_key": "LINKAI_API_KEY",
|
||||
}
|
||||
|
||||
# Use fixed secure location for .env file
|
||||
env_file = expand_path("~/.cow/.env")
|
||||
|
||||
# Read existing env vars from .env file
|
||||
# Read existing env vars (key -> value)
|
||||
existing_env_vars = {}
|
||||
if os.path.exists(env_file):
|
||||
try:
|
||||
@@ -533,48 +537,46 @@ class AgentBridge:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('#') and '=' in line:
|
||||
key, _ = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = True
|
||||
key, val = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = val.strip()
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentBridge] Failed to read .env file: {e}")
|
||||
|
||||
# Check which keys need to be migrated
|
||||
keys_to_migrate = {}
|
||||
# Sync config.json values into .env (add/update/remove)
|
||||
updated = False
|
||||
for config_key, env_key in key_mapping.items():
|
||||
# Skip if already in .env file
|
||||
if env_key in existing_env_vars:
|
||||
continue
|
||||
|
||||
# Get value from config.json
|
||||
value = conf().get(config_key, "")
|
||||
if value and value.strip(): # Only migrate non-empty values
|
||||
keys_to_migrate[env_key] = value.strip()
|
||||
|
||||
# Log summary if there are keys to skip
|
||||
if existing_env_vars:
|
||||
logger.debug(f"[AgentBridge] {len(existing_env_vars)} env vars already in .env")
|
||||
|
||||
# Write new keys to .env file
|
||||
if keys_to_migrate:
|
||||
raw = conf().get(config_key, "")
|
||||
value = raw.strip() if raw else ""
|
||||
old_value = existing_env_vars.get(env_key)
|
||||
|
||||
if value:
|
||||
if old_value == value:
|
||||
continue
|
||||
existing_env_vars[env_key] = value
|
||||
os.environ[env_key] = value
|
||||
updated = True
|
||||
else:
|
||||
if old_value is None:
|
||||
continue
|
||||
existing_env_vars.pop(env_key, None)
|
||||
os.environ.pop(env_key, None)
|
||||
updated = True
|
||||
updated = True
|
||||
|
||||
if updated:
|
||||
try:
|
||||
# Ensure ~/.cow directory and .env file exist
|
||||
env_dir = os.path.dirname(env_file)
|
||||
if not os.path.exists(env_dir):
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
if not os.path.exists(env_file):
|
||||
open(env_file, 'a').close()
|
||||
|
||||
# Append new keys
|
||||
with open(env_file, 'a', encoding='utf-8') as f:
|
||||
f.write('\n# Auto-migrated from config.json\n')
|
||||
for key, value in keys_to_migrate.items():
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
|
||||
with open(env_file, 'w', encoding='utf-8') as f:
|
||||
f.write('# Environment variables for agent\n')
|
||||
f.write('# Auto-managed - synced from config.json on startup\n\n')
|
||||
for key, value in sorted(existing_env_vars.items()):
|
||||
f.write(f'{key}={value}\n')
|
||||
# Also set in current process
|
||||
os.environ[key] = value
|
||||
|
||||
logger.info(f"[AgentBridge] Migrated {len(keys_to_migrate)} API keys from config.json to .env: {list(keys_to_migrate.keys())}")
|
||||
|
||||
logger.info(f"[AgentBridge] Synced API keys from config.json to .env")
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentBridge] Failed to migrate API keys: {e}")
|
||||
logger.warning(f"[AgentBridge] Failed to sync API keys: {e}")
|
||||
|
||||
def _persist_messages(
|
||||
self, session_id: str, new_messages: list, channel_type: str = ""
|
||||
|
||||
@@ -26,8 +26,7 @@ class AgentEventHandler:
|
||||
if context:
|
||||
self.channel = context.kwargs.get("channel") if hasattr(context, "kwargs") else None
|
||||
|
||||
# Track current thinking for channel output
|
||||
self.current_thinking = ""
|
||||
self.current_content = ""
|
||||
self.turn_number = 0
|
||||
|
||||
def handle_event(self, event):
|
||||
@@ -47,6 +46,8 @@ class AgentEventHandler:
|
||||
self._handle_message_update(data)
|
||||
elif event_type == "message_end":
|
||||
self._handle_message_end(data)
|
||||
elif event_type == "reasoning_update":
|
||||
pass
|
||||
elif event_type == "tool_execution_start":
|
||||
self._handle_tool_execution_start(data)
|
||||
elif event_type == "tool_execution_end":
|
||||
@@ -59,30 +60,26 @@ class AgentEventHandler:
|
||||
def _handle_turn_start(self, data):
|
||||
"""Handle turn start event"""
|
||||
self.turn_number = data.get("turn", 0)
|
||||
self.has_tool_calls_in_turn = False
|
||||
self.current_thinking = ""
|
||||
self.current_content = ""
|
||||
|
||||
def _handle_message_update(self, data):
|
||||
"""Handle message update event (streaming text)"""
|
||||
"""Handle message update event (streaming content text)"""
|
||||
delta = data.get("delta", "")
|
||||
self.current_thinking += delta
|
||||
self.current_content += delta
|
||||
|
||||
def _handle_message_end(self, data):
|
||||
"""Handle message end event"""
|
||||
tool_calls = data.get("tool_calls", [])
|
||||
|
||||
# Only send thinking process if followed by tool calls
|
||||
if tool_calls:
|
||||
if self.current_thinking.strip():
|
||||
logger.info(f"💭 {self.current_thinking.strip()[:200]}{'...' if len(self.current_thinking) > 200 else ''}")
|
||||
# Send thinking process to channel
|
||||
self._send_to_channel(f"{self.current_thinking.strip()}")
|
||||
if self.current_content.strip():
|
||||
logger.info(f"💭 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
|
||||
self._send_to_channel(self.current_content.strip())
|
||||
else:
|
||||
# No tool calls = final response (logged at agent_stream level)
|
||||
if self.current_thinking.strip():
|
||||
logger.debug(f"💬 {self.current_thinking.strip()[:200]}{'...' if len(self.current_thinking) > 200 else ''}")
|
||||
if self.current_content.strip():
|
||||
logger.debug(f"💬 {self.current_content.strip()[:200]}{'...' if len(self.current_content) > 200 else ''}")
|
||||
|
||||
self.current_thinking = ""
|
||||
self.current_content = ""
|
||||
|
||||
def _handle_tool_execution_start(self, data):
|
||||
"""Handle tool execution start event - logged by agent_stream.py"""
|
||||
|
||||
@@ -465,8 +465,12 @@ class AgentInitializer:
|
||||
'timezone': timezone_name
|
||||
}
|
||||
|
||||
def get_model():
|
||||
"""Get current model name dynamically from config"""
|
||||
return conf().get("model", "unknown")
|
||||
|
||||
return {
|
||||
"model": conf().get("model", "unknown"),
|
||||
"_get_model": get_model,
|
||||
"workspace": workspace_root,
|
||||
"channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
|
||||
"_get_current_time": get_current_time # Dynamic time function
|
||||
@@ -486,7 +490,7 @@ class AgentInitializer:
|
||||
|
||||
env_file = expand_path("~/.cow/.env")
|
||||
|
||||
# Read existing env vars
|
||||
# Read existing env vars (key -> value)
|
||||
existing_env_vars = {}
|
||||
if os.path.exists(env_file):
|
||||
try:
|
||||
@@ -494,38 +498,46 @@ class AgentInitializer:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('#') and '=' in line:
|
||||
key, _ = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = True
|
||||
key, val = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = val.strip()
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
|
||||
|
||||
# Check which keys need migration
|
||||
keys_to_migrate = {}
|
||||
# Sync config.json values into .env (add/update/remove)
|
||||
updated = False
|
||||
for config_key, env_key in key_mapping.items():
|
||||
if env_key in existing_env_vars:
|
||||
continue
|
||||
value = conf().get(config_key, "")
|
||||
if value and value.strip():
|
||||
keys_to_migrate[env_key] = value.strip()
|
||||
|
||||
# Write new keys
|
||||
if keys_to_migrate:
|
||||
raw = conf().get(config_key, "")
|
||||
value = raw.strip() if raw else ""
|
||||
old_value = existing_env_vars.get(env_key)
|
||||
|
||||
if value:
|
||||
if old_value == value:
|
||||
continue
|
||||
existing_env_vars[env_key] = value
|
||||
os.environ[env_key] = value
|
||||
updated = True
|
||||
else:
|
||||
if old_value is None:
|
||||
continue
|
||||
existing_env_vars.pop(env_key, None)
|
||||
os.environ.pop(env_key, None)
|
||||
updated = True
|
||||
|
||||
if updated:
|
||||
try:
|
||||
env_dir = os.path.dirname(env_file)
|
||||
if not os.path.exists(env_dir):
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
if not os.path.exists(env_file):
|
||||
open(env_file, 'a').close()
|
||||
|
||||
with open(env_file, 'a', encoding='utf-8') as f:
|
||||
f.write('\n# Auto-migrated from config.json\n')
|
||||
for key, value in keys_to_migrate.items():
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
|
||||
# Rewrite the entire .env file to ensure consistency
|
||||
with open(env_file, 'w', encoding='utf-8') as f:
|
||||
f.write('# Environment variables for agent\n')
|
||||
f.write('# Auto-managed - synced from config.json on startup\n\n')
|
||||
for key, value in sorted(existing_env_vars.items()):
|
||||
f.write(f'{key}={value}\n')
|
||||
os.environ[key] = value
|
||||
|
||||
logger.info(f"[AgentInitializer] Migrated {len(keys_to_migrate)} API keys to .env: {list(keys_to_migrate.keys())}")
|
||||
|
||||
logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentInitializer] Failed to migrate API keys: {e}")
|
||||
logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")
|
||||
|
||||
def _start_daily_flush_timer(self):
|
||||
"""Start a background thread that flushes all agents' memory daily at 23:55."""
|
||||
|
||||
@@ -39,11 +39,8 @@ class Bridge(object):
|
||||
self.btype["chat"] = const.BAIDU
|
||||
if model_type in ["xunfei"]:
|
||||
self.btype["chat"] = const.XUNFEI
|
||||
if model_type in [const.QWEN]:
|
||||
self.btype["chat"] = const.QWEN
|
||||
if model_type in [const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
|
||||
if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
|
||||
self.btype["chat"] = const.QWEN_DASHSCOPE
|
||||
# Support Qwen3 and other DashScope models
|
||||
if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
|
||||
self.btype["chat"] = const.QWEN_DASHSCOPE
|
||||
if model_type and model_type.startswith("gemini"):
|
||||
|
||||
@@ -347,38 +347,30 @@ class ChatChannel(Channel):
|
||||
if media_items:
|
||||
logger.info(f"[chat_channel] Extracted {len(media_items)} media item(s) from reply")
|
||||
|
||||
# 先发送文本(保持原文本不变)
|
||||
# Send text first (the frontend will embed video players via renderMarkdown).
|
||||
logger.info(f"[chat_channel] Sending text content before media: {reply.content[:100]}...")
|
||||
self._send(reply, context)
|
||||
logger.info(f"[chat_channel] Text sent, now sending {len(media_items)} media item(s)")
|
||||
|
||||
# 然后逐个发送媒体文件
|
||||
for i, (url, media_type) in enumerate(media_items):
|
||||
try:
|
||||
# 判断是本地文件还是URL
|
||||
# Determine whether it is a remote URL or a local file.
|
||||
if url.startswith(('http://', 'https://')):
|
||||
# 网络资源
|
||||
if media_type == 'video':
|
||||
# 视频使用 FILE 类型发送
|
||||
media_reply = Reply(ReplyType.FILE, url)
|
||||
media_reply.file_name = os.path.basename(url)
|
||||
else:
|
||||
# 图片使用 IMAGE_URL 类型
|
||||
media_reply = Reply(ReplyType.IMAGE_URL, url)
|
||||
elif os.path.exists(url):
|
||||
# 本地文件
|
||||
if media_type == 'video':
|
||||
# 视频使用 FILE 类型,转换为 file:// URL
|
||||
media_reply = Reply(ReplyType.FILE, f"file://{url}")
|
||||
media_reply.file_name = os.path.basename(url)
|
||||
else:
|
||||
# 图片使用 IMAGE_URL 类型,转换为 file:// URL
|
||||
media_reply = Reply(ReplyType.IMAGE_URL, f"file://{url}")
|
||||
else:
|
||||
logger.warning(f"[chat_channel] Media file not found or invalid URL: {url}")
|
||||
continue
|
||||
|
||||
# 发送媒体文件(添加小延迟避免频率限制)
|
||||
if i > 0:
|
||||
time.sleep(0.5)
|
||||
self._send(media_reply, context)
|
||||
|
||||
@@ -110,6 +110,11 @@
|
||||
<i class="fas fa-brain item-icon text-xs w-5 text-center"></i>
|
||||
<span data-i18n="menu_memory">Memory</span>
|
||||
</a>
|
||||
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
|
||||
data-view="knowledge">
|
||||
<i class="fas fa-book item-icon text-xs w-5 text-center"></i>
|
||||
<span data-i18n="menu_knowledge">Knowledge</span>
|
||||
</a>
|
||||
<a class="sidebar-item flex items-center gap-3 px-3 py-2 rounded-lg cursor-pointer transition-all duration-150 hover:bg-white/5 hover:text-neutral-200 text-[14px]"
|
||||
data-view="channels">
|
||||
<i class="fas fa-tower-broadcast item-icon text-xs w-5 text-center"></i>
|
||||
@@ -558,6 +563,106 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- ====================================================== -->
|
||||
<!-- VIEW: Knowledge -->
|
||||
<!-- ====================================================== -->
|
||||
<div id="view-knowledge" class="view">
|
||||
<div class="flex-1 overflow-y-auto p-4 md:p-8 lg:p-10">
|
||||
<div class="w-full max-w-[1600px] mx-auto">
|
||||
|
||||
<!-- Header -->
|
||||
<div class="flex flex-col sm:flex-row sm:items-center justify-between gap-3 mb-4 md:mb-6">
|
||||
<div>
|
||||
<h2 class="text-xl font-bold text-slate-800 dark:text-slate-100" data-i18n="knowledge_title">Knowledge</h2>
|
||||
<p class="text-sm text-slate-500 dark:text-slate-400 mt-1" data-i18n="knowledge_desc">Browse and explore your knowledge base</p>
|
||||
</div>
|
||||
<div class="flex items-center gap-2">
|
||||
<span id="knowledge-stats" class="text-xs text-slate-400 dark:text-slate-500 hidden sm:inline"></span>
|
||||
<div class="flex items-center bg-slate-100 dark:bg-white/10 rounded-lg p-0.5">
|
||||
<button id="knowledge-tab-docs" onclick="switchKnowledgeTab('docs')"
|
||||
class="knowledge-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150 active">
|
||||
<i class="fas fa-folder-tree mr-1.5"></i><span data-i18n="knowledge_tab_docs">Documents</span>
|
||||
</button>
|
||||
<button id="knowledge-tab-graph" onclick="switchKnowledgeTab('graph')"
|
||||
class="knowledge-tab px-3 py-1.5 rounded-md text-xs font-medium cursor-pointer transition-colors duration-150">
|
||||
<i class="fas fa-diagram-project mr-1.5"></i><span data-i18n="knowledge_tab_graph">Graph</span>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Empty state -->
|
||||
<div id="knowledge-empty" class="flex flex-col items-center justify-center py-20">
|
||||
<div class="w-16 h-16 rounded-2xl bg-emerald-50 dark:bg-emerald-900/20 flex items-center justify-center mb-4">
|
||||
<i class="fas fa-book text-emerald-400 text-xl"></i>
|
||||
</div>
|
||||
<p class="text-slate-500 dark:text-slate-400 font-medium" data-i18n="knowledge_loading">Loading knowledge base...</p>
|
||||
<p class="text-sm text-slate-400 dark:text-slate-500 mt-1" data-i18n="knowledge_loading_desc">Knowledge pages will be displayed here</p>
|
||||
<div id="knowledge-empty-guide" class="hidden mt-6 max-w-sm text-center">
|
||||
<p class="text-sm text-slate-500 dark:text-slate-400 mb-4" data-i18n="knowledge_empty_guide">Send documents, links or topics to the agent in chat, and it will automatically organize them into your knowledge base.</p>
|
||||
<button onclick="navigateTo('chat')"
|
||||
class="inline-flex items-center gap-2 px-4 py-2 rounded-lg bg-primary-500 hover:bg-primary-600
|
||||
text-white text-sm font-medium cursor-pointer transition-colors duration-150">
|
||||
<i class="fas fa-message text-xs"></i>
|
||||
<span data-i18n="knowledge_go_chat">Start a conversation</span>
|
||||
</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Documents panel -->
|
||||
<div id="knowledge-panel-docs" class="hidden">
|
||||
<div class="flex flex-col md:flex-row gap-4 md:gap-6" style="min-height: calc(100vh - 220px)">
|
||||
<!-- File tree -->
|
||||
<div id="knowledge-sidebar" class="w-full md:w-72 lg:w-80 flex-shrink-0">
|
||||
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
|
||||
<div class="px-4 py-3 border-b border-slate-200 dark:border-white/10">
|
||||
<div class="relative">
|
||||
<i class="fas fa-search absolute left-3 top-1/2 -translate-y-1/2 text-slate-400 text-xs"></i>
|
||||
<input id="knowledge-search" type="text" placeholder="Search..."
|
||||
class="w-full pl-8 pr-3 py-1.5 text-xs bg-slate-50 dark:bg-white/5 border border-slate-200 dark:border-white/10 rounded-lg text-slate-700 dark:text-slate-200 placeholder-slate-400 dark:placeholder-slate-500 focus:outline-none focus:ring-1 focus:ring-primary-400/50"
|
||||
oninput="filterKnowledgeTree(this.value)">
|
||||
</div>
|
||||
</div>
|
||||
<div id="knowledge-tree" class="p-2 overflow-y-auto max-h-[50vh] md:max-h-[calc(100vh-300px)]"></div>
|
||||
</div>
|
||||
</div>
|
||||
<!-- Content viewer -->
|
||||
<div class="flex-1 min-w-0">
|
||||
<div id="knowledge-content-placeholder"
|
||||
class="flex flex-col items-center justify-center py-20 text-slate-400 dark:text-slate-500"
|
||||
<i class="fas fa-file-lines text-3xl mb-3 opacity-40"></i>
|
||||
<p class="text-sm" data-i18n="knowledge_select_hint">Select a document to view</p>
|
||||
</div>
|
||||
<div id="knowledge-content-viewer" class="hidden">
|
||||
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
|
||||
<div class="flex items-center gap-3 px-4 md:px-5 py-3 border-b border-slate-200 dark:border-white/10">
|
||||
<button onclick="knowledgeMobileBack()" class="md:hidden p-1 -ml-1 text-slate-400 hover:text-slate-600 dark:hover:text-slate-300 cursor-pointer">
|
||||
<i class="fas fa-arrow-left text-xs"></i>
|
||||
</button>
|
||||
<i class="fas fa-file-lines text-slate-400 text-sm hidden md:inline"></i>
|
||||
<span id="knowledge-viewer-title" class="text-sm font-medium text-slate-700 dark:text-slate-200 truncate"></span>
|
||||
<span id="knowledge-viewer-path" class="text-xs text-slate-400 dark:text-slate-500 ml-auto font-mono truncate hidden md:inline"></span>
|
||||
</div>
|
||||
<div id="knowledge-viewer-body"
|
||||
class="p-4 md:p-5 overflow-y-auto text-sm msg-content text-slate-700 dark:text-slate-200"
|
||||
style="max-height: calc(100vh - 280px)"></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- Graph panel -->
|
||||
<div id="knowledge-panel-graph" class="hidden">
|
||||
<div class="bg-white dark:bg-[#1A1A1A] rounded-xl border border-slate-200 dark:border-white/10 overflow-hidden">
|
||||
<div id="knowledge-graph-container" class="w-full h-[60vh] md:h-[calc(100vh-220px)]"></div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- ====================================================== -->
|
||||
<!-- VIEW: Channels -->
|
||||
<!-- ====================================================== -->
|
||||
@@ -670,6 +775,7 @@
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script src="https://cdn.jsdelivr.net/npm/d3@7/dist/d3.min.js"></script>
|
||||
<script src="assets/js/console.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
@@ -45,7 +45,8 @@
|
||||
.msg-content h1 { font-size: 1.4em; }
|
||||
.msg-content h2 { font-size: 1.25em; }
|
||||
.msg-content h3 { font-size: 1.1em; }
|
||||
.msg-content ul, .msg-content ol { margin: 0.5em 0; padding-left: 1.8em; }
|
||||
.msg-content ul { margin: 0.5em 0; padding-left: 1.8em; list-style: disc; }
|
||||
.msg-content ol { margin: 0.5em 0; padding-left: 1.8em; list-style: decimal; }
|
||||
.msg-content li { margin: 0.25em 0; }
|
||||
.msg-content pre {
|
||||
border-radius: 8px; overflow-x: auto; margin: 0.8em 0;
|
||||
@@ -146,7 +147,7 @@
|
||||
font-size: 0.75rem;
|
||||
line-height: 1.5;
|
||||
color: #94a3b8;
|
||||
max-height: 200px;
|
||||
max-height: 300px;
|
||||
overflow-y: auto;
|
||||
}
|
||||
.dark .agent-thinking-step .thinking-full {
|
||||
@@ -158,6 +159,20 @@
|
||||
.agent-thinking-step .thinking-full p:first-child { margin-top: 0; }
|
||||
.agent-thinking-step .thinking-full p:last-child { margin-bottom: 0; }
|
||||
|
||||
/* Content step - real text output frozen before tool calls */
|
||||
.agent-content-step {
|
||||
font-size: 0.875rem;
|
||||
line-height: 1.6;
|
||||
color: inherit;
|
||||
margin-bottom: 0.5rem;
|
||||
padding-bottom: 0.5rem;
|
||||
border-bottom: 1px dashed rgba(0, 0, 0, 0.06);
|
||||
}
|
||||
.dark .agent-content-step { border-bottom-color: rgba(255, 255, 255, 0.06); }
|
||||
.agent-content-step .agent-content-body p { margin: 0.25em 0; }
|
||||
.agent-content-step .agent-content-body p:first-child { margin-top: 0; }
|
||||
.agent-content-step .agent-content-body p:last-child { margin-bottom: 0; }
|
||||
|
||||
/* Tool step - collapsible */
|
||||
.agent-tool-step .tool-header {
|
||||
display: flex;
|
||||
@@ -535,3 +550,142 @@
|
||||
.dark .slash-menu-item .desc {
|
||||
color: #64748b;
|
||||
}
|
||||
|
||||
/* ============================================================
|
||||
Knowledge View
|
||||
============================================================ */
|
||||
|
||||
/* Tab toggle */
|
||||
.knowledge-tab {
|
||||
color: #64748b;
|
||||
}
|
||||
.knowledge-tab.active {
|
||||
background: #fff;
|
||||
color: #334155;
|
||||
box-shadow: 0 1px 3px rgba(0,0,0,0.08);
|
||||
}
|
||||
.dark .knowledge-tab.active {
|
||||
background: rgba(255,255,255,0.1);
|
||||
color: #e2e8f0;
|
||||
}
|
||||
|
||||
/* File tree */
|
||||
.knowledge-tree-group {
|
||||
margin-bottom: 2px;
|
||||
}
|
||||
.knowledge-tree-group-btn {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
width: 100%;
|
||||
padding: 6px 8px;
|
||||
border-radius: 6px;
|
||||
font-size: 12px;
|
||||
font-weight: 600;
|
||||
color: #64748b;
|
||||
cursor: pointer;
|
||||
border: none;
|
||||
background: none;
|
||||
transition: background 0.15s, color 0.15s;
|
||||
text-transform: capitalize;
|
||||
}
|
||||
.knowledge-tree-group-btn:hover {
|
||||
background: rgba(0,0,0,0.04);
|
||||
color: #334155;
|
||||
}
|
||||
.dark .knowledge-tree-group-btn:hover {
|
||||
background: rgba(255,255,255,0.06);
|
||||
color: #e2e8f0;
|
||||
}
|
||||
.knowledge-tree-group-btn i.chevron {
|
||||
font-size: 8px;
|
||||
transition: transform 0.15s;
|
||||
}
|
||||
.knowledge-tree-group.open .chevron {
|
||||
transform: rotate(90deg);
|
||||
}
|
||||
.knowledge-tree-group-items {
|
||||
display: none;
|
||||
}
|
||||
.knowledge-tree-group.open .knowledge-tree-group-items {
|
||||
display: block;
|
||||
}
|
||||
|
||||
.knowledge-tree-file {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 6px;
|
||||
padding: 5px 8px 5px 24px;
|
||||
border-radius: 6px;
|
||||
font-size: 12px;
|
||||
color: #64748b;
|
||||
cursor: pointer;
|
||||
border: none;
|
||||
background: none;
|
||||
width: 100%;
|
||||
text-align: left;
|
||||
transition: background 0.15s, color 0.15s;
|
||||
white-space: nowrap;
|
||||
overflow: hidden;
|
||||
text-overflow: ellipsis;
|
||||
}
|
||||
.knowledge-tree-file:hover {
|
||||
background: rgba(0,0,0,0.04);
|
||||
color: #334155;
|
||||
}
|
||||
.knowledge-tree-file.active {
|
||||
background: #EDFDF3;
|
||||
color: #228547;
|
||||
}
|
||||
.dark .knowledge-tree-file:hover {
|
||||
background: rgba(255,255,255,0.06);
|
||||
color: #e2e8f0;
|
||||
}
|
||||
.dark .knowledge-tree-file.active {
|
||||
background: rgba(74, 190, 110, 0.1);
|
||||
color: #4ABE6E;
|
||||
}
|
||||
|
||||
/* Graph legend */
|
||||
.knowledge-graph-legend {
|
||||
position: absolute;
|
||||
top: 12px;
|
||||
right: 12px;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 8px;
|
||||
font-size: 11px;
|
||||
color: #64748b;
|
||||
z-index: 10;
|
||||
}
|
||||
.knowledge-graph-legend-item {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 4px;
|
||||
}
|
||||
.knowledge-graph-legend-dot {
|
||||
width: 8px;
|
||||
height: 8px;
|
||||
border-radius: 50%;
|
||||
}
|
||||
|
||||
/* Graph tooltip */
|
||||
.knowledge-graph-tooltip {
|
||||
position: absolute;
|
||||
padding: 6px 10px;
|
||||
background: #fff;
|
||||
border: 1px solid #e2e8f0;
|
||||
border-radius: 8px;
|
||||
font-size: 12px;
|
||||
color: #334155;
|
||||
box-shadow: 0 4px 12px rgba(0,0,0,0.08);
|
||||
pointer-events: none;
|
||||
opacity: 0;
|
||||
transition: opacity 0.15s;
|
||||
z-index: 20;
|
||||
}
|
||||
.dark .knowledge-graph-tooltip {
|
||||
background: #1A1A1A;
|
||||
border-color: rgba(255,255,255,0.1);
|
||||
color: #e2e8f0;
|
||||
}
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -126,6 +126,13 @@ class WebChannel(ChatChannel):
|
||||
logger.debug(f"SSE skipped duplicate file for request {request_id}")
|
||||
return
|
||||
|
||||
# Skip http-URL FILE/IMAGE_URL replies produced by chat_channel's media extraction:
|
||||
# the text reply (already sent as "done") contains the URL and the frontend will
|
||||
# render it via renderMarkdown/injectVideoPlayers, so no separate SSE event needed.
|
||||
if reply.type in (ReplyType.FILE, ReplyType.IMAGE_URL) and content.startswith(("http://", "https://")):
|
||||
logger.debug(f"SSE skipped http media reply for request {request_id}")
|
||||
return
|
||||
|
||||
self.sse_queues[request_id].put({
|
||||
"type": "done",
|
||||
"content": content,
|
||||
@@ -161,7 +168,12 @@ class WebChannel(ChatChannel):
|
||||
event_type = event.get("type")
|
||||
data = event.get("data", {})
|
||||
|
||||
if event_type == "message_update":
|
||||
if event_type == "reasoning_update":
|
||||
delta = data.get("delta", "")
|
||||
if delta:
|
||||
q.put({"type": "reasoning", "content": delta})
|
||||
|
||||
elif event_type == "message_update":
|
||||
delta = data.get("delta", "")
|
||||
if delta:
|
||||
q.put({"type": "delta", "content": delta})
|
||||
@@ -188,6 +200,11 @@ class WebChannel(ChatChannel):
|
||||
"execution_time": round(exec_time, 2)
|
||||
})
|
||||
|
||||
elif event_type == "message_end":
|
||||
tool_calls = data.get("tool_calls", [])
|
||||
if tool_calls:
|
||||
q.put({"type": "message_end", "has_tool_calls": True})
|
||||
|
||||
elif event_type == "file_to_send":
|
||||
file_path = data.get("path", "")
|
||||
file_name = data.get("file_name", os.path.basename(file_path))
|
||||
@@ -322,14 +339,18 @@ class WebChannel(ChatChannel):
|
||||
"""
|
||||
SSE generator for a given request_id.
|
||||
Yields UTF-8 encoded bytes to avoid WSGI Latin-1 mangling.
|
||||
Supports client reconnection: the queue is only removed after a
|
||||
"done" event is consumed, so a new GET /stream with the same
|
||||
request_id can resume reading remaining events.
|
||||
"""
|
||||
if request_id not in self.sse_queues:
|
||||
yield b"data: {\"type\": \"error\", \"message\": \"invalid request_id\"}\n\n"
|
||||
return
|
||||
|
||||
q = self.sse_queues[request_id]
|
||||
timeout = 300 # 5 minutes max
|
||||
deadline = time.time() + timeout
|
||||
idle_timeout = 600 # 10 minutes without any real event
|
||||
deadline = time.time() + idle_timeout
|
||||
done = False
|
||||
|
||||
try:
|
||||
while time.time() < deadline:
|
||||
@@ -339,13 +360,18 @@ class WebChannel(ChatChannel):
|
||||
yield b": keepalive\n\n"
|
||||
continue
|
||||
|
||||
# Real event received, reset idle deadline
|
||||
deadline = time.time() + idle_timeout
|
||||
|
||||
payload = json.dumps(item, ensure_ascii=False)
|
||||
yield f"data: {payload}\n\n".encode("utf-8")
|
||||
|
||||
if item.get("type") == "done":
|
||||
done = True
|
||||
break
|
||||
finally:
|
||||
self.sse_queues.pop(request_id, None)
|
||||
if done:
|
||||
self.sse_queues.pop(request_id, None)
|
||||
|
||||
def poll_response(self):
|
||||
"""
|
||||
@@ -428,6 +454,9 @@ class WebChannel(ChatChannel):
|
||||
'/api/skills', 'SkillsHandler',
|
||||
'/api/memory', 'MemoryHandler',
|
||||
'/api/memory/content', 'MemoryContentHandler',
|
||||
'/api/knowledge/list', 'KnowledgeListHandler',
|
||||
'/api/knowledge/read', 'KnowledgeReadHandler',
|
||||
'/api/knowledge/graph', 'KnowledgeGraphHandler',
|
||||
'/api/scheduler', 'SchedulerHandler',
|
||||
'/api/history', 'HistoryHandler',
|
||||
'/api/logs', 'LogsHandler',
|
||||
@@ -447,8 +476,14 @@ class WebChannel(ChatChannel):
|
||||
func = web.httpserver.StaticMiddleware(app.wsgifunc())
|
||||
func = web.httpserver.LogMiddleware(func)
|
||||
server = web.httpserver.WSGIServer(("0.0.0.0", port), func)
|
||||
# Allow concurrent requests by not blocking on in-flight handler threads
|
||||
server.daemon_threads = True
|
||||
# Default request_queue_size(5) / timeout(10s) / numthreads(10) are
|
||||
# too small: when SSE streams occupy many threads, the backlog fills
|
||||
# and new connections get refused (ERR_CONNECTION_ABORTED).
|
||||
server.request_queue_size = 128
|
||||
server.timeout = 300
|
||||
server.requests.min = 20
|
||||
server.requests.max = 80
|
||||
self._http_server = server
|
||||
try:
|
||||
server.start()
|
||||
@@ -563,7 +598,7 @@ class ConfigHandler:
|
||||
_RECOMMENDED_MODELS = [
|
||||
const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
|
||||
const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
|
||||
const.QWEN3_MAX, const.QWEN35_PLUS,
|
||||
const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
|
||||
const.KIMI_K2_5, const.KIMI_K2,
|
||||
const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
|
||||
const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
|
||||
@@ -592,7 +627,7 @@ class ConfigHandler:
|
||||
"api_key_field": "dashscope_api_key",
|
||||
"api_base_key": None,
|
||||
"api_base_default": None,
|
||||
"models": [const.QWEN3_MAX, const.QWEN35_PLUS],
|
||||
"models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
|
||||
}),
|
||||
("moonshot", {
|
||||
"label": "Kimi",
|
||||
@@ -1508,6 +1543,47 @@ class AssetsHandler:
|
||||
raise web.notfound()
|
||||
|
||||
|
||||
class KnowledgeListHandler:
|
||||
def GET(self):
|
||||
web.header('Content-Type', 'application/json; charset=utf-8')
|
||||
try:
|
||||
from agent.knowledge.service import KnowledgeService
|
||||
svc = KnowledgeService(_get_workspace_root())
|
||||
result = svc.list_tree()
|
||||
return json.dumps({"status": "success", **result}, ensure_ascii=False)
|
||||
except Exception as e:
|
||||
logger.error(f"[WebChannel] Knowledge list error: {e}")
|
||||
return json.dumps({"status": "error", "message": str(e)})
|
||||
|
||||
|
||||
class KnowledgeReadHandler:
|
||||
def GET(self):
|
||||
web.header('Content-Type', 'application/json; charset=utf-8')
|
||||
try:
|
||||
from agent.knowledge.service import KnowledgeService
|
||||
params = web.input(path='')
|
||||
svc = KnowledgeService(_get_workspace_root())
|
||||
result = svc.read_file(params.path)
|
||||
return json.dumps({"status": "success", **result}, ensure_ascii=False)
|
||||
except (ValueError, FileNotFoundError) as e:
|
||||
return json.dumps({"status": "error", "message": str(e)})
|
||||
except Exception as e:
|
||||
logger.error(f"[WebChannel] Knowledge read error: {e}")
|
||||
return json.dumps({"status": "error", "message": str(e)})
|
||||
|
||||
|
||||
class KnowledgeGraphHandler:
|
||||
def GET(self):
|
||||
web.header('Content-Type', 'application/json; charset=utf-8')
|
||||
try:
|
||||
from agent.knowledge.service import KnowledgeService
|
||||
svc = KnowledgeService(_get_workspace_root())
|
||||
return json.dumps(svc.build_graph(), ensure_ascii=False)
|
||||
except Exception as e:
|
||||
logger.error(f"[WebChannel] Knowledge graph error: {e}")
|
||||
return json.dumps({"nodes": [], "links": []})
|
||||
|
||||
|
||||
class VersionHandler:
|
||||
def GET(self):
|
||||
web.header('Content-Type', 'application/json; charset=utf-8')
|
||||
|
||||
@@ -37,11 +37,19 @@ def _random_wechat_uin() -> str:
|
||||
return base64.b64encode(str(val).encode("utf-8")).decode("utf-8")
|
||||
|
||||
|
||||
CHANNEL_VERSION = "2.0.0"
|
||||
# iLink-App-ClientVersion: uint32 encoded as major<<16 | minor<<8 | patch
|
||||
# 2.0.0 → 0x00020000 = 131072
|
||||
CLIENT_VERSION = "131072"
|
||||
|
||||
|
||||
def _build_headers(token: str = "") -> dict:
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"AuthorizationType": "ilink_bot_token",
|
||||
"X-WECHAT-UIN": _random_wechat_uin(),
|
||||
"iLink-App-Id": "bot",
|
||||
"iLink-App-ClientVersion": CLIENT_VERSION,
|
||||
}
|
||||
if token:
|
||||
headers["Authorization"] = f"Bearer {token}"
|
||||
@@ -64,6 +72,7 @@ class WeixinApi:
|
||||
def _post(self, endpoint: str, body: dict, timeout: int = DEFAULT_API_TIMEOUT) -> dict:
|
||||
url = _ensure_trailing_slash(self.base_url) + endpoint
|
||||
headers = _build_headers(self.token)
|
||||
body.setdefault("base_info", {}).setdefault("channel_version", CHANNEL_VERSION)
|
||||
try:
|
||||
resp = requests.post(url, json=body, headers=headers, timeout=timeout)
|
||||
resp.raise_for_status()
|
||||
@@ -210,7 +219,10 @@ class WeixinApi:
|
||||
def poll_qr_status(self, qrcode: str, timeout: int = QR_POLL_TIMEOUT) -> dict:
|
||||
url = (_ensure_trailing_slash(self.base_url) +
|
||||
f"ilink/bot/get_qrcode_status?qrcode={requests.utils.quote(qrcode)}")
|
||||
headers = {"iLink-App-ClientVersion": "1"}
|
||||
headers = {
|
||||
"iLink-App-Id": "bot",
|
||||
"iLink-App-ClientVersion": CLIENT_VERSION,
|
||||
}
|
||||
try:
|
||||
resp = requests.get(url, headers=headers, timeout=timeout)
|
||||
resp.raise_for_status()
|
||||
|
||||
@@ -166,10 +166,18 @@ class WeixinChannel(ChatChannel):
|
||||
print("=" * 60)
|
||||
try:
|
||||
import qrcode as qr_lib
|
||||
import io
|
||||
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
|
||||
qr.add_data(qrcode_url)
|
||||
qr.make(fit=True)
|
||||
qr.print_ascii(invert=True)
|
||||
buf = io.StringIO()
|
||||
qr.print_ascii(out=buf, invert=True)
|
||||
try:
|
||||
print(buf.getvalue())
|
||||
except UnicodeEncodeError:
|
||||
# Windows GBK terminals cannot render Unicode block characters
|
||||
print(f"\n (终端不支持显示二维码,请使用链接扫码)")
|
||||
print(f" 二维码链接: {qrcode_url}\n")
|
||||
except ImportError:
|
||||
print(f"\n 二维码链接: {qrcode_url}")
|
||||
print(" (安装 'qrcode' 包可在终端显示二维码)\n")
|
||||
|
||||
@@ -1 +1 @@
|
||||
2.0.5
|
||||
2.0.6
|
||||
|
||||
@@ -6,6 +6,7 @@ from cli.commands.skill import skill
|
||||
from cli.commands.process import start, stop, restart, update, status, logs
|
||||
from cli.commands.context import context
|
||||
from cli.commands.install import install_browser
|
||||
from cli.commands.knowledge import knowledge
|
||||
|
||||
|
||||
HELP_TEXT = """Usage: cow COMMAND [ARGS]...
|
||||
@@ -22,6 +23,7 @@ Commands:
|
||||
status Show CowAgent running status.
|
||||
logs View CowAgent logs.
|
||||
skill Manage CowAgent skills.
|
||||
knowledge Manage knowledge base.
|
||||
install-browser Install browser tool (Playwright + Chromium).
|
||||
|
||||
Tip: You can also send /help, /skill list, etc. in agent chat."""
|
||||
@@ -69,6 +71,7 @@ main.add_command(update)
|
||||
main.add_command(status)
|
||||
main.add_command(logs)
|
||||
main.add_command(context)
|
||||
main.add_command(knowledge)
|
||||
main.add_command(install_browser)
|
||||
|
||||
|
||||
|
||||
121
cli/commands/knowledge.py
Normal file
121
cli/commands/knowledge.py
Normal file
@@ -0,0 +1,121 @@
|
||||
"""cow knowledge - Knowledge base management commands."""
|
||||
|
||||
import os
|
||||
|
||||
import click
|
||||
|
||||
from cli.utils import get_project_root
|
||||
|
||||
|
||||
def _get_knowledge_dir():
|
||||
"""Resolve the knowledge directory path from config or default."""
|
||||
try:
|
||||
import sys
|
||||
sys.path.insert(0, get_project_root())
|
||||
from config import conf
|
||||
from common.utils import expand_path
|
||||
workspace = expand_path(conf().get("agent_workspace", "~/cow"))
|
||||
except Exception:
|
||||
workspace = os.path.expanduser("~/cow")
|
||||
return os.path.join(workspace, "knowledge")
|
||||
|
||||
|
||||
def _get_knowledge_enabled():
|
||||
try:
|
||||
import sys
|
||||
sys.path.insert(0, get_project_root())
|
||||
from config import conf
|
||||
return conf().get("knowledge", True)
|
||||
except Exception:
|
||||
return True
|
||||
|
||||
|
||||
@click.group(invoke_without_command=True)
|
||||
@click.pass_context
|
||||
def knowledge(ctx):
|
||||
"""Manage CowAgent knowledge base."""
|
||||
if ctx.invoked_subcommand is None:
|
||||
click.echo(_stats())
|
||||
|
||||
|
||||
@knowledge.command("list")
|
||||
def knowledge_list():
|
||||
"""Display knowledge base file tree."""
|
||||
click.echo(_tree())
|
||||
|
||||
|
||||
def _stats() -> str:
|
||||
knowledge_dir = _get_knowledge_dir()
|
||||
if not os.path.isdir(knowledge_dir):
|
||||
return "Knowledge base directory not found."
|
||||
|
||||
enabled = _get_knowledge_enabled()
|
||||
total_files = 0
|
||||
total_bytes = 0
|
||||
cat_count = {}
|
||||
|
||||
for root, dirs, files in os.walk(knowledge_dir):
|
||||
dirs[:] = [d for d in dirs if not d.startswith(".")]
|
||||
rel_root = os.path.relpath(root, knowledge_dir)
|
||||
category = rel_root.split(os.sep)[0] if rel_root != "." else "root"
|
||||
for f in files:
|
||||
if f.endswith(".md") and f not in ("index.md", "log.md"):
|
||||
total_files += 1
|
||||
total_bytes += os.path.getsize(os.path.join(root, f))
|
||||
cat_count[category] = cat_count.get(category, 0) + 1
|
||||
|
||||
status_icon = click.style("enabled", fg="green") if enabled else click.style("disabled", fg="red")
|
||||
lines = [
|
||||
f"\n Knowledge Base [{status_icon}]",
|
||||
"",
|
||||
f" Pages: {total_files}",
|
||||
f" Size: {total_bytes / 1024:.1f} KB",
|
||||
"",
|
||||
]
|
||||
if cat_count:
|
||||
lines.append(" Categories:")
|
||||
for cat in sorted(cat_count.keys()):
|
||||
lines.append(f" {cat}/ ({cat_count[cat]} pages)")
|
||||
lines.append("")
|
||||
|
||||
lines.append(f" Path: {knowledge_dir}")
|
||||
lines.append("")
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def _tree() -> str:
|
||||
knowledge_dir = _get_knowledge_dir()
|
||||
if not os.path.isdir(knowledge_dir):
|
||||
return "Knowledge base directory not found."
|
||||
|
||||
tree_lines = [" knowledge/"]
|
||||
|
||||
subdirs = sorted([
|
||||
d for d in os.listdir(knowledge_dir)
|
||||
if os.path.isdir(os.path.join(knowledge_dir, d)) and not d.startswith(".")
|
||||
])
|
||||
|
||||
for i, subdir in enumerate(subdirs):
|
||||
is_last_dir = (i == len(subdirs) - 1)
|
||||
branch = "└── " if is_last_dir else "├── "
|
||||
subdir_path = os.path.join(knowledge_dir, subdir)
|
||||
md_files = sorted([
|
||||
f for f in os.listdir(subdir_path)
|
||||
if f.endswith(".md") and not f.startswith(".")
|
||||
])
|
||||
tree_lines.append(f" {branch}{subdir}/ ({len(md_files)})")
|
||||
|
||||
child_prefix = " " if is_last_dir else " │ "
|
||||
max_show = 15
|
||||
for j, fname in enumerate(md_files[:max_show]):
|
||||
is_last_file = (j == len(md_files[:max_show]) - 1) and len(md_files) <= max_show
|
||||
fb = "└── " if is_last_file else "├── "
|
||||
name = fname.replace(".md", "")
|
||||
tree_lines.append(f"{child_prefix}{fb}{name}")
|
||||
if len(md_files) > max_show:
|
||||
tree_lines.append(f"{child_prefix}└── ... +{len(md_files) - max_show} more")
|
||||
|
||||
if not subdirs:
|
||||
tree_lines.append(" (empty)")
|
||||
|
||||
return "\n" + "\n".join(tree_lines) + "\n"
|
||||
@@ -178,7 +178,10 @@ def update(ctx):
|
||||
"""Update CowAgent and restart."""
|
||||
root = get_project_root()
|
||||
|
||||
# 1. Git pull while service is still running
|
||||
# 1. Stop service first so git pull won't conflict with running code
|
||||
ctx.invoke(stop)
|
||||
|
||||
# 2. Git pull
|
||||
if os.path.isdir(os.path.join(root, ".git")):
|
||||
click.echo("Pulling latest code...")
|
||||
ret = subprocess.call(["git", "pull"], cwd=root)
|
||||
@@ -188,28 +191,61 @@ def update(ctx):
|
||||
else:
|
||||
click.echo("Not a git repository, skipping code update.")
|
||||
|
||||
# 2. Stop service
|
||||
ctx.invoke(stop)
|
||||
|
||||
# 3. Install dependencies
|
||||
python = sys.executable
|
||||
req_file = os.path.join(root, "requirements.txt")
|
||||
if os.path.exists(req_file):
|
||||
click.echo("Installing dependencies...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
|
||||
|
||||
if _IS_WIN:
|
||||
# On Windows, `cow.exe` (this process) locks the exe file, so
|
||||
# `pip install -e .` fails with WinError 5. Write a small .bat
|
||||
# helper that waits for cow.exe to exit, then installs & starts.
|
||||
bat = os.path.join(root, "_cow_update.bat")
|
||||
lines = [
|
||||
"@echo off",
|
||||
"chcp 65001 >nul",
|
||||
"echo Waiting for cow.exe to exit...",
|
||||
"timeout /t 3 /nobreak >nul",
|
||||
]
|
||||
if os.path.exists(req_file):
|
||||
lines.append(f'echo Installing dependencies...')
|
||||
lines.append(f'"{python}" -m pip install -r requirements.txt -q')
|
||||
lines += [
|
||||
"echo Reinstalling cow CLI...",
|
||||
f'"{python}" -m pip install -e . -q',
|
||||
"echo Starting CowAgent...",
|
||||
f'"{python}" -m cli.cli start --no-logs',
|
||||
"echo.",
|
||||
"echo Update complete. You can close this window.",
|
||||
"pause >nul",
|
||||
"del \"%~f0\"",
|
||||
]
|
||||
with open(bat, "w", encoding="utf-8") as f:
|
||||
f.write("\n".join(lines) + "\n")
|
||||
|
||||
subprocess.Popen(
|
||||
["cmd.exe", "/c", "start", "CowAgent Update", "/wait", bat],
|
||||
cwd=root,
|
||||
)
|
||||
click.echo(click.style(
|
||||
"✓ Update script launched. Please follow the new window for progress.",
|
||||
fg="green"))
|
||||
else:
|
||||
# 3. Install dependencies
|
||||
if os.path.exists(req_file):
|
||||
click.echo("Installing dependencies...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
|
||||
cwd=root,
|
||||
)
|
||||
click.echo("Reinstalling cow CLI...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-e", ".", "-q"],
|
||||
cwd=root,
|
||||
)
|
||||
click.echo("Reinstalling cow CLI...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-e", ".", "-q"],
|
||||
cwd=root,
|
||||
)
|
||||
|
||||
# 4. Start service and follow logs
|
||||
click.echo("")
|
||||
time.sleep(1)
|
||||
ctx.invoke(start, no_logs=False)
|
||||
# 4. Start service
|
||||
click.echo("")
|
||||
time.sleep(1)
|
||||
ctx.invoke(start, no_logs=False)
|
||||
|
||||
|
||||
@click.command()
|
||||
|
||||
@@ -47,13 +47,14 @@ CREDENTIAL_MAP = {
|
||||
|
||||
|
||||
class CloudClient(LinkAIClient):
|
||||
def __init__(self, api_key: str, channel, host: str = ""):
|
||||
super().__init__(api_key, host)
|
||||
def __init__(self, api_key: str, channel, host: str = "", port=None):
|
||||
super().__init__(api_key, host, port=port)
|
||||
self.channel = channel
|
||||
self.client_type = channel.channel_type
|
||||
self.channel_mgr = None
|
||||
self._skill_service = None
|
||||
self._memory_service = None
|
||||
self._knowledge_service = None
|
||||
self._chat_service = None
|
||||
|
||||
@property
|
||||
@@ -88,6 +89,21 @@ class CloudClient(LinkAIClient):
|
||||
logger.error(f"[CloudClient] Failed to init MemoryService: {e}")
|
||||
return self._memory_service
|
||||
|
||||
@property
|
||||
def knowledge_service(self):
|
||||
"""Lazy-init KnowledgeService."""
|
||||
if self._knowledge_service is None:
|
||||
try:
|
||||
from agent.knowledge.service import KnowledgeService
|
||||
from config import conf
|
||||
from common.utils import expand_path
|
||||
workspace_root = expand_path(conf().get("agent_workspace", "~/cow"))
|
||||
self._knowledge_service = KnowledgeService(workspace_root)
|
||||
logger.debug("[CloudClient] KnowledgeService initialised")
|
||||
except Exception as e:
|
||||
logger.error(f"[CloudClient] Failed to init KnowledgeService: {e}")
|
||||
return self._knowledge_service
|
||||
|
||||
@property
|
||||
def chat_service(self):
|
||||
"""Lazy-init ChatService (requires AgentBridge via Bridge singleton)."""
|
||||
@@ -468,6 +484,27 @@ class CloudClient(LinkAIClient):
|
||||
|
||||
return svc.dispatch(action, payload)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# knowledge callback
|
||||
# ------------------------------------------------------------------
|
||||
def on_knowledge(self, data: dict) -> dict:
|
||||
"""
|
||||
Handle KNOWLEDGE messages from the cloud console.
|
||||
Delegates to KnowledgeService.dispatch for the actual operations.
|
||||
|
||||
:param data: message data with 'action', 'clientId', 'payload'
|
||||
:return: response dict
|
||||
"""
|
||||
action = data.get("action", "")
|
||||
payload = data.get("payload")
|
||||
logger.info(f"[CloudClient] on_knowledge: action={action}")
|
||||
|
||||
svc = self.knowledge_service
|
||||
if svc is None:
|
||||
return {"action": action, "code": 500, "message": "KnowledgeService not available", "payload": None}
|
||||
|
||||
return svc.dispatch(action, payload)
|
||||
|
||||
# ------------------------------------------------------------------
|
||||
# chat callback
|
||||
# ------------------------------------------------------------------
|
||||
@@ -733,7 +770,7 @@ def start(channel, channel_mgr=None):
|
||||
return
|
||||
|
||||
global chat_client
|
||||
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), channel=channel)
|
||||
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), port=conf().get("cloud_port"), channel=channel)
|
||||
chat_client.channel_mgr = channel_mgr
|
||||
chat_client.config = _build_config()
|
||||
chat_client.start()
|
||||
|
||||
@@ -7,8 +7,8 @@ XUNFEI = "xunfei"
|
||||
CHATGPTONAZURE = "chatGPTOnAzure"
|
||||
LINKAI = "linkai"
|
||||
CLAUDEAPI= "claudeAPI"
|
||||
QWEN = "qwen" # 旧版千问接入
|
||||
QWEN_DASHSCOPE = "dashscope" # 新版千问接入(百炼)
|
||||
QWEN = "qwen" # 千问 (兼容旧配置,实际走 DashscopeBot)
|
||||
QWEN_DASHSCOPE = "dashscope" # 千问 DashScope 接入
|
||||
GEMINI = "gemini"
|
||||
ZHIPU_AI = "zhipu"
|
||||
MOONSHOT = "moonshot"
|
||||
@@ -81,18 +81,19 @@ TTS_1_HD = "tts-1-hd"
|
||||
DEEPSEEK_CHAT = "deepseek-chat" # DeepSeek-V3对话模型
|
||||
DEEPSEEK_REASONER = "deepseek-reasoner" # DeepSeek-R1模型
|
||||
|
||||
# Qwen (通义千问 - 阿里云)
|
||||
QWEN = "qwen"
|
||||
# Qwen (通义千问 - 阿里云 DashScope)
|
||||
QWEN_TURBO = "qwen-turbo"
|
||||
QWEN_PLUS = "qwen-plus"
|
||||
QWEN_MAX = "qwen-max"
|
||||
QWEN_LONG = "qwen-long"
|
||||
QWEN3_MAX = "qwen3-max" # Qwen3 Max - Agent推荐模型
|
||||
QWEN35_PLUS = "qwen3.5-plus" # Qwen3.5 Plus - Omni model (MultiModalConversation)
|
||||
QWEN36_PLUS = "qwen3.6-plus" # Qwen3.6 Plus - Omni model (MultiModalConversation)
|
||||
QWQ_PLUS = "qwq-plus"
|
||||
|
||||
# MiniMax
|
||||
MINIMAX_M2_7 = "MiniMax-M2.7" # MiniMax M2.7 - Latest
|
||||
MINIMAX_M2_7_HIGHSPEED = "MiniMax-M2.7-highspeed" # MiniMax M2.7 highspeed
|
||||
MINIMAX_M2_5 = "MiniMax-M2.5" # MiniMax M2.5
|
||||
MINIMAX_M2_1 = "MiniMax-M2.1" # MiniMax M2.1
|
||||
MINIMAX_M2_1_LIGHTNING = "MiniMax-M2.1-lightning" # MiniMax M2.1 极速版
|
||||
@@ -172,10 +173,10 @@ MODEL_LIST = [
|
||||
DEEPSEEK_CHAT, DEEPSEEK_REASONER,
|
||||
|
||||
# Qwen
|
||||
QWEN, QWEN_TURBO, QWEN_PLUS, QWEN_MAX, QWEN_LONG, QWEN3_MAX, QWEN35_PLUS,
|
||||
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
|
||||
|
||||
# MiniMax
|
||||
MiniMax, MINIMAX_M2_7, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
|
||||
MiniMax, MINIMAX_M2_7, MINIMAX_M2_7_HIGHSPEED, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
|
||||
|
||||
# GLM
|
||||
ZHIPU_AI, GLM_5_TURBO, GLM_5, GLM_4, GLM_4_PLUS, GLM_4_flash, GLM_4_LONG, GLM_4_ALLTOOLS,
|
||||
|
||||
@@ -29,5 +29,6 @@
|
||||
"agent": true,
|
||||
"agent_max_context_tokens": 40000,
|
||||
"agent_max_context_turns": 20,
|
||||
"agent_max_steps": 15
|
||||
"agent_max_steps": 15,
|
||||
"knowledge": true
|
||||
}
|
||||
|
||||
@@ -180,15 +180,16 @@ available_setting = {
|
||||
# 豆包(火山方舟) 平台配置
|
||||
"ark_api_key": "",
|
||||
"ark_base_url": "https://ark.cn-beijing.volces.com/api/v3",
|
||||
#魔搭社区 平台配置
|
||||
# 魔搭社区 平台配置
|
||||
"modelscope_api_key": "",
|
||||
"modelscope_base_url": "https://api-inference.modelscope.cn/v1/chat/completions",
|
||||
# LinkAI平台配置
|
||||
"use_linkai": False,
|
||||
"linkai_api_key": "",
|
||||
"linkai_app_code": "",
|
||||
"linkai_api_base": "https://api.link-ai.tech", # linkAI服务地址
|
||||
"linkai_api_base": "https://api.link-ai.tech",
|
||||
"cloud_host": "client.link-ai.tech",
|
||||
"cloud_port": None,
|
||||
"cloud_deployment_id": "",
|
||||
"minimax_api_key": "",
|
||||
"Minimax_group_id": "",
|
||||
@@ -199,6 +200,7 @@ available_setting = {
|
||||
"agent_max_context_tokens": 50000, # Agent模式下最大上下文tokens
|
||||
"agent_max_context_turns": 30, # Agent模式下最大上下文记忆轮次
|
||||
"agent_max_steps": 15, # Agent模式下单次运行最大决策步数
|
||||
"knowledge": True, # 是否开启知识库功能
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -106,6 +106,45 @@ Session: 12 messages | 8 skills loaded
|
||||
/logs 50
|
||||
```
|
||||
|
||||
## knowledge
|
||||
|
||||
查看和管理个人知识库。默认显示知识库统计信息。
|
||||
|
||||
```text
|
||||
/knowledge
|
||||
```
|
||||
|
||||
输出示例:
|
||||
|
||||
```
|
||||
📚 知识库
|
||||
|
||||
- 状态:已开启
|
||||
- 页面数:12
|
||||
- 总大小:45.2 KB
|
||||
- 分类明细:
|
||||
- concepts/: 5 篇
|
||||
- entities/: 4 篇
|
||||
- sources/: 3 篇
|
||||
```
|
||||
|
||||
**查看目录结构:**
|
||||
|
||||
```text
|
||||
/knowledge list
|
||||
```
|
||||
|
||||
**开启 / 关闭知识库:**
|
||||
|
||||
```text
|
||||
/knowledge on
|
||||
/knowledge off
|
||||
```
|
||||
|
||||
<Note>
|
||||
终端 CLI 中 `cow knowledge` 和 `cow knowledge list` 可用,但 `on|off` 仅支持在对话中使用(需实时生效)。
|
||||
</Note>
|
||||
|
||||
## version
|
||||
|
||||
显示当前 CowAgent 版本号。
|
||||
@@ -40,6 +40,9 @@ Service:
|
||||
Skills:
|
||||
skill Manage skills (list / search / install / uninstall ...)
|
||||
|
||||
Knowledge:
|
||||
knowledge View knowledge base stats and structure
|
||||
|
||||
Others:
|
||||
help Show this help message
|
||||
version Show version
|
||||
@@ -55,6 +58,9 @@ Others:
|
||||
| `/status` | 查看服务状态和配置 |
|
||||
| `/config` | 查看或修改运行时配置 |
|
||||
| `/skill` | 管理技能(安装、卸载、启用、禁用等) |
|
||||
| `/knowledge` | 查看知识库统计信息 |
|
||||
| `/knowledge list` | 查看知识库目录结构 |
|
||||
| `/knowledge on\|off` | 开启或关闭知识库 |
|
||||
| `/context` | 查看当前会话上下文信息 |
|
||||
| `/context clear` | 清空当前会话上下文 |
|
||||
| `/logs` | 查看最近日志 |
|
||||
@@ -76,6 +82,7 @@ Others:
|
||||
| logs | ✓ | ✓ |
|
||||
| config | ✗ | ✓ |
|
||||
| context | — | ✓ |
|
||||
| knowledge (子命令) | ✓ | ✓ |
|
||||
| skill (子命令) | ✓ | ✓ |
|
||||
| start / stop / restart | ✓ | ✗ |
|
||||
| update | ✓ | ✗ |
|
||||
@@ -147,6 +147,17 @@
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "知识",
|
||||
"groups": [
|
||||
{
|
||||
"group": "知识库",
|
||||
"pages": [
|
||||
"knowledge/index"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "通道",
|
||||
"groups": [
|
||||
@@ -171,10 +182,10 @@
|
||||
{
|
||||
"group": "命令系统",
|
||||
"pages": [
|
||||
"commands/index",
|
||||
"commands/process",
|
||||
"commands/skill",
|
||||
"commands/general"
|
||||
"cli/index",
|
||||
"cli/process",
|
||||
"cli/skill",
|
||||
"cli/general"
|
||||
]
|
||||
}
|
||||
]
|
||||
@@ -308,6 +319,17 @@
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "Knowledge",
|
||||
"groups": [
|
||||
{
|
||||
"group": "Knowledge Base",
|
||||
"pages": [
|
||||
"en/knowledge/index"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "Channels",
|
||||
"groups": [
|
||||
@@ -327,15 +349,15 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "Commands",
|
||||
"tab": "CLI",
|
||||
"groups": [
|
||||
{
|
||||
"group": "Command System",
|
||||
"pages": [
|
||||
"en/commands/index",
|
||||
"en/commands/process",
|
||||
"en/commands/skill",
|
||||
"en/commands/chat"
|
||||
"en/cli/index",
|
||||
"en/cli/process",
|
||||
"en/cli/skill",
|
||||
"en/cli/chat"
|
||||
]
|
||||
}
|
||||
]
|
||||
@@ -469,6 +491,17 @@
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "ナレッジ",
|
||||
"groups": [
|
||||
{
|
||||
"group": "ナレッジベース",
|
||||
"pages": [
|
||||
"ja/knowledge/index"
|
||||
]
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "チャネル",
|
||||
"groups": [
|
||||
@@ -488,15 +521,15 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "コマンド",
|
||||
"tab": "CLI",
|
||||
"groups": [
|
||||
{
|
||||
"group": "コマンドシステム",
|
||||
"pages": [
|
||||
"ja/commands/index",
|
||||
"ja/commands/process",
|
||||
"ja/commands/skill",
|
||||
"ja/commands/general"
|
||||
"ja/cli/index",
|
||||
"ja/cli/process",
|
||||
"ja/cli/skill",
|
||||
"ja/cli/general"
|
||||
]
|
||||
}
|
||||
]
|
||||
|
||||
@@ -76,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
|
||||
After running, the Web service starts by default. Access `http://localhost:9899/chat` to chat.
|
||||
|
||||
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/commands/index) to manage the service.
|
||||
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/cli/index) to manage the service.
|
||||
|
||||
### Manual Installation
|
||||
|
||||
@@ -100,7 +100,7 @@ pip3 install -r requirements-optional.txt # optional but recommended
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/commands/index).
|
||||
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/cli/index).
|
||||
|
||||
**4. Install browser (optional)**
|
||||
|
||||
@@ -165,7 +165,7 @@ Supports mainstream model providers. Recommended models for Agent mode:
|
||||
| GLM | `glm-5-turbo` |
|
||||
| Kimi | `kimi-k2.5` |
|
||||
| Doubao | `doubao-seed-2-0-code-preview-260215` |
|
||||
| Qwen | `qwen3.5-plus` |
|
||||
| Qwen | `qwen3.6-plus` |
|
||||
| Claude | `claude-sonnet-4-6` |
|
||||
| Gemini | `gemini-3.1-pro-preview` |
|
||||
| OpenAI | `gpt-5.4` |
|
||||
|
||||
@@ -92,6 +92,31 @@ View recent service logs. Shows the last 20 lines by default, up to 50.
|
||||
/logs 50
|
||||
```
|
||||
|
||||
## knowledge
|
||||
|
||||
View and manage the personal knowledge base. Shows statistics by default.
|
||||
|
||||
```text
|
||||
/knowledge
|
||||
```
|
||||
|
||||
**View directory structure:**
|
||||
|
||||
```text
|
||||
/knowledge list
|
||||
```
|
||||
|
||||
**Enable / disable knowledge base:**
|
||||
|
||||
```text
|
||||
/knowledge on
|
||||
/knowledge off
|
||||
```
|
||||
|
||||
<Note>
|
||||
In the terminal CLI, `cow knowledge` and `cow knowledge list` are available, but `on|off` is only supported in chat (requires runtime effect).
|
||||
</Note>
|
||||
|
||||
## version
|
||||
|
||||
Show the current CowAgent version.
|
||||
@@ -40,6 +40,9 @@ Service:
|
||||
Skills:
|
||||
skill Manage skills (list / search / install / uninstall ...)
|
||||
|
||||
Knowledge:
|
||||
knowledge View knowledge base stats and structure
|
||||
|
||||
Others:
|
||||
help Show this help message
|
||||
version Show version
|
||||
@@ -55,6 +58,9 @@ In the Web console or any connected channel, type `/` to see command suggestions
|
||||
| `/status` | View service status and configuration |
|
||||
| `/config` | View or modify runtime configuration |
|
||||
| `/skill` | Manage skills (install, uninstall, enable, disable, etc.) |
|
||||
| `/knowledge` | View knowledge base statistics |
|
||||
| `/knowledge list` | View knowledge base directory structure |
|
||||
| `/knowledge on\|off` | Enable or disable knowledge base |
|
||||
| `/context` | View current session context info |
|
||||
| `/context clear` | Clear current session context |
|
||||
| `/logs` | View recent logs |
|
||||
@@ -74,6 +80,7 @@ In the Web console or any connected channel, type `/` to see command suggestions
|
||||
| logs | ✓ | ✓ |
|
||||
| config | ✗ | ✓ |
|
||||
| context | — | ✓ |
|
||||
| knowledge (subcommands) | ✓ | ✓ |
|
||||
| skill (subcommands) | ✓ | ✓ |
|
||||
| start / stop / restart | ✓ | ✗ |
|
||||
| update | ✓ | ✗ |
|
||||
@@ -47,7 +47,7 @@ After installation, use the `cow` command to manage the service:
|
||||
| `cow update` | Update code and restart |
|
||||
| `cow install-browser` | Install browser tool dependencies |
|
||||
|
||||
See the [Commands documentation](/en/commands/index) for more details.
|
||||
See the [Commands documentation](/en/cli/index) for more details.
|
||||
|
||||
<Note>
|
||||
If the `cow` command is not available, you can use `./run.sh <command>` (Linux/macOS) or `.\scripts\run.ps1 <command>` (Windows) as a fallback. Both are functionally equivalent.
|
||||
|
||||
@@ -11,14 +11,16 @@ CowAgent's architecture consists of the following core modules:
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
|
||||
|
||||
### Core Modules
|
||||
|
||||
| Module | Description |
|
||||
| --- | --- |
|
||||
| **Channels** | Message channel layer for receiving and sending messages. Supports Web, Feishu, DingTalk, WeCom, WeChat Official Account, and more |
|
||||
| **Agent Core** | Agent engine including task planning, memory system, and skills engine |
|
||||
| **Tools** | Tool layer for Agent to access OS resources. 10+ built-in tools |
|
||||
| **Models** | Model layer with unified access to mainstream LLMs |
|
||||
| **Plan** | Understands user intent, decomposes complex tasks into multi-step plans, and iteratively invokes tools until the goal is achieved |
|
||||
| **Memory** | Automatically persists important information as core memory and daily memory, with hybrid keyword and vector retrieval for cross-session context continuity |
|
||||
| **Knowledge** | Organizes structured knowledge by topic. The Agent autonomously distills valuable information into Markdown pages, maintaining indexes and cross-references to build a growing knowledge network |
|
||||
| **Tools** | Core capability for Agent to access OS resources. 10+ built-in tools including file read/write, terminal, browser, scheduler, memory search, web search, and more |
|
||||
| **Skills** | Loads and manages Skills. Supports one-click installation from Skill Hub, GitHub, and more, or custom skill creation through conversation |
|
||||
| **Models** | Model layer with unified access to OpenAI, Claude, Gemini, DeepSeek, MiniMax, GLM, Qwen, and other mainstream LLMs |
|
||||
| **Channels** | Message channel layer for receiving and sending messages. Supports Web console, WeChat, Feishu, DingTalk, WeCom, WeChat Official Account, and more with a unified protocol |
|
||||
| **CLI** | Command-line system providing terminal commands (`cow`) and chat commands (`/`) for process management, skill installation, configuration, knowledge base management, and more |
|
||||
|
||||
## Agent Mode Workflow
|
||||
|
||||
@@ -28,7 +30,7 @@ When Agent mode is enabled, CowAgent runs as an autonomous agent with the follow
|
||||
2. **Understand Intent** — Analyze task requirements and context
|
||||
3. **Plan Task** — Break complex tasks into multiple steps
|
||||
4. **Invoke Tools** — Select and execute appropriate tools for each step
|
||||
5. **Update Memory** — Store important information in long-term memory
|
||||
5. **Update Memory & Knowledge** — Store important information in long-term memory and organize structured knowledge into the knowledge base
|
||||
6. **Return Result** — Send execution results back to the user
|
||||
|
||||
## Workspace Directory Structure
|
||||
@@ -39,9 +41,12 @@ The Agent workspace is located at `~/cow` by default and stores system prompts,
|
||||
~/cow/
|
||||
├── system.md # Agent system prompt
|
||||
├── user.md # User profile
|
||||
├── MEMORY.md # Core memory
|
||||
├── memory/ # Long-term memory storage
|
||||
│ ├── core.md # Core memory
|
||||
│ └── daily/ # Daily memory
|
||||
│ └── YYYY-MM-DD.md # Daily memory
|
||||
├── knowledge/ # Personal knowledge base
|
||||
│ ├── index.md # Knowledge index
|
||||
│ └── <category>/ # Topic-based pages
|
||||
└── skills/ # Custom skills
|
||||
├── skill-1/
|
||||
└── skill-2/
|
||||
@@ -75,3 +80,4 @@ Configure Agent mode parameters in `config.json`:
|
||||
| `agent_max_context_tokens` | Max context tokens | `40000` |
|
||||
| `agent_max_context_turns` | Max context turns | `30` |
|
||||
| `agent_max_steps` | Max decision steps per task | `15` |
|
||||
| `knowledge` | Enable personal knowledge base | `true` |
|
||||
|
||||
@@ -15,13 +15,26 @@ In subsequent long-term conversations, the Agent intelligently stores or retriev
|
||||
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
## 2. Task Planning and Tool Use
|
||||
## 2. Personal Knowledge Base
|
||||
|
||||
> The knowledge base system enables the Agent to continuously accumulate and organize structured knowledge. Unlike memory which records along a timeline, the knowledge base is organized by topics, transforming articles, conversation insights, and learning materials into interconnected Markdown pages that form a continuously growing knowledge network.
|
||||
|
||||
The Agent automatically organizes valuable information from conversations into knowledge pages, maintaining cross-references and indexes. The Web console provides document browsing and knowledge graph visualization. Knowledge is stored in `~/cow/knowledge/` within the workspace.
|
||||
|
||||
- **Auto-organization**: The Agent autonomously extracts and organizes structured knowledge during conversations, maintaining indexes and cross-references
|
||||
- **Knowledge graph**: Automatically builds a knowledge graph from cross-references between pages, with interactive graph visualization in the Web console
|
||||
- **Chat integration**: Knowledge document links referenced in Agent replies can be clicked directly in the Web console for viewing
|
||||
- **CLI management**: Use `/knowledge` commands to view stats, browse directory, and toggle the feature with `/knowledge on|off`
|
||||
|
||||
See [Personal Knowledge Base](/en/knowledge) for details.
|
||||
|
||||
## 3. Task Planning and Tool Use
|
||||
|
||||
Tools are the core of how the Agent accesses operating system resources. The Agent intelligently selects and invokes tools based on task requirements, performing file read/write, command execution, scheduled tasks, and more. Built-in tools are implemented in the project's `agent/tools/` directory.
|
||||
|
||||
**Key tools:** file read/write/edit, Bash terminal, browser, file send, scheduler, memory search, web search, environment config, and more.
|
||||
|
||||
### 2.1 Terminal and File Access
|
||||
### 3.1 Terminal and File Access
|
||||
|
||||
Access to the OS terminal and file system is the most fundamental and core capability. Many other tools and skills build on top of this. Users can interact with the Agent from a mobile device to operate resources on their personal computer or server:
|
||||
|
||||
@@ -29,7 +42,7 @@ Access to the OS terminal and file system is the most fundamental and core capab
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202181130.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.2 Programming Capability
|
||||
### 3.2 Programming Capability
|
||||
|
||||
Combining programming and system access, the Agent can execute the complete **Vibecoding workflow** — from information search, asset generation, coding, testing, deployment, Nginx configuration, to publishing — all triggered by a single command from your phone:
|
||||
|
||||
@@ -37,7 +50,7 @@ Combining programming and system access, the Agent can execute the complete **Vi
|
||||
<img src="https://cdn.link-ai.tech/doc/20260203121008.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.3 Scheduled Tasks
|
||||
### 3.3 Scheduled Tasks
|
||||
|
||||
The `scheduler` tool enables dynamic scheduled tasks, supporting **one-time tasks, fixed intervals, and Cron expressions**. Tasks can be triggered as either a **fixed message send** or an **Agent dynamic task** execution:
|
||||
|
||||
@@ -45,7 +58,7 @@ The `scheduler` tool enables dynamic scheduled tasks, supporting **one-time task
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.4 Browser
|
||||
### 3.4 Browser
|
||||
|
||||
The built-in `browser` tool allows the Agent to control a Chromium browser to visit web pages, fill forms, click elements, and take screenshots, with support for dynamic JS-rendered pages. Run `cow install-browser` to install with one command, automatically adapting to server (headless) and desktop environments:
|
||||
|
||||
@@ -53,7 +66,7 @@ The built-in `browser` tool allows the Agent to control a Chromium browser to vi
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.5 Environment Variable Management
|
||||
### 3.5 Environment Variable Management
|
||||
|
||||
Secrets required by skills are stored in an environment variable file, managed by the `env_config` tool. You can update secrets through conversation, with built-in security protection and desensitization:
|
||||
|
||||
@@ -61,7 +74,7 @@ Secrets required by skills are stored in an environment variable file, managed b
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202234939.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
## 3. Skills System
|
||||
## 4. Skills System
|
||||
|
||||
The Skills system provides infinite extensibility for the Agent. Each Skill consists of a description file, execution scripts (optional), and resources (optional), describing how to complete specific types of tasks. Skills allow the Agent to follow instructions for complex workflows, invoke tools, or integrate third-party systems.
|
||||
|
||||
@@ -71,7 +84,7 @@ The Skills system provides infinite extensibility for the Agent. Each Skill cons
|
||||
|
||||
Install skills: `/skill install <name>` or `cow skill install <name>`, supporting Skill Hub, GitHub, ClawHub, URL, and more.
|
||||
|
||||
### 3.1 Creating Skills
|
||||
### 4.1 Creating Skills
|
||||
|
||||
The `skill-creator` skill enables rapid skill creation through conversation. You can ask the Agent to codify a workflow as a skill, or send any API documentation and examples for the Agent to complete the integration directly:
|
||||
|
||||
@@ -79,7 +92,7 @@ The `skill-creator` skill enables rapid skill creation through conversation. You
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202202247.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 3.2 Web Search and Image Recognition
|
||||
### 4.2 Web Search and Image Recognition
|
||||
|
||||
- **Web search:** Built-in `web_search` tool, supports multiple search engines. Configure `BOCHA_API_KEY` or `LINKAI_API_KEY` to enable.
|
||||
- **Image recognition:** Built-in `openai-image-vision` skill, supports `gpt-4.1-mini`, `gpt-4.1`, and other models. Requires `OPENAI_API_KEY`.
|
||||
@@ -88,7 +101,7 @@ The `skill-creator` skill enables rapid skill creation through conversation. You
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 3.3 Skill Hub
|
||||
### 4.3 Skill Hub
|
||||
|
||||
Visit [skills.cowagent.ai](https://skills.cowagent.ai/) to browse all available skills, or use commands in conversation:
|
||||
|
||||
@@ -102,7 +115,7 @@ Also supports installing skills from GitHub, ClawHub, LinkAI, and other third-pa
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
|
||||
|
||||
## 4. CLI Command System
|
||||
## 5. CLI Command System
|
||||
|
||||
CowAgent provides two command interaction methods, covering service management, skill installation, configuration, and more:
|
||||
|
||||
@@ -117,4 +130,4 @@ cow skill install pptx # Install a skill
|
||||
cow install-browser # Install browser tool
|
||||
```
|
||||
|
||||
See [Command Overview](https://docs.cowagent.ai/en/commands) for details.
|
||||
See [Command Overview](https://docs.cowagent.ai/en/cli) for details.
|
||||
|
||||
@@ -22,6 +22,9 @@ CowAgent can proactively think and plan tasks, operate computers and external re
|
||||
<Card title="Long-term Memory" icon="database" href="/en/memory">
|
||||
Automatically persists conversation memory to local files and databases, including core memory and daily memory, with keyword and vector retrieval support.
|
||||
</Card>
|
||||
<Card title="Knowledge Base" icon="book" href="/en/knowledge">
|
||||
Automatically organizes structured knowledge with knowledge graph visualization, building a continuously growing knowledge network through cross-references.
|
||||
</Card>
|
||||
<Card title="Skills System" icon="puzzle-piece" href="/en/skills/index">
|
||||
Implements a Skills creation and execution engine with built-in skills, and supports custom Skills development through natural language conversation.
|
||||
</Card>
|
||||
@@ -31,7 +34,7 @@ CowAgent can proactively think and plan tasks, operate computers and external re
|
||||
<Card title="Tool System" icon="wrench" href="/en/tools/index">
|
||||
Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more. The Agent autonomously invokes tools to accomplish complex tasks.
|
||||
</Card>
|
||||
<Card title="Command System" icon="terminal" href="/en/commands/index">
|
||||
<Card title="Command System" icon="terminal" href="/en/cli/index">
|
||||
Provides terminal CLI and in-chat commands for process management, skill installation, configuration, context inspection, and other common operations.
|
||||
</Card>
|
||||
<Card title="Multiple Model Support" icon="microchip" href="/en/models/index">
|
||||
|
||||
77
docs/en/knowledge/index.mdx
Normal file
77
docs/en/knowledge/index.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: Personal Knowledge Base
|
||||
description: CowAgent personal knowledge base — structured knowledge accumulation, automatic organization, and knowledge graph
|
||||
---
|
||||
|
||||
The personal knowledge base is the Agent's long-term structured knowledge store, saved in the `knowledge/` directory within the workspace. Unlike memory, which is organized by timeline, the knowledge base organizes content by topic — articles, conversation insights, and learning materials are structured into interlinked Markdown pages, forming a continuously growing knowledge network.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
### Knowledge vs Memory
|
||||
|
||||
| Dimension | Knowledge Base (knowledge/) | Long-term Memory (memory/) |
|
||||
| --- | --- | --- |
|
||||
| Organization | By topic, interlinked | By timeline, dated files |
|
||||
| Writing | Agent actively structures content | Auto-summarized on context trimming |
|
||||
| Content | Refined, structured knowledge | Raw conversation summaries |
|
||||
| Use cases | Study notes, tech docs, project knowledge | Conversation history, event records |
|
||||
|
||||
### Directory Structure
|
||||
|
||||
```
|
||||
~/cow/knowledge/
|
||||
├── index.md # Knowledge index, entry point for all pages
|
||||
├── log.md # Change log, records each write
|
||||
├── concepts/ # Conceptual knowledge
|
||||
│ └── machine-learning.md
|
||||
├── entities/ # Entity knowledge (people, orgs, tools)
|
||||
│ └── openai.md
|
||||
└── sources/ # Source knowledge (articles, papers)
|
||||
└── llm-wiki.md
|
||||
```
|
||||
|
||||
The directory structure is flexible — the Agent automatically creates appropriate category directories based on actual content. Users can also customize the organization.
|
||||
|
||||
## Automatic Organization
|
||||
|
||||
Knowledge writing is an autonomous Agent behavior, triggered in these scenarios:
|
||||
|
||||
- **User shares an article or document** — The Agent automatically extracts key information and creates a structured knowledge page
|
||||
- **Conversation produces valuable conclusions** — The Agent organizes insights into knowledge pages and links them to existing knowledge
|
||||
- **User explicitly requests organization** — Users can guide the Agent to organize and update knowledge through conversation
|
||||
|
||||
Each knowledge page includes cross-reference links to related pages, gradually building a knowledge graph.
|
||||
|
||||
## Knowledge Retrieval
|
||||
|
||||
The Agent can retrieve knowledge during conversation through:
|
||||
|
||||
- **Index lookup** — Quickly locate relevant pages via `knowledge/index.md`
|
||||
- **Semantic search** — Search knowledge content via the `memory_search` tool
|
||||
- **Direct read** — Read specific knowledge files via the `memory_get` tool
|
||||
|
||||
## Web Console
|
||||
|
||||
The web console provides a dedicated "Knowledge" module with:
|
||||
|
||||
- **Document browsing** — Tree-style directory structure, searchable and collapsible, click to view content
|
||||
- **Knowledge graph** — D3.js force-directed graph visualizing relationships between knowledge pages
|
||||
- **Chat integration** — Knowledge document links referenced in Agent replies are clickable for direct navigation
|
||||
|
||||
## CLI Commands
|
||||
|
||||
Manage the knowledge base with the `/knowledge` command:
|
||||
|
||||
| Command | Description |
|
||||
| --- | --- |
|
||||
| `/knowledge` | Show knowledge base statistics |
|
||||
| `/knowledge list` | Display file directory as a tree |
|
||||
| `/knowledge on` | Enable the knowledge base feature |
|
||||
| `/knowledge off` | Disable the knowledge base feature |
|
||||
|
||||
## Configuration
|
||||
|
||||
| Parameter | Description | Default |
|
||||
| --- | --- | --- |
|
||||
| `knowledge` | Whether to enable the personal knowledge base | `true` |
|
||||
| `agent_workspace` | Workspace path; knowledge is stored under the `knowledge/` subdirectory | `~/cow` |
|
||||
@@ -6,7 +6,7 @@ description: Supported models and recommended choices for CowAgent
|
||||
CowAgent supports mainstream LLMs from domestic and international providers. Model interfaces are implemented in the project's `models/` directory.
|
||||
|
||||
<Note>
|
||||
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.5-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
|
||||
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.6-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
|
||||
</Note>
|
||||
|
||||
## Configuration
|
||||
@@ -25,7 +25,7 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
|
||||
glm-5-turbo, glm-5 and other series models
|
||||
</Card>
|
||||
<Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
|
||||
qwen3.5-plus, qwen3-max and more
|
||||
qwen3.6-plus, qwen3-max and more
|
||||
</Card>
|
||||
<Card title="Kimi" href="/en/models/kimi">
|
||||
kimi-k2.5, kimi-k2 and more
|
||||
|
||||
@@ -5,14 +5,14 @@ description: Tongyi Qianwen model configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| Parameter | Description |
|
||||
| --- | --- |
|
||||
| `model` | Options include `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
|
||||
| `model` | Options include `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
|
||||
| `dashscope_api_key` | Create at [Bailian Console](https://bailian.console.aliyun.com/?tab=model#/api-key). See [official docs](https://bailian.console.aliyun.com/?tab=api#/api) |
|
||||
|
||||
OpenAI-compatible configuration is also supported:
|
||||
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@ New CLI command system for managing CowAgent from terminal and chat:
|
||||
- **Web console**: Type `/` in the input box to open a slash command menu, with arrow-key input history
|
||||
- **Windows support**: New PowerShell script `scripts/run.ps1` with `cow` command support
|
||||
|
||||
Docs: [Command Overview](https://docs.cowagent.ai/en/commands)
|
||||
Docs: [Command Overview](https://docs.cowagent.ai/en/cli)
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ CowAgent offers multiple ways to acquire skills:
|
||||
- **URL** — Install from zip archives or SKILL.md links
|
||||
- **Conversational creation** — Let the Agent create skills through natural language conversation
|
||||
|
||||
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/commands/skill) for details. You can also [create skills](/en/skills/create) through conversation.
|
||||
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/cli/skill) for details. You can also [create skills](/en/skills/create) through conversation.
|
||||
|
||||
## Skill Loading Priority
|
||||
|
||||
|
||||
@@ -49,5 +49,5 @@ Supports zip archives and SKILL.md file links:
|
||||
```
|
||||
|
||||
<Tip>
|
||||
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/commands/skill) for full documentation.
|
||||
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/cli/skill) for full documentation.
|
||||
</Tip>
|
||||
|
||||
@@ -1,9 +1,11 @@
|
||||
---
|
||||
title: memory - Memory
|
||||
description: Search and read long-term memory
|
||||
title: memory - Memory & Knowledge
|
||||
description: Search and read long-term memory and knowledge base files
|
||||
---
|
||||
|
||||
The memory tool contains two sub-tools: `memory_search` (search memory) and `memory_get` (read memory files).
|
||||
The memory tool contains two sub-tools: `memory_search` (search memory) and `memory_get` (read memory or knowledge files).
|
||||
|
||||
When the [knowledge base](/en/knowledge) feature is enabled, both tools also support accessing files under the `knowledge/` directory.
|
||||
|
||||
## Dependencies
|
||||
|
||||
@@ -11,7 +13,7 @@ No extra dependencies, available by default. Managed by the Agent Core memory sy
|
||||
|
||||
## memory_search
|
||||
|
||||
Search historical memory with hybrid keyword and vector retrieval.
|
||||
Search historical memory and knowledge base content with hybrid keyword and vector retrieval.
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --- | --- | --- | --- |
|
||||
@@ -19,11 +21,11 @@ Search historical memory with hybrid keyword and vector retrieval.
|
||||
|
||||
## memory_get
|
||||
|
||||
Read the content of a specific memory file.
|
||||
Read the content of a specific memory or knowledge file.
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `path` | string | Yes | Relative path to memory file (e.g. `MEMORY.md`, `memory/2026-01-01.md`) |
|
||||
| `path` | string | Yes | Relative path to the file (e.g. `MEMORY.md`, `memory/2026-01-01.md`, `knowledge/concepts/rag.md`) |
|
||||
| `start_line` | integer | No | Start line number |
|
||||
| `end_line` | integer | No | End line number |
|
||||
|
||||
@@ -34,3 +36,8 @@ The Agent automatically invokes memory tools in these scenarios:
|
||||
- When the user shares important information → stores to memory
|
||||
- When historical context is needed → searches relevant memory
|
||||
- When conversation reaches a certain length → extracts summary for storage
|
||||
- When discussing domain knowledge → retrieves relevant pages from the knowledge base
|
||||
|
||||
<Note>
|
||||
When `knowledge` is set to `false` in config, the tool descriptions and search scope automatically adjust to include only memory files.
|
||||
</Note>
|
||||
|
||||
72
docs/en/tools/vision.mdx
Normal file
72
docs/en/tools/vision.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
title: vision - Image Analysis
|
||||
description: Analyze image content (recognition, description, OCR, etc.)
|
||||
---
|
||||
|
||||
Analyze local images or image URLs using Vision API. Supports content description, text extraction (OCR), object recognition, and more.
|
||||
|
||||
## Model Selection
|
||||
|
||||
The vision tool uses a multi-level auto-selection strategy with automatic fallback — no manual configuration required:
|
||||
|
||||
1. **Main model** — uses the currently configured main model for image recognition (zero extra cost)
|
||||
2. **Other configured models** — auto-discovers other models with configured API keys as alternatives
|
||||
3. **OpenAI** — uses `open_ai_api_key` to call gpt-4.1-mini
|
||||
4. **LinkAI** — uses `linkai_api_key` to call LinkAI vision service
|
||||
|
||||
When `use_linkai=true`, LinkAI is promoted to the highest priority.
|
||||
|
||||
If the current provider fails, the tool automatically tries the next one until it succeeds or all fail.
|
||||
|
||||
### Supported Models
|
||||
|
||||
| Vendor | Vision Model | Notes |
|
||||
| --- | --- | --- |
|
||||
| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
|
||||
| Qwen (DashScope) | Main model | Via MultiModalConversation API |
|
||||
| Claude | Main model | Anthropic native image format |
|
||||
| Gemini | Main model | inlineData format |
|
||||
| Doubao | Main model | doubao-seed-2-0 series natively supported |
|
||||
| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
|
||||
| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
|
||||
| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
|
||||
|
||||
<Note>
|
||||
ZhipuAI and MiniMax text models do not support image understanding, so their dedicated vision models are always used automatically.
|
||||
</Note>
|
||||
|
||||
## Parameters
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | Yes | Local file path or HTTP(S) image URL |
|
||||
| `question` | string | Yes | Question to ask about the image |
|
||||
|
||||
Supported image formats: jpg, jpeg, png, gif, webp
|
||||
|
||||
## Custom Configuration
|
||||
|
||||
To specify a particular model for the vision tool, add to `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In most cases no configuration is needed. The tool works automatically as long as the main model supports multimodal input or any vision-capable API key is configured.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Describe image content
|
||||
- Extract text from images (OCR)
|
||||
- Identify objects, colors, scenes
|
||||
- Analyze screenshots and scanned documents
|
||||
|
||||
<Note>
|
||||
Images larger than 1MB are automatically compressed (max edge 1536px). All images (including remote URLs) are converted to base64 for transmission to ensure compatibility with all model backends.
|
||||
</Note>
|
||||
@@ -47,7 +47,7 @@ description: 使用脚本一键安装和管理 CowAgent
|
||||
| `cow update` | 更新代码并重启 |
|
||||
| `cow install-browser` | 安装浏览器工具依赖 |
|
||||
|
||||
更多命令和用法参考 [命令文档](/commands/index)。
|
||||
更多命令和用法参考 [命令文档](/cli/index)。
|
||||
|
||||
<Note>
|
||||
如果 `cow` 命令不可用,也可以使用 `./run.sh <命令>`(Linux/macOS)或 `.\scripts\run.ps1 <命令>`(Windows)作为替代,功能等效。
|
||||
|
||||
@@ -36,7 +36,7 @@ pip3 install -e .
|
||||
更新完成后重启服务:
|
||||
|
||||
```bash
|
||||
# 使用 Cow CLI
|
||||
# 使用 Cow CLI (推荐)
|
||||
cow restart
|
||||
|
||||
# 或使用 run.sh
|
||||
|
||||
@@ -11,25 +11,27 @@ CowAgent 的整体架构由以下核心模块组成:
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
|
||||
|
||||
### 核心模块说明
|
||||
|
||||
| 模块 | 说明 |
|
||||
| --- | --- |
|
||||
| **Channels** | 消息通道层,负责接收和发送消息,支持 Web、飞书、钉钉、企微、公众号等 |
|
||||
| **Agent Core** | 智能体核心引擎,包括任务规划、记忆系统和技能引擎 |
|
||||
| **Tools** | 工具层,Agent 通过工具访问操作系统资源,内置 10+ 种工具 |
|
||||
| **Models** | 模型层,支持国内外主流大语言模型的统一接入 |
|
||||
| **Plan** | 理解用户意图,将复杂任务分解为多步骤计划,循环调用工具直到完成目标 |
|
||||
| **Memory** | 自动将重要信息持久化为核心记忆和日级记忆,支持关键词和向量混合检索,跨会话保持上下文连续性 |
|
||||
| **Knowledge** | 以主题维度组织结构化知识,Agent 自主整理有价值信息为 Markdown 页面,维护索引和交叉引用,构建持续增长的知识网络 |
|
||||
| **Tools** | Agent 访问操作系统资源的核心能力,内置文件读写、终端执行、浏览器操作、定时调度、记忆检索、联网搜索等 10+ 种工具 |
|
||||
| **Skills** | 加载和管理 Skills,支持从 Skill Hub、GitHub 等一键安装,或通过对话创建自定义技能 |
|
||||
| **Models** | 模型层,统一接入 OpenAI、Claude、Gemini、DeepSeek、MiniMax、GLM、Qwen 等国内外主流大语言模型 |
|
||||
| **Channels** | 消息通道层,负责接收和发送消息,支持 Web 控制台、微信、飞书、钉钉、企微、公众号等,统一消息协议 |
|
||||
| **CLI** | 命令行系统,提供终端命令(`cow`)和对话命令(`/`),支持进程管理、技能安装、配置修改、知识库管理等操作 |
|
||||
|
||||
## Agent 模式
|
||||
|
||||
启用 Agent 模式后,CowAgent 会以自主智能体的方式运行,核心工作流如下:
|
||||
|
||||
1. **接收消息** - 通过通道接收用户输入
|
||||
2. **理解意图** - 分析任务需求和上下文
|
||||
3. **规划任务** - 将复杂任务分解为多个步骤
|
||||
4. **调用工具** - 选择合适的工具执行每个步骤
|
||||
5. **记忆更新** - 将重要信息存入长期记忆
|
||||
6. **返回结果** - 将执行结果发送回用户
|
||||
1. **接收消息** — 通过通道接收用户输入
|
||||
2. **理解意图** — 分析任务需求和上下文
|
||||
3. **规划任务** — 将复杂任务分解为多个步骤
|
||||
4. **调用工具** — 选择合适的工具执行每个步骤
|
||||
5. **记忆与知识更新** — 将重要信息存入长期记忆,将结构化知识整理至知识库
|
||||
6. **返回结果** — 将执行结果发送回用户
|
||||
|
||||
## 工作空间
|
||||
|
||||
@@ -37,11 +39,14 @@ Agent 的工作空间默认位于 `~/cow` 目录,用于存储系统提示词
|
||||
|
||||
```
|
||||
~/cow/
|
||||
├── system.md # Agent system prompt
|
||||
├── user.md # User profile
|
||||
├── SYSTEM.md # Agent system prompt
|
||||
├── USER.md # User profile
|
||||
├── MEMORY.md # Core memory
|
||||
├── memory/ # Long-term memory storage
|
||||
│ ├── core.md # Core memory
|
||||
│ └── daily/ # Daily memory
|
||||
│ └── YYYY-MM-DD.md # Daily memory
|
||||
├── knowledge/ # Personal knowledge base
|
||||
│ ├── index.md # Knowledge index
|
||||
│ └── <category>/ # Topic-based pages
|
||||
└── skills/ # Custom skills
|
||||
├── skill-1/
|
||||
└── skill-2/
|
||||
@@ -75,3 +80,4 @@ Agent 的工作空间默认位于 `~/cow` 目录,用于存储系统提示词
|
||||
| `agent_max_context_tokens` | 最大上下文 token 数 | `40000` |
|
||||
| `agent_max_context_turns` | 最大上下文记忆轮次 | `30` |
|
||||
| `agent_max_steps` | 单次任务最大决策步数 | `15` |
|
||||
| `knowledge` | 是否启用个人知识库 | `true` |
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
---
|
||||
title: 功能介绍
|
||||
description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、浏览器工具详细说明
|
||||
description: CowAgent 长期记忆、个人知识库、任务规划、技能系统、CLI 命令、浏览器工具详细说明
|
||||
---
|
||||
|
||||
## 1. 长期记忆
|
||||
@@ -15,13 +15,26 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
## 2. 任务规划和工具调用
|
||||
## 2. 个人知识库
|
||||
|
||||
> 知识库系统让 Agent 能够持续积累和组织结构化知识。与按时间线记录的记忆不同,知识库以主题为维度,将文章、对话洞察、学习材料等整理为互相关联的 Markdown 页面,形成持续增长的知识网络。
|
||||
|
||||
Agent 会在对话中自动将有价值的信息整理为知识页面,维护交叉引用和索引,通过 Web 控制台可浏览文档和查看知识图谱。知识库存储在工作空间的 `~/cow/knowledge/` 目录下。
|
||||
|
||||
- **自动整理**:Agent 在对话中自主提取和整理结构化知识,维护索引和交叉引用
|
||||
- **知识图谱**:基于页面间的交叉引用自动构建知识图谱,Web 控制台提供可视化关系图浏览
|
||||
- **对话联动**:Agent 回复中引用的知识文档链接可在 Web 控制台中直接点击跳转查看
|
||||
- **CLI 管理**:通过 `/knowledge` 命令查看统计、浏览目录,通过 `/knowledge on|off` 开关功能
|
||||
|
||||
详细说明请参考 [个人知识库](/knowledge)。
|
||||
|
||||
## 3. 任务规划和工具调用
|
||||
|
||||
工具是 Agent 访问操作系统资源的核心,Agent 会根据任务需求智能选择和调用工具,完成文件读写、命令执行、定时任务等各类操作。内置工具的实现在项目的 `agent/tools/` 目录下。
|
||||
|
||||
**主要工具:** 文件读写编辑、Bash 终端、浏览器操作、文件发送、定时调度、记忆搜索、联网搜索、环境配置等。
|
||||
|
||||
### 2.1 终端和文件访问
|
||||
### 3.1 终端和文件访问
|
||||
|
||||
针对操作系统的终端和文件的访问能力,是最基础和核心的工具,其他很多工具或技能都是基于此进行扩展。用户可通过手机端与 Agent 交互,操作个人电脑或服务器上的资源:
|
||||
|
||||
@@ -29,7 +42,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202181130.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.2 编程能力
|
||||
### 3.2 编程能力
|
||||
|
||||
基于编程能力和系统访问能力,Agent 可以实现从信息搜索、图片等素材生成、编码、测试、部署、Nginx 配置修改、发布的 **Vibecoding 全流程**,通过手机端简单的一句命令完成应用的快速 demo:
|
||||
|
||||
@@ -37,7 +50,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260203121008.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.3 定时任务
|
||||
### 3.3 定时任务
|
||||
|
||||
基于 `scheduler` 工具实现动态定时任务,支持**一次性任务、固定时间间隔、Cron 表达式**三种形式,任务触发可选择**固定消息发送**或 **Agent 动态任务**执行两种模式:
|
||||
|
||||
@@ -45,7 +58,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.4 浏览器操作
|
||||
### 3.4 浏览器操作
|
||||
|
||||
内置 `browser` 工具,Agent 可控制浏览器访问网页、填写表单、点击元素、截图,支持动态 JS 渲染页面。运行 `cow install-browser` 一键安装,自动适配服务器(无头模式)和桌面环境:
|
||||
|
||||
@@ -53,7 +66,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401115728.png" width="750" />
|
||||
</Frame>
|
||||
|
||||
### 2.5 环境变量管理
|
||||
### 3.5 环境变量管理
|
||||
|
||||
技能所需的秘钥存储在环境变量文件中,由 `env_config` 工具进行管理,你可以通过对话的方式更新秘钥,工具内置安全保护和脱敏策略:
|
||||
|
||||
@@ -61,7 +74,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202234939.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
## 3. 技能系统
|
||||
## 4. 技能系统
|
||||
|
||||
技能系统为 Agent 提供无限的扩展性,每个 Skill 由说明文件、运行脚本(可选)、资源(可选)组成,描述如何完成特定类型的任务。通过 Skill 可以让 Agent 遵循说明完成复杂流程、调用各类工具或对接第三方系统。
|
||||
|
||||
@@ -71,7 +84,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
|
||||
安装技能:`/skill install <名称>` 或 `cow skill install <名称>`,支持从 Skill Hub、GitHub、ClawHub、URL 等来源安装。
|
||||
|
||||
### 3.1 创建技能
|
||||
### 4.1 创建技能
|
||||
|
||||
通过 `skill-creator` 技能可以通过对话的方式快速创建技能。你可以让 Agent 将某个工作流程固化为技能,或者把任意接口文档和示例发送给 Agent,让他直接完成对接:
|
||||
|
||||
@@ -79,7 +92,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202202247.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 3.2 搜索和图像识别
|
||||
### 4.2 搜索和图像识别
|
||||
|
||||
- **联网搜索:** 内置 `web_search` 工具,支持多种搜索引擎,配置 `BOCHA_API_KEY` 或 `LINKAI_API_KEY` 后启用。
|
||||
- **图像识别:** 内置 `openai-image-vision` 技能,可使用 `gpt-4.1-mini`、`gpt-4.1` 等模型,依赖 `OPENAI_API_KEY`。
|
||||
@@ -88,7 +101,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 3.3 技能广场
|
||||
### 4.3 技能广场
|
||||
|
||||
访问 [skills.cowagent.ai](https://skills.cowagent.ai/) 浏览所有可用技能,或在对话中执行:
|
||||
|
||||
@@ -103,7 +116,7 @@ description: CowAgent 长期记忆、任务规划、技能系统、CLI 命令、
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
|
||||
|
||||
|
||||
## 4. CLI 命令系统
|
||||
## 5. CLI 命令系统
|
||||
|
||||
CowAgent 提供两种命令交互方式,覆盖服务管理、技能安装、配置调整等日常运维操作:
|
||||
|
||||
@@ -118,6 +131,6 @@ cow skill install pptx # 安装技能
|
||||
cow install-browser # 安装浏览器工具
|
||||
```
|
||||
|
||||
详细命令参考 [命令总览](https://docs.cowagent.ai/commands)。
|
||||
详细命令参考 [命令总览](https://docs.cowagent.ai/cli)。
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
@@ -5,7 +5,7 @@ description: CowAgent - 基于大模型的超级AI助理
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/78c5dd674e2c828642ecc0406669fed7.png" alt="CowAgent" width="450px"/>
|
||||
|
||||
**CowAgent** 是基于大模型的超级AI助理,能够主动思考和任务规划、操作计算机和外部资源、创造和执行Skills、拥有长期记忆并不断成长。
|
||||
**CowAgent** 是基于大模型的超级AI助理,能够主动思考和任务规划、操作计算机和外部资源、创造和执行Skills、拥有长期记忆和知识库并不断成长。
|
||||
|
||||
CowAgent 支持灵活切换多种模型,能处理文本、语音、图片、文件等多模态消息,可接入微信、飞书、钉钉、企业微信应用、微信公众号、网页中使用,7×24小时运行于你的个人电脑或服务器中。
|
||||
|
||||
@@ -27,16 +27,16 @@ CowAgent 支持灵活切换多种模型,能处理文本、语音、图片、
|
||||
<Card title="长期记忆" icon="database" href="/memory">
|
||||
自动将对话记忆持久化至本地文件和数据库中,包括全局记忆和天级记忆,支持关键词及向量检索。
|
||||
</Card>
|
||||
<Card title="个人知识库" icon="book" href="/knowledge">
|
||||
自动整理结构化知识,支持知识图谱可视化,通过交叉引用构建持续增长的知识网络。
|
||||
</Card>
|
||||
<Card title="技能系统" icon="puzzle-piece" href="/skills/index">
|
||||
实现了Skills创建和运行的引擎,内置多种技能,并支持通过自然语言对话完成自定义Skills开发。
|
||||
</Card>
|
||||
<Card title="多模态消息" icon="image" href="/channels/web">
|
||||
支持对文本、图片、语音、文件等多类型消息进行解析、处理、生成、发送等操作。
|
||||
</Card>
|
||||
<Card title="工具系统" icon="wrench" href="/tools/index">
|
||||
内置文件读写、终端执行、浏览器操作、定时任务、消息发送等工具,Agent 可自主调用工具完成复杂任务。
|
||||
</Card>
|
||||
<Card title="命令系统" icon="terminal" href="/commands/index">
|
||||
<Card title="命令系统" icon="terminal" href="/cli/index">
|
||||
提供终端 CLI 和对话中的命令,支持进程管理、技能安装、配置修改、上下文查看等常用操作。
|
||||
</Card>
|
||||
<Card title="多模型支持" icon="microchip" href="/models/index">
|
||||
|
||||
@@ -76,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
|
||||
実行後、デフォルトでWebサービスが起動します。`http://localhost:9899/chat` にアクセスしてチャットを開始できます。
|
||||
|
||||
スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start`、`cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/commands/index)でサービスを管理できます。
|
||||
スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start`、`cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/cli/index)でサービスを管理できます。
|
||||
|
||||
### 手動インストール
|
||||
|
||||
@@ -100,7 +100,7 @@ pip3 install -r requirements-optional.txt # 任意ですが推奨
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
インストール後、`cow` コマンドでサービス管理(起動、停止、更新など)やSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/commands/index)を参照してください。
|
||||
インストール後、`cow` コマンドでサービス管理(起動、停止、更新など)やSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/cli/index)を参照してください。
|
||||
|
||||
**4. ブラウザのインストール(任意)**
|
||||
|
||||
@@ -165,7 +165,7 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
| GLM | `glm-5-turbo` |
|
||||
| Kimi | `kimi-k2.5` |
|
||||
| Doubao | `doubao-seed-2-0-code-preview-260215` |
|
||||
| Qwen | `qwen3.5-plus` |
|
||||
| Qwen | `qwen3.6-plus` |
|
||||
| Claude | `claude-sonnet-4-6` |
|
||||
| Gemini | `gemini-3.1-pro-preview` |
|
||||
| OpenAI | `gpt-5.4` |
|
||||
|
||||
@@ -92,6 +92,31 @@ description: ステータスの確認、設定管理、コンテキスト制御
|
||||
/logs 50
|
||||
```
|
||||
|
||||
## knowledge
|
||||
|
||||
パーソナルナレッジベースの表示と管理を行います。デフォルトでは統計情報を表示します。
|
||||
|
||||
```text
|
||||
/knowledge
|
||||
```
|
||||
|
||||
**ディレクトリ構造を表示:**
|
||||
|
||||
```text
|
||||
/knowledge list
|
||||
```
|
||||
|
||||
**ナレッジベースの有効化・無効化:**
|
||||
|
||||
```text
|
||||
/knowledge on
|
||||
/knowledge off
|
||||
```
|
||||
|
||||
<Note>
|
||||
ターミナル CLI では `cow knowledge` と `cow knowledge list` が利用可能ですが、`on|off` はチャットでのみサポートされます(実行時に即座に反映するため)。
|
||||
</Note>
|
||||
|
||||
## version
|
||||
|
||||
現在の CowAgent のバージョンを表示します。
|
||||
@@ -40,6 +40,9 @@ Service:
|
||||
Skills:
|
||||
skill Manage skills (list / search / install / uninstall ...)
|
||||
|
||||
Knowledge:
|
||||
knowledge View knowledge base stats and structure
|
||||
|
||||
Others:
|
||||
help Show this help message
|
||||
version Show version
|
||||
@@ -55,6 +58,9 @@ Web コンソールや接続されたチャネルの会話で `/` を入力す
|
||||
| `/status` | サービスの状態と設定を表示 |
|
||||
| `/config` | 実行時設定の表示・変更 |
|
||||
| `/skill` | スキル管理(インストール、アンインストール、有効化、無効化など) |
|
||||
| `/knowledge` | ナレッジベースの統計情報を表示 |
|
||||
| `/knowledge list` | ナレッジベースのディレクトリ構造を表示 |
|
||||
| `/knowledge on\|off` | ナレッジベースの有効化・無効化 |
|
||||
| `/context` | 現在のセッションのコンテキスト情報を表示 |
|
||||
| `/context clear` | 現在のセッションのコンテキストをクリア |
|
||||
| `/logs` | 最近のログを表示 |
|
||||
@@ -74,6 +80,7 @@ Web コンソールや接続されたチャネルの会話で `/` を入力す
|
||||
| logs | ✓ | ✓ |
|
||||
| config | ✗ | ✓ |
|
||||
| context | — | ✓ |
|
||||
| knowledge(サブコマンド) | ✓ | ✓ |
|
||||
| skill(サブコマンド) | ✓ | ✓ |
|
||||
| start / stop / restart | ✓ | ✗ |
|
||||
| update | ✓ | ✗ |
|
||||
@@ -47,7 +47,7 @@ Linux、macOS、Windowsに対応しています。Python 3.7〜3.12が必要で
|
||||
| `cow update` | コードを更新して再起動 |
|
||||
| `cow install-browser` | ブラウザツールの依存をインストール |
|
||||
|
||||
詳細は[コマンドドキュメント](/ja/commands/index)を参照してください。
|
||||
詳細は[コマンドドキュメント](/ja/cli/index)を参照してください。
|
||||
|
||||
<Note>
|
||||
`cow` コマンドが利用できない場合は、`./run.sh <コマンド>`(Linux/macOS)または `.\scripts\run.ps1 <コマンド>`(Windows)で代替できます。機能は同等です。
|
||||
|
||||
@@ -11,14 +11,16 @@ CowAgent のアーキテクチャは以下のコアモジュールで構成さ
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
|
||||
|
||||
### コアモジュール
|
||||
|
||||
| モジュール | 説明 |
|
||||
| --- | --- |
|
||||
| **Channels** | メッセージの受信と送信を行うメッセージチャネル層。Web、Feishu(飛書)、DingTalk(釘釘)、WeCom(企業微信)、WeChat公式アカウントなどをサポート |
|
||||
| **Agent Core** | タスク計画、記憶システム、Skill エンジンを含む Agent エンジン |
|
||||
| **Tools** | Agent が OS リソースにアクセスするためのツール層。10 以上の組み込みツール |
|
||||
| **Models** | 主要な LLM への統一アクセスを提供するモデル層 |
|
||||
| **Plan** | ユーザーの意図を理解し、複雑なタスクをマルチステップの計画に分解、目標達成までツールを反復的に呼び出す |
|
||||
| **Memory** | 重要な情報をコアメモリとデイリーメモリとして自動永続化し、キーワードとベクトルのハイブリッド検索でセッション間の連続性を実現 |
|
||||
| **Knowledge** | トピック別に構造化された知識を整理。Agent が価値ある情報を Markdown ページとして自律的に整理し、インデックスと相互参照で成長するナレッジネットワークを構築 |
|
||||
| **Tools** | Agent が OS リソースにアクセスするための中核能力。ファイル読み書き、ターミナル、ブラウザ、スケジューラ、記憶検索、Web 検索など 10 以上の組み込みツール |
|
||||
| **Skills** | Skill の読み込み・管理。Skill Hub や GitHub からのワンクリックインストール、または会話を通じたカスタム Skill の作成をサポート |
|
||||
| **Models** | モデル層。OpenAI、Claude、Gemini、DeepSeek、MiniMax、GLM、Qwen など主要 LLM への統一アクセスを提供 |
|
||||
| **Channels** | メッセージチャネル層。Web コンソール、WeChat、Feishu、DingTalk、WeCom、公式アカウントなど複数チャネルを統一プロトコルでサポート |
|
||||
| **CLI** | コマンドラインシステム。ターミナルコマンド(`cow`)とチャットコマンド(`/`)で、プロセス管理、Skill インストール、設定変更、ナレッジベース管理などをサポート |
|
||||
|
||||
## Agent モードのワークフロー
|
||||
|
||||
@@ -28,7 +30,7 @@ Agent モードが有効な場合、CowAgent は以下のワークフローで
|
||||
2. **意図の理解** — タスク要件とコンテキストを分析
|
||||
3. **タスク計画** — 複雑なタスクを複数のステップに分解
|
||||
4. **ツール呼び出し** — 各ステップに適切なツールを選択・実行
|
||||
5. **記憶の更新** — 重要な情報を長期記憶に保存
|
||||
5. **記憶・ナレッジの更新** — 重要な情報を長期記憶に保存し、構造化された知識をナレッジベースに整理
|
||||
6. **結果の返却** — 実行結果をユーザーに送信
|
||||
|
||||
## ワークスペースのディレクトリ構成
|
||||
@@ -39,9 +41,12 @@ Agent のワークスペースはデフォルトで `~/cow` にあり、シス
|
||||
~/cow/
|
||||
├── system.md # Agent システムプロンプト
|
||||
├── user.md # ユーザープロフィール
|
||||
├── MEMORY.md # コアメモリ
|
||||
├── memory/ # 長期記憶ストレージ
|
||||
│ ├── core.md # コアメモリ
|
||||
│ └── daily/ # デイリーメモリ
|
||||
│ └── YYYY-MM-DD.md # デイリーメモリ
|
||||
├── knowledge/ # パーソナルナレッジベース
|
||||
│ ├── index.md # ナレッジインデックス
|
||||
│ └── <category>/ # トピック別ページ
|
||||
└── skills/ # カスタム Skill
|
||||
├── skill-1/
|
||||
└── skill-2/
|
||||
@@ -75,3 +80,4 @@ Agent のワークスペースはデフォルトで `~/cow` にあり、シス
|
||||
| `agent_max_context_tokens` | 最大コンテキストトークン数 | `40000` |
|
||||
| `agent_max_context_turns` | 最大コンテキストターン数 | `30` |
|
||||
| `agent_max_steps` | タスクあたりの最大判断ステップ数 | `15` |
|
||||
| `knowledge` | パーソナルナレッジベースの有効化 | `true` |
|
||||
|
||||
@@ -15,13 +15,26 @@ description: CowAgent の長期記憶、タスク計画、Skill システム、C
|
||||
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
## 2. タスク計画とツール活用
|
||||
## 2. パーソナルナレッジベース
|
||||
|
||||
> ナレッジベースシステムにより、Agent は構造化された知識を継続的に蓄積・整理できます。時系列で記録されるメモリとは異なり、ナレッジベースはトピック別に整理され、記事、会話からの洞察、学習資料などを相互にリンクされた Markdown ページとして整理し、継続的に成長するナレッジネットワークを形成します。
|
||||
|
||||
Agent は会話中に価値ある情報を自動的にナレッジページとして整理し、相互参照とインデックスを維持します。Web コンソールではドキュメントの閲覧とナレッジグラフの可視化が可能です。ナレッジはワークスペースの `~/cow/knowledge/` ディレクトリに保存されます。
|
||||
|
||||
- **自動整理**:Agent が会話中に構造化された知識を自律的に抽出・整理し、インデックスと相互参照を維持
|
||||
- **ナレッジグラフ**:ページ間の相互参照から自動的にナレッジグラフを構築し、Web コンソールでインタラクティブな関係図として可視化
|
||||
- **チャット連携**:Agent の回答で参照されるナレッジドキュメントのリンクを Web コンソールで直接クリックして閲覧可能
|
||||
- **CLI 管理**:`/knowledge` コマンドで統計表示、ディレクトリ閲覧、`/knowledge on|off` で機能の切り替えが可能
|
||||
|
||||
詳細は [パーソナルナレッジベース](/ja/knowledge) を参照してください。
|
||||
|
||||
## 3. タスク計画とツール活用
|
||||
|
||||
ツールは Agent がオペレーティングシステムのリソースにアクセスするための中核です。Agent はタスク要件に基づいてインテリジェントにツールを選択・呼び出し、ファイルの読み書き、コマンド実行、スケジュールタスクなどを実行します。組み込みツールはプロジェクトの `agent/tools/` ディレクトリに実装されています。
|
||||
|
||||
**主なツール:** ファイルの読み書き・編集、Bash ターミナル、ブラウザ操作、ファイル送信、スケジューラ、記憶検索、Web 検索、環境設定など。
|
||||
|
||||
### 2.1 ターミナルとファイルアクセス
|
||||
### 3.1 ターミナルとファイルアクセス
|
||||
|
||||
OS のターミナルとファイルシステムへのアクセスは、最も基本的かつ中核的な機能です。多くの他のツールや Skill はこの機能の上に構築されています。ユーザーはモバイルデバイスから Agent とやり取りし、パソコンやサーバーのリソースを操作できます:
|
||||
|
||||
@@ -29,7 +42,7 @@ OS のターミナルとファイルシステムへのアクセスは、最も
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202181130.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.2 プログラミング能力
|
||||
### 3.2 プログラミング能力
|
||||
|
||||
プログラミングとシステムアクセスを組み合わせることで、Agent は完全な **Vibecoding ワークフロー** を実行できます。情報検索、アセット生成、コーディング、テスト、デプロイ、Nginx 設定、公開まで、すべてスマートフォンからの一つのコマンドで実行可能です:
|
||||
|
||||
@@ -37,7 +50,7 @@ OS のターミナルとファイルシステムへのアクセスは、最も
|
||||
<img src="https://cdn.link-ai.tech/doc/20260203121008.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.3 スケジュールタスク
|
||||
### 3.3 スケジュールタスク
|
||||
|
||||
`scheduler` ツールにより動的なスケジュールタスクが可能で、**ワンタイムタスク、固定間隔、Cron 式**をサポートしています。タスクは**固定メッセージ送信**または **Agent 動的タスク**実行としてトリガーできます:
|
||||
|
||||
@@ -45,7 +58,7 @@ OS のターミナルとファイルシステムへのアクセスは、最も
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202195402.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.4 ブラウザ操作
|
||||
### 3.4 ブラウザ操作
|
||||
|
||||
組み込みの `browser` ツールにより、Agent は Chromium ブラウザを制御して Web ページへのアクセス、フォームの入力、要素のクリック、スクリーンショットの撮影が可能です。動的 JS レンダリングページにも対応しています。`cow install-browser` でワンコマンドインストール、サーバー(ヘッドレス)とデスクトップ環境に自動対応します:
|
||||
|
||||
@@ -53,7 +66,7 @@ OS のターミナルとファイルシステムへのアクセスは、最も
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 2.5 環境変数管理
|
||||
### 3.5 環境変数管理
|
||||
|
||||
Skill が必要とするシークレットキーは環境変数ファイルに保存され、`env_config` ツールによって管理されます。会話を通じてシークレットを更新でき、セキュリティ保護とマスキング機能が組み込まれています:
|
||||
|
||||
@@ -61,7 +74,7 @@ Skill が必要とするシークレットキーは環境変数ファイルに
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202234939.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
## 3. Skill システム
|
||||
## 4. Skill システム
|
||||
|
||||
Skill システムは Agent に無限の拡張性を提供します。各 Skill は説明ファイル、実行スクリプト(任意)、リソース(任意)で構成され、特定のタイプのタスクを完了する方法を記述します。Skill により Agent は複雑なワークフローの指示に従い、ツールを呼び出し、サードパーティシステムと連携できます。
|
||||
|
||||
@@ -71,7 +84,7 @@ Skill システムは Agent に無限の拡張性を提供します。各 Skill
|
||||
|
||||
Skill のインストール:`/skill install <名前>` または `cow skill install <名前>`。Skill Hub、GitHub、ClawHub、URL などからインストール可能。
|
||||
|
||||
### 3.1 Skill の作成
|
||||
### 4.1 Skill の作成
|
||||
|
||||
`skill-creator` Skill により、会話を通じて Skill を素早く作成できます。ワークフローを Skill としてコード化するよう Agent に依頼したり、API ドキュメントやサンプルを送信して Agent に直接連携を完成させることができます:
|
||||
|
||||
@@ -79,7 +92,7 @@ Skill のインストール:`/skill install <名前>` または `cow skill ins
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202202247.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 3.2 Web 検索と画像認識
|
||||
### 4.2 Web 検索と画像認識
|
||||
|
||||
- **Web 検索:** 組み込みの `web_search` ツールで、複数の検索エンジンをサポートします。`BOCHA_API_KEY` または `LINKAI_API_KEY` を設定して有効化してください。
|
||||
- **画像認識:** 組み込みの `openai-image-vision` Skill で、`gpt-4.1-mini`、`gpt-4.1` などのモデルをサポートします。`OPENAI_API_KEY` が必要です。
|
||||
@@ -88,7 +101,7 @@ Skill のインストール:`/skill install <名前>` または `cow skill ins
|
||||
<img src="https://cdn.link-ai.tech/doc/20260202213219.png" width="800" />
|
||||
</Frame>
|
||||
|
||||
### 3.3 Skill Hub
|
||||
### 4.3 Skill Hub
|
||||
|
||||
[skills.cowagent.ai](https://skills.cowagent.ai/) で利用可能なすべての Skill を閲覧するか、会話内でコマンドを実行できます:
|
||||
|
||||
@@ -102,7 +115,7 @@ GitHub、ClawHub、LinkAI などサードパーティプラットフォームの
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401110103.png" width="750" />
|
||||
|
||||
## 4. CLI コマンドシステム
|
||||
## 5. CLI コマンドシステム
|
||||
|
||||
CowAgent はサービス管理、Skill インストール、設定変更などをカバーする2つのコマンドインターフェースを提供します:
|
||||
|
||||
@@ -117,4 +130,4 @@ cow skill install pptx # Skill をインストール
|
||||
cow install-browser # ブラウザツールをインストール
|
||||
```
|
||||
|
||||
詳細は [コマンド一覧](https://docs.cowagent.ai/ja/commands) を参照してください。
|
||||
詳細は [コマンド一覧](https://docs.cowagent.ai/ja/cli) を参照してください。
|
||||
|
||||
@@ -22,6 +22,9 @@ CowAgent は自ら思考しタスクを計画し、コンピュータや外部
|
||||
<Card title="長期記憶" icon="database" href="/ja/memory">
|
||||
会話の記憶をローカルファイルやデータベースに自動的に永続化します。コアメモリとデイリーメモリを含み、キーワード検索とベクトル検索に対応しています。
|
||||
</Card>
|
||||
<Card title="ナレッジベース" icon="book" href="/ja/knowledge">
|
||||
構造化された知識を自動整理し、ナレッジグラフの可視化をサポート。相互参照により継続的に成長するナレッジネットワークを構築します。
|
||||
</Card>
|
||||
<Card title="Skill システム" icon="puzzle-piece" href="/ja/skills/index">
|
||||
Skill の作成・実行エンジンを実装し、組み込み Skill を搭載。自然言語の会話を通じてカスタム Skill の開発もサポートしています。
|
||||
</Card>
|
||||
@@ -31,7 +34,7 @@ CowAgent は自ら思考しタスクを計画し、コンピュータや外部
|
||||
<Card title="ツールシステム" icon="wrench" href="/ja/tools/index">
|
||||
ファイル読み書き、ターミナル実行、ブラウザ操作、スケジュールタスク、メッセージ送信などの組み込みツールを提供。Agent が自律的にツールを呼び出して複雑なタスクを完了します。
|
||||
</Card>
|
||||
<Card title="コマンドシステム" icon="terminal" href="/ja/commands/index">
|
||||
<Card title="コマンドシステム" icon="terminal" href="/ja/cli/index">
|
||||
ターミナル CLI とチャット内コマンドを提供し、プロセス管理、Skill インストール、設定変更、コンテキスト確認などの一般的な操作をサポートします。
|
||||
</Card>
|
||||
<Card title="複数モデル対応" icon="microchip" href="/ja/models/index">
|
||||
|
||||
77
docs/ja/knowledge/index.mdx
Normal file
77
docs/ja/knowledge/index.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: パーソナルナレッジベース
|
||||
description: CowAgent のパーソナルナレッジベース — 構造化された知識の蓄積、自動整理、ナレッジグラフ
|
||||
---
|
||||
|
||||
パーソナルナレッジベースは、Agent の長期的な構造化知識ストレージで、ワークスペースの `knowledge/` ディレクトリに保存されます。タイムラインで整理されるメモリとは異なり、ナレッジベースはトピック別にコンテンツを整理します。記事、会話のインサイト、学習資料が相互リンクされた Markdown ページとして構造化され、継続的に成長するナレッジネットワークを形成します。
|
||||
|
||||
## コアコンセプト
|
||||
|
||||
### ナレッジ vs メモリ
|
||||
|
||||
| 次元 | ナレッジベース(knowledge/) | 長期記憶(memory/) |
|
||||
| --- | --- | --- |
|
||||
| 整理方法 | トピック別、相互リンク | タイムライン順、日付ファイル |
|
||||
| 書き込み | Agent が能動的に構造化 | コンテキストトリミング時に自動要約 |
|
||||
| コンテンツ | 精製された構造化知識 | 生の会話要約 |
|
||||
| 用途 | 学習ノート、技術ドキュメント、プロジェクト知識 | 会話履歴、イベント記録 |
|
||||
|
||||
### ディレクトリ構造
|
||||
|
||||
```
|
||||
~/cow/knowledge/
|
||||
├── index.md # ナレッジインデックス、全ページのエントリポイント
|
||||
├── log.md # 変更ログ、各書き込みの記録
|
||||
├── concepts/ # 概念的な知識
|
||||
│ └── machine-learning.md
|
||||
├── entities/ # エンティティ知識(人物、組織、ツール)
|
||||
│ └── openai.md
|
||||
└── sources/ # ソース知識(記事、論文)
|
||||
└── llm-wiki.md
|
||||
```
|
||||
|
||||
ディレクトリ構造は柔軟です — Agent は実際のコンテンツに基づいて適切なカテゴリディレクトリを自動作成します。ユーザーが整理方法をカスタマイズすることも可能です。
|
||||
|
||||
## 自動整理
|
||||
|
||||
ナレッジの書き込みは Agent の自律的な動作で、以下のシナリオでトリガーされます:
|
||||
|
||||
- **ユーザーが記事やドキュメントを共有** — Agent が自動的にキー情報を抽出し、構造化されたナレッジページを作成
|
||||
- **会話から価値ある結論が生まれた場合** — Agent がインサイトをナレッジページに整理し、既存の知識とリンク
|
||||
- **ユーザーが明示的に整理を要求** — ユーザーは会話を通じて Agent にナレッジの整理・更新を指示可能
|
||||
|
||||
各ナレッジページには関連ページへの相互参照リンクが含まれ、ナレッジグラフを段階的に構築します。
|
||||
|
||||
## ナレッジ検索
|
||||
|
||||
Agent は会話中に以下の方法でナレッジを検索できます:
|
||||
|
||||
- **インデックス参照** — `knowledge/index.md` で関連ページを素早く特定
|
||||
- **セマンティック検索** — `memory_search` ツールでナレッジコンテンツをセマンティック検索
|
||||
- **直接読み取り** — `memory_get` ツールで特定のナレッジファイルを読み取り
|
||||
|
||||
## Web コンソール
|
||||
|
||||
Web コンソールには専用の「ナレッジ」モジュールがあり、以下をサポートします:
|
||||
|
||||
- **ドキュメント閲覧** — ツリー形式のディレクトリ構造、検索・折りたたみ可能、クリックでコンテンツ表示
|
||||
- **ナレッジグラフ** — D3.js フォースダイレクテッドグラフによるナレッジ間の関係を可視化
|
||||
- **チャット連携** — Agent の返信で参照されたナレッジドキュメントリンクはクリックで直接ナビゲーション
|
||||
|
||||
## CLI コマンド
|
||||
|
||||
`/knowledge` コマンドでナレッジベースを管理:
|
||||
|
||||
| コマンド | 説明 |
|
||||
| --- | --- |
|
||||
| `/knowledge` | ナレッジベースの統計情報を表示 |
|
||||
| `/knowledge list` | ファイルディレクトリをツリー形式で表示 |
|
||||
| `/knowledge on` | ナレッジベース機能を有効化 |
|
||||
| `/knowledge off` | ナレッジベース機能を無効化 |
|
||||
|
||||
## 設定
|
||||
|
||||
| パラメータ | 説明 | デフォルト |
|
||||
| --- | --- | --- |
|
||||
| `knowledge` | パーソナルナレッジベースの有効/無効 | `true` |
|
||||
| `agent_workspace` | ワークスペースパス、ナレッジは `knowledge/` サブディレクトリに保存されます | `~/cow` |
|
||||
@@ -6,7 +6,7 @@ description: CowAgentがサポートするモデルとおすすめの選択肢
|
||||
CowAgentは国内外の主要なLLMをサポートしています。モデルインターフェースはプロジェクトの`models/`ディレクトリに実装されています。
|
||||
|
||||
<Note>
|
||||
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
</Note>
|
||||
|
||||
## 設定
|
||||
@@ -25,7 +25,7 @@ CowAgentは国内外の主要なLLMをサポートしています。モデルイ
|
||||
glm-5-turbo、glm-5およびその他のシリーズモデル
|
||||
</Card>
|
||||
<Card title="Qwen (通义千问)" href="/ja/models/qwen">
|
||||
qwen3.5-plus、qwen3-maxなど
|
||||
qwen3.6-plus、qwen3-maxなど
|
||||
</Card>
|
||||
<Card title="Kimi" href="/ja/models/kimi">
|
||||
kimi-k2.5、kimi-k2など
|
||||
|
||||
@@ -1,18 +1,18 @@
|
||||
---
|
||||
title: Qwen (通义千问)
|
||||
description: 通义千问モデルの設定
|
||||
title: Qwen (通義千問)
|
||||
description: 通義千問モデルの設定
|
||||
---
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| パラメータ | 説明 |
|
||||
| --- | --- |
|
||||
| `model` | `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
|
||||
| `model` | `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
|
||||
| `dashscope_api_key` | [百炼 Console](https://bailian.console.aliyun.com/?tab=model#/api-key)で作成。[公式ドキュメント](https://bailian.console.aliyun.com/?tab=api#/api)を参照 |
|
||||
|
||||
OpenAI互換の設定もサポートしています:
|
||||
@@ -20,7 +20,7 @@ OpenAI互換の設定もサポートしています:
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@ description: CowAgent 2.0.5 - Cow CLI、Skill Hub オープンソース、ブラ
|
||||
- **Web コンソール**:入力欄で `/` を入力するとスラッシュコマンドメニューが表示、矢印キーで入力履歴を辿れる
|
||||
- **Windows サポート**:PowerShell スクリプト `scripts/run.ps1` を追加、`cow` コマンドに対応
|
||||
|
||||
ドキュメント:[コマンド一覧](https://docs.cowagent.ai/ja/commands)
|
||||
ドキュメント:[コマンド一覧](https://docs.cowagent.ai/ja/cli)
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ CowAgent ではスキルを取得する複数の方法を提供しています
|
||||
- **URL** — zip アーカイブや SKILL.md リンクからインストール
|
||||
- **会話で作成** — 自然言語の会話を通じて Agent にスキルを自動作成させる
|
||||
|
||||
詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/commands/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
|
||||
詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/cli/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
|
||||
|
||||
## スキルの読み込み優先順位
|
||||
|
||||
|
||||
@@ -49,5 +49,5 @@ zip アーカイブと SKILL.md ファイルリンクに対応:
|
||||
```
|
||||
|
||||
<Tip>
|
||||
上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/commands/skill)を参照してください。
|
||||
上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/cli/skill)を参照してください。
|
||||
</Tip>
|
||||
|
||||
@@ -1,9 +1,11 @@
|
||||
---
|
||||
title: memory - メモリ
|
||||
description: 長期メモリの検索と読み取り
|
||||
title: memory - メモリ & ナレッジ
|
||||
description: 長期メモリとナレッジベースファイルの検索・読み取り
|
||||
---
|
||||
|
||||
メモリToolには `memory_search`(メモリ検索)と `memory_get`(メモリファイル読み取り)の2つのサブToolがあります。
|
||||
メモリToolには `memory_search`(メモリ検索)と `memory_get`(メモリまたはナレッジファイル読み取り)の2つのサブToolがあります。
|
||||
|
||||
[ナレッジベース](/ja/knowledge) 機能が有効な場合、両ツールとも `knowledge/` ディレクトリのファイルへのアクセスもサポートします。
|
||||
|
||||
## 依存関係
|
||||
|
||||
@@ -11,7 +13,7 @@ description: 長期メモリの検索と読み取り
|
||||
|
||||
## memory_search
|
||||
|
||||
キーワードとベクトルのハイブリッド検索で過去のメモリを検索します。
|
||||
キーワードとベクトルのハイブリッド検索で過去のメモリとナレッジベースの内容を検索します。
|
||||
|
||||
| パラメータ | 型 | 必須 | 説明 |
|
||||
| --- | --- | --- | --- |
|
||||
@@ -19,11 +21,11 @@ description: 長期メモリの検索と読み取り
|
||||
|
||||
## memory_get
|
||||
|
||||
特定のメモリファイルの内容を読み取ります。
|
||||
特定のメモリファイルまたはナレッジファイルの内容を読み取ります。
|
||||
|
||||
| パラメータ | 型 | 必須 | 説明 |
|
||||
| --- | --- | --- | --- |
|
||||
| `path` | string | はい | メモリファイルの相対パス(例:`MEMORY.md`、`memory/2026-01-01.md`) |
|
||||
| `path` | string | はい | ファイルの相対パス(例:`MEMORY.md`、`memory/2026-01-01.md`、`knowledge/concepts/rag.md`) |
|
||||
| `start_line` | integer | いいえ | 開始行番号 |
|
||||
| `end_line` | integer | いいえ | 終了行番号 |
|
||||
|
||||
@@ -34,3 +36,8 @@ Agentは以下のシナリオでメモリToolを自動的に呼び出します
|
||||
- ユーザーが重要な情報を共有した場合 → メモリに保存
|
||||
- 過去のコンテキストが必要な場合 → 関連するメモリを検索
|
||||
- 会話が一定の長さに達した場合 → 要約を抽出して保存
|
||||
- 専門知識について議論する場合 → ナレッジベースから関連ページを検索
|
||||
|
||||
<Note>
|
||||
設定で `knowledge` が `false` に設定されている場合、ツールの説明と検索範囲は自動的にメモリファイルのみに調整されます。
|
||||
</Note>
|
||||
|
||||
72
docs/ja/tools/vision.mdx
Normal file
72
docs/ja/tools/vision.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
title: vision - 画像分析
|
||||
description: 画像コンテンツの分析(認識、説明、OCR など)
|
||||
---
|
||||
|
||||
Vision API を使用してローカル画像や画像 URL を分析します。コンテンツの説明、テキスト抽出(OCR)、オブジェクト認識などに対応しています。
|
||||
|
||||
## モデル選択
|
||||
|
||||
Vision ツールは多段階の自動選択+自動フォールバック戦略を採用しており、手動設定なしで利用可能です:
|
||||
|
||||
1. **メインモデル** — 現在設定されているメインモデルで画像認識を実行(追加コストなし)
|
||||
2. **その他の設定済みモデル** — API キーが設定されている他のマルチモーダルモデルを自動検出
|
||||
3. **OpenAI** — `open_ai_api_key` を使用して gpt-4.1-mini を呼び出し
|
||||
4. **LinkAI** — `linkai_api_key` を使用して LinkAI ビジョンサービスを呼び出し
|
||||
|
||||
`use_linkai=true` の場合、LinkAI が最優先になります。
|
||||
|
||||
現在のプロバイダーが失敗した場合、成功するかすべて失敗するまで自動的に次のプロバイダーを試行します。
|
||||
|
||||
### 対応モデル
|
||||
|
||||
| ベンダー | ビジョンモデル | 説明 |
|
||||
| --- | --- | --- |
|
||||
| OpenAI / 互換プロトコル | メインモデル | すべての OpenAI 互換マルチモーダルモデルに対応 |
|
||||
| 通義千問 (DashScope) | メインモデル | MultiModalConversation API 経由 |
|
||||
| Claude | メインモデル | Anthropic ネイティブ画像形式 |
|
||||
| Gemini | メインモデル | inlineData 形式 |
|
||||
| 豆包 (Doubao) | メインモデル | doubao-seed-2-0 シリーズがネイティブ対応 |
|
||||
| Kimi (Moonshot) | メインモデル | kimi-k2.5 がネイティブ対応 |
|
||||
| 智谱 AI | glm-5v-turbo | 常にビジョン専用モデルを使用 |
|
||||
| MiniMax | MiniMax-Text-01 | 常にビジョン専用モデルを使用 |
|
||||
|
||||
<Note>
|
||||
智谱 AI と MiniMax のテキストモデルは画像理解に対応していないため、対応するビジョン専用モデルが自動的に使用されます。
|
||||
</Note>
|
||||
|
||||
## パラメータ
|
||||
|
||||
| パラメータ | 型 | 必須 | 説明 |
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | はい | ローカルファイルパスまたは HTTP(S) 画像 URL |
|
||||
| `question` | string | はい | 画像に対する質問 |
|
||||
|
||||
対応画像形式:jpg、jpeg、png、gif、webp
|
||||
|
||||
## カスタム設定
|
||||
|
||||
Vision ツールで使用するモデルを指定するには、`config.json` に以下を追加します:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
ほとんどの場合、設定は不要です。メインモデルがマルチモーダルに対応しているか、ビジョン対応の API キーが設定されていれば自動的に動作します。
|
||||
|
||||
## ユースケース
|
||||
|
||||
- 画像コンテンツの説明
|
||||
- 画像からのテキスト抽出(OCR)
|
||||
- オブジェクト、色、シーンの識別
|
||||
- スクリーンショットやスキャン文書の分析
|
||||
|
||||
<Note>
|
||||
1MB を超える画像は自動的に圧縮されます(最大辺 1536px)。すべての画像(リモート URL を含む)は base64 に変換して送信され、すべてのモデルバックエンドとの互換性を確保します。
|
||||
</Note>
|
||||
77
docs/knowledge/index.mdx
Normal file
77
docs/knowledge/index.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: 个人知识库
|
||||
description: CowAgent 的个人知识库系统 — 结构化知识沉淀、自动整理与知识图谱
|
||||
---
|
||||
|
||||
个人知识库是 Agent 的长期结构化知识存储,保存在工作空间的 `knowledge/` 目录下。与按时间线组织的记忆不同,知识库以主题为维度,将用户分享的文章、对话中的洞察、学习材料等整理为互相关联的 Markdown 页面,形成可持续增长的知识网络。
|
||||
|
||||
## 核心概念
|
||||
|
||||
### 知识 vs 记忆
|
||||
|
||||
| 维度 | 知识库(knowledge/) | 长期记忆(memory/) |
|
||||
| --- | --- | --- |
|
||||
| 组织方式 | 按主题分类、互相关联 | 按时间线、日期文件 |
|
||||
| 写入方式 | Agent 主动整理结构化内容 | 上下文裁剪时自动摘要 |
|
||||
| 内容特点 | 提炼后的结构化知识 | 原始对话摘要 |
|
||||
| 典型用途 | 学习笔记、技术文档、项目知识 | 对话历史、事件记录 |
|
||||
|
||||
### 目录结构
|
||||
|
||||
```
|
||||
~/cow/knowledge/
|
||||
├── index.md # 知识索引,所有页面的入口
|
||||
├── log.md # 变更日志,记录每次写入
|
||||
├── concepts/ # 概念类知识
|
||||
│ └── machine-learning.md
|
||||
├── entities/ # 实体类知识(人物、组织、工具)
|
||||
│ └── openai.md
|
||||
└── sources/ # 来源类知识(文章、论文)
|
||||
└── llm-wiki.md
|
||||
```
|
||||
|
||||
目录结构是灵活的 — Agent 会根据实际内容自动创建合适的分类目录。用户也可以通过对话自定义目录组织方式。
|
||||
|
||||
## 自动整理
|
||||
|
||||
知识库的写入是 Agent 的自主行为,在以下场景中触发:
|
||||
|
||||
- **用户分享文章或文档** — Agent 自动提取关键信息,创建结构化知识页面
|
||||
- **对话产生有价值的结论** — Agent 将洞察整理为知识页面,并与已有知识建立关联
|
||||
- **用户主动要求整理** — 用户可以通过对话指导 Agent 组织和更新知识
|
||||
|
||||
每个知识页面都包含与其他页面的交叉引用链接,逐步构建起一个知识图谱。
|
||||
|
||||
## 知识检索
|
||||
|
||||
Agent 在对话中可以通过以下方式检索知识:
|
||||
|
||||
- **索引查阅** — 通过 `knowledge/index.md` 快速定位相关知识页面
|
||||
- **语义搜索** — 通过 `memory_search` 工具对知识库内容进行语义检索
|
||||
- **直接读取** — 通过 `memory_get` 工具读取特定知识文件
|
||||
|
||||
## Web 控制台
|
||||
|
||||
Web 控制台提供了专用的「知识」模块,支持:
|
||||
|
||||
- **文档浏览** — 树状目录结构,可搜索、可折叠,点击查看文档内容
|
||||
- **知识图谱** — 基于 D3.js 的力导向图,可视化展示知识之间的关联关系
|
||||
- **对话联动** — Agent 回复中引用的知识文档链接可直接点击跳转查看
|
||||
|
||||
## CLI 命令
|
||||
|
||||
通过 `/knowledge` 命令管理知识库:
|
||||
|
||||
| 命令 | 说明 |
|
||||
| --- | --- |
|
||||
| `/knowledge` | 显示知识库统计信息 |
|
||||
| `/knowledge list` | 以树状结构显示文件目录 |
|
||||
| `/knowledge on` | 开启知识库功能 |
|
||||
| `/knowledge off` | 关闭知识库功能 |
|
||||
|
||||
## 相关配置
|
||||
|
||||
| 参数 | 说明 | 默认值 |
|
||||
| --- | --- | --- |
|
||||
| `knowledge` | 是否启用个人知识库功能 | `true` |
|
||||
| `agent_workspace` | 工作空间路径,知识库存储在此目录的 `knowledge/` 子目录下 | `~/cow` |
|
||||
@@ -6,19 +6,20 @@ description: CowAgent 支持的模型及推荐选择
|
||||
CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在项目的 `models/` 目录下。
|
||||
|
||||
<Note>
|
||||
Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
|
||||
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
|
||||
</Note>
|
||||
|
||||
## 配置方式
|
||||
|
||||
根据所选模型,在 `config.json` 中填写对应的模型名称和 API Key 即可。每个模型也支持 OpenAI 兼容方式接入,将 `bot_type` 设为 `openai`,配置 `open_ai_api_base` 和 `open_ai_api_key`。
|
||||
|
||||
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
|
||||
|
||||
也可以通过 [Web 控制台](/channels/web) 在线管理模型配置,无需手动编辑配置文件:
|
||||
**方式一(推荐):** 通过 [Web 控制台](/channels/web) 在线管理模型配置,无需手动编辑配置文件:
|
||||
|
||||
<img width="850" src="https://cdn.link-ai.tech/doc/20260227173811.png" />
|
||||
|
||||
**方式二:** 手动编辑 `config.json`,根据所选模型填写对应的模型名称和 API Key。每个模型也支持 OpenAI 兼容方式接入,将 `bot_type` 设为 `openai`,配置 `open_ai_api_base` 和 `open_ai_api_key` 即可。
|
||||
|
||||
|
||||
## 支持的模型
|
||||
|
||||
<CardGroup cols={2}>
|
||||
@@ -29,7 +30,7 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
|
||||
glm-5-turbo、glm-5 等系列模型
|
||||
</Card>
|
||||
<Card title="通义千问 Qwen" href="/models/qwen">
|
||||
qwen3.5-plus、qwen3-max 等
|
||||
qwen3.6-plus、qwen3-max 等
|
||||
</Card>
|
||||
<Card title="Kimi" href="/models/kimi">
|
||||
kimi-k2.5、kimi-k2 等
|
||||
@@ -54,6 +55,7 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
|
||||
<Tip>
|
||||
全部模型名称可参考项目 [`common/const.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py) 文件。
|
||||
</Tip>
|
||||
|
||||
@@ -5,14 +5,14 @@ description: 通义千问模型配置
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 说明 |
|
||||
| --- | --- |
|
||||
| `model` | 可填 `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
|
||||
| `model` | 可填 `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
|
||||
| `dashscope_api_key` | 在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) |
|
||||
|
||||
也支持 OpenAI 兼容方式接入:
|
||||
@@ -20,7 +20,7 @@ description: 通义千问模型配置
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@ description: CowAgent 2.0.5 - Cow CLI、Skill Hub 开源、浏览器工具、企
|
||||
- **web控制台**:Web 控制台输入框输入 `/` 即可弹出指令菜单,支持方向键回溯历史输入
|
||||
- **Windows 支持**:新增 PowerShell 一键安装脚本 `scripts/run.ps1`,同时支持 `cow` 命令
|
||||
|
||||
相关文档:[命令总览](https://docs.cowagent.ai/commands)
|
||||
相关文档:[命令总览](https://docs.cowagent.ai/cli)
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ CowAgent 提供多种方式获取技能:
|
||||
- **URL** — 从 zip 压缩包或 SKILL.md 链接安装
|
||||
- **对话创建** — 通过自然语言对话让 Agent 自动创建技能
|
||||
|
||||
详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/commands/skill)。也可以通过对话 [创建技能](/skills/create),或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能。
|
||||
详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/cli/skill)。也可以通过对话 [创建技能](/skills/create),或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能。
|
||||
|
||||
## 技能加载优先级
|
||||
|
||||
|
||||
@@ -62,5 +62,5 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **[Cow 技能广场]
|
||||
```
|
||||
|
||||
<Tip>
|
||||
以上所有命令在终端中使用时,将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/commands/skill)。
|
||||
以上所有命令在终端中使用时,将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/cli/skill)。
|
||||
</Tip>
|
||||
|
||||
@@ -1,9 +1,11 @@
|
||||
---
|
||||
title: memory - 记忆
|
||||
description: 搜索和读取长期记忆
|
||||
title: memory - 记忆与知识
|
||||
description: 搜索和读取长期记忆及知识库文件
|
||||
---
|
||||
|
||||
记忆工具包含两个子工具:`memory_search`(搜索记忆)和 `memory_get`(读取记忆文件)。
|
||||
记忆工具包含两个子工具:`memory_search`(搜索记忆)和 `memory_get`(读取记忆或知识文件)。
|
||||
|
||||
当 [知识库](/knowledge) 功能开启时,这两个工具同时支持访问 `memory/` 和 `knowledge/` 目录下的文件。
|
||||
|
||||
## 依赖
|
||||
|
||||
@@ -11,7 +13,7 @@ description: 搜索和读取长期记忆
|
||||
|
||||
## memory_search
|
||||
|
||||
搜索历史记忆,支持关键词和向量混合检索。
|
||||
搜索历史记忆和知识库内容,支持关键词和向量混合检索。
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
| --- | --- | --- | --- |
|
||||
@@ -19,11 +21,11 @@ description: 搜索和读取长期记忆
|
||||
|
||||
## memory_get
|
||||
|
||||
读取特定记忆文件的内容。
|
||||
读取特定记忆文件或知识库文件的内容。
|
||||
|
||||
| 参数 | 类型 | 必填 | 说明 |
|
||||
| --- | --- | --- | --- |
|
||||
| `path` | string | 是 | 记忆文件的相对路径(如 `MEMORY.md`、`memory/2026-01-01.md`) |
|
||||
| `path` | string | 是 | 文件的相对路径(如 `MEMORY.md`、`memory/2026-01-01.md`、`knowledge/concepts/rag.md`) |
|
||||
| `start_line` | integer | 否 | 起始行号 |
|
||||
| `end_line` | integer | 否 | 结束行号 |
|
||||
|
||||
@@ -34,3 +36,8 @@ Agent 会在以下场景自动调用记忆工具:
|
||||
- 用户分享重要信息时 → 存储到记忆
|
||||
- 需要参考历史信息时 → 搜索相关记忆
|
||||
- 对话达到一定长度时 → 提取摘要存储
|
||||
- 讨论到专业知识时 → 检索知识库中的相关页面
|
||||
|
||||
<Note>
|
||||
当 `knowledge` 配置为 `false` 时,工具的描述和搜索范围会自动调整为仅包含记忆文件。
|
||||
</Note>
|
||||
|
||||
@@ -5,14 +5,49 @@ description: 分析图片内容(识别、描述、OCR 等)
|
||||
|
||||
使用 Vision API 分析本地图片或图片 URL,支持内容描述、文字提取(OCR)、物体识别等。
|
||||
|
||||
## 依赖
|
||||
## 模型选择
|
||||
|
||||
需要配置至少一个 API Key(通过 `env_config` 工具或工作空间 `.env` 文件配置):
|
||||
Vision 工具采用多级自动选择 + 自动兜底策略,无需手动配置即可使用:
|
||||
|
||||
| 后端 | 环境变量 | 优先级 |
|
||||
1. **主模型** — 优先使用当前配置的主模型进行图像识别(需要是多模态模型)
|
||||
2. **其他已配置模型** — 自动发现已配置 API Key 的其他多模态模型作为备选
|
||||
|
||||
如果当前 provider 调用失败,会自动尝试下一个,直到成功或全部失败。
|
||||
|
||||
### 支持的模型
|
||||
|
||||
| 厂商 | 视觉模型 | 说明 |
|
||||
| --- | --- | --- |
|
||||
| OpenAI | `OPENAI_API_KEY` | 优先使用 |
|
||||
| LinkAI | `LINKAI_API_KEY` | 备选 |
|
||||
| OpenAI / 兼容协议 | 使用主模型 | 支持所有 OpenAI 协议兼容的多模态模型 |
|
||||
| 通义千问 (DashScope) | 使用主模型 | 例如 qwen3.6-plus 等 |
|
||||
| Claude | 使用主模型 | Anthropic 原生图像格式 |
|
||||
| Gemini | 使用主模型 | inlineData 格式 |
|
||||
| 豆包 (Doubao) | 使用主模型 | doubao-seed-2-0 系列原生支持 |
|
||||
| Kimi (Moonshot) | 使用主模型 | kimi-k2.5 原生支持 |
|
||||
| 智谱 AI | glm-5v-turbo | 固定使用视觉专用模型 |
|
||||
| MiniMax | MiniMax-Text-01 | 固定使用视觉专用模型 |
|
||||
|
||||
<Note>
|
||||
智谱和 MiniMax 的文本模型不支持图像理解,因此始终使用对应的视觉专用模型,无需手动指定。
|
||||
</Note>
|
||||
|
||||
> 当 `use_linkai=true` 时,默认使用 LinkAI 的多模态模型进行
|
||||
|
||||
## 自定义配置
|
||||
|
||||
如果希望指定 Vision 使用的模型,可在 `config.json` 中配置,例如:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
大多数情况下无需配置,主模型支持多模态或配置任意一个支持视觉的 API Key 即可自动工作。
|
||||
|
||||
## 参数
|
||||
|
||||
@@ -20,17 +55,18 @@ description: 分析图片内容(识别、描述、OCR 等)
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | 是 | 本地文件路径或 HTTP(S) 图片 URL |
|
||||
| `question` | string | 是 | 对图片提出的问题 |
|
||||
| `model` | string | 否 | 模型名称(默认 gpt-4.1-mini) |
|
||||
|
||||
支持的图片格式:jpg、jpeg、png、gif、webp
|
||||
|
||||
|
||||
|
||||
## 使用场景
|
||||
|
||||
- 描述图片中的内容
|
||||
- 提取图片中的文字(OCR)
|
||||
- 识别物体、颜色、场景
|
||||
- 分析截图、文档扫描件
|
||||
- 分析截图、文档扫描图片等
|
||||
|
||||
<Note>
|
||||
超过 1MB 的图片会自动压缩后上传。如果未配置任何 Vision API Key,该工具不会被加载。
|
||||
超过 1MB 的图片会自动压缩后上传,所有图片(包括远程 URL)会统一转为 base64 传输,确保兼容所有模型后端。
|
||||
</Note>
|
||||
|
||||
@@ -1,214 +0,0 @@
|
||||
# encoding:utf-8
|
||||
|
||||
import json
|
||||
import time
|
||||
from typing import List, Tuple
|
||||
|
||||
import openai
|
||||
from models.openai.openai_compat import RateLimitError, Timeout, APIError, APIConnectionError
|
||||
import broadscope_bailian
|
||||
from broadscope_bailian import ChatQaMessage
|
||||
|
||||
from models.bot import Bot
|
||||
from models.ali.ali_qwen_session import AliQwenSession
|
||||
from models.session_manager import SessionManager
|
||||
from bridge.context import ContextType
|
||||
from bridge.reply import Reply, ReplyType
|
||||
from common.log import logger
|
||||
from common import const
|
||||
from config import conf, load_config
|
||||
|
||||
class AliQwenBot(Bot):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.api_key_expired_time = self.set_api_key()
|
||||
self.sessions = SessionManager(AliQwenSession, model=conf().get("model", const.QWEN))
|
||||
|
||||
def api_key_client(self):
|
||||
return broadscope_bailian.AccessTokenClient(access_key_id=self.access_key_id(), access_key_secret=self.access_key_secret())
|
||||
|
||||
def access_key_id(self):
|
||||
return conf().get("qwen_access_key_id")
|
||||
|
||||
def access_key_secret(self):
|
||||
return conf().get("qwen_access_key_secret")
|
||||
|
||||
def agent_key(self):
|
||||
return conf().get("qwen_agent_key")
|
||||
|
||||
def app_id(self):
|
||||
return conf().get("qwen_app_id")
|
||||
|
||||
def node_id(self):
|
||||
return conf().get("qwen_node_id", "")
|
||||
|
||||
def temperature(self):
|
||||
return conf().get("temperature", 0.2 )
|
||||
|
||||
def top_p(self):
|
||||
return conf().get("top_p", 1)
|
||||
|
||||
def reply(self, query, context=None):
|
||||
# acquire reply content
|
||||
if context.type == ContextType.TEXT:
|
||||
logger.info("[QWEN] query={}".format(query))
|
||||
|
||||
session_id = context["session_id"]
|
||||
reply = None
|
||||
clear_memory_commands = conf().get("clear_memory_commands", ["#清除记忆"])
|
||||
if query in clear_memory_commands:
|
||||
self.sessions.clear_session(session_id)
|
||||
reply = Reply(ReplyType.INFO, "记忆已清除")
|
||||
elif query == "#清除所有":
|
||||
self.sessions.clear_all_session()
|
||||
reply = Reply(ReplyType.INFO, "所有人记忆已清除")
|
||||
elif query == "#更新配置":
|
||||
load_config()
|
||||
reply = Reply(ReplyType.INFO, "配置已更新")
|
||||
if reply:
|
||||
return reply
|
||||
session = self.sessions.session_query(query, session_id)
|
||||
logger.debug("[QWEN] session query={}".format(session.messages))
|
||||
|
||||
reply_content = self.reply_text(session)
|
||||
logger.debug(
|
||||
"[QWEN] new_query={}, session_id={}, reply_cont={}, completion_tokens={}".format(
|
||||
session.messages,
|
||||
session_id,
|
||||
reply_content["content"],
|
||||
reply_content["completion_tokens"],
|
||||
)
|
||||
)
|
||||
if reply_content["completion_tokens"] == 0 and len(reply_content["content"]) > 0:
|
||||
reply = Reply(ReplyType.ERROR, reply_content["content"])
|
||||
elif reply_content["completion_tokens"] > 0:
|
||||
self.sessions.session_reply(reply_content["content"], session_id, reply_content["total_tokens"])
|
||||
reply = Reply(ReplyType.TEXT, reply_content["content"])
|
||||
else:
|
||||
reply = Reply(ReplyType.ERROR, reply_content["content"])
|
||||
logger.debug("[QWEN] reply {} used 0 tokens.".format(reply_content))
|
||||
return reply
|
||||
|
||||
else:
|
||||
reply = Reply(ReplyType.ERROR, "Bot不支持处理{}类型的消息".format(context.type))
|
||||
return reply
|
||||
|
||||
def reply_text(self, session: AliQwenSession, retry_count=0) -> dict:
|
||||
"""
|
||||
call bailian's ChatCompletion to get the answer
|
||||
:param session: a conversation session
|
||||
:param retry_count: retry count
|
||||
:return: {}
|
||||
"""
|
||||
try:
|
||||
prompt, history = self.convert_messages_format(session.messages)
|
||||
self.update_api_key_if_expired()
|
||||
# NOTE 阿里百炼的call()函数未提供temperature参数,考虑到temperature和top_p参数作用相同,取两者较小的值作为top_p参数传入,详情见文档 https://help.aliyun.com/document_detail/2587502.htm
|
||||
response = broadscope_bailian.Completions().call(app_id=self.app_id(), prompt=prompt, history=history, top_p=min(self.temperature(), self.top_p()))
|
||||
completion_content = self.get_completion_content(response, self.node_id())
|
||||
completion_tokens, total_tokens = self.calc_tokens(session.messages, completion_content)
|
||||
return {
|
||||
"total_tokens": total_tokens,
|
||||
"completion_tokens": completion_tokens,
|
||||
"content": completion_content,
|
||||
}
|
||||
except Exception as e:
|
||||
need_retry = retry_count < 2
|
||||
result = {"completion_tokens": 0, "content": "我现在有点累了,等会再来吧"}
|
||||
if isinstance(e, RateLimitError):
|
||||
logger.warn("[QWEN] RateLimitError: {}".format(e))
|
||||
result["content"] = "提问太快啦,请休息一下再问我吧"
|
||||
if need_retry:
|
||||
time.sleep(20)
|
||||
elif isinstance(e, Timeout):
|
||||
logger.warn("[QWEN] Timeout: {}".format(e))
|
||||
result["content"] = "我没有收到你的消息"
|
||||
if need_retry:
|
||||
time.sleep(5)
|
||||
elif isinstance(e, APIError):
|
||||
logger.warn("[QWEN] Bad Gateway: {}".format(e))
|
||||
result["content"] = "请再问我一次"
|
||||
if need_retry:
|
||||
time.sleep(10)
|
||||
elif isinstance(e, APIConnectionError):
|
||||
logger.warn("[QWEN] APIConnectionError: {}".format(e))
|
||||
need_retry = False
|
||||
result["content"] = "我连接不到你的网络"
|
||||
else:
|
||||
logger.exception("[QWEN] Exception: {}".format(e))
|
||||
need_retry = False
|
||||
self.sessions.clear_session(session.session_id)
|
||||
|
||||
if need_retry:
|
||||
logger.warn("[QWEN] 第{}次重试".format(retry_count + 1))
|
||||
return self.reply_text(session, retry_count + 1)
|
||||
else:
|
||||
return result
|
||||
|
||||
def set_api_key(self):
|
||||
api_key, expired_time = self.api_key_client().create_token(agent_key=self.agent_key())
|
||||
broadscope_bailian.api_key = api_key
|
||||
return expired_time
|
||||
|
||||
def update_api_key_if_expired(self):
|
||||
if time.time() > self.api_key_expired_time:
|
||||
self.api_key_expired_time = self.set_api_key()
|
||||
|
||||
def convert_messages_format(self, messages) -> Tuple[str, List[ChatQaMessage]]:
|
||||
history = []
|
||||
user_content = ''
|
||||
assistant_content = ''
|
||||
system_content = ''
|
||||
for message in messages:
|
||||
role = message.get('role')
|
||||
if role == 'user':
|
||||
user_content += message.get('content')
|
||||
elif role == 'assistant':
|
||||
assistant_content = message.get('content')
|
||||
history.append(ChatQaMessage(user_content, assistant_content))
|
||||
user_content = ''
|
||||
assistant_content = ''
|
||||
elif role =='system':
|
||||
system_content += message.get('content')
|
||||
if user_content == '':
|
||||
raise Exception('no user message')
|
||||
if system_content != '':
|
||||
# NOTE 模拟系统消息,测试发现人格描述以"你需要扮演ChatGPT"开头能够起作用,而以"你是ChatGPT"开头模型会直接否认
|
||||
system_qa = ChatQaMessage(system_content, '好的,我会严格按照你的设定回答问题')
|
||||
history.insert(0, system_qa)
|
||||
logger.debug("[QWEN] converted qa messages: {}".format([item.to_dict() for item in history]))
|
||||
logger.debug("[QWEN] user content as prompt: {}".format(user_content))
|
||||
return user_content, history
|
||||
|
||||
def get_completion_content(self, response, node_id):
|
||||
if not response['Success']:
|
||||
return f"[ERROR]\n{response['Code']}:{response['Message']}"
|
||||
text = response['Data']['Text']
|
||||
if node_id == '':
|
||||
return text
|
||||
# TODO: 当使用流程编排创建大模型应用时,响应结构如下,最终结果在['finalResult'][node_id]['response']['text']中,暂时先这么写
|
||||
# {
|
||||
# 'Success': True,
|
||||
# 'Code': None,
|
||||
# 'Message': None,
|
||||
# 'Data': {
|
||||
# 'ResponseId': '9822f38dbacf4c9b8daf5ca03a2daf15',
|
||||
# 'SessionId': 'session_id',
|
||||
# 'Text': '{"finalResult":{"LLM_T7islK":{"params":{"modelId":"qwen-plus-v1","prompt":"${systemVars.query}${bizVars.Text}"},"response":{"text":"作为一个AI语言模型,我没有年龄,因为我没有生日。\n我只是一个程序,没有生命和身体。"}}}}',
|
||||
# 'Thoughts': [],
|
||||
# 'Debug': {},
|
||||
# 'DocReferences': []
|
||||
# },
|
||||
# 'RequestId': '8e11d31551ce4c3f83f49e6e0dd998b0',
|
||||
# 'Failed': None
|
||||
# }
|
||||
text_dict = json.loads(text)
|
||||
completion_content = text_dict['finalResult'][node_id]['response']['text']
|
||||
return completion_content
|
||||
|
||||
def calc_tokens(self, messages, completion_content):
|
||||
completion_tokens = len(completion_content)
|
||||
prompt_tokens = 0
|
||||
for message in messages:
|
||||
prompt_tokens += len(message["content"])
|
||||
return completion_tokens, prompt_tokens + completion_tokens
|
||||
@@ -1,62 +0,0 @@
|
||||
from models.session_manager import Session
|
||||
from common.log import logger
|
||||
|
||||
"""
|
||||
e.g.
|
||||
[
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Who won the world series in 2020?"},
|
||||
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
|
||||
{"role": "user", "content": "Where was it played?"}
|
||||
]
|
||||
"""
|
||||
|
||||
class AliQwenSession(Session):
|
||||
def __init__(self, session_id, system_prompt=None, model="qianwen"):
|
||||
super().__init__(session_id, system_prompt)
|
||||
self.model = model
|
||||
self.reset()
|
||||
|
||||
def discard_exceeding(self, max_tokens, cur_tokens=None):
|
||||
precise = True
|
||||
try:
|
||||
cur_tokens = self.calc_tokens()
|
||||
except Exception as e:
|
||||
precise = False
|
||||
if cur_tokens is None:
|
||||
raise e
|
||||
logger.debug("Exception when counting tokens precisely for query: {}".format(e))
|
||||
while cur_tokens > max_tokens:
|
||||
if len(self.messages) > 2:
|
||||
self.messages.pop(1)
|
||||
elif len(self.messages) == 2 and self.messages[1]["role"] == "assistant":
|
||||
self.messages.pop(1)
|
||||
if precise:
|
||||
cur_tokens = self.calc_tokens()
|
||||
else:
|
||||
cur_tokens = cur_tokens - max_tokens
|
||||
break
|
||||
elif len(self.messages) == 2 and self.messages[1]["role"] == "user":
|
||||
logger.warn("user message exceed max_tokens. total_tokens={}".format(cur_tokens))
|
||||
break
|
||||
else:
|
||||
logger.debug("max_tokens={}, total_tokens={}, len(messages)={}".format(max_tokens, cur_tokens, len(self.messages)))
|
||||
break
|
||||
if precise:
|
||||
cur_tokens = self.calc_tokens()
|
||||
else:
|
||||
cur_tokens = cur_tokens - max_tokens
|
||||
return cur_tokens
|
||||
|
||||
def calc_tokens(self):
|
||||
return num_tokens_from_messages(self.messages, self.model)
|
||||
|
||||
def num_tokens_from_messages(messages, model):
|
||||
"""Returns the number of tokens used by a list of messages."""
|
||||
# 官方token计算规则:"对于中文文本来说,1个token通常对应一个汉字;对于英文文本来说,1个token通常对应3至4个字母或1个单词"
|
||||
# 详情请产看文档:https://help.aliyun.com/document_detail/2586397.html
|
||||
# 目前根据字符串长度粗略估计token数,不影响正常使用
|
||||
tokens = 0
|
||||
for msg in messages:
|
||||
tokens += len(msg["content"])
|
||||
return tokens
|
||||
@@ -2,12 +2,27 @@
|
||||
Auto-replay chat robot abstract class
|
||||
"""
|
||||
|
||||
|
||||
from bridge.context import Context
|
||||
from bridge.reply import Reply
|
||||
|
||||
|
||||
class Bot(object):
|
||||
"""
|
||||
Base class for all chat-bot implementations.
|
||||
|
||||
Subclasses may also implement:
|
||||
|
||||
call_with_tools(messages, tools=None, stream=False, **kwargs)
|
||||
-> dict | generator (OpenAI-compatible format)
|
||||
|
||||
call_vision(image_url, question, model=None, max_tokens=1000)
|
||||
-> dict with keys: model, content, usage (or error/message)
|
||||
|
||||
These are NOT defined here to avoid shadowing concrete implementations
|
||||
provided by mixin classes (e.g. OpenAICompatibleBot) in the MRO.
|
||||
Use ``hasattr(bot, 'call_vision')`` to detect support at runtime.
|
||||
"""
|
||||
|
||||
def reply(self, query, context: Context = None) -> Reply:
|
||||
"""
|
||||
bot auto-reply content
|
||||
|
||||
@@ -46,10 +46,7 @@ def create_bot(bot_type):
|
||||
elif bot_type == const.CLAUDEAPI:
|
||||
from models.claudeapi.claude_api_bot import ClaudeAPIBot
|
||||
return ClaudeAPIBot()
|
||||
elif bot_type == const.QWEN:
|
||||
from models.ali.ali_qwen_bot import AliQwenBot
|
||||
return AliQwenBot()
|
||||
elif bot_type == const.QWEN_DASHSCOPE:
|
||||
elif bot_type in (const.QWEN, const.QWEN_DASHSCOPE):
|
||||
from models.dashscope.dashscope_bot import DashscopeBot
|
||||
return DashscopeBot()
|
||||
elif bot_type == const.GEMINI:
|
||||
|
||||
@@ -1,7 +1,10 @@
|
||||
# encoding:utf-8
|
||||
|
||||
import base64
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
|
||||
@@ -224,6 +227,79 @@ class ClaudeAPIBot(Bot, OpenAIImage):
|
||||
return 64000
|
||||
return 8192
|
||||
|
||||
@staticmethod
|
||||
def _parse_data_url(data_url: str):
|
||||
"""Parse a data:<mime>;base64,<data> URL into (media_type, base64_data)."""
|
||||
m = re.match(r"^data:([^;]+);base64,(.+)$", data_url, re.DOTALL)
|
||||
if m:
|
||||
return m.group(1), m.group(2)
|
||||
return None, None
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Claude Messages API (native image blocks)."""
|
||||
try:
|
||||
actual_model = model or self._model_mapping(conf().get("model"))
|
||||
|
||||
# Build Claude-native image content block
|
||||
if image_url.startswith("data:"):
|
||||
media_type, b64_data = self._parse_data_url(image_url)
|
||||
if not b64_data:
|
||||
return {"error": True, "message": "Invalid base64 data URL"}
|
||||
image_block = {
|
||||
"type": "image",
|
||||
"source": {"type": "base64",
|
||||
"media_type": media_type or "image/jpeg",
|
||||
"data": b64_data},
|
||||
}
|
||||
else:
|
||||
image_block = {
|
||||
"type": "image",
|
||||
"source": {"type": "url", "url": image_url},
|
||||
}
|
||||
|
||||
data = {
|
||||
"model": actual_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
image_block,
|
||||
{"type": "text", "text": question},
|
||||
],
|
||||
}],
|
||||
}
|
||||
|
||||
headers = {
|
||||
"x-api-key": self.api_key,
|
||||
"anthropic-version": "2023-06-01",
|
||||
"content-type": "application/json",
|
||||
}
|
||||
proxies = {"http": self.proxy, "https": self.proxy} if self.proxy else None
|
||||
resp = requests.post(f"{self.api_base}/messages",
|
||||
headers=headers, json=data, proxies=proxies)
|
||||
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
|
||||
body = resp.json()
|
||||
text_parts = [b.get("text", "") for b in body.get("content", [])
|
||||
if b.get("type") == "text"]
|
||||
usage = body.get("usage", {})
|
||||
return {
|
||||
"model": actual_model,
|
||||
"content": "".join(text_parts),
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("input_tokens", 0),
|
||||
"completion_tokens": usage.get("output_tokens", 0),
|
||||
"total_tokens": usage.get("input_tokens", 0) + usage.get("output_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[CLAUDE] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call Claude API with tool support for agent integration
|
||||
@@ -429,8 +505,21 @@ class ClaudeAPIBot(Bot, OpenAIImage):
|
||||
delta = event.get("delta", {})
|
||||
delta_type = delta.get("type")
|
||||
|
||||
if delta_type == "text_delta":
|
||||
# Text content
|
||||
if delta_type == "thinking_delta":
|
||||
thinking_text = delta.get("thinking", "")
|
||||
if thinking_text:
|
||||
yield {
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"reasoning_content": thinking_text
|
||||
},
|
||||
"finish_reason": None
|
||||
}]
|
||||
}
|
||||
|
||||
elif delta_type == "text_delta":
|
||||
content = delta.get("text", "")
|
||||
yield {
|
||||
"id": event.get("id", ""),
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
# encoding:utf-8
|
||||
|
||||
import json
|
||||
from typing import Optional
|
||||
|
||||
from models.bot import Bot
|
||||
from models.session_manager import SessionManager
|
||||
from bridge.context import ContextType
|
||||
@@ -26,15 +28,15 @@ dashscope_models = {
|
||||
|
||||
# Model name prefixes that require MultiModalConversation API instead of Generation API.
|
||||
# Qwen3.5+ series are omni models that only support MultiModalConversation.
|
||||
MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-",)
|
||||
MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-", "qwen3.6-")
|
||||
|
||||
|
||||
# Qwen对话模型API
|
||||
class DashscopeBot(Bot):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen-plus")
|
||||
self.model_name = conf().get("model") or "qwen-plus"
|
||||
self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen3.6-plus")
|
||||
self.model_name = conf().get("model") or "qwen3.6-plus"
|
||||
self.client = dashscope.Generation
|
||||
api_key = conf().get("dashscope_api_key")
|
||||
if api_key:
|
||||
@@ -153,6 +155,56 @@ class DashscopeBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using DashScope MultiModalConversation API."""
|
||||
try:
|
||||
dashscope.api_key = self.api_key
|
||||
vision_model = model or "qwen-vl-max"
|
||||
|
||||
# DashScope multimodal format: {"image": url} + {"text": question}
|
||||
messages = [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"image": image_url},
|
||||
{"text": question},
|
||||
],
|
||||
}]
|
||||
|
||||
response = MultiModalConversation.call(
|
||||
model=vision_model,
|
||||
messages=messages,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
if response.status_code != HTTPStatus.OK:
|
||||
return {
|
||||
"error": True,
|
||||
"message": f"{response.code} - {response.message}",
|
||||
}
|
||||
|
||||
resp_dict = self._response_to_dict(response)
|
||||
choice = resp_dict["output"]["choices"][0]
|
||||
content = choice.get("message", {}).get("content", "")
|
||||
if isinstance(content, list):
|
||||
content = "".join(
|
||||
item.get("text", "") for item in content if isinstance(item, dict)
|
||||
)
|
||||
usage = resp_dict.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("input_tokens", 0),
|
||||
"completion_tokens": usage.get("output_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[DASHSCOPE] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call DashScope API with tool support for agent integration
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
|
||||
import json
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
from models.bot import Bot
|
||||
@@ -147,6 +148,49 @@ class DoubaoBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Doubao (Volcengine Ark) OpenAI-compatible API."""
|
||||
try:
|
||||
vision_model = model or self.args.get("model", "doubao-seed-2-0-pro-260215")
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(f"{self.base_url}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60)
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
data = resp.json()
|
||||
if "error" in data:
|
||||
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[DOUBAO] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
# ==================== Agent mode support ====================
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
|
||||
@@ -434,31 +478,37 @@ class DoubaoBot(Bot):
|
||||
continue
|
||||
|
||||
if role == "user":
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
has_tool_result = any(
|
||||
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
|
||||
)
|
||||
if has_tool_result:
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
|
||||
# Tool results first (must come right after assistant with tool_calls)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
else:
|
||||
# Keep as-is for multimodal content (e.g. image_url blocks)
|
||||
converted.append(msg)
|
||||
|
||||
elif role == "assistant":
|
||||
openai_msg = {"role": "assistant"}
|
||||
|
||||
@@ -12,6 +12,8 @@ import mimetypes
|
||||
import os
|
||||
import re
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
from models.bot import Bot
|
||||
from models.session_manager import SessionManager
|
||||
@@ -144,7 +146,12 @@ class GoogleGeminiBot(Bot):
|
||||
return "", []
|
||||
pattern = r"\[图片:\s*([^\]]+)\]"
|
||||
image_paths = [m.strip().strip("'\"") for m in re.findall(pattern, content) if m.strip()]
|
||||
cleaned_text = re.sub(pattern, "", content)
|
||||
# Replace markers with path-only hints so the model still knows the
|
||||
# original file location (needed when it calls tools like vision).
|
||||
def _replace_with_hint(m):
|
||||
path = m.group(1).strip().strip("'\"")
|
||||
return f"[attached image: {path}]"
|
||||
cleaned_text = re.sub(pattern, _replace_with_hint, content)
|
||||
cleaned_text = re.sub(r"\n{3,}", "\n\n", cleaned_text).strip()
|
||||
return cleaned_text, image_paths
|
||||
|
||||
@@ -225,6 +232,57 @@ class GoogleGeminiBot(Bot):
|
||||
logger.warning(f"[Gemini] Unsupported image URL format: {image_url[:120]}")
|
||||
return None
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Gemini REST API."""
|
||||
try:
|
||||
model_name = model or self.model or "gemini-2.0-flash"
|
||||
image_part = self._build_inline_part_from_image_url({"url": image_url})
|
||||
if not image_part:
|
||||
return {"error": True, "message": f"Cannot process image URL: {image_url[:120]}"}
|
||||
|
||||
payload = {
|
||||
"contents": [{
|
||||
"role": "user",
|
||||
"parts": [image_part, {"text": question}],
|
||||
}],
|
||||
"generationConfig": {"maxOutputTokens": max_tokens},
|
||||
"safetySettings": [
|
||||
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
|
||||
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
|
||||
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
|
||||
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
|
||||
],
|
||||
}
|
||||
endpoint = f"{self.api_base}/v1beta/models/{model_name}:generateContent"
|
||||
headers = {"x-goog-api-key": self.api_key, "Content-Type": "application/json"}
|
||||
resp = requests.post(endpoint, headers=headers, json=payload, timeout=60)
|
||||
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
|
||||
body = resp.json()
|
||||
candidates = body.get("candidates", [])
|
||||
text_parts = []
|
||||
for part in candidates[0].get("content", {}).get("parts", []) if candidates else []:
|
||||
if "text" in part:
|
||||
text_parts.append(part["text"])
|
||||
|
||||
usage_meta = body.get("usageMetadata", {})
|
||||
return {
|
||||
"model": model_name,
|
||||
"content": "".join(text_parts),
|
||||
"usage": {
|
||||
"prompt_tokens": usage_meta.get("promptTokenCount", 0),
|
||||
"completion_tokens": usage_meta.get("candidatesTokenCount", 0),
|
||||
"total_tokens": usage_meta.get("totalTokenCount", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[Gemini] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call Gemini API with tool support using REST API (following official docs)
|
||||
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
import time
|
||||
import json
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
|
||||
from models.bot import Bot
|
||||
@@ -20,7 +22,7 @@ class MinimaxBot(Bot):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.args = {
|
||||
"model": conf().get("model") or "MiniMax-M2.1",
|
||||
"model": conf().get("model") or "MiniMax-M2.7",
|
||||
"temperature": conf().get("temperature", 0.3),
|
||||
"top_p": conf().get("top_p", 0.95),
|
||||
}
|
||||
@@ -175,6 +177,51 @@ class MinimaxBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using MiniMax OpenAI-compatible API.
|
||||
Always uses MiniMax-Text-01 — other MiniMax models do not support vision.
|
||||
"""
|
||||
try:
|
||||
vision_model = "MiniMax-Text-01"
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(f"{self.api_base}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60)
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
data = resp.json()
|
||||
if "error" in data:
|
||||
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[MINIMAX] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call MiniMax API with tool support for agent integration
|
||||
@@ -233,11 +280,8 @@ class MinimaxBot(Bot):
|
||||
|
||||
logger.debug(f"[MINIMAX] API call: model={model}, tools={len(converted_tools) if converted_tools else 0}, stream={stream}")
|
||||
|
||||
# Check if we should show thinking process
|
||||
show_thinking = kwargs.pop("show_thinking", conf().get("minimax_show_thinking", False))
|
||||
|
||||
if stream:
|
||||
return self._handle_stream_response(request_body, show_thinking=show_thinking)
|
||||
return self._handle_stream_response(request_body)
|
||||
else:
|
||||
return self._handle_sync_response(request_body)
|
||||
|
||||
@@ -273,37 +317,41 @@ class MinimaxBot(Bot):
|
||||
if role == "user":
|
||||
# Handle user message
|
||||
if isinstance(content, list):
|
||||
# Extract text from content blocks
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
has_tool_result = any(
|
||||
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
|
||||
)
|
||||
if has_tool_result:
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
|
||||
for block in content:
|
||||
if isinstance(block, dict):
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
# Tool result should be a separate message with role="tool"
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
if not tool_call_id:
|
||||
logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
for block in content:
|
||||
if isinstance(block, dict):
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
if not tool_call_id:
|
||||
logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
|
||||
if text_parts:
|
||||
converted.append({
|
||||
"role": "user",
|
||||
"content": "\n".join(text_parts)
|
||||
})
|
||||
if text_parts:
|
||||
converted.append({
|
||||
"role": "user",
|
||||
"content": "\n".join(text_parts)
|
||||
})
|
||||
|
||||
# Add all tool results (not just the last one)
|
||||
for tool_result in tool_results:
|
||||
converted.append(tool_result)
|
||||
for tool_result in tool_results:
|
||||
converted.append(tool_result)
|
||||
else:
|
||||
# Keep as-is for multimodal content (e.g. image_url blocks)
|
||||
converted.append(msg)
|
||||
else:
|
||||
# Simple text content
|
||||
converted.append({
|
||||
@@ -466,12 +514,11 @@ class MinimaxBot(Bot):
|
||||
logger.error(traceback.format_exc())
|
||||
yield {"error": True, "message": str(e), "status_code": 500}
|
||||
|
||||
def _handle_stream_response(self, request_body, show_thinking=False):
|
||||
def _handle_stream_response(self, request_body):
|
||||
"""Handle streaming API response
|
||||
|
||||
|
||||
Args:
|
||||
request_body: API request parameters
|
||||
show_thinking: Whether to show thinking/reasoning process to users
|
||||
"""
|
||||
try:
|
||||
headers = {
|
||||
@@ -550,19 +597,15 @@ class MinimaxBot(Bot):
|
||||
|
||||
current_reasoning[reasoning_index]["text"] += reasoning_text
|
||||
|
||||
# Optionally yield thinking as visible content
|
||||
if show_thinking:
|
||||
# Yield thinking text as-is (without emoji decoration)
|
||||
# The reasoning text will be displayed to users
|
||||
yield {
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"content": reasoning_text
|
||||
}
|
||||
}]
|
||||
}
|
||||
yield {
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"reasoning_content": reasoning_text
|
||||
}
|
||||
}]
|
||||
}
|
||||
|
||||
# Handle text content
|
||||
if "content" in delta and delta["content"]:
|
||||
|
||||
@@ -576,6 +576,15 @@ class ModelScopeBot(Bot):
|
||||
continue
|
||||
|
||||
if delta.get("reasoning_content"):
|
||||
yield {
|
||||
"choices": [{
|
||||
"index": 0,
|
||||
"delta": {
|
||||
"role": "assistant",
|
||||
"reasoning_content": delta["reasoning_content"]
|
||||
}
|
||||
}]
|
||||
}
|
||||
continue
|
||||
|
||||
tool_call_chunks = delta.get("tool_calls")
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
|
||||
import json
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
from models.bot import Bot
|
||||
@@ -147,6 +148,49 @@ class MoonshotBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Moonshot (Kimi) OpenAI-compatible API."""
|
||||
try:
|
||||
vision_model = model or self.args.get("model", "kimi-k2.5")
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(f"{self.base_url}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60)
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
data = resp.json()
|
||||
if "error" in data:
|
||||
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[MOONSHOT] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
# ==================== Agent mode support ====================
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
|
||||
@@ -435,31 +479,37 @@ class MoonshotBot(Bot):
|
||||
continue
|
||||
|
||||
if role == "user":
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
has_tool_result = any(
|
||||
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
|
||||
)
|
||||
if has_tool_result:
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
|
||||
# Tool results first (must come right after assistant with tool_calls)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
else:
|
||||
# Keep as-is for multimodal content (e.g. image_url blocks)
|
||||
converted.append(msg)
|
||||
|
||||
elif role == "assistant":
|
||||
openai_msg = {"role": "assistant"}
|
||||
|
||||
@@ -9,6 +9,8 @@ This includes: OpenAI, LinkAI, Azure OpenAI, and many third-party providers.
|
||||
|
||||
import json
|
||||
import openai
|
||||
import requests
|
||||
from typing import Optional
|
||||
from common.log import logger
|
||||
from agent.protocol.message_utils import drop_orphaned_tool_results_openai
|
||||
|
||||
@@ -306,3 +308,51 @@ class OpenAICompatibleBot:
|
||||
openai_messages.append(msg)
|
||||
|
||||
return drop_orphaned_tool_results_openai(openai_messages)
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using the OpenAI-compatible /chat/completions endpoint."""
|
||||
try:
|
||||
api_config = self.get_api_config()
|
||||
vision_model = model or api_config.get("model", "gpt-4o")
|
||||
api_key = api_config.get("api_key", "")
|
||||
api_base = (api_config.get("api_base") or "https://api.openai.com/v1").rstrip("/")
|
||||
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(
|
||||
f"{api_base}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60,
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
body = resp.text[:500]
|
||||
logger.error(f"[{self.__class__.__name__}] call_vision HTTP {resp.status_code}: {body}")
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {body}"}
|
||||
data = resp.json()
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[{self.__class__.__name__}] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user