mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 18:17:11 +08:00
Compare commits
22 Commits
2.0.5
...
feat-multi
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
26693acc3f | ||
|
|
3cd92ccda3 | ||
|
|
d86cb4ded6 | ||
|
|
4d5375f6d6 | ||
|
|
424557fedb | ||
|
|
89251e603f | ||
|
|
a653ed07eb | ||
|
|
ad86deb014 | ||
|
|
9525dc7584 | ||
|
|
cd31dd27fd | ||
|
|
360e3670eb | ||
|
|
8dabe3b4c8 | ||
|
|
443e0c2806 | ||
|
|
9cc173cc4d | ||
|
|
b5f33e5ecd | ||
|
|
40dfc6860f | ||
|
|
1c02a04423 | ||
|
|
de0e45070c | ||
|
|
c169cc7d74 | ||
|
|
cd62ad76f6 | ||
|
|
dd25b0fb5b | ||
|
|
a38b22a6a2 |
20
README.md
20
README.md
@@ -101,7 +101,7 @@ bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
|
||||
irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
```
|
||||
|
||||
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start`、`cow stop` 等 [CLI 命令](https://docs.cowagent.ai/commands/index) 管理服务。
|
||||
脚本使用说明:[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start`、`cow stop` 等 [CLI 命令](https://docs.cowagent.ai/cli/index) 管理服务。
|
||||
|
||||
|
||||
## 一、准备
|
||||
@@ -116,7 +116,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
|
||||
### 2.环境安装
|
||||
|
||||
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`,Python 版本需在3.7 ~ 3.12 之间,推荐使用3.9版本。
|
||||
支持 Linux、MacOS、Windows 操作系统,可在个人计算机及服务器上运行,需安装 `Python`,Python 版本需在3.7 ~ 3.12 之间。
|
||||
|
||||
> 注意:Agent 模式推荐使用源码运行,若选择 Docker 部署则无需安装 python 环境和下载源码,可直接快进到下一节。
|
||||
|
||||
@@ -151,7 +151,7 @@ pip3 install -r requirements-optional.txt
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/commands/index)。
|
||||
安装后可使用 `cow` 命令管理服务(启动、停止、更新等)和技能,详见 [命令文档](https://docs.cowagent.ai/cli/index)。
|
||||
|
||||
**(5) 安装浏览器工具 (可选):**
|
||||
|
||||
@@ -218,7 +218,7 @@ cow install-browser
|
||||
<details>
|
||||
<summary>2. 其他配置</summary>
|
||||
|
||||
+ `model`: 模型名称,Agent 模式下推荐使用 `MiniMax-M2.7`、`glm-5-turbo`、`kimi-k2.5`、`qwen3.5-plus`、`claude-sonnet-4-6`、`gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
|
||||
+ `model`: 模型名称,Agent 模式下推荐使用 `MiniMax-M2.7`、`glm-5-turbo`、`kimi-k2.5`、`qwen3.6-plus`、`claude-sonnet-4-6`、`gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
|
||||
+ `character_desc`:普通对话模式下的机器人系统提示词。在 Agent 模式下该配置不生效,由工作空间中的文件内容构成。
|
||||
+ `subscribe_msg`:订阅消息,公众号和企业微信 channel 中请填写,当被订阅时会自动回复, 可使用特殊占位符。目前支持的占位符有{trigger_prefix},在程序中它会自动替换成 bot 的触发词。
|
||||
</details>
|
||||
@@ -303,7 +303,7 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
|
||||
## 模型说明
|
||||
|
||||
以下对所有可支持的模型的配置和使用方法进行说明,模型接口实现在项目的 `models/` 目录下。
|
||||
推荐通过 Web 控制台在线管理模型配置,无需手动编辑文件,详见 [模型文档](https://docs.cowagent.ai/models)。以下是手动修改 `config.json` 配置模型的说明:
|
||||
|
||||
<details>
|
||||
<summary>OpenAI</summary>
|
||||
@@ -411,18 +411,18 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "sk-qVxxxxG"
|
||||
}
|
||||
```
|
||||
- `model`: 可填写 `qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus` 等
|
||||
- `dashscope_api_key`: 通义千问的 API-KEY,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
|
||||
- `model`: 可填写 `qwen3.6-plus、qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus` 等
|
||||
- `dashscope_api_key`: 通义千问的 API-KEY,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ,在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
|
||||
|
||||
方式二:OpenAI 兼容方式接入,配置如下:
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "sk-qVxxxxG"
|
||||
}
|
||||
@@ -674,7 +674,7 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
|
||||
|
||||
## 通道说明
|
||||
|
||||
以下对可接入通道的配置方式进行说明,应用通道代码在项目的 `channel/` 目录下。
|
||||
推荐通过 Web 控制台在线管理通道配置,无需手动编辑文件,详见 [通道文档](https://docs.cowagent.ai/channels/weixin)。以下为手动修改 `config.json` 配置通道的说明:
|
||||
|
||||
支持同时可接入多个通道,配置时可通过逗号进行分割,例如 `"channel_type": "feishu,dingtalk"`。
|
||||
|
||||
|
||||
@@ -207,9 +207,9 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
|
||||
"",
|
||||
"工具调用风格:",
|
||||
"",
|
||||
"- 在多步骤任务、敏感操作或用户要求时简要解释决策过程",
|
||||
"- 持续推进直到任务完成,完成后向用户报告结果。",
|
||||
"- 回复中涉及密钥、令牌等敏感信息必须脱敏。",
|
||||
"- 多步骤任务、复杂决策、敏感操作时,应简要说明当前在做什么、为什么这样做,让用户了解关键进展",
|
||||
"- 持续推进直到任务完成,完成后向用户报告结果",
|
||||
"- 回复中涉及密钥、令牌等敏感信息必须脱敏",
|
||||
"- URL链接直接放在回复文本中即可,系统会自动处理和渲染。无需下载后使用send工具发送",
|
||||
"",
|
||||
]
|
||||
@@ -383,7 +383,8 @@ def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
|
||||
"",
|
||||
"**💬 交流规范**:",
|
||||
"",
|
||||
"- 对话中不要暴露内部技术细节(文件名、工具名等),用自然语言表达。例如说「我已记住」而非「已更新 MEMORY.md」",
|
||||
"- 记忆相关操作无需暴露文件名,用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md」",
|
||||
"- 任务执行过程中的关键决策和步骤应该告知用户,让用户了解你在做什么、为什么这么做",
|
||||
"- 做真正有帮助的助手,而不是表演式的客套,尽可能帮忙解决问题",
|
||||
"- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
|
||||
"- 适当使用 emoji 让表达更生动自然 🎯,但不要过度堆砌",
|
||||
@@ -477,7 +478,14 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
|
||||
|
||||
# Add other runtime info
|
||||
runtime_parts = []
|
||||
if runtime_info.get("model"):
|
||||
# Support dynamic model via callable, fallback to static value
|
||||
if callable(runtime_info.get("_get_model")):
|
||||
try:
|
||||
runtime_parts.append(f"模型={runtime_info['_get_model']()}")
|
||||
except Exception:
|
||||
if runtime_info.get("model"):
|
||||
runtime_parts.append(f"模型={runtime_info['model']}")
|
||||
elif runtime_info.get("model"):
|
||||
runtime_parts.append(f"模型={runtime_info['model']}")
|
||||
if runtime_info.get("workspace"):
|
||||
runtime_parts.append(f"工作空间={runtime_info['workspace']}")
|
||||
|
||||
@@ -231,9 +231,9 @@ _你不是一个聊天机器人,你正在成为某个人。_
|
||||
|
||||
## 🎯 核心原则
|
||||
|
||||
**做真正有帮助的助手,而不是表演式的客套。** 跳过「好的!」「当然可以!」之类的套话——直接帮忙。行动胜过废话。
|
||||
**做真正有帮助的助手。** 目标是真正帮用户解决问题,在执行复杂任务时,关键的决策和过程进展要让用户知道。
|
||||
|
||||
**有自己的观点。** 你可以不同意、有偏好、觉得有趣或无聊。一个没有个性的助手只是多了几步操作的搜索引擎。
|
||||
**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。
|
||||
|
||||
**先自己动手查。** 先试着搞定:读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来,而不是带着问题。
|
||||
|
||||
|
||||
@@ -53,6 +53,12 @@ class SkillLoader:
|
||||
"""
|
||||
Recursively load skills from a directory.
|
||||
|
||||
If a subdirectory contains its own SKILL.md, it is treated as a
|
||||
self-contained skill (or skill-collection) and its children are
|
||||
NOT scanned further. This prevents sub-skills inside a collection
|
||||
(e.g. style-collection/style-anjing) from being listed as
|
||||
independent top-level skills.
|
||||
|
||||
:param dir_path: Directory to scan
|
||||
:param source: Source identifier
|
||||
:param include_root_files: Whether to include root-level .md files
|
||||
@@ -66,38 +72,41 @@ class SkillLoader:
|
||||
except Exception as e:
|
||||
diagnostics.append(f"Failed to list directory {dir_path}: {e}")
|
||||
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
|
||||
|
||||
# If this directory has its own SKILL.md, load it and stop recursing.
|
||||
# The sub-directories are internal resources of this skill.
|
||||
if not include_root_files and 'SKILL.md' in entries:
|
||||
skill_md_path = os.path.join(dir_path, 'SKILL.md')
|
||||
if os.path.isfile(skill_md_path):
|
||||
skill_result = self._load_skill_from_file(skill_md_path, source)
|
||||
if skill_result.skills:
|
||||
skills.extend(skill_result.skills)
|
||||
diagnostics.extend(skill_result.diagnostics)
|
||||
return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
|
||||
|
||||
for entry in entries:
|
||||
# Skip hidden files and directories
|
||||
if entry.startswith('.'):
|
||||
continue
|
||||
|
||||
# Skip common non-skill directories
|
||||
if entry in ('node_modules', '__pycache__', 'venv', '.git'):
|
||||
continue
|
||||
|
||||
full_path = os.path.join(dir_path, entry)
|
||||
|
||||
# Handle directories
|
||||
if os.path.isdir(full_path):
|
||||
# Recursively scan subdirectories
|
||||
sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
|
||||
skills.extend(sub_result.skills)
|
||||
diagnostics.extend(sub_result.diagnostics)
|
||||
continue
|
||||
|
||||
# Handle files
|
||||
if not os.path.isfile(full_path):
|
||||
continue
|
||||
|
||||
# Check if this is a skill file
|
||||
is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
|
||||
is_skill_md = not include_root_files and entry == 'SKILL.md'
|
||||
|
||||
if not (is_root_md or is_skill_md):
|
||||
if not is_root_md:
|
||||
continue
|
||||
|
||||
# Load the skill
|
||||
skill_result = self._load_skill_from_file(full_path, source)
|
||||
if skill_result.skills:
|
||||
skills.extend(skill_result.skills)
|
||||
|
||||
@@ -18,9 +18,13 @@ from common.utils import expand_path
|
||||
class Bash(BaseTool):
|
||||
"""Tool for executing bash commands"""
|
||||
|
||||
_IS_WIN = sys.platform == "win32"
|
||||
|
||||
name: str = "bash"
|
||||
description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
|
||||
|
||||
{'''
|
||||
PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
|
||||
''' if _IS_WIN else ''}
|
||||
ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.
|
||||
|
||||
SAFETY:
|
||||
@@ -103,13 +107,12 @@ SAFETY:
|
||||
logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
|
||||
|
||||
# On Windows, convert $VAR references to %VAR% for cmd.exe
|
||||
if sys.platform == "win32":
|
||||
if self._IS_WIN:
|
||||
env["PYTHONIOENCODING"] = "utf-8"
|
||||
command = self._convert_env_vars_for_windows(command, dotenv_vars)
|
||||
if command and not command.strip().lower().startswith("chcp"):
|
||||
command = f"chcp 65001 >nul 2>&1 && {command}"
|
||||
|
||||
# Execute command with inherited environment variables
|
||||
result = subprocess.run(
|
||||
command,
|
||||
shell=True,
|
||||
@@ -120,7 +123,7 @@ SAFETY:
|
||||
encoding="utf-8",
|
||||
errors="replace",
|
||||
timeout=timeout,
|
||||
env=env
|
||||
env=env,
|
||||
)
|
||||
|
||||
logger.debug(f"[Bash] Exit code: {result.returncode}")
|
||||
|
||||
@@ -45,6 +45,11 @@ _SNAPSHOT_JS = """
|
||||
const KEEP = new Set(%s);
|
||||
const INTERACTIVE = new Set(%s);
|
||||
const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
|
||||
const CLICKABLE_ROLES = new Set([
|
||||
"button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
|
||||
"option","switch","checkbox","radio","combobox","searchbox","slider",
|
||||
"spinbutton","textbox","treeitem"
|
||||
]);
|
||||
let refCounter = 0;
|
||||
const refMap = {};
|
||||
|
||||
@@ -56,6 +61,58 @@ _SNAPSHOT_JS = """
|
||||
return true;
|
||||
}
|
||||
|
||||
// Strong signals: these attributes alone are enough to mark as interactive
|
||||
function hasStrongInteractiveSignal(el) {
|
||||
const role = el.getAttribute("role");
|
||||
if (role && CLICKABLE_ROLES.has(role)) return true;
|
||||
if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
|
||||
if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
|
||||
if (el.getAttribute("contenteditable") === "true") return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
// Check if cursor:pointer is set directly (not just inherited from parent)
|
||||
function hasOwnPointerCursor(el) {
|
||||
try {
|
||||
const st = window.getComputedStyle(el);
|
||||
if (st.cursor !== "pointer") return false;
|
||||
const parent = el.parentElement;
|
||||
if (parent) {
|
||||
const pst = window.getComputedStyle(parent);
|
||||
if (pst.cursor === "pointer") return false;
|
||||
}
|
||||
return true;
|
||||
} catch(e) {}
|
||||
return false;
|
||||
}
|
||||
|
||||
function hasTextOrContent(el) {
|
||||
const t = el.textContent || "";
|
||||
if (t.trim().length > 0) return true;
|
||||
if (el.querySelector("img,video,audio,canvas")) return true;
|
||||
const ariaLabel = el.getAttribute("aria-label");
|
||||
if (ariaLabel && ariaLabel.trim()) return true;
|
||||
const title = el.getAttribute("title");
|
||||
if (title && title.trim()) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function isImplicitInteractive(el) {
|
||||
if (hasStrongInteractiveSignal(el)) return true;
|
||||
if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
|
||||
return false;
|
||||
}
|
||||
|
||||
function getTextContent(el) {
|
||||
let text = "";
|
||||
for (const ch of el.childNodes) {
|
||||
if (ch.nodeType === Node.TEXT_NODE) {
|
||||
text += ch.textContent;
|
||||
}
|
||||
}
|
||||
return text.trim();
|
||||
}
|
||||
|
||||
function walk(node) {
|
||||
if (node.nodeType === Node.TEXT_NODE) {
|
||||
const t = node.textContent.trim();
|
||||
@@ -75,21 +132,35 @@ _SNAPSHOT_JS = """
|
||||
}
|
||||
}
|
||||
|
||||
const keep = KEEP.has(tag);
|
||||
const nativeInteractive = INTERACTIVE.has(tag);
|
||||
const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
|
||||
const keep = KEEP.has(tag) || implicitInteractive;
|
||||
|
||||
if (!keep) {
|
||||
// Unwrap: promote children
|
||||
if (children.length === 0) return null;
|
||||
if (children.length === 1) return children[0];
|
||||
return children;
|
||||
}
|
||||
|
||||
const obj = { tag };
|
||||
if (INTERACTIVE.has(tag)) {
|
||||
if (nativeInteractive || implicitInteractive) {
|
||||
refCounter++;
|
||||
obj.ref = refCounter;
|
||||
refMap[refCounter] = node;
|
||||
}
|
||||
|
||||
if (implicitInteractive) {
|
||||
const role = node.getAttribute("role");
|
||||
if (role) obj.role = role;
|
||||
const directText = getTextContent(node);
|
||||
if (!directText && children.length === 0) {
|
||||
const ariaLabel = node.getAttribute("aria-label");
|
||||
const title = node.getAttribute("title");
|
||||
if (ariaLabel) obj.ariaLabel = ariaLabel;
|
||||
else if (title) obj.ariaLabel = title;
|
||||
}
|
||||
}
|
||||
|
||||
// Attributes
|
||||
if (tag === "a" && node.href) obj.href = node.getAttribute("href");
|
||||
if (tag === "img") {
|
||||
@@ -113,11 +184,13 @@ _SNAPSHOT_JS = """
|
||||
}
|
||||
if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;
|
||||
|
||||
// Role / aria-label
|
||||
const role = node.getAttribute("role");
|
||||
if (role) obj.role = role;
|
||||
const ariaLabel = node.getAttribute("aria-label");
|
||||
if (ariaLabel) obj.ariaLabel = ariaLabel;
|
||||
// Role / aria-label for native interactive & semantic elements
|
||||
if (!implicitInteractive) {
|
||||
const role = node.getAttribute("role");
|
||||
if (role) obj.role = role;
|
||||
const ariaLabel = node.getAttribute("aria-label");
|
||||
if (ariaLabel) obj.ariaLabel = ariaLabel;
|
||||
}
|
||||
|
||||
// Children
|
||||
if (children.length === 1 && typeof children[0] === "string") {
|
||||
@@ -129,7 +202,6 @@ _SNAPSHOT_JS = """
|
||||
return obj;
|
||||
}
|
||||
|
||||
// Store refMap on window for later use by click/fill actions
|
||||
const result = walk(document.body);
|
||||
window.__cowRefMap = refMap;
|
||||
return { tree: result, refCount: refCounter };
|
||||
|
||||
@@ -1,22 +1,30 @@
|
||||
"""
|
||||
Vision tool - Analyze images using OpenAI-compatible Vision API.
|
||||
Vision tool - Analyze images using Vision API.
|
||||
Supports local files (auto base64-encoded) and HTTP URLs.
|
||||
Providers: OpenAI (preferred) > LinkAI (fallback).
|
||||
|
||||
Provider priority (default):
|
||||
1. Main model via bot.call_vision — zero extra cost
|
||||
2. Other models whose API key is configured — auto-discovered
|
||||
3. OpenAI / LinkAI raw HTTP — reliable fallback
|
||||
When use_linkai=true, LinkAI is promoted to #1.
|
||||
When tool.vision.model is set, that model is used exclusively first.
|
||||
"""
|
||||
|
||||
import base64
|
||||
import os
|
||||
import subprocess
|
||||
import tempfile
|
||||
from typing import Any, Dict, Optional, Tuple
|
||||
from dataclasses import dataclass, field
|
||||
from typing import Any, Dict, List, Optional
|
||||
|
||||
import requests
|
||||
|
||||
from agent.tools.base_tool import BaseTool, ToolResult
|
||||
from common import const
|
||||
from common.log import logger
|
||||
from config import conf
|
||||
|
||||
DEFAULT_MODEL = "gpt-4.1-mini"
|
||||
DEFAULT_MODEL = const.GPT_41_MINI
|
||||
DEFAULT_TIMEOUT = 60
|
||||
MAX_TOKENS = 1000
|
||||
COMPRESS_THRESHOLD = 1_048_576 # 1 MB
|
||||
@@ -29,15 +37,46 @@ SUPPORTED_EXTENSIONS = {
|
||||
"webp": "image/webp",
|
||||
}
|
||||
|
||||
_MAIN_MODEL_PROVIDER_NAME = "MainModel"
|
||||
|
||||
# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
|
||||
# Auto-discovered as fallback vision providers when their API key is configured.
|
||||
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
|
||||
_DISCOVERABLE_MODELS = [
|
||||
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_5, "Moonshot"),
|
||||
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
|
||||
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
|
||||
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
|
||||
("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
|
||||
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
|
||||
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class VisionProvider:
|
||||
"""A single Vision API provider configuration."""
|
||||
name: str
|
||||
api_key: str
|
||||
api_base: str
|
||||
extra_headers: dict = field(default_factory=dict)
|
||||
model_override: Optional[str] = None
|
||||
use_bot: bool = False # When True, call via bot.call_vision instead of raw HTTP
|
||||
fallback_bot: Any = None # Bot instance for non-main-model providers
|
||||
|
||||
|
||||
class VisionAPIError(Exception):
|
||||
"""Raised when a Vision API call fails and should trigger fallback."""
|
||||
pass
|
||||
|
||||
|
||||
class Vision(BaseTool):
|
||||
"""Analyze images using OpenAI-compatible Vision API"""
|
||||
"""Analyze images using Vision API"""
|
||||
|
||||
name: str = "vision"
|
||||
description: str = (
|
||||
"Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
|
||||
"Can describe content, extract text, identify objects, colors, etc. "
|
||||
"Requires OPENAI_API_KEY or LINKAI_API_KEY."
|
||||
)
|
||||
|
||||
params: dict = {
|
||||
@@ -51,13 +90,6 @@ class Vision(BaseTool):
|
||||
"type": "string",
|
||||
"description": "Question to ask about the image",
|
||||
},
|
||||
"model": {
|
||||
"type": "string",
|
||||
"description": (
|
||||
f"Vision model to use (default: {DEFAULT_MODEL}). "
|
||||
"Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o"
|
||||
),
|
||||
},
|
||||
},
|
||||
"required": ["image", "question"],
|
||||
}
|
||||
@@ -67,29 +99,26 @@ class Vision(BaseTool):
|
||||
|
||||
@staticmethod
|
||||
def is_available() -> bool:
|
||||
return bool(
|
||||
conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
|
||||
or conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
|
||||
)
|
||||
return True
|
||||
|
||||
def execute(self, args: Dict[str, Any]) -> ToolResult:
|
||||
image = args.get("image", "").strip()
|
||||
question = args.get("question", "").strip()
|
||||
model = args.get("model", DEFAULT_MODEL).strip() or DEFAULT_MODEL
|
||||
|
||||
if not image:
|
||||
return ToolResult.fail("Error: 'image' parameter is required")
|
||||
if not question:
|
||||
return ToolResult.fail("Error: 'question' parameter is required")
|
||||
|
||||
api_key, api_base, extra_headers = self._resolve_provider()
|
||||
if not api_key:
|
||||
providers = self._resolve_providers()
|
||||
if not providers:
|
||||
return ToolResult.fail(
|
||||
"Error: No API key configured for Vision.\n"
|
||||
"Please configure one of the following using env_config tool:\n"
|
||||
" 1. OPENAI_API_KEY (preferred): env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
|
||||
" 2. LINKAI_API_KEY (fallback): env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")\n\n"
|
||||
"Get your key at: https://platform.openai.com/api-keys or https://link-ai.tech"
|
||||
"Error: No model available for Vision.\n"
|
||||
"The main model does not support vision and no other API keys are configured.\n"
|
||||
"Options:\n"
|
||||
" 1. Switch to a multimodal model (e.g. qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
|
||||
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
|
||||
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
|
||||
)
|
||||
|
||||
try:
|
||||
@@ -97,36 +126,221 @@ class Vision(BaseTool):
|
||||
except Exception as e:
|
||||
return ToolResult.fail(f"Error: {e}")
|
||||
|
||||
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
|
||||
|
||||
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
|
||||
question: str, image_content: dict) -> ToolResult:
|
||||
"""Try each provider in order; fall back to the next one on failure."""
|
||||
errors: List[str] = []
|
||||
for i, provider in enumerate(providers):
|
||||
use_model = provider.model_override or model
|
||||
try:
|
||||
logger.info(f"[Vision] Trying provider '{provider.name}' "
|
||||
f"with model '{use_model}' ({i + 1}/{len(providers)})")
|
||||
if provider.use_bot:
|
||||
result = self._call_via_bot(use_model, question, image_content, provider)
|
||||
else:
|
||||
result = self._call_api(provider, use_model, question, image_content)
|
||||
logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
|
||||
return result
|
||||
except VisionAPIError as e:
|
||||
errors.append(f"[{provider.name}/{use_model}] {e}")
|
||||
logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
|
||||
except requests.Timeout:
|
||||
errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
|
||||
logger.warning(f"[Vision] Provider '{provider.name}' timed out")
|
||||
except requests.ConnectionError:
|
||||
errors.append(f"[{provider.name}/{use_model}] Connection failed")
|
||||
logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
|
||||
except Exception as e:
|
||||
errors.append(f"[{provider.name}/{use_model}] {e}")
|
||||
logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
|
||||
|
||||
return ToolResult.fail(
|
||||
"Error: All Vision API providers failed.\n" + "\n".join(f" - {err}" for err in errors)
|
||||
)
|
||||
|
||||
def _resolve_providers(self) -> List[VisionProvider]:
|
||||
"""
|
||||
Build an ordered list of available providers.
|
||||
|
||||
Priority:
|
||||
- use_linkai=true → [LinkAI, MainModel, OtherModels…, OpenAI]
|
||||
- default → [MainModel, OtherModels…, OpenAI, LinkAI]
|
||||
|
||||
"OtherModels" are auto-discovered from configured API keys.
|
||||
The main model's bot_type is excluded from OtherModels to avoid
|
||||
duplicating the MainModel provider.
|
||||
"""
|
||||
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
|
||||
providers: List[VisionProvider] = []
|
||||
|
||||
if use_linkai:
|
||||
self._append_provider(providers, self._build_linkai_provider)
|
||||
self._append_provider(providers, self._build_main_model_provider)
|
||||
self._append_other_model_providers(providers)
|
||||
self._append_provider(providers, self._build_openai_provider)
|
||||
else:
|
||||
self._append_provider(providers, self._build_main_model_provider)
|
||||
self._append_other_model_providers(providers)
|
||||
self._append_provider(providers, self._build_openai_provider)
|
||||
self._append_provider(providers, self._build_linkai_provider)
|
||||
|
||||
return providers
|
||||
|
||||
@staticmethod
|
||||
def _append_provider(providers: List[VisionProvider], builder) -> None:
|
||||
p = builder()
|
||||
if p:
|
||||
providers.append(p)
|
||||
|
||||
def _append_other_model_providers(self, providers: List[VisionProvider]) -> None:
|
||||
"""
|
||||
Auto-discover other models whose API key is configured.
|
||||
Skip the main model's own bot_type (already covered by MainModel provider).
|
||||
Skip bot_types that already have a provider in the list (e.g. OpenAI).
|
||||
"""
|
||||
# Determine main model's bot_type so we can skip it
|
||||
main_bot_type = None
|
||||
if self.model and hasattr(self.model, '_resolve_bot_type'):
|
||||
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
|
||||
|
||||
existing_names = {p.name for p in providers}
|
||||
|
||||
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
|
||||
if display_name in existing_names:
|
||||
continue
|
||||
if bot_type == main_bot_type:
|
||||
continue
|
||||
api_key = conf().get(config_key, "")
|
||||
if not api_key or not api_key.strip():
|
||||
continue
|
||||
|
||||
# Create a bot instance and check if it supports call_vision
|
||||
try:
|
||||
from models.bot_factory import create_bot
|
||||
bot = create_bot(bot_type)
|
||||
if not hasattr(bot, 'call_vision'):
|
||||
continue
|
||||
except Exception:
|
||||
continue
|
||||
|
||||
providers.append(VisionProvider(
|
||||
name=display_name,
|
||||
api_key="",
|
||||
api_base="",
|
||||
model_override=default_model,
|
||||
use_bot=True,
|
||||
fallback_bot=bot,
|
||||
))
|
||||
|
||||
def _resolve_vision_model(self) -> Optional[str]:
|
||||
"""
|
||||
Determine which model to use for vision.
|
||||
|
||||
1. User explicit config: tool.vision.model in config.json
|
||||
2. Fallback to the main configured model name
|
||||
"""
|
||||
tool_conf = conf().get("tool", {})
|
||||
user_vision_model = tool_conf.get("vision", {}).get("model") if isinstance(tool_conf, dict) else None
|
||||
if user_vision_model:
|
||||
return user_vision_model
|
||||
model_name = conf().get("model", "")
|
||||
return model_name or None
|
||||
|
||||
def _build_main_model_provider(self) -> Optional[VisionProvider]:
|
||||
"""
|
||||
Use the vendor's own model for vision via bot.call_vision.
|
||||
Only available when the bot class has call_vision.
|
||||
"""
|
||||
if not (self.model and hasattr(self.model, 'bot')):
|
||||
return None
|
||||
try:
|
||||
return self._call_api(api_key, api_base, model, question, image_content, extra_headers)
|
||||
except requests.Timeout:
|
||||
return ToolResult.fail(f"Error: Vision API request timed out after {DEFAULT_TIMEOUT}s")
|
||||
except requests.ConnectionError:
|
||||
return ToolResult.fail("Error: Failed to connect to Vision API")
|
||||
except Exception as e:
|
||||
logger.error(f"[Vision] Unexpected error: {e}", exc_info=True)
|
||||
return ToolResult.fail(f"Error: Vision API call failed - {e}")
|
||||
bot = self.model.bot
|
||||
if not hasattr(bot, 'call_vision'):
|
||||
return None
|
||||
except Exception:
|
||||
return None
|
||||
|
||||
def _resolve_provider(self) -> Tuple[Optional[str], str, dict]:
|
||||
"""Resolve API key, base URL and extra headers. Priority: conf() > env vars."""
|
||||
vision_model = self._resolve_vision_model()
|
||||
|
||||
return VisionProvider(
|
||||
name=_MAIN_MODEL_PROVIDER_NAME,
|
||||
api_key="",
|
||||
api_base="",
|
||||
model_override=vision_model,
|
||||
use_bot=True,
|
||||
)
|
||||
|
||||
def _build_openai_provider(self) -> Optional[VisionProvider]:
|
||||
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
|
||||
if api_key:
|
||||
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.openai.com/v1"
|
||||
return api_key, self._ensure_v1(api_base), {}
|
||||
if not api_key:
|
||||
return None
|
||||
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.openai.com/v1"
|
||||
return VisionProvider(name="OpenAI", api_key=api_key, api_base=self._ensure_v1(api_base))
|
||||
|
||||
def _build_linkai_provider(self) -> Optional[VisionProvider]:
|
||||
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
|
||||
if api_key:
|
||||
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.link-ai.tech"
|
||||
logger.debug("[Vision] Using LinkAI API (OPENAI_API_KEY not set)")
|
||||
from common.utils import get_cloud_headers
|
||||
extra = get_cloud_headers(api_key)
|
||||
extra.pop("Authorization", None)
|
||||
extra.pop("Content-Type", None)
|
||||
return api_key, self._ensure_v1(api_base), extra
|
||||
if not api_key:
|
||||
return None
|
||||
api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
|
||||
or "https://api.link-ai.tech"
|
||||
from common.utils import get_cloud_headers
|
||||
extra = get_cloud_headers(api_key)
|
||||
extra.pop("Authorization", None)
|
||||
extra.pop("Content-Type", None)
|
||||
return VisionProvider(name="LinkAI", api_key=api_key, api_base=self._ensure_v1(api_base),
|
||||
extra_headers=extra)
|
||||
|
||||
return None, "", {}
|
||||
def _call_via_bot(self, model: str, question: str, image_content: dict,
|
||||
provider: Optional[VisionProvider] = None) -> ToolResult:
|
||||
"""
|
||||
Call a model's call_vision with vendor-native API format.
|
||||
Uses the provider's _fallback_bot if set, otherwise the main model bot.
|
||||
Raises VisionAPIError on failure so fallback can proceed.
|
||||
"""
|
||||
try:
|
||||
bot = (provider and provider.fallback_bot) or self.model.bot
|
||||
except Exception as e:
|
||||
raise VisionAPIError(f"Cannot access bot: {e}")
|
||||
|
||||
# Extract the raw image URL from the OpenAI-format image_content block
|
||||
image_url = image_content.get("image_url", {}).get("url", "")
|
||||
if not image_url:
|
||||
raise VisionAPIError("No image URL in content block")
|
||||
|
||||
try:
|
||||
response = bot.call_vision(
|
||||
image_url=image_url,
|
||||
question=question,
|
||||
model=model,
|
||||
max_tokens=MAX_TOKENS,
|
||||
)
|
||||
except Exception as e:
|
||||
raise VisionAPIError(f"call_vision failed: {e}")
|
||||
|
||||
if response is NotImplemented:
|
||||
raise VisionAPIError("Bot does not support vision")
|
||||
|
||||
if isinstance(response, dict) and response.get("error"):
|
||||
raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
|
||||
|
||||
content = response.get("content", "") if isinstance(response, dict) else ""
|
||||
if not content:
|
||||
raise VisionAPIError("Empty response from main model")
|
||||
|
||||
usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
|
||||
|
||||
# Use the actual model name from the bot response if available
|
||||
actual_model = response.get("model", model) if isinstance(response, dict) else model
|
||||
provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
|
||||
return ToolResult.success({
|
||||
"model": actual_model,
|
||||
"provider": provider_name,
|
||||
"content": content,
|
||||
"usage": usage_info,
|
||||
})
|
||||
|
||||
@staticmethod
|
||||
def _ensure_v1(api_base: str) -> str:
|
||||
@@ -139,9 +353,13 @@ class Vision(BaseTool):
|
||||
return api_base.rstrip("/") + "/v1"
|
||||
|
||||
def _build_image_content(self, image: str) -> dict:
|
||||
"""Build the image_url content block for the API request."""
|
||||
"""
|
||||
Build the image_url content block.
|
||||
Both remote URLs and local files are converted to base64 data URLs
|
||||
so every bot backend can consume them without extra downloads.
|
||||
"""
|
||||
if image.startswith(("http://", "https://")):
|
||||
return {"type": "image_url", "image_url": {"url": image}}
|
||||
return self._download_to_data_url(image)
|
||||
|
||||
if not os.path.isfile(image):
|
||||
raise FileNotFoundError(f"Image file not found: {image}")
|
||||
@@ -165,6 +383,19 @@ class Vision(BaseTool):
|
||||
data_url = f"data:{mime_type};base64,{b64}"
|
||||
return {"type": "image_url", "image_url": {"url": data_url}}
|
||||
|
||||
@staticmethod
|
||||
def _download_to_data_url(url: str) -> dict:
|
||||
"""Download a remote image and return it as a base64 data URL."""
|
||||
resp = requests.get(url, timeout=30)
|
||||
if resp.status_code != 200:
|
||||
raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
|
||||
content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
|
||||
if not content_type.startswith("image/"):
|
||||
content_type = "image/jpeg"
|
||||
b64 = base64.b64encode(resp.content).decode("ascii")
|
||||
data_url = f"data:{content_type};base64,{b64}"
|
||||
return {"type": "image_url", "image_url": {"url": data_url}}
|
||||
|
||||
@staticmethod
|
||||
def _maybe_compress(path: str) -> str:
|
||||
"""Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
|
||||
@@ -220,8 +451,13 @@ class Vision(BaseTool):
|
||||
os.remove(tmp.name)
|
||||
return path
|
||||
|
||||
def _call_api(self, api_key: str, api_base: str, model: str,
|
||||
question: str, image_content: dict, extra_headers: dict = None) -> ToolResult:
|
||||
def _call_api(self, provider: VisionProvider, model: str,
|
||||
question: str, image_content: dict) -> ToolResult:
|
||||
"""
|
||||
Call a single provider's Vision API.
|
||||
Raises VisionAPIError on recoverable failures so the caller can try
|
||||
the next provider.
|
||||
"""
|
||||
payload = {
|
||||
"model": model,
|
||||
"messages": [
|
||||
@@ -233,34 +469,29 @@ class Vision(BaseTool):
|
||||
],
|
||||
}
|
||||
],
|
||||
"max_tokens": MAX_TOKENS,
|
||||
}
|
||||
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Authorization": f"Bearer {provider.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
**(extra_headers or {}),
|
||||
**provider.extra_headers,
|
||||
}
|
||||
|
||||
resp = requests.post(
|
||||
f"{api_base}/chat/completions",
|
||||
f"{provider.api_base}/chat/completions",
|
||||
headers=headers,
|
||||
json=payload,
|
||||
timeout=DEFAULT_TIMEOUT,
|
||||
)
|
||||
|
||||
if resp.status_code == 401:
|
||||
return ToolResult.fail("Error: Invalid API key. Please check your configuration.")
|
||||
if resp.status_code == 429:
|
||||
return ToolResult.fail("Error: API rate limit reached. Please try again later.")
|
||||
if resp.status_code != 200:
|
||||
return ToolResult.fail(f"Error: Vision API returned HTTP {resp.status_code}: {resp.text[:200]}")
|
||||
raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")
|
||||
|
||||
data = resp.json()
|
||||
|
||||
if "error" in data:
|
||||
msg = data["error"].get("message", "Unknown API error")
|
||||
return ToolResult.fail(f"Error: Vision API error - {msg}")
|
||||
raise VisionAPIError(f"API error - {msg}")
|
||||
|
||||
content = ""
|
||||
choices = data.get("choices", [])
|
||||
@@ -270,6 +501,7 @@ class Vision(BaseTool):
|
||||
usage = data.get("usage", {})
|
||||
result = {
|
||||
"model": model,
|
||||
"provider": provider.name,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
|
||||
@@ -67,7 +67,7 @@ class AgentLLMModel(LLMModel):
|
||||
|
||||
_MODEL_BOT_TYPE_MAP = {
|
||||
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
|
||||
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
|
||||
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
|
||||
const.MODELSCOPE: const.MODELSCOPE,
|
||||
}
|
||||
_MODEL_PREFIX_MAP = [
|
||||
@@ -124,14 +124,15 @@ class AgentLLMModel(LLMModel):
|
||||
|
||||
@property
|
||||
def bot(self):
|
||||
"""Lazy load the bot, re-create when model changes"""
|
||||
"""Lazy load the bot, re-create when model or bot_type changes"""
|
||||
from models.bot_factory import create_bot
|
||||
cur_model = self.model
|
||||
if self._bot is None or self._bot_model != cur_model:
|
||||
bot_type = self._resolve_bot_type(cur_model)
|
||||
self._bot = create_bot(bot_type)
|
||||
cur_bot_type = self._resolve_bot_type(cur_model)
|
||||
if self._bot is None or self._bot_model != cur_model or getattr(self, '_bot_type', None) != cur_bot_type:
|
||||
self._bot = create_bot(cur_bot_type)
|
||||
self._bot = add_openai_compatible_support(self._bot)
|
||||
self._bot_model = cur_model
|
||||
self._bot_type = cur_bot_type
|
||||
return self._bot
|
||||
|
||||
def call(self, request: LLMRequest):
|
||||
@@ -505,15 +506,15 @@ class AgentBridge:
|
||||
|
||||
def _migrate_config_to_env(self, workspace_root: str):
|
||||
"""
|
||||
Migrate API keys from config.json to .env file if not already set
|
||||
|
||||
Sync API keys from config.json to .env file.
|
||||
Adds new keys and updates changed values on each startup.
|
||||
|
||||
Args:
|
||||
workspace_root: Workspace directory path (not used, kept for compatibility)
|
||||
"""
|
||||
from config import conf
|
||||
import os
|
||||
|
||||
# Mapping from config.json keys to environment variable names
|
||||
key_mapping = {
|
||||
"open_ai_api_key": "OPENAI_API_KEY",
|
||||
"open_ai_api_base": "OPENAI_API_BASE",
|
||||
@@ -522,10 +523,9 @@ class AgentBridge:
|
||||
"linkai_api_key": "LINKAI_API_KEY",
|
||||
}
|
||||
|
||||
# Use fixed secure location for .env file
|
||||
env_file = expand_path("~/.cow/.env")
|
||||
|
||||
# Read existing env vars from .env file
|
||||
# Read existing env vars (key -> value)
|
||||
existing_env_vars = {}
|
||||
if os.path.exists(env_file):
|
||||
try:
|
||||
@@ -533,48 +533,46 @@ class AgentBridge:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('#') and '=' in line:
|
||||
key, _ = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = True
|
||||
key, val = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = val.strip()
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentBridge] Failed to read .env file: {e}")
|
||||
|
||||
# Check which keys need to be migrated
|
||||
keys_to_migrate = {}
|
||||
# Sync config.json values into .env (add/update/remove)
|
||||
updated = False
|
||||
for config_key, env_key in key_mapping.items():
|
||||
# Skip if already in .env file
|
||||
if env_key in existing_env_vars:
|
||||
continue
|
||||
|
||||
# Get value from config.json
|
||||
value = conf().get(config_key, "")
|
||||
if value and value.strip(): # Only migrate non-empty values
|
||||
keys_to_migrate[env_key] = value.strip()
|
||||
|
||||
# Log summary if there are keys to skip
|
||||
if existing_env_vars:
|
||||
logger.debug(f"[AgentBridge] {len(existing_env_vars)} env vars already in .env")
|
||||
|
||||
# Write new keys to .env file
|
||||
if keys_to_migrate:
|
||||
raw = conf().get(config_key, "")
|
||||
value = raw.strip() if raw else ""
|
||||
old_value = existing_env_vars.get(env_key)
|
||||
|
||||
if value:
|
||||
if old_value == value:
|
||||
continue
|
||||
existing_env_vars[env_key] = value
|
||||
os.environ[env_key] = value
|
||||
updated = True
|
||||
else:
|
||||
if old_value is None:
|
||||
continue
|
||||
existing_env_vars.pop(env_key, None)
|
||||
os.environ.pop(env_key, None)
|
||||
updated = True
|
||||
updated = True
|
||||
|
||||
if updated:
|
||||
try:
|
||||
# Ensure ~/.cow directory and .env file exist
|
||||
env_dir = os.path.dirname(env_file)
|
||||
if not os.path.exists(env_dir):
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
if not os.path.exists(env_file):
|
||||
open(env_file, 'a').close()
|
||||
|
||||
# Append new keys
|
||||
with open(env_file, 'a', encoding='utf-8') as f:
|
||||
f.write('\n# Auto-migrated from config.json\n')
|
||||
for key, value in keys_to_migrate.items():
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
|
||||
with open(env_file, 'w', encoding='utf-8') as f:
|
||||
f.write('# Environment variables for agent\n')
|
||||
f.write('# Auto-managed - synced from config.json on startup\n\n')
|
||||
for key, value in sorted(existing_env_vars.items()):
|
||||
f.write(f'{key}={value}\n')
|
||||
# Also set in current process
|
||||
os.environ[key] = value
|
||||
|
||||
logger.info(f"[AgentBridge] Migrated {len(keys_to_migrate)} API keys from config.json to .env: {list(keys_to_migrate.keys())}")
|
||||
|
||||
logger.info(f"[AgentBridge] Synced API keys from config.json to .env")
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentBridge] Failed to migrate API keys: {e}")
|
||||
logger.warning(f"[AgentBridge] Failed to sync API keys: {e}")
|
||||
|
||||
def _persist_messages(
|
||||
self, session_id: str, new_messages: list, channel_type: str = ""
|
||||
|
||||
@@ -465,8 +465,12 @@ class AgentInitializer:
|
||||
'timezone': timezone_name
|
||||
}
|
||||
|
||||
def get_model():
|
||||
"""Get current model name dynamically from config"""
|
||||
return conf().get("model", "unknown")
|
||||
|
||||
return {
|
||||
"model": conf().get("model", "unknown"),
|
||||
"_get_model": get_model,
|
||||
"workspace": workspace_root,
|
||||
"channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
|
||||
"_get_current_time": get_current_time # Dynamic time function
|
||||
@@ -486,7 +490,7 @@ class AgentInitializer:
|
||||
|
||||
env_file = expand_path("~/.cow/.env")
|
||||
|
||||
# Read existing env vars
|
||||
# Read existing env vars (key -> value)
|
||||
existing_env_vars = {}
|
||||
if os.path.exists(env_file):
|
||||
try:
|
||||
@@ -494,38 +498,46 @@ class AgentInitializer:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if line and not line.startswith('#') and '=' in line:
|
||||
key, _ = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = True
|
||||
key, val = line.split('=', 1)
|
||||
existing_env_vars[key.strip()] = val.strip()
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
|
||||
|
||||
# Check which keys need migration
|
||||
keys_to_migrate = {}
|
||||
# Sync config.json values into .env (add/update/remove)
|
||||
updated = False
|
||||
for config_key, env_key in key_mapping.items():
|
||||
if env_key in existing_env_vars:
|
||||
continue
|
||||
value = conf().get(config_key, "")
|
||||
if value and value.strip():
|
||||
keys_to_migrate[env_key] = value.strip()
|
||||
|
||||
# Write new keys
|
||||
if keys_to_migrate:
|
||||
raw = conf().get(config_key, "")
|
||||
value = raw.strip() if raw else ""
|
||||
old_value = existing_env_vars.get(env_key)
|
||||
|
||||
if value:
|
||||
if old_value == value:
|
||||
continue
|
||||
existing_env_vars[env_key] = value
|
||||
os.environ[env_key] = value
|
||||
updated = True
|
||||
else:
|
||||
if old_value is None:
|
||||
continue
|
||||
existing_env_vars.pop(env_key, None)
|
||||
os.environ.pop(env_key, None)
|
||||
updated = True
|
||||
|
||||
if updated:
|
||||
try:
|
||||
env_dir = os.path.dirname(env_file)
|
||||
if not os.path.exists(env_dir):
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
if not os.path.exists(env_file):
|
||||
open(env_file, 'a').close()
|
||||
|
||||
with open(env_file, 'a', encoding='utf-8') as f:
|
||||
f.write('\n# Auto-migrated from config.json\n')
|
||||
for key, value in keys_to_migrate.items():
|
||||
os.makedirs(env_dir, exist_ok=True)
|
||||
|
||||
# Rewrite the entire .env file to ensure consistency
|
||||
with open(env_file, 'w', encoding='utf-8') as f:
|
||||
f.write('# Environment variables for agent\n')
|
||||
f.write('# Auto-managed - synced from config.json on startup\n\n')
|
||||
for key, value in sorted(existing_env_vars.items()):
|
||||
f.write(f'{key}={value}\n')
|
||||
os.environ[key] = value
|
||||
|
||||
logger.info(f"[AgentInitializer] Migrated {len(keys_to_migrate)} API keys to .env: {list(keys_to_migrate.keys())}")
|
||||
|
||||
logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
|
||||
except Exception as e:
|
||||
logger.warning(f"[AgentInitializer] Failed to migrate API keys: {e}")
|
||||
logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")
|
||||
|
||||
def _start_daily_flush_timer(self):
|
||||
"""Start a background thread that flushes all agents' memory daily at 23:55."""
|
||||
|
||||
@@ -39,11 +39,8 @@ class Bridge(object):
|
||||
self.btype["chat"] = const.BAIDU
|
||||
if model_type in ["xunfei"]:
|
||||
self.btype["chat"] = const.XUNFEI
|
||||
if model_type in [const.QWEN]:
|
||||
self.btype["chat"] = const.QWEN
|
||||
if model_type in [const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
|
||||
if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
|
||||
self.btype["chat"] = const.QWEN_DASHSCOPE
|
||||
# Support Qwen3 and other DashScope models
|
||||
if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
|
||||
self.btype["chat"] = const.QWEN_DASHSCOPE
|
||||
if model_type and model_type.startswith("gemini"):
|
||||
|
||||
@@ -347,38 +347,30 @@ class ChatChannel(Channel):
|
||||
if media_items:
|
||||
logger.info(f"[chat_channel] Extracted {len(media_items)} media item(s) from reply")
|
||||
|
||||
# 先发送文本(保持原文本不变)
|
||||
# Send text first (the frontend will embed video players via renderMarkdown).
|
||||
logger.info(f"[chat_channel] Sending text content before media: {reply.content[:100]}...")
|
||||
self._send(reply, context)
|
||||
logger.info(f"[chat_channel] Text sent, now sending {len(media_items)} media item(s)")
|
||||
|
||||
# 然后逐个发送媒体文件
|
||||
for i, (url, media_type) in enumerate(media_items):
|
||||
try:
|
||||
# 判断是本地文件还是URL
|
||||
# Determine whether it is a remote URL or a local file.
|
||||
if url.startswith(('http://', 'https://')):
|
||||
# 网络资源
|
||||
if media_type == 'video':
|
||||
# 视频使用 FILE 类型发送
|
||||
media_reply = Reply(ReplyType.FILE, url)
|
||||
media_reply.file_name = os.path.basename(url)
|
||||
else:
|
||||
# 图片使用 IMAGE_URL 类型
|
||||
media_reply = Reply(ReplyType.IMAGE_URL, url)
|
||||
elif os.path.exists(url):
|
||||
# 本地文件
|
||||
if media_type == 'video':
|
||||
# 视频使用 FILE 类型,转换为 file:// URL
|
||||
media_reply = Reply(ReplyType.FILE, f"file://{url}")
|
||||
media_reply.file_name = os.path.basename(url)
|
||||
else:
|
||||
# 图片使用 IMAGE_URL 类型,转换为 file:// URL
|
||||
media_reply = Reply(ReplyType.IMAGE_URL, f"file://{url}")
|
||||
else:
|
||||
logger.warning(f"[chat_channel] Media file not found or invalid URL: {url}")
|
||||
continue
|
||||
|
||||
# 发送媒体文件(添加小延迟避免频率限制)
|
||||
if i > 0:
|
||||
time.sleep(0.5)
|
||||
self._send(media_reply, context)
|
||||
|
||||
@@ -270,8 +270,42 @@ function createMd() {
|
||||
|
||||
const md = createMd();
|
||||
|
||||
const VIDEO_EXT_RE = /\.(?:mp4|webm|mov|avi|mkv)$/i; // tested against URL without query string
|
||||
|
||||
function _buildVideoHtml(url) {
|
||||
const fileName = url.split('/').pop().split('?')[0];
|
||||
return `<div style="margin:10px 0;">` +
|
||||
`<video controls preload="metadata" ` +
|
||||
`style="max-width:100%;border-radius:10px;box-shadow:0 2px 8px rgba(0,0,0,0.15);display:block;">` +
|
||||
`<source src="${url}"></video>` +
|
||||
`<a href="${url}" target="_blank" ` +
|
||||
`style="display:inline-flex;align-items:center;gap:4px;margin-top:4px;font-size:12px;color:#8b8fa8;text-decoration:none;">` +
|
||||
`<i class="fas fa-download"></i> ${escapeHtml(fileName)}</a></div>`;
|
||||
}
|
||||
|
||||
function injectVideoPlayers(html) {
|
||||
// Step 1: replace markdown-it anchor tags whose href points to a video file.
|
||||
const step1 = html.replace(
|
||||
/<a\s+href="(https?:\/\/[^"]+)"[^>]*>[^<]*<\/a>/gi,
|
||||
(match, url) => VIDEO_EXT_RE.test(url.split('?')[0]) ? _buildVideoHtml(url) : match
|
||||
);
|
||||
// Step 2: replace any remaining bare video URLs in text nodes (not inside HTML tags).
|
||||
// Split on HTML tags to avoid touching src/href attributes already in markup.
|
||||
return step1.split(/(<[^>]+>)/).map((chunk, idx) => {
|
||||
// Even indices are text nodes; odd indices are HTML tags — leave them untouched.
|
||||
if (idx % 2 !== 0) return chunk;
|
||||
return chunk.replace(/https?:\/\/\S+/gi, (url) => {
|
||||
const bare = url.replace(/[),.\s]+$/, ''); // strip trailing punctuation
|
||||
return VIDEO_EXT_RE.test(bare.split('?')[0]) ? _buildVideoHtml(bare) : url;
|
||||
});
|
||||
}).join('');
|
||||
}
|
||||
|
||||
function renderMarkdown(text) {
|
||||
try { return md.render(text); }
|
||||
try {
|
||||
const html = md.render(text);
|
||||
return injectVideoPlayers(html);
|
||||
}
|
||||
catch (e) { return text.replace(/\n/g, '<br>'); }
|
||||
}
|
||||
|
||||
@@ -729,41 +763,60 @@ function sendMessage() {
|
||||
}));
|
||||
}
|
||||
|
||||
fetch('/message', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify(body)
|
||||
})
|
||||
.then(r => r.json())
|
||||
.then(data => {
|
||||
if (data.status === 'success') {
|
||||
if (data.stream) {
|
||||
startSSE(data.request_id, loadingEl, timestamp);
|
||||
const MAX_RETRIES = 2;
|
||||
const RETRY_DELAY_MS = 1000;
|
||||
|
||||
function postWithRetry(attempt) {
|
||||
fetch('/message', {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify(body)
|
||||
})
|
||||
.then(r => r.json())
|
||||
.then(data => {
|
||||
if (data.status === 'success') {
|
||||
if (data.stream) {
|
||||
startSSE(data.request_id, loadingEl, timestamp);
|
||||
} else {
|
||||
loadingContainers[data.request_id] = loadingEl;
|
||||
if (!isPolling) startPolling();
|
||||
}
|
||||
} else {
|
||||
loadingContainers[data.request_id] = loadingEl;
|
||||
if (!isPolling) startPolling();
|
||||
loadingEl.remove();
|
||||
addBotMessage(t('error_send'), new Date());
|
||||
}
|
||||
})
|
||||
.catch(err => {
|
||||
if (err.name === 'AbortError') {
|
||||
loadingEl.remove();
|
||||
addBotMessage(t('error_timeout'), new Date());
|
||||
return;
|
||||
}
|
||||
if (attempt < MAX_RETRIES) {
|
||||
console.warn(`[sendMessage] attempt ${attempt + 1} failed, retrying...`, err);
|
||||
setTimeout(() => postWithRetry(attempt + 1), RETRY_DELAY_MS * (attempt + 1));
|
||||
return;
|
||||
}
|
||||
} else {
|
||||
loadingEl.remove();
|
||||
addBotMessage(t('error_send'), new Date());
|
||||
}
|
||||
})
|
||||
.catch(err => {
|
||||
loadingEl.remove();
|
||||
addBotMessage(err.name === 'AbortError' ? t('error_timeout') : t('error_send'), new Date());
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
postWithRetry(0);
|
||||
}
|
||||
|
||||
function startSSE(requestId, loadingEl, timestamp) {
|
||||
const es = new EventSource(`/stream?request_id=${encodeURIComponent(requestId)}`);
|
||||
activeStreams[requestId] = es;
|
||||
|
||||
let botEl = null;
|
||||
let stepsEl = null; // .agent-steps (thinking summaries + tool indicators)
|
||||
let contentEl = null; // .answer-content (final streaming answer)
|
||||
let mediaEl = null; // .media-content (images & file attachments)
|
||||
let accumulatedText = '';
|
||||
let currentToolEl = null;
|
||||
let done = false;
|
||||
|
||||
const MAX_RECONNECTS = 10;
|
||||
const RECONNECT_BASE_MS = 1000;
|
||||
let reconnectCount = 0;
|
||||
|
||||
function ensureBotEl() {
|
||||
if (botEl) return;
|
||||
@@ -788,162 +841,204 @@ function startSSE(requestId, loadingEl, timestamp) {
|
||||
mediaEl = botEl.querySelector('.media-content');
|
||||
}
|
||||
|
||||
es.onmessage = function(e) {
|
||||
let item;
|
||||
try { item = JSON.parse(e.data); } catch (_) { return; }
|
||||
function connect() {
|
||||
const es = new EventSource(`/stream?request_id=${encodeURIComponent(requestId)}`);
|
||||
activeStreams[requestId] = es;
|
||||
|
||||
if (item.type === 'delta') {
|
||||
ensureBotEl();
|
||||
accumulatedText += item.content;
|
||||
contentEl.innerHTML = renderMarkdown(accumulatedText);
|
||||
scrollChatToBottom();
|
||||
es.onmessage = function(e) {
|
||||
let item;
|
||||
try { item = JSON.parse(e.data); } catch (_) { return; }
|
||||
|
||||
} else if (item.type === 'tool_start') {
|
||||
ensureBotEl();
|
||||
// Successful data received, reset reconnect counter
|
||||
reconnectCount = 0;
|
||||
|
||||
// Save current thinking as a collapsible step
|
||||
if (accumulatedText.trim()) {
|
||||
const fullText = accumulatedText.trim();
|
||||
const oneLine = fullText.replace(/\n+/g, ' ');
|
||||
const needsTruncate = oneLine.length > 80;
|
||||
const stepEl = document.createElement('div');
|
||||
stepEl.className = 'agent-step agent-thinking-step' + (needsTruncate ? '' : ' no-expand');
|
||||
if (needsTruncate) {
|
||||
const truncated = oneLine.substring(0, 80) + '…';
|
||||
stepEl.innerHTML = `
|
||||
<div class="thinking-header" onclick="this.parentElement.classList.toggle('expanded')">
|
||||
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
|
||||
<span class="thinking-summary">${escapeHtml(truncated)}</span>
|
||||
<i class="fas fa-chevron-right thinking-chevron"></i>
|
||||
</div>
|
||||
<div class="thinking-full">${renderMarkdown(fullText)}</div>`;
|
||||
} else {
|
||||
stepEl.innerHTML = `
|
||||
<div class="thinking-header no-toggle">
|
||||
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
|
||||
<span>${escapeHtml(oneLine)}</span>
|
||||
</div>`;
|
||||
if (item.type === 'delta') {
|
||||
ensureBotEl();
|
||||
accumulatedText += item.content;
|
||||
contentEl.innerHTML = renderMarkdown(accumulatedText);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'tool_start') {
|
||||
ensureBotEl();
|
||||
|
||||
// Save current thinking as a collapsible step
|
||||
if (accumulatedText.trim()) {
|
||||
const fullText = accumulatedText.trim();
|
||||
const oneLine = fullText.replace(/\n+/g, ' ');
|
||||
const needsTruncate = oneLine.length > 80;
|
||||
const stepEl = document.createElement('div');
|
||||
stepEl.className = 'agent-step agent-thinking-step' + (needsTruncate ? '' : ' no-expand');
|
||||
if (needsTruncate) {
|
||||
const truncated = oneLine.substring(0, 80) + '…';
|
||||
stepEl.innerHTML = `
|
||||
<div class="thinking-header" onclick="this.parentElement.classList.toggle('expanded')">
|
||||
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
|
||||
<span class="thinking-summary">${escapeHtml(truncated)}</span>
|
||||
<i class="fas fa-chevron-right thinking-chevron"></i>
|
||||
</div>
|
||||
<div class="thinking-full">${renderMarkdown(fullText)}</div>`;
|
||||
} else {
|
||||
stepEl.innerHTML = `
|
||||
<div class="thinking-header no-toggle">
|
||||
<i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
|
||||
<span>${escapeHtml(oneLine)}</span>
|
||||
</div>`;
|
||||
}
|
||||
stepsEl.appendChild(stepEl);
|
||||
}
|
||||
stepsEl.appendChild(stepEl);
|
||||
}
|
||||
accumulatedText = '';
|
||||
contentEl.innerHTML = '';
|
||||
accumulatedText = '';
|
||||
contentEl.innerHTML = '';
|
||||
|
||||
// Add tool execution indicator (collapsible)
|
||||
currentToolEl = document.createElement('div');
|
||||
currentToolEl.className = 'agent-step agent-tool-step';
|
||||
const argsStr = formatToolArgs(item.arguments || {});
|
||||
currentToolEl.innerHTML = `
|
||||
<div class="tool-header" onclick="this.parentElement.classList.toggle('expanded')">
|
||||
<i class="fas fa-cog fa-spin text-primary-400 flex-shrink-0 tool-icon"></i>
|
||||
<span class="tool-name">${item.tool}</span>
|
||||
<i class="fas fa-chevron-right tool-chevron"></i>
|
||||
</div>
|
||||
<div class="tool-detail">
|
||||
<div class="tool-detail-section">
|
||||
<div class="tool-detail-label">Input</div>
|
||||
<pre class="tool-detail-content">${argsStr}</pre>
|
||||
// Add tool execution indicator (collapsible)
|
||||
currentToolEl = document.createElement('div');
|
||||
currentToolEl.className = 'agent-step agent-tool-step';
|
||||
const argsStr = formatToolArgs(item.arguments || {});
|
||||
currentToolEl.innerHTML = `
|
||||
<div class="tool-header" onclick="this.parentElement.classList.toggle('expanded')">
|
||||
<i class="fas fa-cog fa-spin text-primary-400 flex-shrink-0 tool-icon"></i>
|
||||
<span class="tool-name">${item.tool}</span>
|
||||
<i class="fas fa-chevron-right tool-chevron"></i>
|
||||
</div>
|
||||
<div class="tool-detail-section tool-output-section"></div>
|
||||
</div>`;
|
||||
stepsEl.appendChild(currentToolEl);
|
||||
<div class="tool-detail">
|
||||
<div class="tool-detail-section">
|
||||
<div class="tool-detail-label">Input</div>
|
||||
<pre class="tool-detail-content">${argsStr}</pre>
|
||||
</div>
|
||||
<div class="tool-detail-section tool-output-section"></div>
|
||||
</div>`;
|
||||
stepsEl.appendChild(currentToolEl);
|
||||
|
||||
scrollChatToBottom();
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'tool_end') {
|
||||
if (currentToolEl) {
|
||||
const isError = item.status !== 'success';
|
||||
const icon = currentToolEl.querySelector('.tool-icon');
|
||||
icon.className = isError
|
||||
? 'fas fa-times text-red-400 flex-shrink-0 tool-icon'
|
||||
: 'fas fa-check text-primary-400 flex-shrink-0 tool-icon';
|
||||
} else if (item.type === 'tool_end') {
|
||||
if (currentToolEl) {
|
||||
const isError = item.status !== 'success';
|
||||
const icon = currentToolEl.querySelector('.tool-icon');
|
||||
icon.className = isError
|
||||
? 'fas fa-times text-red-400 flex-shrink-0 tool-icon'
|
||||
: 'fas fa-check text-primary-400 flex-shrink-0 tool-icon';
|
||||
|
||||
// Show execution time
|
||||
const nameEl = currentToolEl.querySelector('.tool-name');
|
||||
if (item.execution_time !== undefined) {
|
||||
nameEl.innerHTML += ` <span class="tool-time">${item.execution_time}s</span>`;
|
||||
// Show execution time
|
||||
const nameEl = currentToolEl.querySelector('.tool-name');
|
||||
if (item.execution_time !== undefined) {
|
||||
nameEl.innerHTML += ` <span class="tool-time">${item.execution_time}s</span>`;
|
||||
}
|
||||
|
||||
// Fill output section
|
||||
const outputSection = currentToolEl.querySelector('.tool-output-section');
|
||||
if (outputSection && item.result) {
|
||||
outputSection.innerHTML = `
|
||||
<div class="tool-detail-label">${isError ? 'Error' : 'Output'}</div>
|
||||
<pre class="tool-detail-content ${isError ? 'tool-error-text' : ''}">${escapeHtml(String(item.result))}</pre>`;
|
||||
}
|
||||
|
||||
if (isError) currentToolEl.classList.add('tool-failed');
|
||||
currentToolEl = null;
|
||||
}
|
||||
|
||||
// Fill output section
|
||||
const outputSection = currentToolEl.querySelector('.tool-output-section');
|
||||
if (outputSection && item.result) {
|
||||
outputSection.innerHTML = `
|
||||
<div class="tool-detail-label">${isError ? 'Error' : 'Output'}</div>
|
||||
<pre class="tool-detail-content ${isError ? 'tool-error-text' : ''}">${escapeHtml(String(item.result))}</pre>`;
|
||||
}
|
||||
} else if (item.type === 'image') {
|
||||
ensureBotEl();
|
||||
const imgEl = document.createElement('img');
|
||||
imgEl.src = item.content;
|
||||
imgEl.alt = 'screenshot';
|
||||
imgEl.style.cssText = 'max-width:600px;border-radius:8px;margin:8px 0;cursor:pointer;box-shadow:0 1px 4px rgba(0,0,0,0.1);';
|
||||
imgEl.onclick = () => window.open(item.content, '_blank');
|
||||
mediaEl.appendChild(imgEl);
|
||||
scrollChatToBottom();
|
||||
|
||||
if (isError) currentToolEl.classList.add('tool-failed');
|
||||
currentToolEl = null;
|
||||
} else if (item.type === 'text') {
|
||||
// Intermediate text sent before media items; display it but keep SSE open.
|
||||
ensureBotEl();
|
||||
contentEl.classList.remove('sse-streaming');
|
||||
const textContent = item.content || accumulatedText;
|
||||
if (textContent) contentEl.innerHTML = renderMarkdown(textContent);
|
||||
applyHighlighting(botEl);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'video') {
|
||||
ensureBotEl();
|
||||
const wrapper = document.createElement('div');
|
||||
wrapper.innerHTML = _buildVideoHtml(item.content);
|
||||
mediaEl.appendChild(wrapper.firstElementChild || wrapper);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'file') {
|
||||
ensureBotEl();
|
||||
const fileName = item.file_name || item.content.split('/').pop();
|
||||
const fileEl = document.createElement('a');
|
||||
fileEl.href = item.content;
|
||||
fileEl.download = fileName;
|
||||
fileEl.target = '_blank';
|
||||
fileEl.className = 'file-attachment';
|
||||
fileEl.style.cssText = 'display:inline-flex;align-items:center;gap:6px;padding:8px 14px;margin:8px 0;border-radius:8px;background:var(--bg-secondary,#f3f4f6);color:var(--text-primary,#374151);text-decoration:none;font-size:14px;border:1px solid var(--border-color,#e5e7eb);';
|
||||
fileEl.innerHTML = `<i class="fas fa-file-download" style="color:#6b7280;"></i> ${fileName}`;
|
||||
mediaEl.appendChild(fileEl);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'phase') {
|
||||
// Coarse progress (e.g. cow install-browser); must not close SSE (unlike "done")
|
||||
ensureBotEl();
|
||||
const wrap = document.createElement('div');
|
||||
wrap.className = 'text-xs sm:text-sm text-slate-600 dark:text-slate-400 border-l-2 border-primary-400 pl-2 py-1 my-0.5';
|
||||
wrap.textContent = String(item.content || '');
|
||||
stepsEl.appendChild(wrap);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'done') {
|
||||
done = true;
|
||||
es.close();
|
||||
delete activeStreams[requestId];
|
||||
|
||||
// item.content may be empty when "done" is only a stream-close signal after media.
|
||||
const finalText = item.content || accumulatedText;
|
||||
|
||||
if (!botEl && finalText) {
|
||||
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
|
||||
addBotMessage(finalText, new Date((item.timestamp || Date.now() / 1000) * 1000), requestId);
|
||||
} else if (botEl) {
|
||||
contentEl.classList.remove('sse-streaming');
|
||||
// Only update text content when there is something new to show.
|
||||
if (finalText) contentEl.innerHTML = renderMarkdown(finalText);
|
||||
applyHighlighting(botEl);
|
||||
}
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'error') {
|
||||
done = true;
|
||||
es.close();
|
||||
delete activeStreams[requestId];
|
||||
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
|
||||
addBotMessage(t('error_send'), new Date());
|
||||
}
|
||||
};
|
||||
|
||||
} else if (item.type === 'image') {
|
||||
ensureBotEl();
|
||||
const imgEl = document.createElement('img');
|
||||
imgEl.src = item.content;
|
||||
imgEl.alt = 'screenshot';
|
||||
imgEl.style.cssText = 'max-width:600px;border-radius:8px;margin:8px 0;cursor:pointer;box-shadow:0 1px 4px rgba(0,0,0,0.1);';
|
||||
imgEl.onclick = () => window.open(item.content, '_blank');
|
||||
mediaEl.appendChild(imgEl);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'file') {
|
||||
ensureBotEl();
|
||||
const fileName = item.file_name || item.content.split('/').pop();
|
||||
const fileEl = document.createElement('a');
|
||||
fileEl.href = item.content;
|
||||
fileEl.download = fileName;
|
||||
fileEl.target = '_blank';
|
||||
fileEl.className = 'file-attachment';
|
||||
fileEl.style.cssText = 'display:inline-flex;align-items:center;gap:6px;padding:8px 14px;margin:8px 0;border-radius:8px;background:var(--bg-secondary,#f3f4f6);color:var(--text-primary,#374151);text-decoration:none;font-size:14px;border:1px solid var(--border-color,#e5e7eb);';
|
||||
fileEl.innerHTML = `<i class="fas fa-file-download" style="color:#6b7280;"></i> ${fileName}`;
|
||||
mediaEl.appendChild(fileEl);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'phase') {
|
||||
// Coarse progress (e.g. cow install-browser); must not close SSE (unlike "done")
|
||||
ensureBotEl();
|
||||
const wrap = document.createElement('div');
|
||||
wrap.className = 'text-xs sm:text-sm text-slate-600 dark:text-slate-400 border-l-2 border-primary-400 pl-2 py-1 my-0.5';
|
||||
wrap.textContent = String(item.content || '');
|
||||
stepsEl.appendChild(wrap);
|
||||
scrollChatToBottom();
|
||||
|
||||
} else if (item.type === 'done') {
|
||||
es.onerror = function() {
|
||||
es.close();
|
||||
delete activeStreams[requestId];
|
||||
|
||||
const finalText = item.content || accumulatedText;
|
||||
if (done) return;
|
||||
|
||||
if (!botEl && finalText) {
|
||||
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
|
||||
addBotMessage(finalText, new Date((item.timestamp || Date.now() / 1000) * 1000), requestId);
|
||||
} else if (botEl) {
|
||||
if (reconnectCount < MAX_RECONNECTS) {
|
||||
reconnectCount++;
|
||||
const delay = Math.min(RECONNECT_BASE_MS * reconnectCount, 5000);
|
||||
console.warn(`[SSE] connection lost for ${requestId}, reconnecting in ${delay}ms (attempt ${reconnectCount}/${MAX_RECONNECTS})`);
|
||||
setTimeout(connect, delay);
|
||||
return;
|
||||
}
|
||||
|
||||
// Exhausted retries, show whatever we have
|
||||
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
|
||||
if (!botEl) {
|
||||
addBotMessage(t('error_send'), new Date());
|
||||
} else if (accumulatedText) {
|
||||
contentEl.classList.remove('sse-streaming');
|
||||
if (finalText) contentEl.innerHTML = renderMarkdown(finalText);
|
||||
contentEl.innerHTML = renderMarkdown(accumulatedText);
|
||||
applyHighlighting(botEl);
|
||||
}
|
||||
scrollChatToBottom();
|
||||
};
|
||||
}
|
||||
|
||||
} else if (item.type === 'error') {
|
||||
es.close();
|
||||
delete activeStreams[requestId];
|
||||
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
|
||||
addBotMessage(t('error_send'), new Date());
|
||||
}
|
||||
};
|
||||
|
||||
es.onerror = function() {
|
||||
es.close();
|
||||
delete activeStreams[requestId];
|
||||
if (loadingEl) { loadingEl.remove(); loadingEl = null; }
|
||||
if (!botEl) {
|
||||
addBotMessage(t('error_send'), new Date());
|
||||
} else if (accumulatedText) {
|
||||
contentEl.classList.remove('sse-streaming');
|
||||
contentEl.innerHTML = renderMarkdown(accumulatedText);
|
||||
applyHighlighting(botEl);
|
||||
}
|
||||
};
|
||||
connect();
|
||||
}
|
||||
|
||||
function startPolling() {
|
||||
|
||||
@@ -126,6 +126,13 @@ class WebChannel(ChatChannel):
|
||||
logger.debug(f"SSE skipped duplicate file for request {request_id}")
|
||||
return
|
||||
|
||||
# Skip http-URL FILE/IMAGE_URL replies produced by chat_channel's media extraction:
|
||||
# the text reply (already sent as "done") contains the URL and the frontend will
|
||||
# render it via renderMarkdown/injectVideoPlayers, so no separate SSE event needed.
|
||||
if reply.type in (ReplyType.FILE, ReplyType.IMAGE_URL) and content.startswith(("http://", "https://")):
|
||||
logger.debug(f"SSE skipped http media reply for request {request_id}")
|
||||
return
|
||||
|
||||
self.sse_queues[request_id].put({
|
||||
"type": "done",
|
||||
"content": content,
|
||||
@@ -322,14 +329,18 @@ class WebChannel(ChatChannel):
|
||||
"""
|
||||
SSE generator for a given request_id.
|
||||
Yields UTF-8 encoded bytes to avoid WSGI Latin-1 mangling.
|
||||
Supports client reconnection: the queue is only removed after a
|
||||
"done" event is consumed, so a new GET /stream with the same
|
||||
request_id can resume reading remaining events.
|
||||
"""
|
||||
if request_id not in self.sse_queues:
|
||||
yield b"data: {\"type\": \"error\", \"message\": \"invalid request_id\"}\n\n"
|
||||
return
|
||||
|
||||
q = self.sse_queues[request_id]
|
||||
timeout = 300 # 5 minutes max
|
||||
deadline = time.time() + timeout
|
||||
idle_timeout = 600 # 10 minutes without any real event
|
||||
deadline = time.time() + idle_timeout
|
||||
done = False
|
||||
|
||||
try:
|
||||
while time.time() < deadline:
|
||||
@@ -339,13 +350,18 @@ class WebChannel(ChatChannel):
|
||||
yield b": keepalive\n\n"
|
||||
continue
|
||||
|
||||
# Real event received, reset idle deadline
|
||||
deadline = time.time() + idle_timeout
|
||||
|
||||
payload = json.dumps(item, ensure_ascii=False)
|
||||
yield f"data: {payload}\n\n".encode("utf-8")
|
||||
|
||||
if item.get("type") == "done":
|
||||
done = True
|
||||
break
|
||||
finally:
|
||||
self.sse_queues.pop(request_id, None)
|
||||
if done:
|
||||
self.sse_queues.pop(request_id, None)
|
||||
|
||||
def poll_response(self):
|
||||
"""
|
||||
@@ -447,8 +463,14 @@ class WebChannel(ChatChannel):
|
||||
func = web.httpserver.StaticMiddleware(app.wsgifunc())
|
||||
func = web.httpserver.LogMiddleware(func)
|
||||
server = web.httpserver.WSGIServer(("0.0.0.0", port), func)
|
||||
# Allow concurrent requests by not blocking on in-flight handler threads
|
||||
server.daemon_threads = True
|
||||
# Default request_queue_size(5) / timeout(10s) / numthreads(10) are
|
||||
# too small: when SSE streams occupy many threads, the backlog fills
|
||||
# and new connections get refused (ERR_CONNECTION_ABORTED).
|
||||
server.request_queue_size = 128
|
||||
server.timeout = 300
|
||||
server.requests.min = 20
|
||||
server.requests.max = 80
|
||||
self._http_server = server
|
||||
try:
|
||||
server.start()
|
||||
@@ -563,7 +585,7 @@ class ConfigHandler:
|
||||
_RECOMMENDED_MODELS = [
|
||||
const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
|
||||
const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
|
||||
const.QWEN3_MAX, const.QWEN35_PLUS,
|
||||
const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
|
||||
const.KIMI_K2_5, const.KIMI_K2,
|
||||
const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
|
||||
const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
|
||||
@@ -592,7 +614,7 @@ class ConfigHandler:
|
||||
"api_key_field": "dashscope_api_key",
|
||||
"api_base_key": None,
|
||||
"api_base_default": None,
|
||||
"models": [const.QWEN3_MAX, const.QWEN35_PLUS],
|
||||
"models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
|
||||
}),
|
||||
("moonshot", {
|
||||
"label": "Kimi",
|
||||
|
||||
@@ -37,11 +37,19 @@ def _random_wechat_uin() -> str:
|
||||
return base64.b64encode(str(val).encode("utf-8")).decode("utf-8")
|
||||
|
||||
|
||||
CHANNEL_VERSION = "2.0.0"
|
||||
# iLink-App-ClientVersion: uint32 encoded as major<<16 | minor<<8 | patch
|
||||
# 2.0.0 → 0x00020000 = 131072
|
||||
CLIENT_VERSION = "131072"
|
||||
|
||||
|
||||
def _build_headers(token: str = "") -> dict:
|
||||
headers = {
|
||||
"Content-Type": "application/json",
|
||||
"AuthorizationType": "ilink_bot_token",
|
||||
"X-WECHAT-UIN": _random_wechat_uin(),
|
||||
"iLink-App-Id": "bot",
|
||||
"iLink-App-ClientVersion": CLIENT_VERSION,
|
||||
}
|
||||
if token:
|
||||
headers["Authorization"] = f"Bearer {token}"
|
||||
@@ -64,6 +72,7 @@ class WeixinApi:
|
||||
def _post(self, endpoint: str, body: dict, timeout: int = DEFAULT_API_TIMEOUT) -> dict:
|
||||
url = _ensure_trailing_slash(self.base_url) + endpoint
|
||||
headers = _build_headers(self.token)
|
||||
body.setdefault("base_info", {}).setdefault("channel_version", CHANNEL_VERSION)
|
||||
try:
|
||||
resp = requests.post(url, json=body, headers=headers, timeout=timeout)
|
||||
resp.raise_for_status()
|
||||
@@ -210,7 +219,10 @@ class WeixinApi:
|
||||
def poll_qr_status(self, qrcode: str, timeout: int = QR_POLL_TIMEOUT) -> dict:
|
||||
url = (_ensure_trailing_slash(self.base_url) +
|
||||
f"ilink/bot/get_qrcode_status?qrcode={requests.utils.quote(qrcode)}")
|
||||
headers = {"iLink-App-ClientVersion": "1"}
|
||||
headers = {
|
||||
"iLink-App-Id": "bot",
|
||||
"iLink-App-ClientVersion": CLIENT_VERSION,
|
||||
}
|
||||
try:
|
||||
resp = requests.get(url, headers=headers, timeout=timeout)
|
||||
resp.raise_for_status()
|
||||
|
||||
@@ -166,10 +166,18 @@ class WeixinChannel(ChatChannel):
|
||||
print("=" * 60)
|
||||
try:
|
||||
import qrcode as qr_lib
|
||||
import io
|
||||
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
|
||||
qr.add_data(qrcode_url)
|
||||
qr.make(fit=True)
|
||||
qr.print_ascii(invert=True)
|
||||
buf = io.StringIO()
|
||||
qr.print_ascii(out=buf, invert=True)
|
||||
try:
|
||||
print(buf.getvalue())
|
||||
except UnicodeEncodeError:
|
||||
# Windows GBK terminals cannot render Unicode block characters
|
||||
print(f"\n (终端不支持显示二维码,请使用链接扫码)")
|
||||
print(f" 二维码链接: {qrcode_url}\n")
|
||||
except ImportError:
|
||||
print(f"\n 二维码链接: {qrcode_url}")
|
||||
print(" (安装 'qrcode' 包可在终端显示二维码)\n")
|
||||
|
||||
@@ -178,7 +178,10 @@ def update(ctx):
|
||||
"""Update CowAgent and restart."""
|
||||
root = get_project_root()
|
||||
|
||||
# 1. Git pull while service is still running
|
||||
# 1. Stop service first so git pull won't conflict with running code
|
||||
ctx.invoke(stop)
|
||||
|
||||
# 2. Git pull
|
||||
if os.path.isdir(os.path.join(root, ".git")):
|
||||
click.echo("Pulling latest code...")
|
||||
ret = subprocess.call(["git", "pull"], cwd=root)
|
||||
@@ -188,28 +191,61 @@ def update(ctx):
|
||||
else:
|
||||
click.echo("Not a git repository, skipping code update.")
|
||||
|
||||
# 2. Stop service
|
||||
ctx.invoke(stop)
|
||||
|
||||
# 3. Install dependencies
|
||||
python = sys.executable
|
||||
req_file = os.path.join(root, "requirements.txt")
|
||||
if os.path.exists(req_file):
|
||||
click.echo("Installing dependencies...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
|
||||
|
||||
if _IS_WIN:
|
||||
# On Windows, `cow.exe` (this process) locks the exe file, so
|
||||
# `pip install -e .` fails with WinError 5. Write a small .bat
|
||||
# helper that waits for cow.exe to exit, then installs & starts.
|
||||
bat = os.path.join(root, "_cow_update.bat")
|
||||
lines = [
|
||||
"@echo off",
|
||||
"chcp 65001 >nul",
|
||||
"echo Waiting for cow.exe to exit...",
|
||||
"timeout /t 3 /nobreak >nul",
|
||||
]
|
||||
if os.path.exists(req_file):
|
||||
lines.append(f'echo Installing dependencies...')
|
||||
lines.append(f'"{python}" -m pip install -r requirements.txt -q')
|
||||
lines += [
|
||||
"echo Reinstalling cow CLI...",
|
||||
f'"{python}" -m pip install -e . -q',
|
||||
"echo Starting CowAgent...",
|
||||
f'"{python}" -m cli.cli start --no-logs',
|
||||
"echo.",
|
||||
"echo Update complete. You can close this window.",
|
||||
"pause >nul",
|
||||
"del \"%~f0\"",
|
||||
]
|
||||
with open(bat, "w", encoding="utf-8") as f:
|
||||
f.write("\n".join(lines) + "\n")
|
||||
|
||||
subprocess.Popen(
|
||||
["cmd.exe", "/c", "start", "CowAgent Update", "/wait", bat],
|
||||
cwd=root,
|
||||
)
|
||||
click.echo(click.style(
|
||||
"✓ Update script launched. Please follow the new window for progress.",
|
||||
fg="green"))
|
||||
else:
|
||||
# 3. Install dependencies
|
||||
if os.path.exists(req_file):
|
||||
click.echo("Installing dependencies...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
|
||||
cwd=root,
|
||||
)
|
||||
click.echo("Reinstalling cow CLI...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-e", ".", "-q"],
|
||||
cwd=root,
|
||||
)
|
||||
click.echo("Reinstalling cow CLI...")
|
||||
subprocess.call(
|
||||
[python, "-m", "pip", "install", "-e", ".", "-q"],
|
||||
cwd=root,
|
||||
)
|
||||
|
||||
# 4. Start service and follow logs
|
||||
click.echo("")
|
||||
time.sleep(1)
|
||||
ctx.invoke(start, no_logs=False)
|
||||
# 4. Start service
|
||||
click.echo("")
|
||||
time.sleep(1)
|
||||
ctx.invoke(start, no_logs=False)
|
||||
|
||||
|
||||
@click.command()
|
||||
|
||||
@@ -47,8 +47,8 @@ CREDENTIAL_MAP = {
|
||||
|
||||
|
||||
class CloudClient(LinkAIClient):
|
||||
def __init__(self, api_key: str, channel, host: str = ""):
|
||||
super().__init__(api_key, host)
|
||||
def __init__(self, api_key: str, channel, host: str = "", port=None):
|
||||
super().__init__(api_key, host, port=port)
|
||||
self.channel = channel
|
||||
self.client_type = channel.channel_type
|
||||
self.channel_mgr = None
|
||||
@@ -733,7 +733,7 @@ def start(channel, channel_mgr=None):
|
||||
return
|
||||
|
||||
global chat_client
|
||||
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), channel=channel)
|
||||
chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), port=conf().get("cloud_port"), channel=channel)
|
||||
chat_client.channel_mgr = channel_mgr
|
||||
chat_client.config = _build_config()
|
||||
chat_client.start()
|
||||
|
||||
@@ -7,8 +7,8 @@ XUNFEI = "xunfei"
|
||||
CHATGPTONAZURE = "chatGPTOnAzure"
|
||||
LINKAI = "linkai"
|
||||
CLAUDEAPI= "claudeAPI"
|
||||
QWEN = "qwen" # 旧版千问接入
|
||||
QWEN_DASHSCOPE = "dashscope" # 新版千问接入(百炼)
|
||||
QWEN = "qwen" # 千问 (兼容旧配置,实际走 DashscopeBot)
|
||||
QWEN_DASHSCOPE = "dashscope" # 千问 DashScope 接入
|
||||
GEMINI = "gemini"
|
||||
ZHIPU_AI = "zhipu"
|
||||
MOONSHOT = "moonshot"
|
||||
@@ -81,14 +81,14 @@ TTS_1_HD = "tts-1-hd"
|
||||
DEEPSEEK_CHAT = "deepseek-chat" # DeepSeek-V3对话模型
|
||||
DEEPSEEK_REASONER = "deepseek-reasoner" # DeepSeek-R1模型
|
||||
|
||||
# Qwen (通义千问 - 阿里云)
|
||||
QWEN = "qwen"
|
||||
# Qwen (通义千问 - 阿里云 DashScope)
|
||||
QWEN_TURBO = "qwen-turbo"
|
||||
QWEN_PLUS = "qwen-plus"
|
||||
QWEN_MAX = "qwen-max"
|
||||
QWEN_LONG = "qwen-long"
|
||||
QWEN3_MAX = "qwen3-max" # Qwen3 Max - Agent推荐模型
|
||||
QWEN35_PLUS = "qwen3.5-plus" # Qwen3.5 Plus - Omni model (MultiModalConversation)
|
||||
QWEN36_PLUS = "qwen3.6-plus" # Qwen3.6 Plus - Omni model (MultiModalConversation)
|
||||
QWQ_PLUS = "qwq-plus"
|
||||
|
||||
# MiniMax
|
||||
@@ -172,7 +172,7 @@ MODEL_LIST = [
|
||||
DEEPSEEK_CHAT, DEEPSEEK_REASONER,
|
||||
|
||||
# Qwen
|
||||
QWEN, QWEN_TURBO, QWEN_PLUS, QWEN_MAX, QWEN_LONG, QWEN3_MAX, QWEN35_PLUS,
|
||||
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
|
||||
|
||||
# MiniMax
|
||||
MiniMax, MINIMAX_M2_7, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
|
||||
|
||||
@@ -189,6 +189,7 @@ available_setting = {
|
||||
"linkai_app_code": "",
|
||||
"linkai_api_base": "https://api.link-ai.tech", # linkAI服务地址
|
||||
"cloud_host": "client.link-ai.tech",
|
||||
"cloud_port": None,
|
||||
"cloud_deployment_id": "",
|
||||
"minimax_api_key": "",
|
||||
"Minimax_group_id": "",
|
||||
|
||||
@@ -171,10 +171,10 @@
|
||||
{
|
||||
"group": "命令系统",
|
||||
"pages": [
|
||||
"commands/index",
|
||||
"commands/process",
|
||||
"commands/skill",
|
||||
"commands/general"
|
||||
"cli/index",
|
||||
"cli/process",
|
||||
"cli/skill",
|
||||
"cli/general"
|
||||
]
|
||||
}
|
||||
]
|
||||
@@ -327,15 +327,15 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "Commands",
|
||||
"tab": "CLI",
|
||||
"groups": [
|
||||
{
|
||||
"group": "Command System",
|
||||
"pages": [
|
||||
"en/commands/index",
|
||||
"en/commands/process",
|
||||
"en/commands/skill",
|
||||
"en/commands/chat"
|
||||
"en/cli/index",
|
||||
"en/cli/process",
|
||||
"en/cli/skill",
|
||||
"en/cli/chat"
|
||||
]
|
||||
}
|
||||
]
|
||||
@@ -488,15 +488,15 @@
|
||||
]
|
||||
},
|
||||
{
|
||||
"tab": "コマンド",
|
||||
"tab": "CLI",
|
||||
"groups": [
|
||||
{
|
||||
"group": "コマンドシステム",
|
||||
"pages": [
|
||||
"ja/commands/index",
|
||||
"ja/commands/process",
|
||||
"ja/commands/skill",
|
||||
"ja/commands/general"
|
||||
"ja/cli/index",
|
||||
"ja/cli/process",
|
||||
"ja/cli/skill",
|
||||
"ja/cli/general"
|
||||
]
|
||||
}
|
||||
]
|
||||
|
||||
@@ -76,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
|
||||
After running, the Web service starts by default. Access `http://localhost:9899/chat` to chat.
|
||||
|
||||
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/commands/index) to manage the service.
|
||||
Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/cli/index) to manage the service.
|
||||
|
||||
### Manual Installation
|
||||
|
||||
@@ -100,7 +100,7 @@ pip3 install -r requirements-optional.txt # optional but recommended
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/commands/index).
|
||||
After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/cli/index).
|
||||
|
||||
**4. Install browser (optional)**
|
||||
|
||||
@@ -165,7 +165,7 @@ Supports mainstream model providers. Recommended models for Agent mode:
|
||||
| GLM | `glm-5-turbo` |
|
||||
| Kimi | `kimi-k2.5` |
|
||||
| Doubao | `doubao-seed-2-0-code-preview-260215` |
|
||||
| Qwen | `qwen3.5-plus` |
|
||||
| Qwen | `qwen3.6-plus` |
|
||||
| Claude | `claude-sonnet-4-6` |
|
||||
| Gemini | `gemini-3.1-pro-preview` |
|
||||
| OpenAI | `gpt-5.4` |
|
||||
|
||||
@@ -47,7 +47,7 @@ After installation, use the `cow` command to manage the service:
|
||||
| `cow update` | Update code and restart |
|
||||
| `cow install-browser` | Install browser tool dependencies |
|
||||
|
||||
See the [Commands documentation](/en/commands/index) for more details.
|
||||
See the [Commands documentation](/en/cli/index) for more details.
|
||||
|
||||
<Note>
|
||||
If the `cow` command is not available, you can use `./run.sh <command>` (Linux/macOS) or `.\scripts\run.ps1 <command>` (Windows) as a fallback. Both are functionally equivalent.
|
||||
|
||||
@@ -117,4 +117,4 @@ cow skill install pptx # Install a skill
|
||||
cow install-browser # Install browser tool
|
||||
```
|
||||
|
||||
See [Command Overview](https://docs.cowagent.ai/en/commands) for details.
|
||||
See [Command Overview](https://docs.cowagent.ai/en/cli) for details.
|
||||
|
||||
@@ -31,7 +31,7 @@ CowAgent can proactively think and plan tasks, operate computers and external re
|
||||
<Card title="Tool System" icon="wrench" href="/en/tools/index">
|
||||
Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more. The Agent autonomously invokes tools to accomplish complex tasks.
|
||||
</Card>
|
||||
<Card title="Command System" icon="terminal" href="/en/commands/index">
|
||||
<Card title="Command System" icon="terminal" href="/en/cli/index">
|
||||
Provides terminal CLI and in-chat commands for process management, skill installation, configuration, context inspection, and other common operations.
|
||||
</Card>
|
||||
<Card title="Multiple Model Support" icon="microchip" href="/en/models/index">
|
||||
|
||||
@@ -6,7 +6,7 @@ description: Supported models and recommended choices for CowAgent
|
||||
CowAgent supports mainstream LLMs from domestic and international providers. Model interfaces are implemented in the project's `models/` directory.
|
||||
|
||||
<Note>
|
||||
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.5-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
|
||||
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.6-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
|
||||
</Note>
|
||||
|
||||
## Configuration
|
||||
@@ -25,7 +25,7 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
|
||||
glm-5-turbo, glm-5 and other series models
|
||||
</Card>
|
||||
<Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
|
||||
qwen3.5-plus, qwen3-max and more
|
||||
qwen3.6-plus, qwen3-max and more
|
||||
</Card>
|
||||
<Card title="Kimi" href="/en/models/kimi">
|
||||
kimi-k2.5, kimi-k2 and more
|
||||
|
||||
@@ -5,14 +5,14 @@ description: Tongyi Qianwen model configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| Parameter | Description |
|
||||
| --- | --- |
|
||||
| `model` | Options include `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
|
||||
| `model` | Options include `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
|
||||
| `dashscope_api_key` | Create at [Bailian Console](https://bailian.console.aliyun.com/?tab=model#/api-key). See [official docs](https://bailian.console.aliyun.com/?tab=api#/api) |
|
||||
|
||||
OpenAI-compatible configuration is also supported:
|
||||
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@ New CLI command system for managing CowAgent from terminal and chat:
|
||||
- **Web console**: Type `/` in the input box to open a slash command menu, with arrow-key input history
|
||||
- **Windows support**: New PowerShell script `scripts/run.ps1` with `cow` command support
|
||||
|
||||
Docs: [Command Overview](https://docs.cowagent.ai/en/commands)
|
||||
Docs: [Command Overview](https://docs.cowagent.ai/en/cli)
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ CowAgent offers multiple ways to acquire skills:
|
||||
- **URL** — Install from zip archives or SKILL.md links
|
||||
- **Conversational creation** — Let the Agent create skills through natural language conversation
|
||||
|
||||
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/commands/skill) for details. You can also [create skills](/en/skills/create) through conversation.
|
||||
See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/cli/skill) for details. You can also [create skills](/en/skills/create) through conversation.
|
||||
|
||||
## Skill Loading Priority
|
||||
|
||||
|
||||
@@ -49,5 +49,5 @@ Supports zip archives and SKILL.md file links:
|
||||
```
|
||||
|
||||
<Tip>
|
||||
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/commands/skill) for full documentation.
|
||||
All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/cli/skill) for full documentation.
|
||||
</Tip>
|
||||
|
||||
72
docs/en/tools/vision.mdx
Normal file
72
docs/en/tools/vision.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
title: vision - Image Analysis
|
||||
description: Analyze image content (recognition, description, OCR, etc.)
|
||||
---
|
||||
|
||||
Analyze local images or image URLs using Vision API. Supports content description, text extraction (OCR), object recognition, and more.
|
||||
|
||||
## Model Selection
|
||||
|
||||
The vision tool uses a multi-level auto-selection strategy with automatic fallback — no manual configuration required:
|
||||
|
||||
1. **Main model** — uses the currently configured main model for image recognition (zero extra cost)
|
||||
2. **Other configured models** — auto-discovers other models with configured API keys as alternatives
|
||||
3. **OpenAI** — uses `open_ai_api_key` to call gpt-4.1-mini
|
||||
4. **LinkAI** — uses `linkai_api_key` to call LinkAI vision service
|
||||
|
||||
When `use_linkai=true`, LinkAI is promoted to the highest priority.
|
||||
|
||||
If the current provider fails, the tool automatically tries the next one until it succeeds or all fail.
|
||||
|
||||
### Supported Models
|
||||
|
||||
| Vendor | Vision Model | Notes |
|
||||
| --- | --- | --- |
|
||||
| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
|
||||
| Qwen (DashScope) | Main model | Via MultiModalConversation API |
|
||||
| Claude | Main model | Anthropic native image format |
|
||||
| Gemini | Main model | inlineData format |
|
||||
| Doubao | Main model | doubao-seed-2-0 series natively supported |
|
||||
| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
|
||||
| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
|
||||
| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
|
||||
|
||||
<Note>
|
||||
ZhipuAI and MiniMax text models do not support image understanding, so their dedicated vision models are always used automatically.
|
||||
</Note>
|
||||
|
||||
## Parameters
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | Yes | Local file path or HTTP(S) image URL |
|
||||
| `question` | string | Yes | Question to ask about the image |
|
||||
|
||||
Supported image formats: jpg, jpeg, png, gif, webp
|
||||
|
||||
## Custom Configuration
|
||||
|
||||
To specify a particular model for the vision tool, add to `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In most cases no configuration is needed. The tool works automatically as long as the main model supports multimodal input or any vision-capable API key is configured.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Describe image content
|
||||
- Extract text from images (OCR)
|
||||
- Identify objects, colors, scenes
|
||||
- Analyze screenshots and scanned documents
|
||||
|
||||
<Note>
|
||||
Images larger than 1MB are automatically compressed (max edge 1536px). All images (including remote URLs) are converted to base64 for transmission to ensure compatibility with all model backends.
|
||||
</Note>
|
||||
@@ -47,7 +47,7 @@ description: 使用脚本一键安装和管理 CowAgent
|
||||
| `cow update` | 更新代码并重启 |
|
||||
| `cow install-browser` | 安装浏览器工具依赖 |
|
||||
|
||||
更多命令和用法参考 [命令文档](/commands/index)。
|
||||
更多命令和用法参考 [命令文档](/cli/index)。
|
||||
|
||||
<Note>
|
||||
如果 `cow` 命令不可用,也可以使用 `./run.sh <命令>`(Linux/macOS)或 `.\scripts\run.ps1 <命令>`(Windows)作为替代,功能等效。
|
||||
|
||||
@@ -36,7 +36,7 @@ pip3 install -e .
|
||||
更新完成后重启服务:
|
||||
|
||||
```bash
|
||||
# 使用 Cow CLI
|
||||
# 使用 Cow CLI (推荐)
|
||||
cow restart
|
||||
|
||||
# 或使用 run.sh
|
||||
|
||||
@@ -118,6 +118,6 @@ cow skill install pptx # 安装技能
|
||||
cow install-browser # 安装浏览器工具
|
||||
```
|
||||
|
||||
详细命令参考 [命令总览](https://docs.cowagent.ai/commands)。
|
||||
详细命令参考 [命令总览](https://docs.cowagent.ai/cli)。
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
@@ -36,7 +36,7 @@ CowAgent 支持灵活切换多种模型,能处理文本、语音、图片、
|
||||
<Card title="工具系统" icon="wrench" href="/tools/index">
|
||||
内置文件读写、终端执行、浏览器操作、定时任务、消息发送等工具,Agent 可自主调用工具完成复杂任务。
|
||||
</Card>
|
||||
<Card title="命令系统" icon="terminal" href="/commands/index">
|
||||
<Card title="命令系统" icon="terminal" href="/cli/index">
|
||||
提供终端 CLI 和对话中的命令,支持进程管理、技能安装、配置修改、上下文查看等常用操作。
|
||||
</Card>
|
||||
<Card title="多模型支持" icon="microchip" href="/models/index">
|
||||
|
||||
@@ -76,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
|
||||
|
||||
実行後、デフォルトでWebサービスが起動します。`http://localhost:9899/chat` にアクセスしてチャットを開始できます。
|
||||
|
||||
スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start`、`cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/commands/index)でサービスを管理できます。
|
||||
スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start`、`cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/cli/index)でサービスを管理できます。
|
||||
|
||||
### 手動インストール
|
||||
|
||||
@@ -100,7 +100,7 @@ pip3 install -r requirements-optional.txt # 任意ですが推奨
|
||||
pip3 install -e .
|
||||
```
|
||||
|
||||
インストール後、`cow` コマンドでサービス管理(起動、停止、更新など)やSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/commands/index)を参照してください。
|
||||
インストール後、`cow` コマンドでサービス管理(起動、停止、更新など)やSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/cli/index)を参照してください。
|
||||
|
||||
**4. ブラウザのインストール(任意)**
|
||||
|
||||
@@ -165,7 +165,7 @@ sudo docker logs -f chatgpt-on-wechat
|
||||
| GLM | `glm-5-turbo` |
|
||||
| Kimi | `kimi-k2.5` |
|
||||
| Doubao | `doubao-seed-2-0-code-preview-260215` |
|
||||
| Qwen | `qwen3.5-plus` |
|
||||
| Qwen | `qwen3.6-plus` |
|
||||
| Claude | `claude-sonnet-4-6` |
|
||||
| Gemini | `gemini-3.1-pro-preview` |
|
||||
| OpenAI | `gpt-5.4` |
|
||||
|
||||
@@ -47,7 +47,7 @@ Linux、macOS、Windowsに対応しています。Python 3.7〜3.12が必要で
|
||||
| `cow update` | コードを更新して再起動 |
|
||||
| `cow install-browser` | ブラウザツールの依存をインストール |
|
||||
|
||||
詳細は[コマンドドキュメント](/ja/commands/index)を参照してください。
|
||||
詳細は[コマンドドキュメント](/ja/cli/index)を参照してください。
|
||||
|
||||
<Note>
|
||||
`cow` コマンドが利用できない場合は、`./run.sh <コマンド>`(Linux/macOS)または `.\scripts\run.ps1 <コマンド>`(Windows)で代替できます。機能は同等です。
|
||||
|
||||
@@ -117,4 +117,4 @@ cow skill install pptx # Skill をインストール
|
||||
cow install-browser # ブラウザツールをインストール
|
||||
```
|
||||
|
||||
詳細は [コマンド一覧](https://docs.cowagent.ai/ja/commands) を参照してください。
|
||||
詳細は [コマンド一覧](https://docs.cowagent.ai/ja/cli) を参照してください。
|
||||
|
||||
@@ -31,7 +31,7 @@ CowAgent は自ら思考しタスクを計画し、コンピュータや外部
|
||||
<Card title="ツールシステム" icon="wrench" href="/ja/tools/index">
|
||||
ファイル読み書き、ターミナル実行、ブラウザ操作、スケジュールタスク、メッセージ送信などの組み込みツールを提供。Agent が自律的にツールを呼び出して複雑なタスクを完了します。
|
||||
</Card>
|
||||
<Card title="コマンドシステム" icon="terminal" href="/ja/commands/index">
|
||||
<Card title="コマンドシステム" icon="terminal" href="/ja/cli/index">
|
||||
ターミナル CLI とチャット内コマンドを提供し、プロセス管理、Skill インストール、設定変更、コンテキスト確認などの一般的な操作をサポートします。
|
||||
</Card>
|
||||
<Card title="複数モデル対応" icon="microchip" href="/ja/models/index">
|
||||
|
||||
@@ -6,7 +6,7 @@ description: CowAgentがサポートするモデルとおすすめの選択肢
|
||||
CowAgentは国内外の主要なLLMをサポートしています。モデルインターフェースはプロジェクトの`models/`ディレクトリに実装されています。
|
||||
|
||||
<Note>
|
||||
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
</Note>
|
||||
|
||||
## 設定
|
||||
@@ -25,7 +25,7 @@ CowAgentは国内外の主要なLLMをサポートしています。モデルイ
|
||||
glm-5-turbo、glm-5およびその他のシリーズモデル
|
||||
</Card>
|
||||
<Card title="Qwen (通义千问)" href="/ja/models/qwen">
|
||||
qwen3.5-plus、qwen3-maxなど
|
||||
qwen3.6-plus、qwen3-maxなど
|
||||
</Card>
|
||||
<Card title="Kimi" href="/ja/models/kimi">
|
||||
kimi-k2.5、kimi-k2など
|
||||
|
||||
@@ -1,18 +1,18 @@
|
||||
---
|
||||
title: Qwen (通义千问)
|
||||
description: 通义千问モデルの設定
|
||||
title: Qwen (通義千問)
|
||||
description: 通義千問モデルの設定
|
||||
---
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| パラメータ | 説明 |
|
||||
| --- | --- |
|
||||
| `model` | `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
|
||||
| `model` | `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
|
||||
| `dashscope_api_key` | [百炼 Console](https://bailian.console.aliyun.com/?tab=model#/api-key)で作成。[公式ドキュメント](https://bailian.console.aliyun.com/?tab=api#/api)を参照 |
|
||||
|
||||
OpenAI互換の設定もサポートしています:
|
||||
@@ -20,7 +20,7 @@ OpenAI互換の設定もサポートしています:
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@ description: CowAgent 2.0.5 - Cow CLI、Skill Hub オープンソース、ブラ
|
||||
- **Web コンソール**:入力欄で `/` を入力するとスラッシュコマンドメニューが表示、矢印キーで入力履歴を辿れる
|
||||
- **Windows サポート**:PowerShell スクリプト `scripts/run.ps1` を追加、`cow` コマンドに対応
|
||||
|
||||
ドキュメント:[コマンド一覧](https://docs.cowagent.ai/ja/commands)
|
||||
ドキュメント:[コマンド一覧](https://docs.cowagent.ai/ja/cli)
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ CowAgent ではスキルを取得する複数の方法を提供しています
|
||||
- **URL** — zip アーカイブや SKILL.md リンクからインストール
|
||||
- **会話で作成** — 自然言語の会話を通じて Agent にスキルを自動作成させる
|
||||
|
||||
詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/commands/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
|
||||
詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/cli/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
|
||||
|
||||
## スキルの読み込み優先順位
|
||||
|
||||
|
||||
@@ -49,5 +49,5 @@ zip アーカイブと SKILL.md ファイルリンクに対応:
|
||||
```
|
||||
|
||||
<Tip>
|
||||
上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/commands/skill)を参照してください。
|
||||
上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/cli/skill)を参照してください。
|
||||
</Tip>
|
||||
|
||||
72
docs/ja/tools/vision.mdx
Normal file
72
docs/ja/tools/vision.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
title: vision - 画像分析
|
||||
description: 画像コンテンツの分析(認識、説明、OCR など)
|
||||
---
|
||||
|
||||
Vision API を使用してローカル画像や画像 URL を分析します。コンテンツの説明、テキスト抽出(OCR)、オブジェクト認識などに対応しています。
|
||||
|
||||
## モデル選択
|
||||
|
||||
Vision ツールは多段階の自動選択+自動フォールバック戦略を採用しており、手動設定なしで利用可能です:
|
||||
|
||||
1. **メインモデル** — 現在設定されているメインモデルで画像認識を実行(追加コストなし)
|
||||
2. **その他の設定済みモデル** — API キーが設定されている他のマルチモーダルモデルを自動検出
|
||||
3. **OpenAI** — `open_ai_api_key` を使用して gpt-4.1-mini を呼び出し
|
||||
4. **LinkAI** — `linkai_api_key` を使用して LinkAI ビジョンサービスを呼び出し
|
||||
|
||||
`use_linkai=true` の場合、LinkAI が最優先になります。
|
||||
|
||||
現在のプロバイダーが失敗した場合、成功するかすべて失敗するまで自動的に次のプロバイダーを試行します。
|
||||
|
||||
### 対応モデル
|
||||
|
||||
| ベンダー | ビジョンモデル | 説明 |
|
||||
| --- | --- | --- |
|
||||
| OpenAI / 互換プロトコル | メインモデル | すべての OpenAI 互換マルチモーダルモデルに対応 |
|
||||
| 通義千問 (DashScope) | メインモデル | MultiModalConversation API 経由 |
|
||||
| Claude | メインモデル | Anthropic ネイティブ画像形式 |
|
||||
| Gemini | メインモデル | inlineData 形式 |
|
||||
| 豆包 (Doubao) | メインモデル | doubao-seed-2-0 シリーズがネイティブ対応 |
|
||||
| Kimi (Moonshot) | メインモデル | kimi-k2.5 がネイティブ対応 |
|
||||
| 智谱 AI | glm-5v-turbo | 常にビジョン専用モデルを使用 |
|
||||
| MiniMax | MiniMax-Text-01 | 常にビジョン専用モデルを使用 |
|
||||
|
||||
<Note>
|
||||
智谱 AI と MiniMax のテキストモデルは画像理解に対応していないため、対応するビジョン専用モデルが自動的に使用されます。
|
||||
</Note>
|
||||
|
||||
## パラメータ
|
||||
|
||||
| パラメータ | 型 | 必須 | 説明 |
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | はい | ローカルファイルパスまたは HTTP(S) 画像 URL |
|
||||
| `question` | string | はい | 画像に対する質問 |
|
||||
|
||||
対応画像形式:jpg、jpeg、png、gif、webp
|
||||
|
||||
## カスタム設定
|
||||
|
||||
Vision ツールで使用するモデルを指定するには、`config.json` に以下を追加します:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
ほとんどの場合、設定は不要です。メインモデルがマルチモーダルに対応しているか、ビジョン対応の API キーが設定されていれば自動的に動作します。
|
||||
|
||||
## ユースケース
|
||||
|
||||
- 画像コンテンツの説明
|
||||
- 画像からのテキスト抽出(OCR)
|
||||
- オブジェクト、色、シーンの識別
|
||||
- スクリーンショットやスキャン文書の分析
|
||||
|
||||
<Note>
|
||||
1MB を超える画像は自動的に圧縮されます(最大辺 1536px)。すべての画像(リモート URL を含む)は base64 に変換して送信され、すべてのモデルバックエンドとの互換性を確保します。
|
||||
</Note>
|
||||
@@ -6,19 +6,20 @@ description: CowAgent 支持的模型及推荐选择
|
||||
CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在项目的 `models/` 目录下。
|
||||
|
||||
<Note>
|
||||
Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
|
||||
|
||||
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
|
||||
</Note>
|
||||
|
||||
## 配置方式
|
||||
|
||||
根据所选模型,在 `config.json` 中填写对应的模型名称和 API Key 即可。每个模型也支持 OpenAI 兼容方式接入,将 `bot_type` 设为 `openai`,配置 `open_ai_api_base` 和 `open_ai_api_key`。
|
||||
|
||||
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
|
||||
|
||||
也可以通过 [Web 控制台](/channels/web) 在线管理模型配置,无需手动编辑配置文件:
|
||||
**方式一(推荐):** 通过 [Web 控制台](/channels/web) 在线管理模型配置,无需手动编辑配置文件:
|
||||
|
||||
<img width="850" src="https://cdn.link-ai.tech/doc/20260227173811.png" />
|
||||
|
||||
**方式二:** 手动编辑 `config.json`,根据所选模型填写对应的模型名称和 API Key。每个模型也支持 OpenAI 兼容方式接入,将 `bot_type` 设为 `openai`,配置 `open_ai_api_base` 和 `open_ai_api_key` 即可。
|
||||
|
||||
|
||||
## 支持的模型
|
||||
|
||||
<CardGroup cols={2}>
|
||||
@@ -29,7 +30,7 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
|
||||
glm-5-turbo、glm-5 等系列模型
|
||||
</Card>
|
||||
<Card title="通义千问 Qwen" href="/models/qwen">
|
||||
qwen3.5-plus、qwen3-max 等
|
||||
qwen3.6-plus、qwen3-max 等
|
||||
</Card>
|
||||
<Card title="Kimi" href="/models/kimi">
|
||||
kimi-k2.5、kimi-k2 等
|
||||
@@ -54,6 +55,7 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
|
||||
<Tip>
|
||||
全部模型名称可参考项目 [`common/const.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py) 文件。
|
||||
</Tip>
|
||||
|
||||
@@ -5,14 +5,14 @@ description: 通义千问模型配置
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"dashscope_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
```
|
||||
|
||||
| 参数 | 说明 |
|
||||
| --- | --- |
|
||||
| `model` | 可填 `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
|
||||
| `model` | 可填 `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
|
||||
| `dashscope_api_key` | 在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建,参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) |
|
||||
|
||||
也支持 OpenAI 兼容方式接入:
|
||||
@@ -20,7 +20,7 @@ description: 通义千问模型配置
|
||||
```json
|
||||
{
|
||||
"bot_type": "openai",
|
||||
"model": "qwen3.5-plus",
|
||||
"model": "qwen3.6-plus",
|
||||
"open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
|
||||
"open_ai_api_key": "YOUR_API_KEY"
|
||||
}
|
||||
|
||||
@@ -12,7 +12,7 @@ description: CowAgent 2.0.5 - Cow CLI、Skill Hub 开源、浏览器工具、企
|
||||
- **web控制台**:Web 控制台输入框输入 `/` 即可弹出指令菜单,支持方向键回溯历史输入
|
||||
- **Windows 支持**:新增 PowerShell 一键安装脚本 `scripts/run.ps1`,同时支持 `cow` 命令
|
||||
|
||||
相关文档:[命令总览](https://docs.cowagent.ai/commands)
|
||||
相关文档:[命令总览](https://docs.cowagent.ai/cli)
|
||||
|
||||
<img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ CowAgent 提供多种方式获取技能:
|
||||
- **URL** — 从 zip 压缩包或 SKILL.md 链接安装
|
||||
- **对话创建** — 通过自然语言对话让 Agent 自动创建技能
|
||||
|
||||
详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/commands/skill)。也可以通过对话 [创建技能](/skills/create),或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能。
|
||||
详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/cli/skill)。也可以通过对话 [创建技能](/skills/create),或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能。
|
||||
|
||||
## 技能加载优先级
|
||||
|
||||
|
||||
@@ -62,5 +62,5 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **[Cow 技能广场]
|
||||
```
|
||||
|
||||
<Tip>
|
||||
以上所有命令在终端中使用时,将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/commands/skill)。
|
||||
以上所有命令在终端中使用时,将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/cli/skill)。
|
||||
</Tip>
|
||||
|
||||
@@ -5,14 +5,49 @@ description: 分析图片内容(识别、描述、OCR 等)
|
||||
|
||||
使用 Vision API 分析本地图片或图片 URL,支持内容描述、文字提取(OCR)、物体识别等。
|
||||
|
||||
## 依赖
|
||||
## 模型选择
|
||||
|
||||
需要配置至少一个 API Key(通过 `env_config` 工具或工作空间 `.env` 文件配置):
|
||||
Vision 工具采用多级自动选择 + 自动兜底策略,无需手动配置即可使用:
|
||||
|
||||
| 后端 | 环境变量 | 优先级 |
|
||||
1. **主模型** — 优先使用当前配置的主模型进行图像识别(需要是多模态模型)
|
||||
2. **其他已配置模型** — 自动发现已配置 API Key 的其他多模态模型作为备选
|
||||
|
||||
如果当前 provider 调用失败,会自动尝试下一个,直到成功或全部失败。
|
||||
|
||||
### 支持的模型
|
||||
|
||||
| 厂商 | 视觉模型 | 说明 |
|
||||
| --- | --- | --- |
|
||||
| OpenAI | `OPENAI_API_KEY` | 优先使用 |
|
||||
| LinkAI | `LINKAI_API_KEY` | 备选 |
|
||||
| OpenAI / 兼容协议 | 使用主模型 | 支持所有 OpenAI 协议兼容的多模态模型 |
|
||||
| 通义千问 (DashScope) | 使用主模型 | 例如 qwen3.6-plus 等 |
|
||||
| Claude | 使用主模型 | Anthropic 原生图像格式 |
|
||||
| Gemini | 使用主模型 | inlineData 格式 |
|
||||
| 豆包 (Doubao) | 使用主模型 | doubao-seed-2-0 系列原生支持 |
|
||||
| Kimi (Moonshot) | 使用主模型 | kimi-k2.5 原生支持 |
|
||||
| 智谱 AI | glm-5v-turbo | 固定使用视觉专用模型 |
|
||||
| MiniMax | MiniMax-Text-01 | 固定使用视觉专用模型 |
|
||||
|
||||
<Note>
|
||||
智谱和 MiniMax 的文本模型不支持图像理解,因此始终使用对应的视觉专用模型,无需手动指定。
|
||||
</Note>
|
||||
|
||||
> 当 `use_linkai=true` 时,默认使用 LinkAI 的多模态模型进行
|
||||
|
||||
## 自定义配置
|
||||
|
||||
如果希望指定 Vision 使用的模型,可在 `config.json` 中配置,例如:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
大多数情况下无需配置,主模型支持多模态或配置任意一个支持视觉的 API Key 即可自动工作。
|
||||
|
||||
## 参数
|
||||
|
||||
@@ -20,17 +55,18 @@ description: 分析图片内容(识别、描述、OCR 等)
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | 是 | 本地文件路径或 HTTP(S) 图片 URL |
|
||||
| `question` | string | 是 | 对图片提出的问题 |
|
||||
| `model` | string | 否 | 模型名称(默认 gpt-4.1-mini) |
|
||||
|
||||
支持的图片格式:jpg、jpeg、png、gif、webp
|
||||
|
||||
|
||||
|
||||
## 使用场景
|
||||
|
||||
- 描述图片中的内容
|
||||
- 提取图片中的文字(OCR)
|
||||
- 识别物体、颜色、场景
|
||||
- 分析截图、文档扫描件
|
||||
- 分析截图、文档扫描图片等
|
||||
|
||||
<Note>
|
||||
超过 1MB 的图片会自动压缩后上传。如果未配置任何 Vision API Key,该工具不会被加载。
|
||||
超过 1MB 的图片会自动压缩后上传,所有图片(包括远程 URL)会统一转为 base64 传输,确保兼容所有模型后端。
|
||||
</Note>
|
||||
|
||||
@@ -1,214 +0,0 @@
|
||||
# encoding:utf-8
|
||||
|
||||
import json
|
||||
import time
|
||||
from typing import List, Tuple
|
||||
|
||||
import openai
|
||||
from models.openai.openai_compat import RateLimitError, Timeout, APIError, APIConnectionError
|
||||
import broadscope_bailian
|
||||
from broadscope_bailian import ChatQaMessage
|
||||
|
||||
from models.bot import Bot
|
||||
from models.ali.ali_qwen_session import AliQwenSession
|
||||
from models.session_manager import SessionManager
|
||||
from bridge.context import ContextType
|
||||
from bridge.reply import Reply, ReplyType
|
||||
from common.log import logger
|
||||
from common import const
|
||||
from config import conf, load_config
|
||||
|
||||
class AliQwenBot(Bot):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.api_key_expired_time = self.set_api_key()
|
||||
self.sessions = SessionManager(AliQwenSession, model=conf().get("model", const.QWEN))
|
||||
|
||||
def api_key_client(self):
|
||||
return broadscope_bailian.AccessTokenClient(access_key_id=self.access_key_id(), access_key_secret=self.access_key_secret())
|
||||
|
||||
def access_key_id(self):
|
||||
return conf().get("qwen_access_key_id")
|
||||
|
||||
def access_key_secret(self):
|
||||
return conf().get("qwen_access_key_secret")
|
||||
|
||||
def agent_key(self):
|
||||
return conf().get("qwen_agent_key")
|
||||
|
||||
def app_id(self):
|
||||
return conf().get("qwen_app_id")
|
||||
|
||||
def node_id(self):
|
||||
return conf().get("qwen_node_id", "")
|
||||
|
||||
def temperature(self):
|
||||
return conf().get("temperature", 0.2 )
|
||||
|
||||
def top_p(self):
|
||||
return conf().get("top_p", 1)
|
||||
|
||||
def reply(self, query, context=None):
|
||||
# acquire reply content
|
||||
if context.type == ContextType.TEXT:
|
||||
logger.info("[QWEN] query={}".format(query))
|
||||
|
||||
session_id = context["session_id"]
|
||||
reply = None
|
||||
clear_memory_commands = conf().get("clear_memory_commands", ["#清除记忆"])
|
||||
if query in clear_memory_commands:
|
||||
self.sessions.clear_session(session_id)
|
||||
reply = Reply(ReplyType.INFO, "记忆已清除")
|
||||
elif query == "#清除所有":
|
||||
self.sessions.clear_all_session()
|
||||
reply = Reply(ReplyType.INFO, "所有人记忆已清除")
|
||||
elif query == "#更新配置":
|
||||
load_config()
|
||||
reply = Reply(ReplyType.INFO, "配置已更新")
|
||||
if reply:
|
||||
return reply
|
||||
session = self.sessions.session_query(query, session_id)
|
||||
logger.debug("[QWEN] session query={}".format(session.messages))
|
||||
|
||||
reply_content = self.reply_text(session)
|
||||
logger.debug(
|
||||
"[QWEN] new_query={}, session_id={}, reply_cont={}, completion_tokens={}".format(
|
||||
session.messages,
|
||||
session_id,
|
||||
reply_content["content"],
|
||||
reply_content["completion_tokens"],
|
||||
)
|
||||
)
|
||||
if reply_content["completion_tokens"] == 0 and len(reply_content["content"]) > 0:
|
||||
reply = Reply(ReplyType.ERROR, reply_content["content"])
|
||||
elif reply_content["completion_tokens"] > 0:
|
||||
self.sessions.session_reply(reply_content["content"], session_id, reply_content["total_tokens"])
|
||||
reply = Reply(ReplyType.TEXT, reply_content["content"])
|
||||
else:
|
||||
reply = Reply(ReplyType.ERROR, reply_content["content"])
|
||||
logger.debug("[QWEN] reply {} used 0 tokens.".format(reply_content))
|
||||
return reply
|
||||
|
||||
else:
|
||||
reply = Reply(ReplyType.ERROR, "Bot不支持处理{}类型的消息".format(context.type))
|
||||
return reply
|
||||
|
||||
def reply_text(self, session: AliQwenSession, retry_count=0) -> dict:
|
||||
"""
|
||||
call bailian's ChatCompletion to get the answer
|
||||
:param session: a conversation session
|
||||
:param retry_count: retry count
|
||||
:return: {}
|
||||
"""
|
||||
try:
|
||||
prompt, history = self.convert_messages_format(session.messages)
|
||||
self.update_api_key_if_expired()
|
||||
# NOTE 阿里百炼的call()函数未提供temperature参数,考虑到temperature和top_p参数作用相同,取两者较小的值作为top_p参数传入,详情见文档 https://help.aliyun.com/document_detail/2587502.htm
|
||||
response = broadscope_bailian.Completions().call(app_id=self.app_id(), prompt=prompt, history=history, top_p=min(self.temperature(), self.top_p()))
|
||||
completion_content = self.get_completion_content(response, self.node_id())
|
||||
completion_tokens, total_tokens = self.calc_tokens(session.messages, completion_content)
|
||||
return {
|
||||
"total_tokens": total_tokens,
|
||||
"completion_tokens": completion_tokens,
|
||||
"content": completion_content,
|
||||
}
|
||||
except Exception as e:
|
||||
need_retry = retry_count < 2
|
||||
result = {"completion_tokens": 0, "content": "我现在有点累了,等会再来吧"}
|
||||
if isinstance(e, RateLimitError):
|
||||
logger.warn("[QWEN] RateLimitError: {}".format(e))
|
||||
result["content"] = "提问太快啦,请休息一下再问我吧"
|
||||
if need_retry:
|
||||
time.sleep(20)
|
||||
elif isinstance(e, Timeout):
|
||||
logger.warn("[QWEN] Timeout: {}".format(e))
|
||||
result["content"] = "我没有收到你的消息"
|
||||
if need_retry:
|
||||
time.sleep(5)
|
||||
elif isinstance(e, APIError):
|
||||
logger.warn("[QWEN] Bad Gateway: {}".format(e))
|
||||
result["content"] = "请再问我一次"
|
||||
if need_retry:
|
||||
time.sleep(10)
|
||||
elif isinstance(e, APIConnectionError):
|
||||
logger.warn("[QWEN] APIConnectionError: {}".format(e))
|
||||
need_retry = False
|
||||
result["content"] = "我连接不到你的网络"
|
||||
else:
|
||||
logger.exception("[QWEN] Exception: {}".format(e))
|
||||
need_retry = False
|
||||
self.sessions.clear_session(session.session_id)
|
||||
|
||||
if need_retry:
|
||||
logger.warn("[QWEN] 第{}次重试".format(retry_count + 1))
|
||||
return self.reply_text(session, retry_count + 1)
|
||||
else:
|
||||
return result
|
||||
|
||||
def set_api_key(self):
|
||||
api_key, expired_time = self.api_key_client().create_token(agent_key=self.agent_key())
|
||||
broadscope_bailian.api_key = api_key
|
||||
return expired_time
|
||||
|
||||
def update_api_key_if_expired(self):
|
||||
if time.time() > self.api_key_expired_time:
|
||||
self.api_key_expired_time = self.set_api_key()
|
||||
|
||||
def convert_messages_format(self, messages) -> Tuple[str, List[ChatQaMessage]]:
|
||||
history = []
|
||||
user_content = ''
|
||||
assistant_content = ''
|
||||
system_content = ''
|
||||
for message in messages:
|
||||
role = message.get('role')
|
||||
if role == 'user':
|
||||
user_content += message.get('content')
|
||||
elif role == 'assistant':
|
||||
assistant_content = message.get('content')
|
||||
history.append(ChatQaMessage(user_content, assistant_content))
|
||||
user_content = ''
|
||||
assistant_content = ''
|
||||
elif role =='system':
|
||||
system_content += message.get('content')
|
||||
if user_content == '':
|
||||
raise Exception('no user message')
|
||||
if system_content != '':
|
||||
# NOTE 模拟系统消息,测试发现人格描述以"你需要扮演ChatGPT"开头能够起作用,而以"你是ChatGPT"开头模型会直接否认
|
||||
system_qa = ChatQaMessage(system_content, '好的,我会严格按照你的设定回答问题')
|
||||
history.insert(0, system_qa)
|
||||
logger.debug("[QWEN] converted qa messages: {}".format([item.to_dict() for item in history]))
|
||||
logger.debug("[QWEN] user content as prompt: {}".format(user_content))
|
||||
return user_content, history
|
||||
|
||||
def get_completion_content(self, response, node_id):
|
||||
if not response['Success']:
|
||||
return f"[ERROR]\n{response['Code']}:{response['Message']}"
|
||||
text = response['Data']['Text']
|
||||
if node_id == '':
|
||||
return text
|
||||
# TODO: 当使用流程编排创建大模型应用时,响应结构如下,最终结果在['finalResult'][node_id]['response']['text']中,暂时先这么写
|
||||
# {
|
||||
# 'Success': True,
|
||||
# 'Code': None,
|
||||
# 'Message': None,
|
||||
# 'Data': {
|
||||
# 'ResponseId': '9822f38dbacf4c9b8daf5ca03a2daf15',
|
||||
# 'SessionId': 'session_id',
|
||||
# 'Text': '{"finalResult":{"LLM_T7islK":{"params":{"modelId":"qwen-plus-v1","prompt":"${systemVars.query}${bizVars.Text}"},"response":{"text":"作为一个AI语言模型,我没有年龄,因为我没有生日。\n我只是一个程序,没有生命和身体。"}}}}',
|
||||
# 'Thoughts': [],
|
||||
# 'Debug': {},
|
||||
# 'DocReferences': []
|
||||
# },
|
||||
# 'RequestId': '8e11d31551ce4c3f83f49e6e0dd998b0',
|
||||
# 'Failed': None
|
||||
# }
|
||||
text_dict = json.loads(text)
|
||||
completion_content = text_dict['finalResult'][node_id]['response']['text']
|
||||
return completion_content
|
||||
|
||||
def calc_tokens(self, messages, completion_content):
|
||||
completion_tokens = len(completion_content)
|
||||
prompt_tokens = 0
|
||||
for message in messages:
|
||||
prompt_tokens += len(message["content"])
|
||||
return completion_tokens, prompt_tokens + completion_tokens
|
||||
@@ -1,62 +0,0 @@
|
||||
from models.session_manager import Session
|
||||
from common.log import logger
|
||||
|
||||
"""
|
||||
e.g.
|
||||
[
|
||||
{"role": "system", "content": "You are a helpful assistant."},
|
||||
{"role": "user", "content": "Who won the world series in 2020?"},
|
||||
{"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
|
||||
{"role": "user", "content": "Where was it played?"}
|
||||
]
|
||||
"""
|
||||
|
||||
class AliQwenSession(Session):
|
||||
def __init__(self, session_id, system_prompt=None, model="qianwen"):
|
||||
super().__init__(session_id, system_prompt)
|
||||
self.model = model
|
||||
self.reset()
|
||||
|
||||
def discard_exceeding(self, max_tokens, cur_tokens=None):
|
||||
precise = True
|
||||
try:
|
||||
cur_tokens = self.calc_tokens()
|
||||
except Exception as e:
|
||||
precise = False
|
||||
if cur_tokens is None:
|
||||
raise e
|
||||
logger.debug("Exception when counting tokens precisely for query: {}".format(e))
|
||||
while cur_tokens > max_tokens:
|
||||
if len(self.messages) > 2:
|
||||
self.messages.pop(1)
|
||||
elif len(self.messages) == 2 and self.messages[1]["role"] == "assistant":
|
||||
self.messages.pop(1)
|
||||
if precise:
|
||||
cur_tokens = self.calc_tokens()
|
||||
else:
|
||||
cur_tokens = cur_tokens - max_tokens
|
||||
break
|
||||
elif len(self.messages) == 2 and self.messages[1]["role"] == "user":
|
||||
logger.warn("user message exceed max_tokens. total_tokens={}".format(cur_tokens))
|
||||
break
|
||||
else:
|
||||
logger.debug("max_tokens={}, total_tokens={}, len(messages)={}".format(max_tokens, cur_tokens, len(self.messages)))
|
||||
break
|
||||
if precise:
|
||||
cur_tokens = self.calc_tokens()
|
||||
else:
|
||||
cur_tokens = cur_tokens - max_tokens
|
||||
return cur_tokens
|
||||
|
||||
def calc_tokens(self):
|
||||
return num_tokens_from_messages(self.messages, self.model)
|
||||
|
||||
def num_tokens_from_messages(messages, model):
|
||||
"""Returns the number of tokens used by a list of messages."""
|
||||
# 官方token计算规则:"对于中文文本来说,1个token通常对应一个汉字;对于英文文本来说,1个token通常对应3至4个字母或1个单词"
|
||||
# 详情请产看文档:https://help.aliyun.com/document_detail/2586397.html
|
||||
# 目前根据字符串长度粗略估计token数,不影响正常使用
|
||||
tokens = 0
|
||||
for msg in messages:
|
||||
tokens += len(msg["content"])
|
||||
return tokens
|
||||
@@ -2,12 +2,27 @@
|
||||
Auto-replay chat robot abstract class
|
||||
"""
|
||||
|
||||
|
||||
from bridge.context import Context
|
||||
from bridge.reply import Reply
|
||||
|
||||
|
||||
class Bot(object):
|
||||
"""
|
||||
Base class for all chat-bot implementations.
|
||||
|
||||
Subclasses may also implement:
|
||||
|
||||
call_with_tools(messages, tools=None, stream=False, **kwargs)
|
||||
-> dict | generator (OpenAI-compatible format)
|
||||
|
||||
call_vision(image_url, question, model=None, max_tokens=1000)
|
||||
-> dict with keys: model, content, usage (or error/message)
|
||||
|
||||
These are NOT defined here to avoid shadowing concrete implementations
|
||||
provided by mixin classes (e.g. OpenAICompatibleBot) in the MRO.
|
||||
Use ``hasattr(bot, 'call_vision')`` to detect support at runtime.
|
||||
"""
|
||||
|
||||
def reply(self, query, context: Context = None) -> Reply:
|
||||
"""
|
||||
bot auto-reply content
|
||||
|
||||
@@ -46,10 +46,7 @@ def create_bot(bot_type):
|
||||
elif bot_type == const.CLAUDEAPI:
|
||||
from models.claudeapi.claude_api_bot import ClaudeAPIBot
|
||||
return ClaudeAPIBot()
|
||||
elif bot_type == const.QWEN:
|
||||
from models.ali.ali_qwen_bot import AliQwenBot
|
||||
return AliQwenBot()
|
||||
elif bot_type == const.QWEN_DASHSCOPE:
|
||||
elif bot_type in (const.QWEN, const.QWEN_DASHSCOPE):
|
||||
from models.dashscope.dashscope_bot import DashscopeBot
|
||||
return DashscopeBot()
|
||||
elif bot_type == const.GEMINI:
|
||||
|
||||
@@ -1,7 +1,10 @@
|
||||
# encoding:utf-8
|
||||
|
||||
import base64
|
||||
import json
|
||||
import re
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
|
||||
@@ -224,6 +227,79 @@ class ClaudeAPIBot(Bot, OpenAIImage):
|
||||
return 64000
|
||||
return 8192
|
||||
|
||||
@staticmethod
|
||||
def _parse_data_url(data_url: str):
|
||||
"""Parse a data:<mime>;base64,<data> URL into (media_type, base64_data)."""
|
||||
m = re.match(r"^data:([^;]+);base64,(.+)$", data_url, re.DOTALL)
|
||||
if m:
|
||||
return m.group(1), m.group(2)
|
||||
return None, None
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Claude Messages API (native image blocks)."""
|
||||
try:
|
||||
actual_model = model or self._model_mapping(conf().get("model"))
|
||||
|
||||
# Build Claude-native image content block
|
||||
if image_url.startswith("data:"):
|
||||
media_type, b64_data = self._parse_data_url(image_url)
|
||||
if not b64_data:
|
||||
return {"error": True, "message": "Invalid base64 data URL"}
|
||||
image_block = {
|
||||
"type": "image",
|
||||
"source": {"type": "base64",
|
||||
"media_type": media_type or "image/jpeg",
|
||||
"data": b64_data},
|
||||
}
|
||||
else:
|
||||
image_block = {
|
||||
"type": "image",
|
||||
"source": {"type": "url", "url": image_url},
|
||||
}
|
||||
|
||||
data = {
|
||||
"model": actual_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
image_block,
|
||||
{"type": "text", "text": question},
|
||||
],
|
||||
}],
|
||||
}
|
||||
|
||||
headers = {
|
||||
"x-api-key": self.api_key,
|
||||
"anthropic-version": "2023-06-01",
|
||||
"content-type": "application/json",
|
||||
}
|
||||
proxies = {"http": self.proxy, "https": self.proxy} if self.proxy else None
|
||||
resp = requests.post(f"{self.api_base}/messages",
|
||||
headers=headers, json=data, proxies=proxies)
|
||||
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
|
||||
body = resp.json()
|
||||
text_parts = [b.get("text", "") for b in body.get("content", [])
|
||||
if b.get("type") == "text"]
|
||||
usage = body.get("usage", {})
|
||||
return {
|
||||
"model": actual_model,
|
||||
"content": "".join(text_parts),
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("input_tokens", 0),
|
||||
"completion_tokens": usage.get("output_tokens", 0),
|
||||
"total_tokens": usage.get("input_tokens", 0) + usage.get("output_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[CLAUDE] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call Claude API with tool support for agent integration
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
# encoding:utf-8
|
||||
|
||||
import json
|
||||
from typing import Optional
|
||||
|
||||
from models.bot import Bot
|
||||
from models.session_manager import SessionManager
|
||||
from bridge.context import ContextType
|
||||
@@ -26,15 +28,15 @@ dashscope_models = {
|
||||
|
||||
# Model name prefixes that require MultiModalConversation API instead of Generation API.
|
||||
# Qwen3.5+ series are omni models that only support MultiModalConversation.
|
||||
MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-",)
|
||||
MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-", "qwen3.6-")
|
||||
|
||||
|
||||
# Qwen对话模型API
|
||||
class DashscopeBot(Bot):
|
||||
def __init__(self):
|
||||
super().__init__()
|
||||
self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen-plus")
|
||||
self.model_name = conf().get("model") or "qwen-plus"
|
||||
self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen3.6-plus")
|
||||
self.model_name = conf().get("model") or "qwen3.6-plus"
|
||||
self.client = dashscope.Generation
|
||||
api_key = conf().get("dashscope_api_key")
|
||||
if api_key:
|
||||
@@ -153,6 +155,56 @@ class DashscopeBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using DashScope MultiModalConversation API."""
|
||||
try:
|
||||
dashscope.api_key = self.api_key
|
||||
vision_model = model or "qwen-vl-max"
|
||||
|
||||
# DashScope multimodal format: {"image": url} + {"text": question}
|
||||
messages = [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"image": image_url},
|
||||
{"text": question},
|
||||
],
|
||||
}]
|
||||
|
||||
response = MultiModalConversation.call(
|
||||
model=vision_model,
|
||||
messages=messages,
|
||||
max_tokens=max_tokens,
|
||||
)
|
||||
|
||||
if response.status_code != HTTPStatus.OK:
|
||||
return {
|
||||
"error": True,
|
||||
"message": f"{response.code} - {response.message}",
|
||||
}
|
||||
|
||||
resp_dict = self._response_to_dict(response)
|
||||
choice = resp_dict["output"]["choices"][0]
|
||||
content = choice.get("message", {}).get("content", "")
|
||||
if isinstance(content, list):
|
||||
content = "".join(
|
||||
item.get("text", "") for item in content if isinstance(item, dict)
|
||||
)
|
||||
usage = resp_dict.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("input_tokens", 0),
|
||||
"completion_tokens": usage.get("output_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[DASHSCOPE] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call DashScope API with tool support for agent integration
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
|
||||
import json
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
from models.bot import Bot
|
||||
@@ -147,6 +148,49 @@ class DoubaoBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Doubao (Volcengine Ark) OpenAI-compatible API."""
|
||||
try:
|
||||
vision_model = model or self.args.get("model", "doubao-seed-2-0-pro-260215")
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(f"{self.base_url}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60)
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
data = resp.json()
|
||||
if "error" in data:
|
||||
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[DOUBAO] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
# ==================== Agent mode support ====================
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
|
||||
@@ -434,31 +478,37 @@ class DoubaoBot(Bot):
|
||||
continue
|
||||
|
||||
if role == "user":
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
has_tool_result = any(
|
||||
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
|
||||
)
|
||||
if has_tool_result:
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
|
||||
# Tool results first (must come right after assistant with tool_calls)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
else:
|
||||
# Keep as-is for multimodal content (e.g. image_url blocks)
|
||||
converted.append(msg)
|
||||
|
||||
elif role == "assistant":
|
||||
openai_msg = {"role": "assistant"}
|
||||
|
||||
@@ -12,6 +12,8 @@ import mimetypes
|
||||
import os
|
||||
import re
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
from models.bot import Bot
|
||||
from models.session_manager import SessionManager
|
||||
@@ -144,7 +146,12 @@ class GoogleGeminiBot(Bot):
|
||||
return "", []
|
||||
pattern = r"\[图片:\s*([^\]]+)\]"
|
||||
image_paths = [m.strip().strip("'\"") for m in re.findall(pattern, content) if m.strip()]
|
||||
cleaned_text = re.sub(pattern, "", content)
|
||||
# Replace markers with path-only hints so the model still knows the
|
||||
# original file location (needed when it calls tools like vision).
|
||||
def _replace_with_hint(m):
|
||||
path = m.group(1).strip().strip("'\"")
|
||||
return f"[attached image: {path}]"
|
||||
cleaned_text = re.sub(pattern, _replace_with_hint, content)
|
||||
cleaned_text = re.sub(r"\n{3,}", "\n\n", cleaned_text).strip()
|
||||
return cleaned_text, image_paths
|
||||
|
||||
@@ -225,6 +232,57 @@ class GoogleGeminiBot(Bot):
|
||||
logger.warning(f"[Gemini] Unsupported image URL format: {image_url[:120]}")
|
||||
return None
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Gemini REST API."""
|
||||
try:
|
||||
model_name = model or self.model or "gemini-2.0-flash"
|
||||
image_part = self._build_inline_part_from_image_url({"url": image_url})
|
||||
if not image_part:
|
||||
return {"error": True, "message": f"Cannot process image URL: {image_url[:120]}"}
|
||||
|
||||
payload = {
|
||||
"contents": [{
|
||||
"role": "user",
|
||||
"parts": [image_part, {"text": question}],
|
||||
}],
|
||||
"generationConfig": {"maxOutputTokens": max_tokens},
|
||||
"safetySettings": [
|
||||
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
|
||||
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
|
||||
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
|
||||
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
|
||||
],
|
||||
}
|
||||
endpoint = f"{self.api_base}/v1beta/models/{model_name}:generateContent"
|
||||
headers = {"x-goog-api-key": self.api_key, "Content-Type": "application/json"}
|
||||
resp = requests.post(endpoint, headers=headers, json=payload, timeout=60)
|
||||
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
|
||||
body = resp.json()
|
||||
candidates = body.get("candidates", [])
|
||||
text_parts = []
|
||||
for part in candidates[0].get("content", {}).get("parts", []) if candidates else []:
|
||||
if "text" in part:
|
||||
text_parts.append(part["text"])
|
||||
|
||||
usage_meta = body.get("usageMetadata", {})
|
||||
return {
|
||||
"model": model_name,
|
||||
"content": "".join(text_parts),
|
||||
"usage": {
|
||||
"prompt_tokens": usage_meta.get("promptTokenCount", 0),
|
||||
"completion_tokens": usage_meta.get("candidatesTokenCount", 0),
|
||||
"total_tokens": usage_meta.get("totalTokenCount", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[Gemini] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call Gemini API with tool support using REST API (following official docs)
|
||||
|
||||
@@ -2,6 +2,8 @@
|
||||
|
||||
import time
|
||||
import json
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
|
||||
from models.bot import Bot
|
||||
@@ -175,6 +177,51 @@ class MinimaxBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using MiniMax OpenAI-compatible API.
|
||||
Always uses MiniMax-Text-01 — other MiniMax models do not support vision.
|
||||
"""
|
||||
try:
|
||||
vision_model = "MiniMax-Text-01"
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(f"{self.api_base}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60)
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
data = resp.json()
|
||||
if "error" in data:
|
||||
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[MINIMAX] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call MiniMax API with tool support for agent integration
|
||||
@@ -273,37 +320,41 @@ class MinimaxBot(Bot):
|
||||
if role == "user":
|
||||
# Handle user message
|
||||
if isinstance(content, list):
|
||||
# Extract text from content blocks
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
has_tool_result = any(
|
||||
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
|
||||
)
|
||||
if has_tool_result:
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
|
||||
for block in content:
|
||||
if isinstance(block, dict):
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
# Tool result should be a separate message with role="tool"
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
if not tool_call_id:
|
||||
logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
for block in content:
|
||||
if isinstance(block, dict):
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
if not tool_call_id:
|
||||
logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
|
||||
if text_parts:
|
||||
converted.append({
|
||||
"role": "user",
|
||||
"content": "\n".join(text_parts)
|
||||
})
|
||||
if text_parts:
|
||||
converted.append({
|
||||
"role": "user",
|
||||
"content": "\n".join(text_parts)
|
||||
})
|
||||
|
||||
# Add all tool results (not just the last one)
|
||||
for tool_result in tool_results:
|
||||
converted.append(tool_result)
|
||||
for tool_result in tool_results:
|
||||
converted.append(tool_result)
|
||||
else:
|
||||
# Keep as-is for multimodal content (e.g. image_url blocks)
|
||||
converted.append(msg)
|
||||
else:
|
||||
# Simple text content
|
||||
converted.append({
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
|
||||
import json
|
||||
import time
|
||||
from typing import Optional
|
||||
|
||||
import requests
|
||||
from models.bot import Bot
|
||||
@@ -147,6 +148,49 @@ class MoonshotBot(Bot):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using Moonshot (Kimi) OpenAI-compatible API."""
|
||||
try:
|
||||
vision_model = model or self.args.get("model", "kimi-k2.5")
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"max_tokens": max_tokens,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {self.api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(f"{self.base_url}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60)
|
||||
if resp.status_code != 200:
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
|
||||
data = resp.json()
|
||||
if "error" in data:
|
||||
return {"error": True, "message": data["error"].get("message", str(data["error"]))}
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[MOONSHOT] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
# ==================== Agent mode support ====================
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
|
||||
@@ -435,31 +479,37 @@ class MoonshotBot(Bot):
|
||||
continue
|
||||
|
||||
if role == "user":
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
has_tool_result = any(
|
||||
isinstance(b, dict) and b.get("type") == "tool_result" for b in content
|
||||
)
|
||||
if has_tool_result:
|
||||
text_parts = []
|
||||
tool_results = []
|
||||
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
for block in content:
|
||||
if not isinstance(block, dict):
|
||||
continue
|
||||
if block.get("type") == "text":
|
||||
text_parts.append(block.get("text", ""))
|
||||
elif block.get("type") == "tool_result":
|
||||
tool_call_id = block.get("tool_use_id") or ""
|
||||
result_content = block.get("content", "")
|
||||
if not isinstance(result_content, str):
|
||||
result_content = json.dumps(result_content, ensure_ascii=False)
|
||||
tool_results.append({
|
||||
"role": "tool",
|
||||
"tool_call_id": tool_call_id,
|
||||
"content": result_content
|
||||
})
|
||||
|
||||
# Tool results first (must come right after assistant with tool_calls)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
for tr in tool_results:
|
||||
converted.append(tr)
|
||||
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
if text_parts:
|
||||
converted.append({"role": "user", "content": "\n".join(text_parts)})
|
||||
else:
|
||||
# Keep as-is for multimodal content (e.g. image_url blocks)
|
||||
converted.append(msg)
|
||||
|
||||
elif role == "assistant":
|
||||
openai_msg = {"role": "assistant"}
|
||||
|
||||
@@ -9,6 +9,8 @@ This includes: OpenAI, LinkAI, Azure OpenAI, and many third-party providers.
|
||||
|
||||
import json
|
||||
import openai
|
||||
import requests
|
||||
from typing import Optional
|
||||
from common.log import logger
|
||||
from agent.protocol.message_utils import drop_orphaned_tool_results_openai
|
||||
|
||||
@@ -306,3 +308,51 @@ class OpenAICompatibleBot:
|
||||
openai_messages.append(msg)
|
||||
|
||||
return drop_orphaned_tool_results_openai(openai_messages)
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using the OpenAI-compatible /chat/completions endpoint."""
|
||||
try:
|
||||
api_config = self.get_api_config()
|
||||
vision_model = model or api_config.get("model", "gpt-4o")
|
||||
api_key = api_config.get("api_key", "")
|
||||
api_base = (api_config.get("api_base") or "https://api.openai.com/v1").rstrip("/")
|
||||
|
||||
payload = {
|
||||
"model": vision_model,
|
||||
"messages": [{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
}
|
||||
headers = {
|
||||
"Authorization": f"Bearer {api_key}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
resp = requests.post(
|
||||
f"{api_base}/chat/completions",
|
||||
headers=headers, json=payload, timeout=60,
|
||||
)
|
||||
if resp.status_code != 200:
|
||||
body = resp.text[:500]
|
||||
logger.error(f"[{self.__class__.__name__}] call_vision HTTP {resp.status_code}: {body}")
|
||||
return {"error": True, "message": f"HTTP {resp.status_code}: {body}"}
|
||||
data = resp.json()
|
||||
content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
|
||||
usage = data.get("usage", {})
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": usage.get("prompt_tokens", 0),
|
||||
"completion_tokens": usage.get("completion_tokens", 0),
|
||||
"total_tokens": usage.get("total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[{self.__class__.__name__}] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
@@ -2,6 +2,7 @@
|
||||
|
||||
import time
|
||||
import json
|
||||
from typing import Optional
|
||||
|
||||
from models.bot import Bot
|
||||
from models.zhipuai.zhipu_ai_session import ZhipuAISession
|
||||
@@ -149,6 +150,40 @@ class ZHIPUAIBot(Bot, ZhipuAIImage):
|
||||
else:
|
||||
return result
|
||||
|
||||
def call_vision(self, image_url: str, question: str,
|
||||
model: Optional[str] = None,
|
||||
max_tokens: int = 1000) -> dict:
|
||||
"""Analyze an image using ZhipuAI OpenAI-compatible SDK.
|
||||
Always uses glm-5v-turbo — the text models (glm-5-turbo etc.) do not support vision.
|
||||
"""
|
||||
try:
|
||||
vision_model = "glm-5v-turbo"
|
||||
response = self.client.chat.completions.create(
|
||||
model=vision_model,
|
||||
max_tokens=max_tokens,
|
||||
messages=[{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": question},
|
||||
{"type": "image_url", "image_url": {"url": image_url}},
|
||||
],
|
||||
}],
|
||||
)
|
||||
content = response.choices[0].message.content or ""
|
||||
usage = response.usage
|
||||
return {
|
||||
"model": vision_model,
|
||||
"content": content,
|
||||
"usage": {
|
||||
"prompt_tokens": getattr(usage, "prompt_tokens", 0),
|
||||
"completion_tokens": getattr(usage, "completion_tokens", 0),
|
||||
"total_tokens": getattr(usage, "total_tokens", 0),
|
||||
},
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"[ZHIPU_AI] call_vision error: {e}")
|
||||
return {"error": True, "message": str(e)}
|
||||
|
||||
def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
|
||||
"""
|
||||
Call ZhipuAI API with tool support for agent integration
|
||||
|
||||
@@ -157,7 +157,6 @@ class CowCliPlugin(Plugin):
|
||||
" /config 查看当前配置",
|
||||
" /config <key> 查看某项配置",
|
||||
" /config <key> <val> 修改配置",
|
||||
" /install-browser 安装浏览器工具依赖",
|
||||
"",
|
||||
"💡 也可以用 cow <command> 代替 /<command>",
|
||||
]
|
||||
@@ -407,7 +406,7 @@ class CowCliPlugin(Plugin):
|
||||
from common import const
|
||||
_EXACT = {
|
||||
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
|
||||
"xunfei": const.XUNFEI, const.QWEN: const.QWEN,
|
||||
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
|
||||
const.MODELSCOPE: const.MODELSCOPE,
|
||||
const.MOONSHOT: const.MOONSHOT,
|
||||
"moonshot-v1-8k": const.MOONSHOT, "moonshot-v1-32k": const.MOONSHOT,
|
||||
|
||||
@@ -315,7 +315,7 @@ class Godcmd(Plugin):
|
||||
except Exception as e:
|
||||
ok, result = False, "你没有设置私有GPT模型"
|
||||
elif cmd == "reset":
|
||||
if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI, const.BAIDU, const.XUNFEI, const.QWEN, const.GEMINI, const.ZHIPU_AI, const.CLAUDEAPI]:
|
||||
if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI, const.BAIDU, const.XUNFEI, const.QWEN, const.QWEN_DASHSCOPE, const.GEMINI, const.ZHIPU_AI, const.CLAUDEAPI]:
|
||||
bot.sessions.clear_session(session_id)
|
||||
if Bridge().chat_bots.get(bottype):
|
||||
Bridge().chat_bots.get(bottype).sessions.clear_session(session_id)
|
||||
@@ -341,7 +341,7 @@ class Godcmd(Plugin):
|
||||
ok, result = True, "配置已重载"
|
||||
elif cmd == "resetall":
|
||||
if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI,
|
||||
const.BAIDU, const.XUNFEI, const.QWEN, const.GEMINI, const.ZHIPU_AI, const.MOONSHOT,
|
||||
const.BAIDU, const.XUNFEI, const.QWEN, const.QWEN_DASHSCOPE, const.GEMINI, const.ZHIPU_AI, const.MOONSHOT,
|
||||
const.MODELSCOPE]:
|
||||
channel.cancel_all_session()
|
||||
bot.sessions.clear_all_session()
|
||||
|
||||
@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
|
||||
name = "cowagent"
|
||||
version = "1.0.0"
|
||||
description = "CowAgent - AI Agent on WeChat and more"
|
||||
requires-python = ">=3.9"
|
||||
requires-python = ">=3.7"
|
||||
dependencies = [
|
||||
"click>=8.0",
|
||||
"requests>=2.28.2",
|
||||
|
||||
@@ -4,8 +4,6 @@ requests>=2.28.2
|
||||
chardet>=5.1.0
|
||||
Pillow
|
||||
web.py
|
||||
linkai>=0.0.6.0
|
||||
agentmesh-sdk>=0.1.3
|
||||
python-dotenv>=1.0.0
|
||||
PyYAML>=6.0
|
||||
croniter>=2.0.0
|
||||
|
||||
4
run.sh
4
run.sh
@@ -271,7 +271,7 @@ select_model() {
|
||||
echo -e "${YELLOW}2) Zhipu AI (glm-5-turbo, glm-5, etc.)${NC}"
|
||||
echo -e "${YELLOW}3) Kimi (kimi-k2.5, kimi-k2, etc.)${NC}"
|
||||
echo -e "${YELLOW}4) Doubao (doubao-seed-2-0-code-preview-260215, etc.)${NC}"
|
||||
echo -e "${YELLOW}5) Qwen (qwen3.5-plus, qwen3-max, qwq-plus, etc.)${NC}"
|
||||
echo -e "${YELLOW}5) Qwen (qwen3.6-plus, qwen3.5-plus, qwen3-max, qwq-plus, etc.)${NC}"
|
||||
echo -e "${YELLOW}6) Claude (claude-sonnet-4-6, claude-opus-4-6, etc.)${NC}"
|
||||
echo -e "${YELLOW}7) Gemini (gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, etc.)${NC}"
|
||||
echo -e "${YELLOW}8) OpenAI GPT (gpt-5.4, gpt-5.2, gpt-4.1, etc.)${NC}"
|
||||
@@ -318,7 +318,7 @@ configure_model() {
|
||||
2) read_model_config "Zhipu AI" "glm-5-turbo" "ZHIPU_KEY" ;;
|
||||
3) read_model_config "Kimi (Moonshot)" "kimi-k2.5" "MOONSHOT_KEY" ;;
|
||||
4) read_model_config "Doubao (Volcengine Ark)" "doubao-seed-2-0-code-preview-260215" "ARK_KEY" ;;
|
||||
5) read_model_config "Qwen (DashScope)" "qwen3.5-plus" "DASHSCOPE_KEY" ;;
|
||||
5) read_model_config "Qwen (DashScope)" "qwen3.6-plus" "DASHSCOPE_KEY" ;;
|
||||
6)
|
||||
read_model_config "Claude" "claude-sonnet-4-6" "CLAUDE_KEY"
|
||||
read_api_base "CLAUDE_BASE" "https://api.anthropic.com/v1"
|
||||
|
||||
@@ -154,7 +154,7 @@ $ModelChoices = @{
|
||||
"2" = @{ Provider = "Zhipu AI"; Default = "glm-5-turbo"; Key = "ZHIPU_KEY" }
|
||||
"3" = @{ Provider = "Kimi (Moonshot)"; Default = "kimi-k2.5"; Key = "MOONSHOT_KEY" }
|
||||
"4" = @{ Provider = "Doubao (Volcengine Ark)"; Default = "doubao-seed-2-0-code-preview-260215"; Key = "ARK_KEY" }
|
||||
"5" = @{ Provider = "Qwen (DashScope)"; Default = "qwen3.5-plus"; Key = "DASHSCOPE_KEY" }
|
||||
"5" = @{ Provider = "Qwen (DashScope)"; Default = "qwen3.6-plus"; Key = "DASHSCOPE_KEY" }
|
||||
"6" = @{ Provider = "Claude"; Default = "claude-sonnet-4-6"; Key = "CLAUDE_KEY"; Base = "https://api.anthropic.com/v1" }
|
||||
"7" = @{ Provider = "Gemini"; Default = "gemini-3.1-pro-preview"; Key = "GEMINI_KEY"; Base = "https://generativelanguage.googleapis.com" }
|
||||
"8" = @{ Provider = "OpenAI GPT"; Default = "gpt-5.4"; Key = "OPENAI_KEY"; Base = "https://api.openai.com/v1" }
|
||||
@@ -169,7 +169,7 @@ function Select-Model {
|
||||
Write-Host "2) Zhipu AI (glm-5-turbo, glm-5, etc.)"
|
||||
Write-Host "3) Kimi (kimi-k2.5, kimi-k2, etc.)"
|
||||
Write-Host "4) Doubao (doubao-seed-2-0-code-preview-260215, etc.)"
|
||||
Write-Host "5) Qwen (qwen3.5-plus, qwen3-max, qwq-plus, etc.)"
|
||||
Write-Host "5) Qwen (qwen3.6-plus, qwen3.5-plus, qwen3-max, qwq-plus, etc.)"
|
||||
Write-Host "6) Claude (claude-sonnet-4-6, claude-opus-4-6, etc.)"
|
||||
Write-Host "7) Gemini (gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, etc.)"
|
||||
Write-Host "8) OpenAI GPT (gpt-5.4, gpt-5.2, gpt-4.1, etc.)"
|
||||
@@ -453,7 +453,11 @@ function Update-Project {
|
||||
|
||||
Assert-Python
|
||||
Install-Dependencies
|
||||
Start-CowAgent
|
||||
|
||||
# Start via python -m cli.cli instead of cow.exe, because the exe may
|
||||
# still be cached/locked from the previous installation on Windows.
|
||||
Write-Cow "Starting CowAgent..."
|
||||
& $PythonCmd -m cli.cli start
|
||||
}
|
||||
|
||||
# ── main ──────────────────────────────────────────────────────────
|
||||
|
||||
Reference in New Issue
Block a user