feat(vision): prioritize main model for image recognition with multi-provider fallback

- Add call_vision method to all bot implementations (DashScope, Claude, Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot) using each vendor's native multimodal API format - Remove call_with_tools/call_vision from Bot base class to fix MRO shadowing issue with OpenAICompatibleBot mixin - Refactor vision tool provider resolution: MainModel → other configured models (auto-discovered) → OpenAI → LinkAI, with automatic fallback - Return actual model name used in call_vision responses - Sync config.json API keys to .env bidirectionally on startup - Fix bot instance cache to detect bot_type/use_linkai config changes - Add SSE reconnection support for web console - Preserve image path hints in Gemini text for correct vision tool calls - Update docs/tools/vision.mdx
feat: add port config
2026-06-02 18:17:11 +08:00 · 2026-04-11 19:46:11 +08:00 · 2026-04-09 21:29:53 +08:00 · 2026-04-09 09:55:07 +08:00 · 2026-04-08 16:54:26 +08:00 · 2026-04-08 16:50:45 +08:00
80 changed files with 1653 additions and 815 deletions
--- a/README.md
+++ b/README.md
@@ -101,7 +101,7 @@ bash <(curl -fsSL https://cdn.link-ai.tech/code/cow/run.sh)
 irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
 ```

-脚本使用说明：[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start`、`cow stop` 等 [CLI 命令](https://docs.cowagent.ai/commands/index) 管理服务。
+脚本使用说明：[一键运行脚本](https://docs.cowagent.ai/guide/quick-start)。安装后可使用 `cow start`、`cow stop` 等 [CLI 命令](https://docs.cowagent.ai/cli/index) 管理服务。


 ## 一、准备
@@ -116,7 +116,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex

 ### 2.环境安装

-支持 Linux、MacOS、Windows 操作系统，可在个人计算机及服务器上运行，需安装 `Python`，Python 版本需在3.7 ~ 3.12 之间，推荐使用3.9版本。
+支持 Linux、MacOS、Windows 操作系统，可在个人计算机及服务器上运行，需安装 `Python`，Python 版本需在3.7 ~ 3.12 之间。

 > 注意：Agent 模式推荐使用源码运行，若选择 Docker 部署则无需安装 python 环境和下载源码，可直接快进到下一节。

@@ -151,7 +151,7 @@ pip3 install -r requirements-optional.txt
 pip3 install -e .
 ```

-安装后可使用 `cow` 命令管理服务（启动、停止、更新等）和技能，详见 [命令文档](https://docs.cowagent.ai/commands/index)。
+安装后可使用 `cow` 命令管理服务（启动、停止、更新等）和技能，详见 [命令文档](https://docs.cowagent.ai/cli/index)。

 **(5) 安装浏览器工具 (可选)：**

@@ -218,7 +218,7 @@ cow install-browser
 <details>
 <summary>2. 其他配置</summary>

-+ `model`: 模型名称，Agent 模式下推荐使用 `MiniMax-M2.7`、`glm-5-turbo`、`kimi-k2.5`、`qwen3.5-plus`、`claude-sonnet-4-6`、`gemini-3.1-pro-preview`，全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
+ `model`: 模型名称，Agent 模式下推荐使用 `MiniMax-M2.7`、`glm-5-turbo`、`kimi-k2.5`、`qwen3.6-plus`、`claude-sonnet-4-6`、`gemini-3.1-pro-preview`，全部模型名称参考[common/const.py](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py)文件
 + `character_desc`：普通对话模式下的机器人系统提示词。在 Agent 模式下该配置不生效，由工作空间中的文件内容构成。
 + `subscribe_msg`：订阅消息，公众号和企业微信 channel 中请填写，当被订阅时会自动回复， 可使用特殊占位符。目前支持的占位符有{trigger_prefix}，在程序中它会自动替换成 bot 的触发词。
 </details>
@@ -303,7 +303,7 @@ sudo docker logs -f chatgpt-on-wechat

 ## 模型说明

-以下对所有可支持的模型的配置和使用方法进行说明，模型接口实现在项目的 `models/` 目录下。
+推荐通过 Web 控制台在线管理模型配置，无需手动编辑文件，详见 [模型文档](https://docs.cowagent.ai/models)。以下是手动修改 `config.json` 配置模型的说明：

 <details>
 <summary>OpenAI</summary>
@@ -411,18 +411,18 @@ sudo docker logs -f chatgpt-on-wechat

 ```json
 {
-    "model": "qwen3.5-plus",
+    "model": "qwen3.6-plus",
    "dashscope_api_key": "sk-qVxxxxG"
 }
 ```
- - `model`: 可填写 `qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus` 等
- - `dashscope_api_key`: 通义千问的 API-KEY，参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ，在 [控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建
+ - `model`: 可填写 `qwen3.6-plus、qwen3.5-plus、qwen3-max、qwen-max、qwen-plus、qwen-turbo、qwen-long、qwq-plus` 等
+ - `dashscope_api_key`: 通义千问的 API-KEY，参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) ，在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建

 方式二：OpenAI 兼容方式接入，配置如下：
 ```json
 {
  "bot_type": "openai",
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "open_ai_api_key": "sk-qVxxxxG"
 }
@@ -674,7 +674,7 @@ Coding Plan 是各厂商推出的编程包月套餐，所有厂商均可通过 O

 ## 通道说明

-以下对可接入通道的配置方式进行说明，应用通道代码在项目的 `channel/` 目录下。
+推荐通过 Web 控制台在线管理通道配置，无需手动编辑文件，详见 [通道文档](https://docs.cowagent.ai/channels/weixin)。以下为手动修改 `config.json` 配置通道的说明：

 支持同时可接入多个通道，配置时可通过逗号进行分割，例如 `"channel_type": "feishu,dingtalk"`。

--- a/agent/prompt/builder.py
+++ b/agent/prompt/builder.py
@@ -207,9 +207,9 @@ def _build_tooling_section(tools: List[Any], language: str) -> List[str]:
        "",
        "工具调用风格：",
        "",
-        "- 在多步骤任务、敏感操作或用户要求时简要解释决策过程",
-        "- 持续推进直到任务完成，完成后向用户报告结果。",
-        "- 回复中涉及密钥、令牌等敏感信息必须脱敏。",
+        "- 多步骤任务、复杂决策、敏感操作时，应简要说明当前在做什么、为什么这样做，让用户了解关键进展",
+        "- 持续推进直到任务完成，完成后向用户报告结果",
+        "- 回复中涉及密钥、令牌等敏感信息必须脱敏",
        "- URL链接直接放在回复文本中即可，系统会自动处理和渲染。无需下载后使用send工具发送",
        "",
    ]
@@ -383,7 +383,8 @@ def _build_workspace_section(workspace_dir: str, language: str) -> List[str]:
        "",
        "**💬 交流规范**:",
        "",
-        "- 对话中不要暴露内部技术细节（文件名、工具名等），用自然语言表达。例如说「我已记住」而非「已更新 MEMORY.md」",
+        "- 记忆相关操作无需暴露文件名，用自然语言表达即可。例如说「我已记住」而非「已更新 MEMORY.md」",
+        "- 任务执行过程中的关键决策和步骤应该告知用户，让用户了解你在做什么、为什么这么做",
        "- 做真正有帮助的助手，而不是表演式的客套，尽可能帮忙解决问题",
        "- 回复应结构清晰、重点突出。善用 **加粗**、列表、分段等格式让信息一目了然",
        "- 适当使用 emoji 让表达更生动自然 🎯，但不要过度堆砌",
@@ -477,7 +478,14 @@ def _build_runtime_section(runtime_info: Dict[str, Any], language: str) -> List[
    
    # Add other runtime info
    runtime_parts = []
-    if runtime_info.get("model"):
+    # Support dynamic model via callable, fallback to static value
+    if callable(runtime_info.get("_get_model")):
+        try:
+            runtime_parts.append(f"模型={runtime_info['_get_model']()}")
+        except Exception:
+            if runtime_info.get("model"):
+                runtime_parts.append(f"模型={runtime_info['model']}")
+    elif runtime_info.get("model"):
        runtime_parts.append(f"模型={runtime_info['model']}")
    if runtime_info.get("workspace"):
        runtime_parts.append(f"工作空间={runtime_info['workspace']}")
--- a/agent/prompt/workspace.py
+++ b/agent/prompt/workspace.py
@@ -231,9 +231,9 @@ _你不是一个聊天机器人，你正在成为某个人。_

 ## 🎯 核心原则

-**做真正有帮助的助手，而不是表演式的客套。** 跳过「好的！」「当然可以！」之类的套话——直接帮忙。行动胜过废话。
+**做真正有帮助的助手。** 目标是真正帮用户解决问题，在执行复杂任务时，关键的决策和过程进展要让用户知道。

-**有自己的观点。** 你可以不同意、有偏好、觉得有趣或无聊。一个没有个性的助手只是多了几步操作的搜索引擎。
+**有自己的观点和个性。** 你可以不同意、有偏好、觉得有趣或无聊。

 **先自己动手查。** 先试着搞定：读文件、查上下文、搜索一下。实在搞不定了再问。目标是带着答案回来，而不是带着问题。

--- a/agent/skills/loader.py
+++ b/agent/skills/loader.py
@@ -53,6 +53,12 @@ class SkillLoader:
        """
        Recursively load skills from a directory.
        
+        If a subdirectory contains its own SKILL.md, it is treated as a
+        self-contained skill (or skill-collection) and its children are
+        NOT scanned further. This prevents sub-skills inside a collection
+        (e.g. style-collection/style-anjing) from being listed as
+        independent top-level skills.
+        
        :param dir_path: Directory to scan
        :param source: Source identifier
        :param include_root_files: Whether to include root-level .md files
@@ -66,38 +72,41 @@ class SkillLoader:
        except Exception as e:
            diagnostics.append(f"Failed to list directory {dir_path}: {e}")
            return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
+
+        # If this directory has its own SKILL.md, load it and stop recursing.
+        # The sub-directories are internal resources of this skill.
+        if not include_root_files and 'SKILL.md' in entries:
+            skill_md_path = os.path.join(dir_path, 'SKILL.md')
+            if os.path.isfile(skill_md_path):
+                skill_result = self._load_skill_from_file(skill_md_path, source)
+                if skill_result.skills:
+                    skills.extend(skill_result.skills)
+                diagnostics.extend(skill_result.diagnostics)
+                return LoadSkillsResult(skills=skills, diagnostics=diagnostics)
        
        for entry in entries:
-            # Skip hidden files and directories
            if entry.startswith('.'):
                continue
            
-            # Skip common non-skill directories
            if entry in ('node_modules', '__pycache__', 'venv', '.git'):
                continue
            
            full_path = os.path.join(dir_path, entry)
            
-            # Handle directories
            if os.path.isdir(full_path):
-                # Recursively scan subdirectories
                sub_result = self._load_skills_recursive(full_path, source, include_root_files=False)
                skills.extend(sub_result.skills)
                diagnostics.extend(sub_result.diagnostics)
                continue
            
-            # Handle files
            if not os.path.isfile(full_path):
                continue
            
-            # Check if this is a skill file
            is_root_md = include_root_files and entry.endswith('.md') and entry.upper() != 'README.MD'
-            is_skill_md = not include_root_files and entry == 'SKILL.md'
            
-            if not (is_root_md or is_skill_md):
+            if not is_root_md:
                continue
            
-            # Load the skill
            skill_result = self._load_skill_from_file(full_path, source)
            if skill_result.skills:
                skills.extend(skill_result.skills)
--- a/agent/tools/bash/bash.py
+++ b/agent/tools/bash/bash.py
@@ -18,9 +18,13 @@ from common.utils import expand_path
 class Bash(BaseTool):
    """Tool for executing bash commands"""

+    _IS_WIN = sys.platform == "win32"
+
    name: str = "bash"
    description: str = f"""Execute a bash command in the current working directory. Returns stdout and stderr. Output is truncated to last {DEFAULT_MAX_LINES} lines or {DEFAULT_MAX_BYTES // 1024}KB (whichever is hit first). If truncated, full output is saved to a temp file.
-
+{'''
+PLATFORM: Windows (cmd.exe). Do NOT use Unix-only commands like grep, head, tail, sed, awk.
+''' if _IS_WIN else ''}
 ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME directly.

 SAFETY:
@@ -103,13 +107,12 @@ SAFETY:
                logger.debug(f"[Bash] Process User: {os.environ.get('USERNAME', os.environ.get('USER', 'unknown'))}")
            
            # On Windows, convert $VAR references to %VAR% for cmd.exe
-            if sys.platform == "win32":
+            if self._IS_WIN:
                env["PYTHONIOENCODING"] = "utf-8"
                command = self._convert_env_vars_for_windows(command, dotenv_vars)
                if command and not command.strip().lower().startswith("chcp"):
                    command = f"chcp 65001 >nul 2>&1 && {command}"

-            # Execute command with inherited environment variables
            result = subprocess.run(
                command,
                shell=True,
@@ -120,7 +123,7 @@ SAFETY:
                encoding="utf-8",
                errors="replace",
                timeout=timeout,
-                env=env
+                env=env,
            )
            
            logger.debug(f"[Bash] Exit code: {result.returncode}")
--- a/agent/tools/browser/browser_service.py
+++ b/agent/tools/browser/browser_service.py
@@ -45,6 +45,11 @@ _SNAPSHOT_JS = """
    const KEEP = new Set(%s);
    const INTERACTIVE = new Set(%s);
    const SKIP = new Set(["script","style","noscript","svg","path","meta","link","br","hr"]);
+    const CLICKABLE_ROLES = new Set([
+        "button","link","tab","menuitem","menuitemcheckbox","menuitemradio",
+        "option","switch","checkbox","radio","combobox","searchbox","slider",
+        "spinbutton","textbox","treeitem"
+    ]);
    let refCounter = 0;
    const refMap = {};

@@ -56,6 +61,58 @@ _SNAPSHOT_JS = """
        return true;
    }

+    // Strong signals: these attributes alone are enough to mark as interactive
+    function hasStrongInteractiveSignal(el) {
+        const role = el.getAttribute("role");
+        if (role && CLICKABLE_ROLES.has(role)) return true;
+        if (el.hasAttribute("onclick") || el.hasAttribute("tabindex")) return true;
+        if (el.hasAttribute("data-click") || el.hasAttribute("data-action")) return true;
+        if (el.getAttribute("contenteditable") === "true") return true;
+        return false;
+    }
+
+    // Check if cursor:pointer is set directly (not just inherited from parent)
+    function hasOwnPointerCursor(el) {
+        try {
+            const st = window.getComputedStyle(el);
+            if (st.cursor !== "pointer") return false;
+            const parent = el.parentElement;
+            if (parent) {
+                const pst = window.getComputedStyle(parent);
+                if (pst.cursor === "pointer") return false;
+            }
+            return true;
+        } catch(e) {}
+        return false;
+    }
+
+    function hasTextOrContent(el) {
+        const t = el.textContent || "";
+        if (t.trim().length > 0) return true;
+        if (el.querySelector("img,video,audio,canvas")) return true;
+        const ariaLabel = el.getAttribute("aria-label");
+        if (ariaLabel && ariaLabel.trim()) return true;
+        const title = el.getAttribute("title");
+        if (title && title.trim()) return true;
+        return false;
+    }
+
+    function isImplicitInteractive(el) {
+        if (hasStrongInteractiveSignal(el)) return true;
+        if (hasOwnPointerCursor(el) && hasTextOrContent(el)) return true;
+        return false;
+    }
+
+    function getTextContent(el) {
+        let text = "";
+        for (const ch of el.childNodes) {
+            if (ch.nodeType === Node.TEXT_NODE) {
+                text += ch.textContent;
+            }
+        }
+        return text.trim();
+    }
+
    function walk(node) {
        if (node.nodeType === Node.TEXT_NODE) {
            const t = node.textContent.trim();
@@ -75,21 +132,35 @@ _SNAPSHOT_JS = """
            }
        }

-        const keep = KEEP.has(tag);
+        const nativeInteractive = INTERACTIVE.has(tag);
+        const implicitInteractive = !nativeInteractive && (node instanceof HTMLElement) && isImplicitInteractive(node);
+        const keep = KEEP.has(tag) || implicitInteractive;
+
        if (!keep) {
-            // Unwrap: promote children
            if (children.length === 0) return null;
            if (children.length === 1) return children[0];
            return children;
        }

        const obj = { tag };
-        if (INTERACTIVE.has(tag)) {
+        if (nativeInteractive || implicitInteractive) {
            refCounter++;
            obj.ref = refCounter;
            refMap[refCounter] = node;
        }

+        if (implicitInteractive) {
+            const role = node.getAttribute("role");
+            if (role) obj.role = role;
+            const directText = getTextContent(node);
+            if (!directText && children.length === 0) {
+                const ariaLabel = node.getAttribute("aria-label");
+                const title = node.getAttribute("title");
+                if (ariaLabel) obj.ariaLabel = ariaLabel;
+                else if (title) obj.ariaLabel = title;
+            }
+        }
+
        // Attributes
        if (tag === "a" && node.href) obj.href = node.getAttribute("href");
        if (tag === "img") {
@@ -113,11 +184,13 @@ _SNAPSHOT_JS = """
        }
        if (tag === "label" && node.htmlFor) obj.for = node.htmlFor;

-        // Role / aria-label
-        const role = node.getAttribute("role");
-        if (role) obj.role = role;
-        const ariaLabel = node.getAttribute("aria-label");
-        if (ariaLabel) obj.ariaLabel = ariaLabel;
+        // Role / aria-label for native interactive & semantic elements
+        if (!implicitInteractive) {
+            const role = node.getAttribute("role");
+            if (role) obj.role = role;
+            const ariaLabel = node.getAttribute("aria-label");
+            if (ariaLabel) obj.ariaLabel = ariaLabel;
+        }

        // Children
        if (children.length === 1 && typeof children[0] === "string") {
@@ -129,7 +202,6 @@ _SNAPSHOT_JS = """
        return obj;
    }

-    // Store refMap on window for later use by click/fill actions
    const result = walk(document.body);
    window.__cowRefMap = refMap;
    return { tree: result, refCount: refCounter };
--- a/agent/tools/vision/vision.py
+++ b/agent/tools/vision/vision.py
@@ -1,22 +1,30 @@
 """
-Vision tool - Analyze images using OpenAI-compatible Vision API.
+Vision tool - Analyze images using Vision API.
 Supports local files (auto base64-encoded) and HTTP URLs.
-Providers: OpenAI (preferred) > LinkAI (fallback).
+
+Provider priority (default):
+  1. Main model via bot.call_vision — zero extra cost
+  2. Other models whose API key is configured — auto-discovered
+  3. OpenAI / LinkAI raw HTTP — reliable fallback
+  When use_linkai=true, LinkAI is promoted to #1.
+  When tool.vision.model is set, that model is used exclusively first.
 """

 import base64
 import os
 import subprocess
 import tempfile
-from typing import Any, Dict, Optional, Tuple
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional

 import requests

 from agent.tools.base_tool import BaseTool, ToolResult
+from common import const
 from common.log import logger
 from config import conf

-DEFAULT_MODEL = "gpt-4.1-mini"
+DEFAULT_MODEL = const.GPT_41_MINI
 DEFAULT_TIMEOUT = 60
 MAX_TOKENS = 1000
 COMPRESS_THRESHOLD = 1_048_576  # 1 MB
@@ -29,15 +37,46 @@ SUPPORTED_EXTENSIONS = {
    "webp": "image/webp",
 }

+_MAIN_MODEL_PROVIDER_NAME = "MainModel"
+
+# (config_key_for_api_key, bot_type, default_vision_model, provider_display_name)
+# Auto-discovered as fallback vision providers when their API key is configured.
+# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
+_DISCOVERABLE_MODELS = [
+    ("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_5, "Moonshot"),
+    ("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
+    ("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
+    ("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
+    ("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
+    ("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
+    ("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
+]
+
+
+@dataclass
+class VisionProvider:
+    """A single Vision API provider configuration."""
+    name: str
+    api_key: str
+    api_base: str
+    extra_headers: dict = field(default_factory=dict)
+    model_override: Optional[str] = None
+    use_bot: bool = False  # When True, call via bot.call_vision instead of raw HTTP
+    fallback_bot: Any = None  # Bot instance for non-main-model providers
+
+
+class VisionAPIError(Exception):
+    """Raised when a Vision API call fails and should trigger fallback."""
+    pass
+

 class Vision(BaseTool):
-    """Analyze images using OpenAI-compatible Vision API"""
+    """Analyze images using Vision API"""

    name: str = "vision"
    description: str = (
        "Analyze a local image or image URL (jpg/jpeg/png) using Vision API. "
        "Can describe content, extract text, identify objects, colors, etc. "
-        "Requires OPENAI_API_KEY or LINKAI_API_KEY."
    )

    params: dict = {
@@ -51,13 +90,6 @@ class Vision(BaseTool):
                "type": "string",
                "description": "Question to ask about the image",
            },
-            "model": {
-                "type": "string",
-                "description": (
-                    f"Vision model to use (default: {DEFAULT_MODEL}). "
-                    "Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4o"
-                ),
-            },
        },
        "required": ["image", "question"],
    }
@@ -67,29 +99,26 @@ class Vision(BaseTool):

    @staticmethod
    def is_available() -> bool:
-        return bool(
-            conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
-            or conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
-        )
+        return True

    def execute(self, args: Dict[str, Any]) -> ToolResult:
        image = args.get("image", "").strip()
        question = args.get("question", "").strip()
-        model = args.get("model", DEFAULT_MODEL).strip() or DEFAULT_MODEL

        if not image:
            return ToolResult.fail("Error: 'image' parameter is required")
        if not question:
            return ToolResult.fail("Error: 'question' parameter is required")

-        api_key, api_base, extra_headers = self._resolve_provider()
-        if not api_key:
+        providers = self._resolve_providers()
+        if not providers:
            return ToolResult.fail(
-                "Error: No API key configured for Vision.\n"
-                "Please configure one of the following using env_config tool:\n"
-                "  1. OPENAI_API_KEY (preferred): env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
-                "  2. LINKAI_API_KEY (fallback): env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")\n\n"
-                "Get your key at: https://platform.openai.com/api-keys or https://link-ai.tech"
+                "Error: No model available for Vision.\n"
+                "The main model does not support vision and no other API keys are configured.\n"
+                "Options:\n"
+                "  1. Switch to a multimodal model (e.g. qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
+                "  2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
+                "  3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
            )

        try:
@@ -97,36 +126,221 @@ class Vision(BaseTool):
        except Exception as e:
            return ToolResult.fail(f"Error: {e}")

+        return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
+
+    def _call_with_fallback(self, providers: List[VisionProvider], model: str,
+                            question: str, image_content: dict) -> ToolResult:
+        """Try each provider in order; fall back to the next one on failure."""
+        errors: List[str] = []
+        for i, provider in enumerate(providers):
+            use_model = provider.model_override or model
+            try:
+                logger.info(f"[Vision] Trying provider '{provider.name}' "
+                            f"with model '{use_model}' ({i + 1}/{len(providers)})")
+                if provider.use_bot:
+                    result = self._call_via_bot(use_model, question, image_content, provider)
+                else:
+                    result = self._call_api(provider, use_model, question, image_content)
+                logger.info(f"[Vision] ✅ Success via {provider.name} (model={use_model})")
+                return result
+            except VisionAPIError as e:
+                errors.append(f"[{provider.name}/{use_model}] {e}")
+                logger.warning(f"[Vision] Provider '{provider.name}' failed: {e}")
+            except requests.Timeout:
+                errors.append(f"[{provider.name}/{use_model}] Request timed out after {DEFAULT_TIMEOUT}s")
+                logger.warning(f"[Vision] Provider '{provider.name}' timed out")
+            except requests.ConnectionError:
+                errors.append(f"[{provider.name}/{use_model}] Connection failed")
+                logger.warning(f"[Vision] Provider '{provider.name}' connection failed")
+            except Exception as e:
+                errors.append(f"[{provider.name}/{use_model}] {e}")
+                logger.error(f"[Vision] Provider '{provider.name}' unexpected error: {e}", exc_info=True)
+
+        return ToolResult.fail(
+            "Error: All Vision API providers failed.\n" + "\n".join(f"  - {err}" for err in errors)
+        )
+
+    def _resolve_providers(self) -> List[VisionProvider]:
+        """
+        Build an ordered list of available providers.
+
+        Priority:
+          - use_linkai=true  → [LinkAI, MainModel, OtherModels…, OpenAI]
+          - default          → [MainModel, OtherModels…, OpenAI, LinkAI]
+
+        "OtherModels" are auto-discovered from configured API keys.
+        The main model's bot_type is excluded from OtherModels to avoid
+        duplicating the MainModel provider.
+        """
+        use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
+        providers: List[VisionProvider] = []
+
+        if use_linkai:
+            self._append_provider(providers, self._build_linkai_provider)
+            self._append_provider(providers, self._build_main_model_provider)
+            self._append_other_model_providers(providers)
+            self._append_provider(providers, self._build_openai_provider)
+        else:
+            self._append_provider(providers, self._build_main_model_provider)
+            self._append_other_model_providers(providers)
+            self._append_provider(providers, self._build_openai_provider)
+            self._append_provider(providers, self._build_linkai_provider)
+
+        return providers
+
+    @staticmethod
+    def _append_provider(providers: List[VisionProvider], builder) -> None:
+        p = builder()
+        if p:
+            providers.append(p)
+
+    def _append_other_model_providers(self, providers: List[VisionProvider]) -> None:
+        """
+        Auto-discover other models whose API key is configured.
+        Skip the main model's own bot_type (already covered by MainModel provider).
+        Skip bot_types that already have a provider in the list (e.g. OpenAI).
+        """
+        # Determine main model's bot_type so we can skip it
+        main_bot_type = None
+        if self.model and hasattr(self.model, '_resolve_bot_type'):
+            main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
+
+        existing_names = {p.name for p in providers}
+
+        for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
+            if display_name in existing_names:
+                continue
+            if bot_type == main_bot_type:
+                continue
+            api_key = conf().get(config_key, "")
+            if not api_key or not api_key.strip():
+                continue
+
+            # Create a bot instance and check if it supports call_vision
+            try:
+                from models.bot_factory import create_bot
+                bot = create_bot(bot_type)
+                if not hasattr(bot, 'call_vision'):
+                    continue
+            except Exception:
+                continue
+
+            providers.append(VisionProvider(
+                name=display_name,
+                api_key="",
+                api_base="",
+                model_override=default_model,
+                use_bot=True,
+                fallback_bot=bot,
+            ))
+
+    def _resolve_vision_model(self) -> Optional[str]:
+        """
+        Determine which model to use for vision.
+
+        1. User explicit config: tool.vision.model in config.json
+        2. Fallback to the main configured model name
+        """
+        tool_conf = conf().get("tool", {})
+        user_vision_model = tool_conf.get("vision", {}).get("model") if isinstance(tool_conf, dict) else None
+        if user_vision_model:
+            return user_vision_model
+        model_name = conf().get("model", "")
+        return model_name or None
+
+    def _build_main_model_provider(self) -> Optional[VisionProvider]:
+        """
+        Use the vendor's own model for vision via bot.call_vision.
+        Only available when the bot class has call_vision.
+        """
+        if not (self.model and hasattr(self.model, 'bot')):
+            return None
        try:
-            return self._call_api(api_key, api_base, model, question, image_content, extra_headers)
-        except requests.Timeout:
-            return ToolResult.fail(f"Error: Vision API request timed out after {DEFAULT_TIMEOUT}s")
-        except requests.ConnectionError:
-            return ToolResult.fail("Error: Failed to connect to Vision API")
-        except Exception as e:
-            logger.error(f"[Vision] Unexpected error: {e}", exc_info=True)
-            return ToolResult.fail(f"Error: Vision API call failed - {e}")
+            bot = self.model.bot
+            if not hasattr(bot, 'call_vision'):
+                return None
+        except Exception:
+            return None

-    def _resolve_provider(self) -> Tuple[Optional[str], str, dict]:
-        """Resolve API key, base URL and extra headers. Priority: conf() > env vars."""
+        vision_model = self._resolve_vision_model()
+
+        return VisionProvider(
+            name=_MAIN_MODEL_PROVIDER_NAME,
+            api_key="",
+            api_base="",
+            model_override=vision_model,
+            use_bot=True,
+        )
+
+    def _build_openai_provider(self) -> Optional[VisionProvider]:
        api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
-        if api_key:
-            api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
-                or "https://api.openai.com/v1"
-            return api_key, self._ensure_v1(api_base), {}
+        if not api_key:
+            return None
+        api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
+            or "https://api.openai.com/v1"
+        return VisionProvider(name="OpenAI", api_key=api_key, api_base=self._ensure_v1(api_base))

+    def _build_linkai_provider(self) -> Optional[VisionProvider]:
        api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
-        if api_key:
-            api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
-                or "https://api.link-ai.tech"
-            logger.debug("[Vision] Using LinkAI API (OPENAI_API_KEY not set)")
-            from common.utils import get_cloud_headers
-            extra = get_cloud_headers(api_key)
-            extra.pop("Authorization", None)
-            extra.pop("Content-Type", None)
-            return api_key, self._ensure_v1(api_base), extra
+        if not api_key:
+            return None
+        api_base = (conf().get("linkai_api_base") or os.environ.get("LINKAI_API_BASE", "")).rstrip("/") \
+            or "https://api.link-ai.tech"
+        from common.utils import get_cloud_headers
+        extra = get_cloud_headers(api_key)
+        extra.pop("Authorization", None)
+        extra.pop("Content-Type", None)
+        return VisionProvider(name="LinkAI", api_key=api_key, api_base=self._ensure_v1(api_base),
+                              extra_headers=extra)

-        return None, "", {}
+    def _call_via_bot(self, model: str, question: str, image_content: dict,
+                      provider: Optional[VisionProvider] = None) -> ToolResult:
+        """
+        Call a model's call_vision with vendor-native API format.
+        Uses the provider's _fallback_bot if set, otherwise the main model bot.
+        Raises VisionAPIError on failure so fallback can proceed.
+        """
+        try:
+            bot = (provider and provider.fallback_bot) or self.model.bot
+        except Exception as e:
+            raise VisionAPIError(f"Cannot access bot: {e}")
+
+        # Extract the raw image URL from the OpenAI-format image_content block
+        image_url = image_content.get("image_url", {}).get("url", "")
+        if not image_url:
+            raise VisionAPIError("No image URL in content block")
+
+        try:
+            response = bot.call_vision(
+                image_url=image_url,
+                question=question,
+                model=model,
+                max_tokens=MAX_TOKENS,
+            )
+        except Exception as e:
+            raise VisionAPIError(f"call_vision failed: {e}")
+
+        if response is NotImplemented:
+            raise VisionAPIError("Bot does not support vision")
+
+        if isinstance(response, dict) and response.get("error"):
+            raise VisionAPIError(f"API error - {response.get('message', 'Unknown')}")
+
+        content = response.get("content", "") if isinstance(response, dict) else ""
+        if not content:
+            raise VisionAPIError("Empty response from main model")
+
+        usage_info = response.get("usage", {}) if isinstance(response, dict) else {}
+
+        # Use the actual model name from the bot response if available
+        actual_model = response.get("model", model) if isinstance(response, dict) else model
+        provider_name = provider.name if provider else _MAIN_MODEL_PROVIDER_NAME
+        return ToolResult.success({
+            "model": actual_model,
+            "provider": provider_name,
+            "content": content,
+            "usage": usage_info,
+        })

    @staticmethod
    def _ensure_v1(api_base: str) -> str:
@@ -139,9 +353,13 @@ class Vision(BaseTool):
        return api_base.rstrip("/") + "/v1"

    def _build_image_content(self, image: str) -> dict:
-        """Build the image_url content block for the API request."""
+        """
+        Build the image_url content block.
+        Both remote URLs and local files are converted to base64 data URLs
+        so every bot backend can consume them without extra downloads.
+        """
        if image.startswith(("http://", "https://")):
-            return {"type": "image_url", "image_url": {"url": image}}
+            return self._download_to_data_url(image)

        if not os.path.isfile(image):
            raise FileNotFoundError(f"Image file not found: {image}")
@@ -165,6 +383,19 @@ class Vision(BaseTool):
        data_url = f"data:{mime_type};base64,{b64}"
        return {"type": "image_url", "image_url": {"url": data_url}}

+    @staticmethod
+    def _download_to_data_url(url: str) -> dict:
+        """Download a remote image and return it as a base64 data URL."""
+        resp = requests.get(url, timeout=30)
+        if resp.status_code != 200:
+            raise VisionAPIError(f"Failed to download image: HTTP {resp.status_code}")
+        content_type = resp.headers.get("Content-Type", "image/jpeg").split(";")[0].strip()
+        if not content_type.startswith("image/"):
+            content_type = "image/jpeg"
+        b64 = base64.b64encode(resp.content).decode("ascii")
+        data_url = f"data:{content_type};base64,{b64}"
+        return {"type": "image_url", "image_url": {"url": data_url}}
+
    @staticmethod
    def _maybe_compress(path: str) -> str:
        """Compress image to under COMPRESS_THRESHOLD with max long-edge 1536px."""
@@ -220,8 +451,13 @@ class Vision(BaseTool):
        os.remove(tmp.name)
        return path

-    def _call_api(self, api_key: str, api_base: str, model: str,
-                  question: str, image_content: dict, extra_headers: dict = None) -> ToolResult:
+    def _call_api(self, provider: VisionProvider, model: str,
+                  question: str, image_content: dict) -> ToolResult:
+        """
+        Call a single provider's Vision API.
+        Raises VisionAPIError on recoverable failures so the caller can try
+        the next provider.
+        """
        payload = {
            "model": model,
            "messages": [
@@ -233,34 +469,29 @@ class Vision(BaseTool):
                    ],
                }
            ],
-            "max_tokens": MAX_TOKENS,
        }

        headers = {
-            "Authorization": f"Bearer {api_key}",
+            "Authorization": f"Bearer {provider.api_key}",
            "Content-Type": "application/json",
-            **(extra_headers or {}),
+            **provider.extra_headers,
        }

        resp = requests.post(
-            f"{api_base}/chat/completions",
+            f"{provider.api_base}/chat/completions",
            headers=headers,
            json=payload,
            timeout=DEFAULT_TIMEOUT,
        )

-        if resp.status_code == 401:
-            return ToolResult.fail("Error: Invalid API key. Please check your configuration.")
-        if resp.status_code == 429:
-            return ToolResult.fail("Error: API rate limit reached. Please try again later.")
        if resp.status_code != 200:
-            return ToolResult.fail(f"Error: Vision API returned HTTP {resp.status_code}: {resp.text[:200]}")
+            raise VisionAPIError(f"HTTP {resp.status_code}: {resp.text[:200]}")

        data = resp.json()

        if "error" in data:
            msg = data["error"].get("message", "Unknown API error")
-            return ToolResult.fail(f"Error: Vision API error - {msg}")
+            raise VisionAPIError(f"API error - {msg}")

        content = ""
        choices = data.get("choices", [])
@@ -270,6 +501,7 @@ class Vision(BaseTool):
        usage = data.get("usage", {})
        result = {
            "model": model,
+            "provider": provider.name,
            "content": content,
            "usage": {
                "prompt_tokens": usage.get("prompt_tokens", 0),
--- a/bridge/agent_bridge.py
+++ b/bridge/agent_bridge.py
@@ -67,7 +67,7 @@ class AgentLLMModel(LLMModel):

    _MODEL_BOT_TYPE_MAP = {
        "wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
-        "xunfei": const.XUNFEI, const.QWEN: const.QWEN,
+        "xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
        const.MODELSCOPE: const.MODELSCOPE,
    }
    _MODEL_PREFIX_MAP = [
@@ -124,14 +124,15 @@ class AgentLLMModel(LLMModel):

    @property
    def bot(self):
-        """Lazy load the bot, re-create when model changes"""
+        """Lazy load the bot, re-create when model or bot_type changes"""
        from models.bot_factory import create_bot
        cur_model = self.model
-        if self._bot is None or self._bot_model != cur_model:
-            bot_type = self._resolve_bot_type(cur_model)
-            self._bot = create_bot(bot_type)
+        cur_bot_type = self._resolve_bot_type(cur_model)
+        if self._bot is None or self._bot_model != cur_model or getattr(self, '_bot_type', None) != cur_bot_type:
+            self._bot = create_bot(cur_bot_type)
            self._bot = add_openai_compatible_support(self._bot)
            self._bot_model = cur_model
+            self._bot_type = cur_bot_type
        return self._bot

    def call(self, request: LLMRequest):
@@ -505,15 +506,15 @@ class AgentBridge:
    
    def _migrate_config_to_env(self, workspace_root: str):
        """
-        Migrate API keys from config.json to .env file if not already set
-        
+        Sync API keys from config.json to .env file.
+        Adds new keys and updates changed values on each startup.
+
        Args:
            workspace_root: Workspace directory path (not used, kept for compatibility)
        """
        from config import conf
        import os
        
-        # Mapping from config.json keys to environment variable names
        key_mapping = {
            "open_ai_api_key": "OPENAI_API_KEY",
            "open_ai_api_base": "OPENAI_API_BASE",
@@ -522,10 +523,9 @@ class AgentBridge:
            "linkai_api_key": "LINKAI_API_KEY",
        }
        
-        # Use fixed secure location for .env file
        env_file = expand_path("~/.cow/.env")
        
-        # Read existing env vars from .env file
+        # Read existing env vars (key -> value)
        existing_env_vars = {}
        if os.path.exists(env_file):
            try:
@@ -533,48 +533,46 @@ class AgentBridge:
                    for line in f:
                        line = line.strip()
                        if line and not line.startswith('#') and '=' in line:
-                            key, _ = line.split('=', 1)
-                            existing_env_vars[key.strip()] = True
+                            key, val = line.split('=', 1)
+                            existing_env_vars[key.strip()] = val.strip()
            except Exception as e:
                logger.warning(f"[AgentBridge] Failed to read .env file: {e}")
        
-        # Check which keys need to be migrated
-        keys_to_migrate = {}
+        # Sync config.json values into .env (add/update/remove)
+        updated = False
        for config_key, env_key in key_mapping.items():
-            # Skip if already in .env file
-            if env_key in existing_env_vars:
-                continue
-            
-            # Get value from config.json
-            value = conf().get(config_key, "")
-            if value and value.strip():  # Only migrate non-empty values
-                keys_to_migrate[env_key] = value.strip()
-        
-        # Log summary if there are keys to skip
-        if existing_env_vars:
-            logger.debug(f"[AgentBridge] {len(existing_env_vars)} env vars already in .env")
-        
-        # Write new keys to .env file
-        if keys_to_migrate:
+            raw = conf().get(config_key, "")
+            value = raw.strip() if raw else ""
+            old_value = existing_env_vars.get(env_key)
+
+            if value:
+                if old_value == value:
+                    continue
+                existing_env_vars[env_key] = value
+                os.environ[env_key] = value
+                updated = True
+            else:
+                if old_value is None:
+                    continue
+                existing_env_vars.pop(env_key, None)
+                os.environ.pop(env_key, None)
+                updated = True
+            updated = True
+
+        if updated:
            try:
-                # Ensure ~/.cow directory and .env file exist
                env_dir = os.path.dirname(env_file)
-                if not os.path.exists(env_dir):
-                    os.makedirs(env_dir, exist_ok=True)
-                if not os.path.exists(env_file):
-                    open(env_file, 'a').close()
-                
-                # Append new keys
-                with open(env_file, 'a', encoding='utf-8') as f:
-                    f.write('\n# Auto-migrated from config.json\n')
-                    for key, value in keys_to_migrate.items():
+                os.makedirs(env_dir, exist_ok=True)
+
+                with open(env_file, 'w', encoding='utf-8') as f:
+                    f.write('# Environment variables for agent\n')
+                    f.write('# Auto-managed - synced from config.json on startup\n\n')
+                    for key, value in sorted(existing_env_vars.items()):
                        f.write(f'{key}={value}\n')
-                        # Also set in current process
-                        os.environ[key] = value
-                
-                logger.info(f"[AgentBridge] Migrated {len(keys_to_migrate)} API keys from config.json to .env: {list(keys_to_migrate.keys())}")
+
+                logger.info(f"[AgentBridge] Synced API keys from config.json to .env")
            except Exception as e:
-                logger.warning(f"[AgentBridge] Failed to migrate API keys: {e}")
+                logger.warning(f"[AgentBridge] Failed to sync API keys: {e}")
    
    def _persist_messages(
        self, session_id: str, new_messages: list, channel_type: str = ""
--- a/bridge/agent_initializer.py
+++ b/bridge/agent_initializer.py
@@ -465,8 +465,12 @@ class AgentInitializer:
                'timezone': timezone_name
            }
        
+        def get_model():
+            """Get current model name dynamically from config"""
+            return conf().get("model", "unknown")
+
        return {
-            "model": conf().get("model", "unknown"),
+            "_get_model": get_model,
            "workspace": workspace_root,
            "channel": ", ".join(conf().get("channel_type")) if isinstance(conf().get("channel_type"), list) else conf().get("channel_type", "unknown"),
            "_get_current_time": get_current_time  # Dynamic time function
@@ -486,7 +490,7 @@ class AgentInitializer:
        
        env_file = expand_path("~/.cow/.env")
        
-        # Read existing env vars
+        # Read existing env vars (key -> value)
        existing_env_vars = {}
        if os.path.exists(env_file):
            try:
@@ -494,38 +498,46 @@ class AgentInitializer:
                    for line in f:
                        line = line.strip()
                        if line and not line.startswith('#') and '=' in line:
-                            key, _ = line.split('=', 1)
-                            existing_env_vars[key.strip()] = True
+                            key, val = line.split('=', 1)
+                            existing_env_vars[key.strip()] = val.strip()
            except Exception as e:
                logger.warning(f"[AgentInitializer] Failed to read .env file: {e}")
        
-        # Check which keys need migration
-        keys_to_migrate = {}
+        # Sync config.json values into .env (add/update/remove)
+        updated = False
        for config_key, env_key in key_mapping.items():
-            if env_key in existing_env_vars:
-                continue
-            value = conf().get(config_key, "")
-            if value and value.strip():
-                keys_to_migrate[env_key] = value.strip()
-        
-        # Write new keys
-        if keys_to_migrate:
+            raw = conf().get(config_key, "")
+            value = raw.strip() if raw else ""
+            old_value = existing_env_vars.get(env_key)
+
+            if value:
+                if old_value == value:
+                    continue
+                existing_env_vars[env_key] = value
+                os.environ[env_key] = value
+                updated = True
+            else:
+                if old_value is None:
+                    continue
+                existing_env_vars.pop(env_key, None)
+                os.environ.pop(env_key, None)
+                updated = True
+
+        if updated:
            try:
                env_dir = os.path.dirname(env_file)
-                if not os.path.exists(env_dir):
-                    os.makedirs(env_dir, exist_ok=True)
-                if not os.path.exists(env_file):
-                    open(env_file, 'a').close()
-                
-                with open(env_file, 'a', encoding='utf-8') as f:
-                    f.write('\n# Auto-migrated from config.json\n')
-                    for key, value in keys_to_migrate.items():
+                os.makedirs(env_dir, exist_ok=True)
+
+                # Rewrite the entire .env file to ensure consistency
+                with open(env_file, 'w', encoding='utf-8') as f:
+                    f.write('# Environment variables for agent\n')
+                    f.write('# Auto-managed - synced from config.json on startup\n\n')
+                    for key, value in sorted(existing_env_vars.items()):
                        f.write(f'{key}={value}\n')
-                        os.environ[key] = value
-                
-                logger.info(f"[AgentInitializer] Migrated {len(keys_to_migrate)} API keys to .env: {list(keys_to_migrate.keys())}")
+
+                logger.info(f"[AgentInitializer] Synced API keys from config.json to .env")
            except Exception as e:
-                logger.warning(f"[AgentInitializer] Failed to migrate API keys: {e}")
+                logger.warning(f"[AgentInitializer] Failed to sync API keys: {e}")

    def _start_daily_flush_timer(self):
        """Start a background thread that flushes all agents' memory daily at 23:55."""
--- a/bridge/bridge.py
+++ b/bridge/bridge.py
@@ -39,11 +39,8 @@ class Bridge(object):
                self.btype["chat"] = const.BAIDU
            if model_type in ["xunfei"]:
                self.btype["chat"] = const.XUNFEI
-            if model_type in [const.QWEN]:
-                self.btype["chat"] = const.QWEN
-            if model_type in [const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
+            if model_type in [const.QWEN, const.QWEN_TURBO, const.QWEN_PLUS, const.QWEN_MAX]:
                self.btype["chat"] = const.QWEN_DASHSCOPE
-            # Support Qwen3 and other DashScope models
            if model_type and (model_type.startswith("qwen") or model_type.startswith("qwq") or model_type.startswith("qvq")):
                self.btype["chat"] = const.QWEN_DASHSCOPE
            if model_type and model_type.startswith("gemini"):
--- a/channel/chat_channel.py
+++ b/channel/chat_channel.py
@@ -347,38 +347,30 @@ class ChatChannel(Channel):
        if media_items:
            logger.info(f"[chat_channel] Extracted {len(media_items)} media item(s) from reply")
            
-            # 先发送文本（保持原文本不变）
+            # Send text first (the frontend will embed video players via renderMarkdown).
            logger.info(f"[chat_channel] Sending text content before media: {reply.content[:100]}...")
            self._send(reply, context)
            logger.info(f"[chat_channel] Text sent, now sending {len(media_items)} media item(s)")
            
-            # 然后逐个发送媒体文件
            for i, (url, media_type) in enumerate(media_items):
                try:
-                    # 判断是本地文件还是URL
+                    # Determine whether it is a remote URL or a local file.
                    if url.startswith(('http://', 'https://')):
-                        # 网络资源
                        if media_type == 'video':
-                            # 视频使用 FILE 类型发送
                            media_reply = Reply(ReplyType.FILE, url)
                            media_reply.file_name = os.path.basename(url)
                        else:
-                            # 图片使用 IMAGE_URL 类型
                            media_reply = Reply(ReplyType.IMAGE_URL, url)
                    elif os.path.exists(url):
-                        # 本地文件
                        if media_type == 'video':
-                            # 视频使用 FILE 类型，转换为 file:// URL
                            media_reply = Reply(ReplyType.FILE, f"file://{url}")
                            media_reply.file_name = os.path.basename(url)
                        else:
-                            # 图片使用 IMAGE_URL 类型，转换为 file:// URL
                            media_reply = Reply(ReplyType.IMAGE_URL, f"file://{url}")
                    else:
                        logger.warning(f"[chat_channel] Media file not found or invalid URL: {url}")
                        continue
                    
-                    # 发送媒体文件（添加小延迟避免频率限制）
                    if i > 0:
                        time.sleep(0.5)
                    self._send(media_reply, context)
--- a/channel/web/static/js/console.js
+++ b/channel/web/static/js/console.js
@@ -270,8 +270,42 @@ function createMd() {

 const md = createMd();

+const VIDEO_EXT_RE = /\.(?:mp4|webm|mov|avi|mkv)$/i;  // tested against URL without query string
+
+function _buildVideoHtml(url) {
+    const fileName = url.split('/').pop().split('?')[0];
+    return `<div style="margin:10px 0;">` +
+        `<video controls preload="metadata" ` +
+        `style="max-width:100%;border-radius:10px;box-shadow:0 2px 8px rgba(0,0,0,0.15);display:block;">` +
+        `<source src="${url}"></video>` +
+        `<a href="${url}" target="_blank" ` +
+        `style="display:inline-flex;align-items:center;gap:4px;margin-top:4px;font-size:12px;color:#8b8fa8;text-decoration:none;">` +
+        `<i class="fas fa-download"></i> ${escapeHtml(fileName)}</a></div>`;
+}
+
+function injectVideoPlayers(html) {
+    // Step 1: replace markdown-it anchor tags whose href points to a video file.
+    const step1 = html.replace(
+        /<a\s+href="(https?:\/\/[^"]+)"[^>]*>[^<]*<\/a>/gi,
+        (match, url) => VIDEO_EXT_RE.test(url.split('?')[0]) ? _buildVideoHtml(url) : match
+    );
+    // Step 2: replace any remaining bare video URLs in text nodes (not inside HTML tags).
+    // Split on HTML tags to avoid touching src/href attributes already in markup.
+    return step1.split(/(<[^>]+>)/).map((chunk, idx) => {
+        // Even indices are text nodes; odd indices are HTML tags — leave them untouched.
+        if (idx % 2 !== 0) return chunk;
+        return chunk.replace(/https?:\/\/\S+/gi, (url) => {
+            const bare = url.replace(/[),.\s]+$/, '');  // strip trailing punctuation
+            return VIDEO_EXT_RE.test(bare.split('?')[0]) ? _buildVideoHtml(bare) : url;
+        });
+    }).join('');
+}
+
 function renderMarkdown(text) {
-    try { return md.render(text); }
+    try {
+        const html = md.render(text);
+        return injectVideoPlayers(html);
+    }
    catch (e) { return text.replace(/\n/g, '<br>'); }
 }

@@ -729,41 +763,60 @@ function sendMessage() {
        }));
    }

-    fetch('/message', {
-        method: 'POST',
-        headers: { 'Content-Type': 'application/json' },
-        body: JSON.stringify(body)
-    })
-    .then(r => r.json())
-    .then(data => {
-        if (data.status === 'success') {
-            if (data.stream) {
-                startSSE(data.request_id, loadingEl, timestamp);
+    const MAX_RETRIES = 2;
+    const RETRY_DELAY_MS = 1000;
+
+    function postWithRetry(attempt) {
+        fetch('/message', {
+            method: 'POST',
+            headers: { 'Content-Type': 'application/json' },
+            body: JSON.stringify(body)
+        })
+        .then(r => r.json())
+        .then(data => {
+            if (data.status === 'success') {
+                if (data.stream) {
+                    startSSE(data.request_id, loadingEl, timestamp);
+                } else {
+                    loadingContainers[data.request_id] = loadingEl;
+                    if (!isPolling) startPolling();
+                }
            } else {
-                loadingContainers[data.request_id] = loadingEl;
-                if (!isPolling) startPolling();
+                loadingEl.remove();
+                addBotMessage(t('error_send'), new Date());
+            }
+        })
+        .catch(err => {
+            if (err.name === 'AbortError') {
+                loadingEl.remove();
+                addBotMessage(t('error_timeout'), new Date());
+                return;
+            }
+            if (attempt < MAX_RETRIES) {
+                console.warn(`[sendMessage] attempt ${attempt + 1} failed, retrying...`, err);
+                setTimeout(() => postWithRetry(attempt + 1), RETRY_DELAY_MS * (attempt + 1));
+                return;
            }
-        } else {
            loadingEl.remove();
            addBotMessage(t('error_send'), new Date());
-        }
-    })
-    .catch(err => {
-        loadingEl.remove();
-        addBotMessage(err.name === 'AbortError' ? t('error_timeout') : t('error_send'), new Date());
-    });
+        });
+    }
+
+    postWithRetry(0);
 }

 function startSSE(requestId, loadingEl, timestamp) {
-    const es = new EventSource(`/stream?request_id=${encodeURIComponent(requestId)}`);
-    activeStreams[requestId] = es;
-
    let botEl = null;
    let stepsEl = null;    // .agent-steps  (thinking summaries + tool indicators)
    let contentEl = null;  // .answer-content (final streaming answer)
    let mediaEl = null;    // .media-content (images & file attachments)
    let accumulatedText = '';
    let currentToolEl = null;
+    let done = false;
+
+    const MAX_RECONNECTS = 10;
+    const RECONNECT_BASE_MS = 1000;
+    let reconnectCount = 0;

    function ensureBotEl() {
        if (botEl) return;
@@ -788,162 +841,204 @@ function startSSE(requestId, loadingEl, timestamp) {
        mediaEl = botEl.querySelector('.media-content');
    }

-    es.onmessage = function(e) {
-        let item;
-        try { item = JSON.parse(e.data); } catch (_) { return; }
+    function connect() {
+        const es = new EventSource(`/stream?request_id=${encodeURIComponent(requestId)}`);
+        activeStreams[requestId] = es;

-        if (item.type === 'delta') {
-            ensureBotEl();
-            accumulatedText += item.content;
-            contentEl.innerHTML = renderMarkdown(accumulatedText);
-            scrollChatToBottom();
+        es.onmessage = function(e) {
+            let item;
+            try { item = JSON.parse(e.data); } catch (_) { return; }

-        } else if (item.type === 'tool_start') {
-            ensureBotEl();
+            // Successful data received, reset reconnect counter
+            reconnectCount = 0;

-            // Save current thinking as a collapsible step
-            if (accumulatedText.trim()) {
-                const fullText = accumulatedText.trim();
-                const oneLine = fullText.replace(/\n+/g, ' ');
-                const needsTruncate = oneLine.length > 80;
-                const stepEl = document.createElement('div');
-                stepEl.className = 'agent-step agent-thinking-step' + (needsTruncate ? '' : ' no-expand');
-                if (needsTruncate) {
-                    const truncated = oneLine.substring(0, 80) + '…';
-                    stepEl.innerHTML = `
-                        <div class="thinking-header" onclick="this.parentElement.classList.toggle('expanded')">
-                            <i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
-                            <span class="thinking-summary">${escapeHtml(truncated)}</span>
-                            <i class="fas fa-chevron-right thinking-chevron"></i>
-                        </div>
-                        <div class="thinking-full">${renderMarkdown(fullText)}</div>`;
-                } else {
-                    stepEl.innerHTML = `
-                        <div class="thinking-header no-toggle">
-                            <i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
-                            <span>${escapeHtml(oneLine)}</span>
-                        </div>`;
+            if (item.type === 'delta') {
+                ensureBotEl();
+                accumulatedText += item.content;
+                contentEl.innerHTML = renderMarkdown(accumulatedText);
+                scrollChatToBottom();
+
+            } else if (item.type === 'tool_start') {
+                ensureBotEl();
+
+                // Save current thinking as a collapsible step
+                if (accumulatedText.trim()) {
+                    const fullText = accumulatedText.trim();
+                    const oneLine = fullText.replace(/\n+/g, ' ');
+                    const needsTruncate = oneLine.length > 80;
+                    const stepEl = document.createElement('div');
+                    stepEl.className = 'agent-step agent-thinking-step' + (needsTruncate ? '' : ' no-expand');
+                    if (needsTruncate) {
+                        const truncated = oneLine.substring(0, 80) + '…';
+                        stepEl.innerHTML = `
+                            <div class="thinking-header" onclick="this.parentElement.classList.toggle('expanded')">
+                                <i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
+                                <span class="thinking-summary">${escapeHtml(truncated)}</span>
+                                <i class="fas fa-chevron-right thinking-chevron"></i>
+                            </div>
+                            <div class="thinking-full">${renderMarkdown(fullText)}</div>`;
+                    } else {
+                        stepEl.innerHTML = `
+                            <div class="thinking-header no-toggle">
+                                <i class="fas fa-lightbulb text-amber-400 flex-shrink-0"></i>
+                                <span>${escapeHtml(oneLine)}</span>
+                            </div>`;
+                    }
+                    stepsEl.appendChild(stepEl);
                }
-                stepsEl.appendChild(stepEl);
-            }
-            accumulatedText = '';
-            contentEl.innerHTML = '';
+                accumulatedText = '';
+                contentEl.innerHTML = '';

-            // Add tool execution indicator (collapsible)
-            currentToolEl = document.createElement('div');
-            currentToolEl.className = 'agent-step agent-tool-step';
-            const argsStr = formatToolArgs(item.arguments || {});
-            currentToolEl.innerHTML = `
-                <div class="tool-header" onclick="this.parentElement.classList.toggle('expanded')">
-                    <i class="fas fa-cog fa-spin text-primary-400 flex-shrink-0 tool-icon"></i>
-                    <span class="tool-name">${item.tool}</span>
-                    <i class="fas fa-chevron-right tool-chevron"></i>
-                </div>
-                <div class="tool-detail">
-                    <div class="tool-detail-section">
-                        <div class="tool-detail-label">Input</div>
-                        <pre class="tool-detail-content">${argsStr}</pre>
+                // Add tool execution indicator (collapsible)
+                currentToolEl = document.createElement('div');
+                currentToolEl.className = 'agent-step agent-tool-step';
+                const argsStr = formatToolArgs(item.arguments || {});
+                currentToolEl.innerHTML = `
+                    <div class="tool-header" onclick="this.parentElement.classList.toggle('expanded')">
+                        <i class="fas fa-cog fa-spin text-primary-400 flex-shrink-0 tool-icon"></i>
+                        <span class="tool-name">${item.tool}</span>
+                        <i class="fas fa-chevron-right tool-chevron"></i>
                    </div>
-                    <div class="tool-detail-section tool-output-section"></div>
-                </div>`;
-            stepsEl.appendChild(currentToolEl);
+                    <div class="tool-detail">
+                        <div class="tool-detail-section">
+                            <div class="tool-detail-label">Input</div>
+                            <pre class="tool-detail-content">${argsStr}</pre>
+                        </div>
+                        <div class="tool-detail-section tool-output-section"></div>
+                    </div>`;
+                stepsEl.appendChild(currentToolEl);

-            scrollChatToBottom();
+                scrollChatToBottom();

-        } else if (item.type === 'tool_end') {
-            if (currentToolEl) {
-                const isError = item.status !== 'success';
-                const icon = currentToolEl.querySelector('.tool-icon');
-                icon.className = isError
-                    ? 'fas fa-times text-red-400 flex-shrink-0 tool-icon'
-                    : 'fas fa-check text-primary-400 flex-shrink-0 tool-icon';
+            } else if (item.type === 'tool_end') {
+                if (currentToolEl) {
+                    const isError = item.status !== 'success';
+                    const icon = currentToolEl.querySelector('.tool-icon');
+                    icon.className = isError
+                        ? 'fas fa-times text-red-400 flex-shrink-0 tool-icon'
+                        : 'fas fa-check text-primary-400 flex-shrink-0 tool-icon';

-                // Show execution time
-                const nameEl = currentToolEl.querySelector('.tool-name');
-                if (item.execution_time !== undefined) {
-                    nameEl.innerHTML += ` <span class="tool-time">${item.execution_time}s</span>`;
+                    // Show execution time
+                    const nameEl = currentToolEl.querySelector('.tool-name');
+                    if (item.execution_time !== undefined) {
+                        nameEl.innerHTML += ` <span class="tool-time">${item.execution_time}s</span>`;
+                    }
+
+                    // Fill output section
+                    const outputSection = currentToolEl.querySelector('.tool-output-section');
+                    if (outputSection && item.result) {
+                        outputSection.innerHTML = `
+                            <div class="tool-detail-label">${isError ? 'Error' : 'Output'}</div>
+                            <pre class="tool-detail-content ${isError ? 'tool-error-text' : ''}">${escapeHtml(String(item.result))}</pre>`;
+                    }
+
+                    if (isError) currentToolEl.classList.add('tool-failed');
+                    currentToolEl = null;
                }

-                // Fill output section
-                const outputSection = currentToolEl.querySelector('.tool-output-section');
-                if (outputSection && item.result) {
-                    outputSection.innerHTML = `
-                        <div class="tool-detail-label">${isError ? 'Error' : 'Output'}</div>
-                        <pre class="tool-detail-content ${isError ? 'tool-error-text' : ''}">${escapeHtml(String(item.result))}</pre>`;
-                }
+            } else if (item.type === 'image') {
+                ensureBotEl();
+                const imgEl = document.createElement('img');
+                imgEl.src = item.content;
+                imgEl.alt = 'screenshot';
+                imgEl.style.cssText = 'max-width:600px;border-radius:8px;margin:8px 0;cursor:pointer;box-shadow:0 1px 4px rgba(0,0,0,0.1);';
+                imgEl.onclick = () => window.open(item.content, '_blank');
+                mediaEl.appendChild(imgEl);
+                scrollChatToBottom();

-                if (isError) currentToolEl.classList.add('tool-failed');
-                currentToolEl = null;
+            } else if (item.type === 'text') {
+                // Intermediate text sent before media items; display it but keep SSE open.
+                ensureBotEl();
+                contentEl.classList.remove('sse-streaming');
+                const textContent = item.content || accumulatedText;
+                if (textContent) contentEl.innerHTML = renderMarkdown(textContent);
+                applyHighlighting(botEl);
+                scrollChatToBottom();
+
+            } else if (item.type === 'video') {
+                ensureBotEl();
+                const wrapper = document.createElement('div');
+                wrapper.innerHTML = _buildVideoHtml(item.content);
+                mediaEl.appendChild(wrapper.firstElementChild || wrapper);
+                scrollChatToBottom();
+
+            } else if (item.type === 'file') {
+                ensureBotEl();
+                const fileName = item.file_name || item.content.split('/').pop();
+                const fileEl = document.createElement('a');
+                fileEl.href = item.content;
+                fileEl.download = fileName;
+                fileEl.target = '_blank';
+                fileEl.className = 'file-attachment';
+                fileEl.style.cssText = 'display:inline-flex;align-items:center;gap:6px;padding:8px 14px;margin:8px 0;border-radius:8px;background:var(--bg-secondary,#f3f4f6);color:var(--text-primary,#374151);text-decoration:none;font-size:14px;border:1px solid var(--border-color,#e5e7eb);';
+                fileEl.innerHTML = `<i class="fas fa-file-download" style="color:#6b7280;"></i> ${fileName}`;
+                mediaEl.appendChild(fileEl);
+                scrollChatToBottom();
+
+            } else if (item.type === 'phase') {
+                // Coarse progress (e.g. cow install-browser); must not close SSE (unlike "done")
+                ensureBotEl();
+                const wrap = document.createElement('div');
+                wrap.className = 'text-xs sm:text-sm text-slate-600 dark:text-slate-400 border-l-2 border-primary-400 pl-2 py-1 my-0.5';
+                wrap.textContent = String(item.content || '');
+                stepsEl.appendChild(wrap);
+                scrollChatToBottom();
+
+            } else if (item.type === 'done') {
+                done = true;
+                es.close();
+                delete activeStreams[requestId];
+
+                // item.content may be empty when "done" is only a stream-close signal after media.
+                const finalText = item.content || accumulatedText;
+
+                if (!botEl && finalText) {
+                    if (loadingEl) { loadingEl.remove(); loadingEl = null; }
+                    addBotMessage(finalText, new Date((item.timestamp || Date.now() / 1000) * 1000), requestId);
+                } else if (botEl) {
+                    contentEl.classList.remove('sse-streaming');
+                    // Only update text content when there is something new to show.
+                    if (finalText) contentEl.innerHTML = renderMarkdown(finalText);
+                    applyHighlighting(botEl);
+                }
+                scrollChatToBottom();
+
+            } else if (item.type === 'error') {
+                done = true;
+                es.close();
+                delete activeStreams[requestId];
+                if (loadingEl) { loadingEl.remove(); loadingEl = null; }
+                addBotMessage(t('error_send'), new Date());
            }
+        };

-        } else if (item.type === 'image') {
-            ensureBotEl();
-            const imgEl = document.createElement('img');
-            imgEl.src = item.content;
-            imgEl.alt = 'screenshot';
-            imgEl.style.cssText = 'max-width:600px;border-radius:8px;margin:8px 0;cursor:pointer;box-shadow:0 1px 4px rgba(0,0,0,0.1);';
-            imgEl.onclick = () => window.open(item.content, '_blank');
-            mediaEl.appendChild(imgEl);
-            scrollChatToBottom();
-
-        } else if (item.type === 'file') {
-            ensureBotEl();
-            const fileName = item.file_name || item.content.split('/').pop();
-            const fileEl = document.createElement('a');
-            fileEl.href = item.content;
-            fileEl.download = fileName;
-            fileEl.target = '_blank';
-            fileEl.className = 'file-attachment';
-            fileEl.style.cssText = 'display:inline-flex;align-items:center;gap:6px;padding:8px 14px;margin:8px 0;border-radius:8px;background:var(--bg-secondary,#f3f4f6);color:var(--text-primary,#374151);text-decoration:none;font-size:14px;border:1px solid var(--border-color,#e5e7eb);';
-            fileEl.innerHTML = `<i class="fas fa-file-download" style="color:#6b7280;"></i> ${fileName}`;
-            mediaEl.appendChild(fileEl);
-            scrollChatToBottom();
-
-        } else if (item.type === 'phase') {
-            // Coarse progress (e.g. cow install-browser); must not close SSE (unlike "done")
-            ensureBotEl();
-            const wrap = document.createElement('div');
-            wrap.className = 'text-xs sm:text-sm text-slate-600 dark:text-slate-400 border-l-2 border-primary-400 pl-2 py-1 my-0.5';
-            wrap.textContent = String(item.content || '');
-            stepsEl.appendChild(wrap);
-            scrollChatToBottom();
-
-        } else if (item.type === 'done') {
+        es.onerror = function() {
            es.close();
            delete activeStreams[requestId];

-            const finalText = item.content || accumulatedText;
+            if (done) return;

-            if (!botEl && finalText) {
-                if (loadingEl) { loadingEl.remove(); loadingEl = null; }
-                addBotMessage(finalText, new Date((item.timestamp || Date.now() / 1000) * 1000), requestId);
-            } else if (botEl) {
+            if (reconnectCount < MAX_RECONNECTS) {
+                reconnectCount++;
+                const delay = Math.min(RECONNECT_BASE_MS * reconnectCount, 5000);
+                console.warn(`[SSE] connection lost for ${requestId}, reconnecting in ${delay}ms (attempt ${reconnectCount}/${MAX_RECONNECTS})`);
+                setTimeout(connect, delay);
+                return;
+            }
+
+            // Exhausted retries, show whatever we have
+            if (loadingEl) { loadingEl.remove(); loadingEl = null; }
+            if (!botEl) {
+                addBotMessage(t('error_send'), new Date());
+            } else if (accumulatedText) {
                contentEl.classList.remove('sse-streaming');
-                if (finalText) contentEl.innerHTML = renderMarkdown(finalText);
+                contentEl.innerHTML = renderMarkdown(accumulatedText);
                applyHighlighting(botEl);
            }
-            scrollChatToBottom();
+        };
+    }

-        } else if (item.type === 'error') {
-            es.close();
-            delete activeStreams[requestId];
-            if (loadingEl) { loadingEl.remove(); loadingEl = null; }
-            addBotMessage(t('error_send'), new Date());
-        }
-    };
-
-    es.onerror = function() {
-        es.close();
-        delete activeStreams[requestId];
-        if (loadingEl) { loadingEl.remove(); loadingEl = null; }
-        if (!botEl) {
-            addBotMessage(t('error_send'), new Date());
-        } else if (accumulatedText) {
-            contentEl.classList.remove('sse-streaming');
-            contentEl.innerHTML = renderMarkdown(accumulatedText);
-            applyHighlighting(botEl);
-        }
-    };
+    connect();
 }

 function startPolling() {
--- a/channel/web/web_channel.py
+++ b/channel/web/web_channel.py
@@ -126,6 +126,13 @@ class WebChannel(ChatChannel):
                    logger.debug(f"SSE skipped duplicate file for request {request_id}")
                    return

+                # Skip http-URL FILE/IMAGE_URL replies produced by chat_channel's media extraction:
+                # the text reply (already sent as "done") contains the URL and the frontend will
+                # render it via renderMarkdown/injectVideoPlayers, so no separate SSE event needed.
+                if reply.type in (ReplyType.FILE, ReplyType.IMAGE_URL) and content.startswith(("http://", "https://")):
+                    logger.debug(f"SSE skipped http media reply for request {request_id}")
+                    return
+
                self.sse_queues[request_id].put({
                    "type": "done",
                    "content": content,
@@ -322,14 +329,18 @@ class WebChannel(ChatChannel):
        """
        SSE generator for a given request_id.
        Yields UTF-8 encoded bytes to avoid WSGI Latin-1 mangling.
+        Supports client reconnection: the queue is only removed after a
+        "done" event is consumed, so a new GET /stream with the same
+        request_id can resume reading remaining events.
        """
        if request_id not in self.sse_queues:
            yield b"data: {\"type\": \"error\", \"message\": \"invalid request_id\"}\n\n"
            return

        q = self.sse_queues[request_id]
-        timeout = 300  # 5 minutes max
-        deadline = time.time() + timeout
+        idle_timeout = 600  # 10 minutes without any real event
+        deadline = time.time() + idle_timeout
+        done = False

        try:
            while time.time() < deadline:
@@ -339,13 +350,18 @@ class WebChannel(ChatChannel):
                    yield b": keepalive\n\n"
                    continue

+                # Real event received, reset idle deadline
+                deadline = time.time() + idle_timeout
+
                payload = json.dumps(item, ensure_ascii=False)
                yield f"data: {payload}\n\n".encode("utf-8")

                if item.get("type") == "done":
+                    done = True
                    break
        finally:
-            self.sse_queues.pop(request_id, None)
+            if done:
+                self.sse_queues.pop(request_id, None)

    def poll_response(self):
        """
@@ -447,8 +463,14 @@ class WebChannel(ChatChannel):
        func = web.httpserver.StaticMiddleware(app.wsgifunc())
        func = web.httpserver.LogMiddleware(func)
        server = web.httpserver.WSGIServer(("0.0.0.0", port), func)
-        # Allow concurrent requests by not blocking on in-flight handler threads
        server.daemon_threads = True
+        # Default request_queue_size(5) / timeout(10s) / numthreads(10) are
+        # too small: when SSE streams occupy many threads, the backlog fills
+        # and new connections get refused (ERR_CONNECTION_ABORTED).
+        server.request_queue_size = 128
+        server.timeout = 300
+        server.requests.min = 20
+        server.requests.max = 80
        self._http_server = server
        try:
            server.start()
@@ -563,7 +585,7 @@ class ConfigHandler:
    _RECOMMENDED_MODELS = [
        const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
        const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
-        const.QWEN3_MAX, const.QWEN35_PLUS,
+        const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
        const.KIMI_K2_5, const.KIMI_K2,
        const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
        const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
@@ -592,7 +614,7 @@ class ConfigHandler:
            "api_key_field": "dashscope_api_key",
            "api_base_key": None,
            "api_base_default": None,
-            "models": [const.QWEN3_MAX, const.QWEN35_PLUS],
+            "models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
        }),
        ("moonshot", {
            "label": "Kimi",
--- a/channel/weixin/weixin_api.py
+++ b/channel/weixin/weixin_api.py
@@ -37,11 +37,19 @@ def _random_wechat_uin() -> str:
    return base64.b64encode(str(val).encode("utf-8")).decode("utf-8")


+CHANNEL_VERSION = "2.0.0"
+# iLink-App-ClientVersion: uint32 encoded as major<<16 | minor<<8 | patch
+# 2.0.0 → 0x00020000 = 131072
+CLIENT_VERSION = "131072"
+
+
 def _build_headers(token: str = "") -> dict:
    headers = {
        "Content-Type": "application/json",
        "AuthorizationType": "ilink_bot_token",
        "X-WECHAT-UIN": _random_wechat_uin(),
+        "iLink-App-Id": "bot",
+        "iLink-App-ClientVersion": CLIENT_VERSION,
    }
    if token:
        headers["Authorization"] = f"Bearer {token}"
@@ -64,6 +72,7 @@ class WeixinApi:
    def _post(self, endpoint: str, body: dict, timeout: int = DEFAULT_API_TIMEOUT) -> dict:
        url = _ensure_trailing_slash(self.base_url) + endpoint
        headers = _build_headers(self.token)
+        body.setdefault("base_info", {}).setdefault("channel_version", CHANNEL_VERSION)
        try:
            resp = requests.post(url, json=body, headers=headers, timeout=timeout)
            resp.raise_for_status()
@@ -210,7 +219,10 @@ class WeixinApi:
    def poll_qr_status(self, qrcode: str, timeout: int = QR_POLL_TIMEOUT) -> dict:
        url = (_ensure_trailing_slash(self.base_url) +
               f"ilink/bot/get_qrcode_status?qrcode={requests.utils.quote(qrcode)}")
-        headers = {"iLink-App-ClientVersion": "1"}
+        headers = {
+            "iLink-App-Id": "bot",
+            "iLink-App-ClientVersion": CLIENT_VERSION,
+        }
        try:
            resp = requests.get(url, headers=headers, timeout=timeout)
            resp.raise_for_status()
--- a/channel/weixin/weixin_channel.py
+++ b/channel/weixin/weixin_channel.py
@@ -166,10 +166,18 @@ class WeixinChannel(ChatChannel):
        print("=" * 60)
        try:
            import qrcode as qr_lib
+            import io
            qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
            qr.add_data(qrcode_url)
            qr.make(fit=True)
-            qr.print_ascii(invert=True)
+            buf = io.StringIO()
+            qr.print_ascii(out=buf, invert=True)
+            try:
+                print(buf.getvalue())
+            except UnicodeEncodeError:
+                # Windows GBK terminals cannot render Unicode block characters
+                print(f"\n  (终端不支持显示二维码，请使用链接扫码)")
+                print(f"  二维码链接: {qrcode_url}\n")
        except ImportError:
            print(f"\n  二维码链接: {qrcode_url}")
            print("  (安装 'qrcode' 包可在终端显示二维码)\n")
--- a/cli/commands/process.py
+++ b/cli/commands/process.py
@@ -178,7 +178,10 @@ def update(ctx):
    """Update CowAgent and restart."""
    root = get_project_root()

-    # 1. Git pull while service is still running
+    # 1. Stop service first so git pull won't conflict with running code
+    ctx.invoke(stop)
+
+    # 2. Git pull
    if os.path.isdir(os.path.join(root, ".git")):
        click.echo("Pulling latest code...")
        ret = subprocess.call(["git", "pull"], cwd=root)
@@ -188,28 +191,61 @@ def update(ctx):
    else:
        click.echo("Not a git repository, skipping code update.")

-    # 2. Stop service
-    ctx.invoke(stop)
-
-    # 3. Install dependencies
    python = sys.executable
    req_file = os.path.join(root, "requirements.txt")
-    if os.path.exists(req_file):
-        click.echo("Installing dependencies...")
-        subprocess.call(
-            [python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
+
+    if _IS_WIN:
+        # On Windows, `cow.exe` (this process) locks the exe file, so
+        # `pip install -e .` fails with WinError 5.  Write a small .bat
+        # helper that waits for cow.exe to exit, then installs & starts.
+        bat = os.path.join(root, "_cow_update.bat")
+        lines = [
+            "@echo off",
+            "chcp 65001 >nul",
+            "echo Waiting for cow.exe to exit...",
+            "timeout /t 3 /nobreak >nul",
+        ]
+        if os.path.exists(req_file):
+            lines.append(f'echo Installing dependencies...')
+            lines.append(f'"{python}" -m pip install -r requirements.txt -q')
+        lines += [
+            "echo Reinstalling cow CLI...",
+            f'"{python}" -m pip install -e . -q',
+            "echo Starting CowAgent...",
+            f'"{python}" -m cli.cli start --no-logs',
+            "echo.",
+            "echo Update complete. You can close this window.",
+            "pause >nul",
+            "del \"%~f0\"",
+        ]
+        with open(bat, "w", encoding="utf-8") as f:
+            f.write("\n".join(lines) + "\n")
+
+        subprocess.Popen(
+            ["cmd.exe", "/c", "start", "CowAgent Update", "/wait", bat],
+            cwd=root,
+        )
+        click.echo(click.style(
+            "✓ Update script launched. Please follow the new window for progress.",
+            fg="green"))
+    else:
+        # 3. Install dependencies
+        if os.path.exists(req_file):
+            click.echo("Installing dependencies...")
+            subprocess.call(
+                [python, "-m", "pip", "install", "-r", "requirements.txt", "-q"],
+                cwd=root,
+            )
+        click.echo("Reinstalling cow CLI...")
+        subprocess.call(
+            [python, "-m", "pip", "install", "-e", ".", "-q"],
            cwd=root,
        )
-    click.echo("Reinstalling cow CLI...")
-    subprocess.call(
-        [python, "-m", "pip", "install", "-e", ".", "-q"],
-        cwd=root,
-    )

-    # 4. Start service and follow logs
-    click.echo("")
-    time.sleep(1)
-    ctx.invoke(start, no_logs=False)
+        # 4. Start service
+        click.echo("")
+        time.sleep(1)
+        ctx.invoke(start, no_logs=False)


@click.command()
--- a/common/cloud_client.py
+++ b/common/cloud_client.py
@@ -47,8 +47,8 @@ CREDENTIAL_MAP = {


 class CloudClient(LinkAIClient):
-    def __init__(self, api_key: str, channel, host: str = ""):
-        super().__init__(api_key, host)
+    def __init__(self, api_key: str, channel, host: str = "", port=None):
+        super().__init__(api_key, host, port=port)
        self.channel = channel
        self.client_type = channel.channel_type
        self.channel_mgr = None
@@ -733,7 +733,7 @@ def start(channel, channel_mgr=None):
        return

    global chat_client
-    chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), channel=channel)
+    chat_client = CloudClient(api_key=conf().get("linkai_api_key"), host=conf().get("cloud_host", ""), port=conf().get("cloud_port"), channel=channel)
    chat_client.channel_mgr = channel_mgr
    chat_client.config = _build_config()
    chat_client.start()
--- a/common/const.py
+++ b/common/const.py
@@ -7,8 +7,8 @@ XUNFEI = "xunfei"
 CHATGPTONAZURE = "chatGPTOnAzure"
 LINKAI = "linkai"
 CLAUDEAPI= "claudeAPI"
-QWEN = "qwen"  # 旧版千问接入
-QWEN_DASHSCOPE = "dashscope"  # 新版千问接入(百炼)
+QWEN = "qwen"  # 千问 (兼容旧配置，实际走 DashscopeBot)
+QWEN_DASHSCOPE = "dashscope"  # 千问 DashScope 接入
 GEMINI = "gemini" 
 ZHIPU_AI = "zhipu"  
 MOONSHOT = "moonshot"
@@ -81,14 +81,14 @@ TTS_1_HD = "tts-1-hd"
 DEEPSEEK_CHAT = "deepseek-chat"  # DeepSeek-V3对话模型
 DEEPSEEK_REASONER = "deepseek-reasoner"  # DeepSeek-R1模型

-# Qwen (通义千问 - 阿里云)
-QWEN = "qwen"
+# Qwen (通义千问 - 阿里云 DashScope)
 QWEN_TURBO = "qwen-turbo"
 QWEN_PLUS = "qwen-plus"
 QWEN_MAX = "qwen-max"
 QWEN_LONG = "qwen-long"
 QWEN3_MAX = "qwen3-max"  # Qwen3 Max - Agent推荐模型
 QWEN35_PLUS = "qwen3.5-plus"  # Qwen3.5 Plus - Omni model (MultiModalConversation)
+QWEN36_PLUS = "qwen3.6-plus"  # Qwen3.6 Plus - Omni model (MultiModalConversation)
 QWQ_PLUS = "qwq-plus"

 # MiniMax
@@ -172,7 +172,7 @@ MODEL_LIST = [
              DEEPSEEK_CHAT, DEEPSEEK_REASONER,
              
              # Qwen
-              QWEN, QWEN_TURBO, QWEN_PLUS, QWEN_MAX, QWEN_LONG, QWEN3_MAX, QWEN35_PLUS,
+              QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
              
              # MiniMax
              MiniMax, MINIMAX_M2_7, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
--- a/config.py
+++ b/config.py
@@ -189,6 +189,7 @@ available_setting = {
    "linkai_app_code": "",
    "linkai_api_base": "https://api.link-ai.tech",  # linkAI服务地址
    "cloud_host": "client.link-ai.tech",
+    "cloud_port": None,
    "cloud_deployment_id": "",
    "minimax_api_key": "",
    "Minimax_group_id": "",
--- a/docs/commands/general.mdx
+++ b/docs/commands/general.mdx
--- a/docs/commands/index.mdx
+++ b/docs/commands/index.mdx
--- a/docs/commands/process.mdx
+++ b/docs/commands/process.mdx
--- a/docs/commands/skill.mdx
+++ b/docs/commands/skill.mdx
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -171,10 +171,10 @@
              {
                "group": "命令系统",
                "pages": [
-                  "commands/index",
-                  "commands/process",
-                  "commands/skill",
-                  "commands/general"
+                  "cli/index",
+                  "cli/process",
+                  "cli/skill",
+                  "cli/general"
                ]
              }
            ]
@@ -327,15 +327,15 @@
            ]
          },
          {
-            "tab": "Commands",
+            "tab": "CLI",
            "groups": [
              {
                "group": "Command System",
                "pages": [
-                  "en/commands/index",
-                  "en/commands/process",
-                  "en/commands/skill",
-                  "en/commands/chat"
+                  "en/cli/index",
+                  "en/cli/process",
+                  "en/cli/skill",
+                  "en/cli/chat"
                ]
              }
            ]
@@ -488,15 +488,15 @@
            ]
          },
          {
-            "tab": "コマンド",
+            "tab": "CLI",
            "groups": [
              {
                "group": "コマンドシステム",
                "pages": [
-                  "ja/commands/index",
-                  "ja/commands/process",
-                  "ja/commands/skill",
-                  "ja/commands/general"
+                  "ja/cli/index",
+                  "ja/cli/process",
+                  "ja/cli/skill",
+                  "ja/cli/general"
                ]
              }
            ]
--- a/docs/en/README.md
+++ b/docs/en/README.md
@@ -76,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex

 After running, the Web service starts by default. Access `http://localhost:9899/chat` to chat.

-Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/commands/index) to manage the service.
+Script usage: [One-click Install](https://docs.cowagent.ai/en/guide/quick-start). After installation, you can also use `cow start`, `cow stop`, and other [CLI commands](https://docs.cowagent.ai/en/cli/index) to manage the service.

 ### Manual Installation

@@ -100,7 +100,7 @@ pip3 install -r requirements-optional.txt   # optional but recommended
 pip3 install -e .
 ```

-After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/commands/index).
+After installation, use `cow` commands to manage the service (start, stop, update, etc.) and skills. See [Command Docs](https://docs.cowagent.ai/en/cli/index).

 **4. Install browser (optional)**

@@ -165,7 +165,7 @@ Supports mainstream model providers. Recommended models for Agent mode:
 | GLM | `glm-5-turbo` |
 | Kimi | `kimi-k2.5` |
 | Doubao | `doubao-seed-2-0-code-preview-260215` |
-| Qwen | `qwen3.5-plus` |
+| Qwen | `qwen3.6-plus` |
 | Claude | `claude-sonnet-4-6` |
 | Gemini | `gemini-3.1-pro-preview` |
 | OpenAI | `gpt-5.4` |
--- a/docs/en/commands/general.mdx
+++ b/docs/en/commands/general.mdx
--- a/docs/en/commands/index.mdx
+++ b/docs/en/commands/index.mdx
--- a/docs/en/commands/process.mdx
+++ b/docs/en/commands/process.mdx
--- a/docs/en/commands/skill.mdx
+++ b/docs/en/commands/skill.mdx
--- a/docs/en/guide/quick-start.mdx
+++ b/docs/en/guide/quick-start.mdx
@@ -47,7 +47,7 @@ After installation, use the `cow` command to manage the service:
 | `cow update` | Update code and restart |
 | `cow install-browser` | Install browser tool dependencies |

-See the [Commands documentation](/en/commands/index) for more details.
+See the [Commands documentation](/en/cli/index) for more details.

 <Note>
  If the `cow` command is not available, you can use `./run.sh <command>` (Linux/macOS) or `.\scripts\run.ps1 <command>` (Windows) as a fallback. Both are functionally equivalent.
--- a/docs/en/intro/features.mdx
+++ b/docs/en/intro/features.mdx
@@ -117,4 +117,4 @@ cow skill install pptx # Install a skill
 cow install-browser    # Install browser tool
 ```

-See [Command Overview](https://docs.cowagent.ai/en/commands) for details.
+See [Command Overview](https://docs.cowagent.ai/en/cli) for details.
--- a/docs/en/intro/index.mdx
+++ b/docs/en/intro/index.mdx
@@ -31,7 +31,7 @@ CowAgent can proactively think and plan tasks, operate computers and external re
  <Card title="Tool System" icon="wrench" href="/en/tools/index">
    Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more. The Agent autonomously invokes tools to accomplish complex tasks.
  </Card>
-  <Card title="Command System" icon="terminal" href="/en/commands/index">
+  <Card title="Command System" icon="terminal" href="/en/cli/index">
    Provides terminal CLI and in-chat commands for process management, skill installation, configuration, context inspection, and other common operations.
  </Card>
  <Card title="Multiple Model Support" icon="microchip" href="/en/models/index">
--- a/docs/en/models/index.mdx
+++ b/docs/en/models/index.mdx
@@ -6,7 +6,7 @@ description: Supported models and recommended choices for CowAgent
 CowAgent supports mainstream LLMs from domestic and international providers. Model interfaces are implemented in the project's `models/` directory.

 <Note>
-  For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.5-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
+  For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.6-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
 </Note>

 ## Configuration
@@ -25,7 +25,7 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
    glm-5-turbo, glm-5 and other series models
  </Card>
  <Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
-    qwen3.5-plus, qwen3-max and more
+    qwen3.6-plus, qwen3-max and more
  </Card>
  <Card title="Kimi" href="/en/models/kimi">
    kimi-k2.5, kimi-k2 and more
--- a/docs/en/models/qwen.mdx
+++ b/docs/en/models/qwen.mdx
@@ -5,14 +5,14 @@ description: Tongyi Qianwen model configuration

 ```json
 {
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "dashscope_api_key": "YOUR_API_KEY"
 }
 ```

 | Parameter | Description |
 | --- | --- |
-| `model` | Options include `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
+| `model` | Options include `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
 | `dashscope_api_key` | Create at [Bailian Console](https://bailian.console.aliyun.com/?tab=model#/api-key). See [official docs](https://bailian.console.aliyun.com/?tab=api#/api) |

 OpenAI-compatible configuration is also supported:
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
 ```json
 {
  "bot_type": "openai",
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "open_ai_api_key": "YOUR_API_KEY"
 }
--- a/docs/en/releases/v2.0.5.mdx
+++ b/docs/en/releases/v2.0.5.mdx
@@ -12,7 +12,7 @@ New CLI command system for managing CowAgent from terminal and chat:
 - **Web console**: Type `/` in the input box to open a slash command menu, with arrow-key input history
 - **Windows support**: New PowerShell script `scripts/run.ps1` with `cow` command support

-Docs: [Command Overview](https://docs.cowagent.ai/en/commands)
+Docs: [Command Overview](https://docs.cowagent.ai/en/cli)

 <img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />

--- a/docs/en/skills/index.mdx
+++ b/docs/en/skills/index.mdx
@@ -17,7 +17,7 @@ CowAgent offers multiple ways to acquire skills:
 - **URL** — Install from zip archives or SKILL.md links
 - **Conversational creation** — Let the Agent create skills through natural language conversation

-See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/commands/skill) for details. You can also [create skills](/en/skills/create) through conversation.
+See [Install Skills](/en/skills/install) and [Skill Management Commands](/en/cli/skill) for details. You can also [create skills](/en/skills/create) through conversation.

 ## Skill Loading Priority

--- a/docs/en/skills/install.mdx
+++ b/docs/en/skills/install.mdx
@@ -49,5 +49,5 @@ Supports zip archives and SKILL.md file links:
 ```

 <Tip>
-  All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/commands/skill) for full documentation.
+  All commands above work in the terminal by replacing `/skill` with `cow skill`. See [Skill Management Commands](/en/cli/skill) for full documentation.
 </Tip>
--- a/docs/en/tools/vision.mdx
+++ b/docs/en/tools/vision.mdx
@@ -0,0 +1,72 @@
+---
+title: vision - Image Analysis
+description: Analyze image content (recognition, description, OCR, etc.)
+---
+
+Analyze local images or image URLs using Vision API. Supports content description, text extraction (OCR), object recognition, and more.
+
+## Model Selection
+
+The vision tool uses a multi-level auto-selection strategy with automatic fallback — no manual configuration required:
+
+1. **Main model** — uses the currently configured main model for image recognition (zero extra cost)
+2. **Other configured models** — auto-discovers other models with configured API keys as alternatives
+3. **OpenAI** — uses `open_ai_api_key` to call gpt-4.1-mini
+4. **LinkAI** — uses `linkai_api_key` to call LinkAI vision service
+
+When `use_linkai=true`, LinkAI is promoted to the highest priority.
+
+If the current provider fails, the tool automatically tries the next one until it succeeds or all fail.
+
+### Supported Models
+
+| Vendor | Vision Model | Notes |
+| --- | --- | --- |
+| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
+| Qwen (DashScope) | Main model | Via MultiModalConversation API |
+| Claude | Main model | Anthropic native image format |
+| Gemini | Main model | inlineData format |
+| Doubao | Main model | doubao-seed-2-0 series natively supported |
+| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
+| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
+| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
+
+<Note>
+  ZhipuAI and MiniMax text models do not support image understanding, so their dedicated vision models are always used automatically.
+</Note>
+
+## Parameters
+
+| Parameter | Type | Required | Description |
+| --- | --- | --- | --- |
+| `image` | string | Yes | Local file path or HTTP(S) image URL |
+| `question` | string | Yes | Question to ask about the image |
+
+Supported image formats: jpg, jpeg, png, gif, webp
+
+## Custom Configuration
+
+To specify a particular model for the vision tool, add to `config.json`:
+
+```json
+{
+    "tool": {
+        "vision": {
+            "model": "gpt-4o"
+        }
+    }
+}
+```
+
+In most cases no configuration is needed. The tool works automatically as long as the main model supports multimodal input or any vision-capable API key is configured.
+
+## Use Cases
+
+- Describe image content
+- Extract text from images (OCR)
+- Identify objects, colors, scenes
+- Analyze screenshots and scanned documents
+
+<Note>
+  Images larger than 1MB are automatically compressed (max edge 1536px). All images (including remote URLs) are converted to base64 for transmission to ensure compatibility with all model backends.
+</Note>
--- a/docs/guide/quick-start.mdx
+++ b/docs/guide/quick-start.mdx
@@ -47,7 +47,7 @@ description: 使用脚本一键安装和管理 CowAgent
 | `cow update` | 更新代码并重启 |
 | `cow install-browser` | 安装浏览器工具依赖 |

-更多命令和用法参考 [命令文档](/commands/index)。
+更多命令和用法参考 [命令文档](/cli/index)。

 <Note>
  如果 `cow` 命令不可用，也可以使用 `./run.sh <命令>`（Linux/macOS）或 `.\scripts\run.ps1 <命令>`（Windows）作为替代，功能等效。
--- a/docs/guide/upgrade.mdx
+++ b/docs/guide/upgrade.mdx
@@ -36,7 +36,7 @@ pip3 install -e .
 更新完成后重启服务：

 ```bash
-# 使用 Cow CLI
+# 使用 Cow CLI (推荐)
 cow restart

 # 或使用 run.sh
--- a/docs/intro/features.mdx
+++ b/docs/intro/features.mdx
@@ -118,6 +118,6 @@ cow skill install pptx # 安装技能
 cow install-browser    # 安装浏览器工具
 ```

-详细命令参考 [命令总览](https://docs.cowagent.ai/commands)。
+详细命令参考 [命令总览](https://docs.cowagent.ai/cli)。

 <img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />
--- a/docs/intro/index.mdx
+++ b/docs/intro/index.mdx
@@ -36,7 +36,7 @@ CowAgent 支持灵活切换多种模型，能处理文本、语音、图片、
  <Card title="工具系统" icon="wrench" href="/tools/index">
    内置文件读写、终端执行、浏览器操作、定时任务、消息发送等工具，Agent 可自主调用工具完成复杂任务。
  </Card>
-  <Card title="命令系统" icon="terminal" href="/commands/index">
+  <Card title="命令系统" icon="terminal" href="/cli/index">
    提供终端 CLI 和对话中的命令，支持进程管理、技能安装、配置修改、上下文查看等常用操作。
  </Card>
  <Card title="多模型支持" icon="microchip" href="/models/index">
--- a/docs/ja/README.md
+++ b/docs/ja/README.md
@@ -76,7 +76,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex

 実行後、デフォルトでWebサービスが起動します。`http://localhost:9899/chat` にアクセスしてチャットを開始できます。

-スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start`、`cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/commands/index)でサービスを管理できます。
+スクリプトの使い方: [ワンクリックインストール](https://docs.cowagent.ai/ja/guide/quick-start)。インストール後は `cow start`、`cow stop` などの [CLI コマンド](https://docs.cowagent.ai/ja/cli/index)でサービスを管理できます。

 ### 手動インストール

@@ -100,7 +100,7 @@ pip3 install -r requirements-optional.txt   # 任意ですが推奨
 pip3 install -e .
 ```

-インストール後、`cow` コマンドでサービス管理（起動、停止、更新など）やSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/commands/index)を参照してください。
+インストール後、`cow` コマンドでサービス管理（起動、停止、更新など）やSkill管理ができます。[コマンドドキュメント](https://docs.cowagent.ai/ja/cli/index)を参照してください。

 **4. ブラウザのインストール（任意）**

@@ -165,7 +165,7 @@ sudo docker logs -f chatgpt-on-wechat
 | GLM | `glm-5-turbo` |
 | Kimi | `kimi-k2.5` |
 | Doubao | `doubao-seed-2-0-code-preview-260215` |
-| Qwen | `qwen3.5-plus` |
+| Qwen | `qwen3.6-plus` |
 | Claude | `claude-sonnet-4-6` |
 | Gemini | `gemini-3.1-pro-preview` |
 | OpenAI | `gpt-5.4` |
--- a/docs/ja/commands/general.mdx
+++ b/docs/ja/commands/general.mdx
--- a/docs/ja/commands/index.mdx
+++ b/docs/ja/commands/index.mdx
--- a/docs/ja/commands/process.mdx
+++ b/docs/ja/commands/process.mdx
--- a/docs/ja/commands/skill.mdx
+++ b/docs/ja/commands/skill.mdx
--- a/docs/ja/guide/quick-start.mdx
+++ b/docs/ja/guide/quick-start.mdx
@@ -47,7 +47,7 @@ Linux、macOS、Windowsに対応しています。Python 3.7〜3.12が必要で
 | `cow update` | コードを更新して再起動 |
 | `cow install-browser` | ブラウザツールの依存をインストール |

-詳細は[コマンドドキュメント](/ja/commands/index)を参照してください。
+詳細は[コマンドドキュメント](/ja/cli/index)を参照してください。

 <Note>
  `cow` コマンドが利用できない場合は、`./run.sh <コマンド>`（Linux/macOS）または `.\scripts\run.ps1 <コマンド>`（Windows）で代替できます。機能は同等です。
--- a/docs/ja/intro/features.mdx
+++ b/docs/ja/intro/features.mdx
@@ -117,4 +117,4 @@ cow skill install pptx # Skill をインストール
 cow install-browser    # ブラウザツールをインストール
 ```

-詳細は [コマンド一覧](https://docs.cowagent.ai/ja/commands) を参照してください。
+詳細は [コマンド一覧](https://docs.cowagent.ai/ja/cli) を参照してください。
--- a/docs/ja/intro/index.mdx
+++ b/docs/ja/intro/index.mdx
@@ -31,7 +31,7 @@ CowAgent は自ら思考しタスクを計画し、コンピュータや外部
  <Card title="ツールシステム" icon="wrench" href="/ja/tools/index">
    ファイル読み書き、ターミナル実行、ブラウザ操作、スケジュールタスク、メッセージ送信などの組み込みツールを提供。Agent が自律的にツールを呼び出して複雑なタスクを完了します。
  </Card>
-  <Card title="コマンドシステム" icon="terminal" href="/ja/commands/index">
+  <Card title="コマンドシステム" icon="terminal" href="/ja/cli/index">
    ターミナル CLI とチャット内コマンドを提供し、プロセス管理、Skill インストール、設定変更、コンテキスト確認などの一般的な操作をサポートします。
  </Card>
  <Card title="複数モデル対応" icon="microchip" href="/ja/models/index">
--- a/docs/ja/models/index.mdx
+++ b/docs/ja/models/index.mdx
@@ -6,7 +6,7 @@ description: CowAgentがサポートするモデルとおすすめの選択肢
 CowAgentは国内外の主要なLLMをサポートしています。モデルインターフェースはプロジェクトの`models/`ディレクトリに実装されています。

 <Note>
-  Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
+  Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
 </Note>

 ## 設定
@@ -25,7 +25,7 @@ CowAgentは国内外の主要なLLMをサポートしています。モデルイ
    glm-5-turbo、glm-5およびその他のシリーズモデル
  </Card>
  <Card title="Qwen (通义千问)" href="/ja/models/qwen">
-    qwen3.5-plus、qwen3-maxなど
+    qwen3.6-plus、qwen3-maxなど
  </Card>
  <Card title="Kimi" href="/ja/models/kimi">
    kimi-k2.5、kimi-k2など
--- a/docs/ja/models/qwen.mdx
+++ b/docs/ja/models/qwen.mdx
@@ -1,18 +1,18 @@
 ---
-title: Qwen (通义千问)
-description: 通义千问モデルの設定
+title: Qwen (通義千問)
+description: 通義千問モデルの設定
 ---

 ```json
 {
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "dashscope_api_key": "YOUR_API_KEY"
 }
 ```

 | パラメータ | 説明 |
 | --- | --- |
-| `model` | `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
+| `model` | `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus`などから選択可能 |
 | `dashscope_api_key` | [百炼 Console](https://bailian.console.aliyun.com/?tab=model#/api-key)で作成。[公式ドキュメント](https://bailian.console.aliyun.com/?tab=api#/api)を参照 |

 OpenAI互換の設定もサポートしています:
@@ -20,7 +20,7 @@ OpenAI互換の設定もサポートしています:
 ```json
 {
  "bot_type": "openai",
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "open_ai_api_key": "YOUR_API_KEY"
 }
--- a/docs/ja/releases/v2.0.5.mdx
+++ b/docs/ja/releases/v2.0.5.mdx
@@ -12,7 +12,7 @@ description: CowAgent 2.0.5 - Cow CLI、Skill Hub オープンソース、ブラ
 - **Web コンソール**：入力欄で `/` を入力するとスラッシュコマンドメニューが表示、矢印キーで入力履歴を辿れる
 - **Windows サポート**：PowerShell スクリプト `scripts/run.ps1` を追加、`cow` コマンドに対応

-ドキュメント：[コマンド一覧](https://docs.cowagent.ai/ja/commands)
+ドキュメント：[コマンド一覧](https://docs.cowagent.ai/ja/cli)

 <img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />

--- a/docs/ja/skills/index.mdx
+++ b/docs/ja/skills/index.mdx
@@ -17,7 +17,7 @@ CowAgent ではスキルを取得する複数の方法を提供しています
 - **URL** — zip アーカイブや SKILL.md リンクからインストール
 - **会話で作成** — 自然言語の会話を通じて Agent にスキルを自動作成させる

-詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/commands/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。
+詳細は[スキルのインストール](/ja/skills/install)と[スキル管理コマンド](/ja/cli/skill)を参照してください。会話を通じて[スキルを作成](/ja/skills/create)することもできます。

 ## スキルの読み込み優先順位

--- a/docs/ja/skills/install.mdx
+++ b/docs/ja/skills/install.mdx
@@ -49,5 +49,5 @@ zip アーカイブと SKILL.md ファイルリンクに対応：
 ```

 <Tip>
-  上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/commands/skill)を参照してください。
+  上記のすべてのコマンドは、ターミナルでは `/skill` を `cow skill` に置き換えて使用できます。完全なコマンドドキュメントは[スキル管理コマンド](/ja/cli/skill)を参照してください。
 </Tip>
--- a/docs/ja/tools/vision.mdx
+++ b/docs/ja/tools/vision.mdx
@@ -0,0 +1,72 @@
+---
+title: vision - 画像分析
+description: 画像コンテンツの分析（認識、説明、OCR など）
+---
+
+Vision API を使用してローカル画像や画像 URL を分析します。コンテンツの説明、テキスト抽出（OCR）、オブジェクト認識などに対応しています。
+
+## モデル選択
+
+Vision ツールは多段階の自動選択＋自動フォールバック戦略を採用しており、手動設定なしで利用可能です：
+
+1. **メインモデル** — 現在設定されているメインモデルで画像認識を実行（追加コストなし）
+2. **その他の設定済みモデル** — API キーが設定されている他のマルチモーダルモデルを自動検出
+3. **OpenAI** — `open_ai_api_key` を使用して gpt-4.1-mini を呼び出し
+4. **LinkAI** — `linkai_api_key` を使用して LinkAI ビジョンサービスを呼び出し
+
+`use_linkai=true` の場合、LinkAI が最優先になります。
+
+現在のプロバイダーが失敗した場合、成功するかすべて失敗するまで自動的に次のプロバイダーを試行します。
+
+### 対応モデル
+
+| ベンダー | ビジョンモデル | 説明 |
+| --- | --- | --- |
+| OpenAI / 互換プロトコル | メインモデル | すべての OpenAI 互換マルチモーダルモデルに対応 |
+| 通義千問 (DashScope) | メインモデル | MultiModalConversation API 経由 |
+| Claude | メインモデル | Anthropic ネイティブ画像形式 |
+| Gemini | メインモデル | inlineData 形式 |
+| 豆包 (Doubao) | メインモデル | doubao-seed-2-0 シリーズがネイティブ対応 |
+| Kimi (Moonshot) | メインモデル | kimi-k2.5 がネイティブ対応 |
+| 智谱 AI | glm-5v-turbo | 常にビジョン専用モデルを使用 |
+| MiniMax | MiniMax-Text-01 | 常にビジョン専用モデルを使用 |
+
+<Note>
+  智谱 AI と MiniMax のテキストモデルは画像理解に対応していないため、対応するビジョン専用モデルが自動的に使用されます。
+</Note>
+
+## パラメータ
+
+| パラメータ | 型 | 必須 | 説明 |
+| --- | --- | --- | --- |
+| `image` | string | はい | ローカルファイルパスまたは HTTP(S) 画像 URL |
+| `question` | string | はい | 画像に対する質問 |
+
+対応画像形式：jpg、jpeg、png、gif、webp
+
+## カスタム設定
+
+Vision ツールで使用するモデルを指定するには、`config.json` に以下を追加します：
+
+```json
+{
+    "tool": {
+        "vision": {
+            "model": "gpt-4o"
+        }
+    }
+}
+```
+
+ほとんどの場合、設定は不要です。メインモデルがマルチモーダルに対応しているか、ビジョン対応の API キーが設定されていれば自動的に動作します。
+
+## ユースケース
+
+- 画像コンテンツの説明
+- 画像からのテキスト抽出（OCR）
+- オブジェクト、色、シーンの識別
+- スクリーンショットやスキャン文書の分析
+
+<Note>
+  1MB を超える画像は自動的に圧縮されます（最大辺 1536px）。すべての画像（リモート URL を含む）は base64 に変換して送信され、すべてのモデルバックエンドとの互換性を確保します。
+</Note>
--- a/docs/models/index.mdx
+++ b/docs/models/index.mdx
@@ -6,19 +6,20 @@ description: CowAgent 支持的模型及推荐选择
 CowAgent 支持国内外主流厂商的大语言模型，模型接口实现在项目的 `models/` 目录下。

 <Note>
-  Agent 模式下推荐使用以下模型，可根据效果及成本综合选择：MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
+  Agent 模式下推荐使用以下模型，可根据效果及成本综合选择：MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
+
+  同时支持使用 [LinkAI](https://link-ai.tech) 平台接口，可灵活切换多种模型，并支持知识库、工作流、插件等 Agent 能力。
 </Note>

 ## 配置方式

-根据所选模型，在 `config.json` 中填写对应的模型名称和 API Key 即可。每个模型也支持 OpenAI 兼容方式接入，将 `bot_type` 设为 `openai`，配置 `open_ai_api_base` 和 `open_ai_api_key`。
-
-同时支持使用 [LinkAI](https://link-ai.tech) 平台接口，可灵活切换多种模型，并支持知识库、工作流、插件等 Agent 能力。
-
-也可以通过 [Web 控制台](/channels/web) 在线管理模型配置，无需手动编辑配置文件：
+**方式一（推荐）：** 通过 [Web 控制台](/channels/web) 在线管理模型配置，无需手动编辑配置文件：

 <img width="850" src="https://cdn.link-ai.tech/doc/20260227173811.png" />

+**方式二：** 手动编辑 `config.json`，根据所选模型填写对应的模型名称和 API Key。每个模型也支持 OpenAI 兼容方式接入，将 `bot_type` 设为 `openai`，配置 `open_ai_api_base` 和 `open_ai_api_key` 即可。
+
+
 ## 支持的模型

 <CardGroup cols={2}>
@@ -29,7 +30,7 @@ CowAgent 支持国内外主流厂商的大语言模型，模型接口实现在
    glm-5-turbo、glm-5 等系列模型
  </Card>
  <Card title="通义千问 Qwen" href="/models/qwen">
-    qwen3.5-plus、qwen3-max 等
+    qwen3.6-plus、qwen3-max 等
  </Card>
  <Card title="Kimi" href="/models/kimi">
    kimi-k2.5、kimi-k2 等
@@ -54,6 +55,7 @@ CowAgent 支持国内外主流厂商的大语言模型，模型接口实现在
  </Card>
 </CardGroup>

+
 <Tip>
  全部模型名称可参考项目 [`common/const.py`](https://github.com/zhayujie/chatgpt-on-wechat/blob/master/common/const.py) 文件。
 </Tip>
--- a/docs/models/qwen.mdx
+++ b/docs/models/qwen.mdx
@@ -5,14 +5,14 @@ description: 通义千问模型配置

 ```json
 {
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "dashscope_api_key": "YOUR_API_KEY"
 }
 ```

 | 参数 | 说明 |
 | --- | --- |
-| `model` | 可填 `qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
+| `model` | 可填 `qwen3.6-plus`、`qwen3.5-plus`、`qwen3-max`、`qwen-max`、`qwen-plus`、`qwen-turbo`、`qwq-plus` 等 |
 | `dashscope_api_key` | 在 [百炼控制台](https://bailian.console.aliyun.com/?tab=model#/api-key) 创建，参考 [官方文档](https://bailian.console.aliyun.com/?tab=api#/api) |

 也支持 OpenAI 兼容方式接入：
@@ -20,7 +20,7 @@ description: 通义千问模型配置
 ```json
 {
  "bot_type": "openai",
-  "model": "qwen3.5-plus",
+  "model": "qwen3.6-plus",
  "open_ai_api_base": "https://dashscope.aliyuncs.com/compatible-mode/v1",
  "open_ai_api_key": "YOUR_API_KEY"
 }
--- a/docs/releases/v2.0.5.mdx
+++ b/docs/releases/v2.0.5.mdx
@@ -12,7 +12,7 @@ description: CowAgent 2.0.5 - Cow CLI、Skill Hub 开源、浏览器工具、企
 - **web控制台**：Web 控制台输入框输入 `/` 即可弹出指令菜单，支持方向键回溯历史输入
 - **Windows 支持**：新增 PowerShell 一键安装脚本 `scripts/run.ps1`，同时支持 `cow` 命令

-相关文档：[命令总览](https://docs.cowagent.ai/commands)
+相关文档：[命令总览](https://docs.cowagent.ai/cli)

 <img src="https://cdn.link-ai.tech/doc/20260401114549.png" width="750" />

--- a/docs/skills/index.mdx
+++ b/docs/skills/index.mdx
@@ -18,7 +18,7 @@ CowAgent 提供多种方式获取技能：
 - **URL** — 从 zip 压缩包或 SKILL.md 链接安装
 - **对话创建** — 通过自然语言对话让 Agent 自动创建技能

-详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/commands/skill)。也可以通过对话 [创建技能](/skills/create)，或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能。
+详细安装方式参考 [安装技能](/skills/install) 和 [技能管理命令](/cli/skill)。也可以通过对话 [创建技能](/skills/create)，或向 [Skill Hub](https://skills.cowagent.ai/submit) 贡献你的技能。

 ## 技能加载优先级

--- a/docs/skills/install.mdx
+++ b/docs/skills/install.mdx
@@ -62,5 +62,5 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **[Cow 技能广场]
 ```

 <Tip>
-  以上所有命令在终端中使用时，将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/commands/skill)。
+  以上所有命令在终端中使用时，将 `/skill` 替换为 `cow skill` 即可。完整命令说明参考 [技能管理命令](/cli/skill)。
 </Tip>
--- a/docs/tools/vision.mdx
+++ b/docs/tools/vision.mdx
@@ -5,14 +5,49 @@ description: 分析图片内容（识别、描述、OCR 等）

 使用 Vision API 分析本地图片或图片 URL，支持内容描述、文字提取（OCR）、物体识别等。

-## 依赖
+## 模型选择

-需要配置至少一个 API Key（通过 `env_config` 工具或工作空间 `.env` 文件配置）：
+Vision 工具采用多级自动选择 + 自动兜底策略，无需手动配置即可使用：

-| 后端 | 环境变量 | 优先级 |
+1. **主模型** — 优先使用当前配置的主模型进行图像识别（需要是多模态模型）
+2. **其他已配置模型** — 自动发现已配置 API Key 的其他多模态模型作为备选
+
+如果当前 provider 调用失败，会自动尝试下一个，直到成功或全部失败。
+
+### 支持的模型
+
+| 厂商 | 视觉模型 | 说明 |
 | --- | --- | --- |
-| OpenAI | `OPENAI_API_KEY` | 优先使用 |
-| LinkAI | `LINKAI_API_KEY` | 备选 |
+| OpenAI / 兼容协议 | 使用主模型 | 支持所有 OpenAI 协议兼容的多模态模型 |
+| 通义千问 (DashScope) | 使用主模型 | 例如 qwen3.6-plus 等 |
+| Claude | 使用主模型 | Anthropic 原生图像格式 |
+| Gemini | 使用主模型 | inlineData 格式 |
+| 豆包 (Doubao) | 使用主模型 | doubao-seed-2-0 系列原生支持 |
+| Kimi (Moonshot) | 使用主模型 | kimi-k2.5 原生支持 |
+| 智谱 AI | glm-5v-turbo | 固定使用视觉专用模型 |
+| MiniMax | MiniMax-Text-01 | 固定使用视觉专用模型 |
+
+<Note>
+  智谱和 MiniMax 的文本模型不支持图像理解，因此始终使用对应的视觉专用模型，无需手动指定。
+</Note>
+
+> 当 `use_linkai=true` 时，默认使用 LinkAI 的多模态模型进行
+
+## 自定义配置
+
+如果希望指定 Vision 使用的模型，可在 `config.json` 中配置，例如：
+
+```json
+{
+    "tool": {
+        "vision": {
+            "model": "gpt-4o"
+        }
+    }
+}
+```
+
+大多数情况下无需配置，主模型支持多模态或配置任意一个支持视觉的 API Key 即可自动工作。

 ## 参数

@@ -20,17 +55,18 @@ description: 分析图片内容（识别、描述、OCR 等）
 | --- | --- | --- | --- |
 | `image` | string | 是 | 本地文件路径或 HTTP(S) 图片 URL |
 | `question` | string | 是 | 对图片提出的问题 |
-| `model` | string | 否 | 模型名称（默认 gpt-4.1-mini） |

 支持的图片格式：jpg、jpeg、png、gif、webp

+
+
 ## 使用场景

 - 描述图片中的内容
 - 提取图片中的文字（OCR）
 - 识别物体、颜色、场景
- 分析截图、文档扫描件
+- 分析截图、文档扫描图片等

 <Note>
-  超过 1MB 的图片会自动压缩后上传。如果未配置任何 Vision API Key，该工具不会被加载。
+  超过 1MB 的图片会自动压缩后上传，所有图片（包括远程 URL）会统一转为 base64 传输，确保兼容所有模型后端。
 </Note>
--- a/models/ali/ali_qwen_bot.py
+++ b/models/ali/ali_qwen_bot.py
@@ -1,214 +0,0 @@
-# encoding:utf-8
-
-import json
-import time
-from typing import List, Tuple
-
-import openai
-from models.openai.openai_compat import RateLimitError, Timeout, APIError, APIConnectionError
-import broadscope_bailian
-from broadscope_bailian import ChatQaMessage
-
-from models.bot import Bot
-from models.ali.ali_qwen_session import AliQwenSession
-from models.session_manager import SessionManager
-from bridge.context import ContextType
-from bridge.reply import Reply, ReplyType
-from common.log import logger
-from common import const
-from config import conf, load_config
-
-class AliQwenBot(Bot):
-    def __init__(self):
-        super().__init__()
-        self.api_key_expired_time = self.set_api_key()
-        self.sessions = SessionManager(AliQwenSession, model=conf().get("model", const.QWEN))
-
-    def api_key_client(self):
-        return broadscope_bailian.AccessTokenClient(access_key_id=self.access_key_id(), access_key_secret=self.access_key_secret())
-
-    def access_key_id(self):
-        return conf().get("qwen_access_key_id")
-
-    def access_key_secret(self):
-        return conf().get("qwen_access_key_secret")
-
-    def agent_key(self):
-        return conf().get("qwen_agent_key")
-
-    def app_id(self):
-        return conf().get("qwen_app_id")
-
-    def node_id(self):
-        return conf().get("qwen_node_id", "")
-
-    def temperature(self):
-        return conf().get("temperature", 0.2 )
-
-    def top_p(self):
-        return conf().get("top_p", 1)
-
-    def reply(self, query, context=None):
-        # acquire reply content
-        if context.type == ContextType.TEXT:
-            logger.info("[QWEN] query={}".format(query))
-
-            session_id = context["session_id"]
-            reply = None
-            clear_memory_commands = conf().get("clear_memory_commands", ["#清除记忆"])
-            if query in clear_memory_commands:
-                self.sessions.clear_session(session_id)
-                reply = Reply(ReplyType.INFO, "记忆已清除")
-            elif query == "#清除所有":
-                self.sessions.clear_all_session()
-                reply = Reply(ReplyType.INFO, "所有人记忆已清除")
-            elif query == "#更新配置":
-                load_config()
-                reply = Reply(ReplyType.INFO, "配置已更新")
-            if reply:
-                return reply
-            session = self.sessions.session_query(query, session_id)
-            logger.debug("[QWEN] session query={}".format(session.messages))
-
-            reply_content = self.reply_text(session)
-            logger.debug(
-                "[QWEN] new_query={}, session_id={}, reply_cont={}, completion_tokens={}".format(
-                    session.messages,
-                    session_id,
-                    reply_content["content"],
-                    reply_content["completion_tokens"],
-                )
-            )
-            if reply_content["completion_tokens"] == 0 and len(reply_content["content"]) > 0:
-                reply = Reply(ReplyType.ERROR, reply_content["content"])
-            elif reply_content["completion_tokens"] > 0:
-                self.sessions.session_reply(reply_content["content"], session_id, reply_content["total_tokens"])
-                reply = Reply(ReplyType.TEXT, reply_content["content"])
-            else:
-                reply = Reply(ReplyType.ERROR, reply_content["content"])
-                logger.debug("[QWEN] reply {} used 0 tokens.".format(reply_content))
-            return reply
-
-        else:
-            reply = Reply(ReplyType.ERROR, "Bot不支持处理{}类型的消息".format(context.type))
-            return reply
-
-    def reply_text(self, session: AliQwenSession, retry_count=0) -> dict:
-        """
-        call bailian's ChatCompletion to get the answer
-        :param session: a conversation session
-        :param retry_count: retry count
-        :return: {}
-        """
-        try:
-            prompt, history = self.convert_messages_format(session.messages)
-            self.update_api_key_if_expired()
-            # NOTE 阿里百炼的call()函数未提供temperature参数，考虑到temperature和top_p参数作用相同，取两者较小的值作为top_p参数传入，详情见文档 https://help.aliyun.com/document_detail/2587502.htm
-            response = broadscope_bailian.Completions().call(app_id=self.app_id(), prompt=prompt, history=history, top_p=min(self.temperature(), self.top_p()))
-            completion_content = self.get_completion_content(response, self.node_id())
-            completion_tokens, total_tokens = self.calc_tokens(session.messages, completion_content)
-            return {
-                "total_tokens": total_tokens,
-                "completion_tokens": completion_tokens,
-                "content": completion_content,
-            }
-        except Exception as e:
-            need_retry = retry_count < 2
-            result = {"completion_tokens": 0, "content": "我现在有点累了，等会再来吧"}
-            if isinstance(e, RateLimitError):
-                logger.warn("[QWEN] RateLimitError: {}".format(e))
-                result["content"] = "提问太快啦，请休息一下再问我吧"
-                if need_retry:
-                    time.sleep(20)
-            elif isinstance(e, Timeout):
-                logger.warn("[QWEN] Timeout: {}".format(e))
-                result["content"] = "我没有收到你的消息"
-                if need_retry:
-                    time.sleep(5)
-            elif isinstance(e, APIError):
-                logger.warn("[QWEN] Bad Gateway: {}".format(e))
-                result["content"] = "请再问我一次"
-                if need_retry:
-                    time.sleep(10)
-            elif isinstance(e, APIConnectionError):
-                logger.warn("[QWEN] APIConnectionError: {}".format(e))
-                need_retry = False
-                result["content"] = "我连接不到你的网络"
-            else:
-                logger.exception("[QWEN] Exception: {}".format(e))
-                need_retry = False
-                self.sessions.clear_session(session.session_id)
-
-            if need_retry:
-                logger.warn("[QWEN] 第{}次重试".format(retry_count + 1))
-                return self.reply_text(session, retry_count + 1)
-            else:
-                return result
-
-    def set_api_key(self):
-        api_key, expired_time = self.api_key_client().create_token(agent_key=self.agent_key())
-        broadscope_bailian.api_key = api_key
-        return expired_time
-
-    def update_api_key_if_expired(self):
-        if time.time() > self.api_key_expired_time:
-            self.api_key_expired_time = self.set_api_key()
-
-    def convert_messages_format(self, messages) -> Tuple[str, List[ChatQaMessage]]:
-        history = []
-        user_content = ''
-        assistant_content = ''
-        system_content = ''
-        for message in messages:
-            role = message.get('role')
-            if role == 'user':
-                user_content += message.get('content')
-            elif role == 'assistant':
-                assistant_content = message.get('content')
-                history.append(ChatQaMessage(user_content, assistant_content))
-                user_content = ''
-                assistant_content = ''
-            elif role =='system':
-                system_content += message.get('content')
-        if user_content == '':
-            raise Exception('no user message')
-        if system_content != '':
-            # NOTE 模拟系统消息，测试发现人格描述以"你需要扮演ChatGPT"开头能够起作用，而以"你是ChatGPT"开头模型会直接否认
-            system_qa = ChatQaMessage(system_content, '好的，我会严格按照你的设定回答问题')
-            history.insert(0, system_qa)
-        logger.debug("[QWEN] converted qa messages: {}".format([item.to_dict() for item in history]))
-        logger.debug("[QWEN] user content as prompt: {}".format(user_content))
-        return user_content, history
-
-    def get_completion_content(self, response, node_id):
-        if not response['Success']:
-            return f"[ERROR]\n{response['Code']}:{response['Message']}"
-        text = response['Data']['Text']
-        if node_id == '':
-            return text
-        # TODO: 当使用流程编排创建大模型应用时，响应结构如下，最终结果在['finalResult'][node_id]['response']['text']中，暂时先这么写
-        # {
-        #     'Success': True,
-        #     'Code': None,
-        #     'Message': None,
-        #     'Data': {
-        #         'ResponseId': '9822f38dbacf4c9b8daf5ca03a2daf15',
-        #         'SessionId': 'session_id',
-        #         'Text': '{"finalResult":{"LLM_T7islK":{"params":{"modelId":"qwen-plus-v1","prompt":"${systemVars.query}${bizVars.Text}"},"response":{"text":"作为一个AI语言模型，我没有年龄，因为我没有生日。\n我只是一个程序，没有生命和身体。"}}}}',
-        #         'Thoughts': [],
-        #         'Debug': {},
-        #         'DocReferences': []
-        #     },
-        #     'RequestId': '8e11d31551ce4c3f83f49e6e0dd998b0',
-        #     'Failed': None
-        # }
-        text_dict = json.loads(text)
-        completion_content =  text_dict['finalResult'][node_id]['response']['text']
-        return completion_content
-
-    def calc_tokens(self, messages, completion_content):
-        completion_tokens = len(completion_content)
-        prompt_tokens = 0
-        for message in messages:
-            prompt_tokens += len(message["content"])
-        return completion_tokens, prompt_tokens + completion_tokens
--- a/models/ali/ali_qwen_session.py
+++ b/models/ali/ali_qwen_session.py
@@ -1,62 +0,0 @@
-from models.session_manager import Session
-from common.log import logger
-
-"""
-    e.g.
-    [
-        {"role": "system", "content": "You are a helpful assistant."},
-        {"role": "user", "content": "Who won the world series in 2020?"},
-        {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."},
-        {"role": "user", "content": "Where was it played?"}
-    ]
-"""
-
-class AliQwenSession(Session):
-    def __init__(self, session_id, system_prompt=None, model="qianwen"):
-        super().__init__(session_id, system_prompt)
-        self.model = model
-        self.reset()
-
-    def discard_exceeding(self, max_tokens, cur_tokens=None):
-        precise = True
-        try:
-            cur_tokens = self.calc_tokens()
-        except Exception as e:
-            precise = False
-            if cur_tokens is None:
-                raise e
-            logger.debug("Exception when counting tokens precisely for query: {}".format(e))
-        while cur_tokens > max_tokens:
-            if len(self.messages) > 2:
-                self.messages.pop(1)
-            elif len(self.messages) == 2 and self.messages[1]["role"] == "assistant":
-                self.messages.pop(1)
-                if precise:
-                    cur_tokens = self.calc_tokens()
-                else:
-                    cur_tokens = cur_tokens - max_tokens
-                break
-            elif len(self.messages) == 2 and self.messages[1]["role"] == "user":
-                logger.warn("user message exceed max_tokens. total_tokens={}".format(cur_tokens))
-                break
-            else:
-                logger.debug("max_tokens={}, total_tokens={}, len(messages)={}".format(max_tokens, cur_tokens, len(self.messages)))
-                break
-            if precise:
-                cur_tokens = self.calc_tokens()
-            else:
-                cur_tokens = cur_tokens - max_tokens
-        return cur_tokens
-
-    def calc_tokens(self):
-        return num_tokens_from_messages(self.messages, self.model)
-
-def num_tokens_from_messages(messages, model):
-    """Returns the number of tokens used by a list of messages."""
-    # 官方token计算规则："对于中文文本来说，1个token通常对应一个汉字；对于英文文本来说，1个token通常对应3至4个字母或1个单词"
-    # 详情请产看文档：https://help.aliyun.com/document_detail/2586397.html
-    # 目前根据字符串长度粗略估计token数，不影响正常使用
-    tokens = 0
-    for msg in messages:
-        tokens += len(msg["content"])
-    return tokens
--- a/models/bot.py
+++ b/models/bot.py
@@ -2,12 +2,27 @@
 Auto-replay chat robot abstract class
 """

-
 from bridge.context import Context
 from bridge.reply import Reply


 class Bot(object):
+    """
+    Base class for all chat-bot implementations.
+
+    Subclasses may also implement:
+
+        call_with_tools(messages, tools=None, stream=False, **kwargs)
+            -> dict | generator  (OpenAI-compatible format)
+
+        call_vision(image_url, question, model=None, max_tokens=1000)
+            -> dict with keys: model, content, usage  (or error/message)
+
+    These are NOT defined here to avoid shadowing concrete implementations
+    provided by mixin classes (e.g. OpenAICompatibleBot) in the MRO.
+    Use ``hasattr(bot, 'call_vision')`` to detect support at runtime.
+    """
+
    def reply(self, query, context: Context = None) -> Reply:
        """
        bot auto-reply content
--- a/models/bot_factory.py
+++ b/models/bot_factory.py
@@ -46,10 +46,7 @@ def create_bot(bot_type):
    elif bot_type == const.CLAUDEAPI:
        from models.claudeapi.claude_api_bot import ClaudeAPIBot
        return ClaudeAPIBot()
-    elif bot_type == const.QWEN:
-        from models.ali.ali_qwen_bot import AliQwenBot
-        return AliQwenBot()
-    elif bot_type == const.QWEN_DASHSCOPE:
+    elif bot_type in (const.QWEN, const.QWEN_DASHSCOPE):
        from models.dashscope.dashscope_bot import DashscopeBot
        return DashscopeBot()
    elif bot_type == const.GEMINI:
--- a/models/claudeapi/claude_api_bot.py
+++ b/models/claudeapi/claude_api_bot.py
@@ -1,7 +1,10 @@
 # encoding:utf-8

+import base64
 import json
+import re
 import time
+from typing import Optional

 import requests

@@ -224,6 +227,79 @@ class ClaudeAPIBot(Bot, OpenAIImage):
            return 64000
        return 8192

+    @staticmethod
+    def _parse_data_url(data_url: str):
+        """Parse a data:<mime>;base64,<data> URL into (media_type, base64_data)."""
+        m = re.match(r"^data:([^;]+);base64,(.+)$", data_url, re.DOTALL)
+        if m:
+            return m.group(1), m.group(2)
+        return None, None
+
+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using Claude Messages API (native image blocks)."""
+        try:
+            actual_model = model or self._model_mapping(conf().get("model"))
+
+            # Build Claude-native image content block
+            if image_url.startswith("data:"):
+                media_type, b64_data = self._parse_data_url(image_url)
+                if not b64_data:
+                    return {"error": True, "message": "Invalid base64 data URL"}
+                image_block = {
+                    "type": "image",
+                    "source": {"type": "base64",
+                               "media_type": media_type or "image/jpeg",
+                               "data": b64_data},
+                }
+            else:
+                image_block = {
+                    "type": "image",
+                    "source": {"type": "url", "url": image_url},
+                }
+
+            data = {
+                "model": actual_model,
+                "max_tokens": max_tokens,
+                "messages": [{
+                    "role": "user",
+                    "content": [
+                        image_block,
+                        {"type": "text", "text": question},
+                    ],
+                }],
+            }
+
+            headers = {
+                "x-api-key": self.api_key,
+                "anthropic-version": "2023-06-01",
+                "content-type": "application/json",
+            }
+            proxies = {"http": self.proxy, "https": self.proxy} if self.proxy else None
+            resp = requests.post(f"{self.api_base}/messages",
+                                 headers=headers, json=data, proxies=proxies)
+
+            if resp.status_code != 200:
+                return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
+
+            body = resp.json()
+            text_parts = [b.get("text", "") for b in body.get("content", [])
+                          if b.get("type") == "text"]
+            usage = body.get("usage", {})
+            return {
+                "model": actual_model,
+                "content": "".join(text_parts),
+                "usage": {
+                    "prompt_tokens": usage.get("input_tokens", 0),
+                    "completion_tokens": usage.get("output_tokens", 0),
+                    "total_tokens": usage.get("input_tokens", 0) + usage.get("output_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[CLAUDE] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
        """
        Call Claude API with tool support for agent integration
--- a/models/dashscope/dashscope_bot.py
+++ b/models/dashscope/dashscope_bot.py
@@ -1,6 +1,8 @@
 # encoding:utf-8

 import json
+from typing import Optional
+
 from models.bot import Bot
 from models.session_manager import SessionManager
 from bridge.context import ContextType
@@ -26,15 +28,15 @@ dashscope_models = {

 # Model name prefixes that require MultiModalConversation API instead of Generation API.
 # Qwen3.5+ series are omni models that only support MultiModalConversation.
-MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-",)
+MULTIMODAL_MODEL_PREFIXES = ("qwen3.5-", "qwen3.6-")


 # Qwen对话模型API
 class DashscopeBot(Bot):
    def __init__(self):
        super().__init__()
-        self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen-plus")
-        self.model_name = conf().get("model") or "qwen-plus"
+        self.sessions = SessionManager(DashscopeSession, model=conf().get("model") or "qwen3.6-plus")
+        self.model_name = conf().get("model") or "qwen3.6-plus"
        self.client = dashscope.Generation
        api_key = conf().get("dashscope_api_key")
        if api_key:
@@ -153,6 +155,56 @@ class DashscopeBot(Bot):
            else:
                return result

+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using DashScope MultiModalConversation API."""
+        try:
+            dashscope.api_key = self.api_key
+            vision_model = model or "qwen-vl-max"
+
+            # DashScope multimodal format: {"image": url} + {"text": question}
+            messages = [{
+                "role": "user",
+                "content": [
+                    {"image": image_url},
+                    {"text": question},
+                ],
+            }]
+
+            response = MultiModalConversation.call(
+                model=vision_model,
+                messages=messages,
+                max_tokens=max_tokens,
+            )
+
+            if response.status_code != HTTPStatus.OK:
+                return {
+                    "error": True,
+                    "message": f"{response.code} - {response.message}",
+                }
+
+            resp_dict = self._response_to_dict(response)
+            choice = resp_dict["output"]["choices"][0]
+            content = choice.get("message", {}).get("content", "")
+            if isinstance(content, list):
+                content = "".join(
+                    item.get("text", "") for item in content if isinstance(item, dict)
+                )
+            usage = resp_dict.get("usage", {})
+            return {
+                "model": vision_model,
+                "content": content,
+                "usage": {
+                    "prompt_tokens": usage.get("input_tokens", 0),
+                    "completion_tokens": usage.get("output_tokens", 0),
+                    "total_tokens": usage.get("total_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[DASHSCOPE] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
        """
        Call DashScope API with tool support for agent integration
--- a/models/doubao/doubao_bot.py
+++ b/models/doubao/doubao_bot.py
@@ -2,6 +2,7 @@

 import json
 import time
+from typing import Optional

 import requests
 from models.bot import Bot
@@ -147,6 +148,49 @@ class DoubaoBot(Bot):
            else:
                return result

+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using Doubao (Volcengine Ark) OpenAI-compatible API."""
+        try:
+            vision_model = model or self.args.get("model", "doubao-seed-2-0-pro-260215")
+            payload = {
+                "model": vision_model,
+                "max_tokens": max_tokens,
+                "messages": [{
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": question},
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                    ],
+                }],
+            }
+            headers = {
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json",
+            }
+            resp = requests.post(f"{self.base_url}/chat/completions",
+                                 headers=headers, json=payload, timeout=60)
+            if resp.status_code != 200:
+                return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
+            data = resp.json()
+            if "error" in data:
+                return {"error": True, "message": data["error"].get("message", str(data["error"]))}
+            content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
+            usage = data.get("usage", {})
+            return {
+                "model": vision_model,
+                "content": content,
+                "usage": {
+                    "prompt_tokens": usage.get("prompt_tokens", 0),
+                    "completion_tokens": usage.get("completion_tokens", 0),
+                    "total_tokens": usage.get("total_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[DOUBAO] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    # ==================== Agent mode support ====================

    def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
@@ -434,31 +478,37 @@ class DoubaoBot(Bot):
                continue

            if role == "user":
-                text_parts = []
-                tool_results = []
+                has_tool_result = any(
+                    isinstance(b, dict) and b.get("type") == "tool_result" for b in content
+                )
+                if has_tool_result:
+                    text_parts = []
+                    tool_results = []

-                for block in content:
-                    if not isinstance(block, dict):
-                        continue
-                    if block.get("type") == "text":
-                        text_parts.append(block.get("text", ""))
-                    elif block.get("type") == "tool_result":
-                        tool_call_id = block.get("tool_use_id") or ""
-                        result_content = block.get("content", "")
-                        if not isinstance(result_content, str):
-                            result_content = json.dumps(result_content, ensure_ascii=False)
-                        tool_results.append({
-                            "role": "tool",
-                            "tool_call_id": tool_call_id,
-                            "content": result_content
-                        })
+                    for block in content:
+                        if not isinstance(block, dict):
+                            continue
+                        if block.get("type") == "text":
+                            text_parts.append(block.get("text", ""))
+                        elif block.get("type") == "tool_result":
+                            tool_call_id = block.get("tool_use_id") or ""
+                            result_content = block.get("content", "")
+                            if not isinstance(result_content, str):
+                                result_content = json.dumps(result_content, ensure_ascii=False)
+                            tool_results.append({
+                                "role": "tool",
+                                "tool_call_id": tool_call_id,
+                                "content": result_content
+                            })

-                # Tool results first (must come right after assistant with tool_calls)
-                for tr in tool_results:
-                    converted.append(tr)
+                    for tr in tool_results:
+                        converted.append(tr)

-                if text_parts:
-                    converted.append({"role": "user", "content": "\n".join(text_parts)})
+                    if text_parts:
+                        converted.append({"role": "user", "content": "\n".join(text_parts)})
+                else:
+                    # Keep as-is for multimodal content (e.g. image_url blocks)
+                    converted.append(msg)

            elif role == "assistant":
                openai_msg = {"role": "assistant"}
--- a/models/gemini/google_gemini_bot.py
+++ b/models/gemini/google_gemini_bot.py
@@ -12,6 +12,8 @@ import mimetypes
 import os
 import re
 import time
+from typing import Optional
+
 import requests
 from models.bot import Bot
 from models.session_manager import SessionManager
@@ -144,7 +146,12 @@ class GoogleGeminiBot(Bot):
            return "", []
        pattern = r"\[图片:\s*([^\]]+)\]"
        image_paths = [m.strip().strip("'\"") for m in re.findall(pattern, content) if m.strip()]
-        cleaned_text = re.sub(pattern, "", content)
+        # Replace markers with path-only hints so the model still knows the
+        # original file location (needed when it calls tools like vision).
+        def _replace_with_hint(m):
+            path = m.group(1).strip().strip("'\"")
+            return f"[attached image: {path}]"
+        cleaned_text = re.sub(pattern, _replace_with_hint, content)
        cleaned_text = re.sub(r"\n{3,}", "\n\n", cleaned_text).strip()
        return cleaned_text, image_paths

@@ -225,6 +232,57 @@ class GoogleGeminiBot(Bot):
        logger.warning(f"[Gemini] Unsupported image URL format: {image_url[:120]}")
        return None

+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using Gemini REST API."""
+        try:
+            model_name = model or self.model or "gemini-2.0-flash"
+            image_part = self._build_inline_part_from_image_url({"url": image_url})
+            if not image_part:
+                return {"error": True, "message": f"Cannot process image URL: {image_url[:120]}"}
+
+            payload = {
+                "contents": [{
+                    "role": "user",
+                    "parts": [image_part, {"text": question}],
+                }],
+                "generationConfig": {"maxOutputTokens": max_tokens},
+                "safetySettings": [
+                    {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
+                    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
+                    {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
+                    {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
+                ],
+            }
+            endpoint = f"{self.api_base}/v1beta/models/{model_name}:generateContent"
+            headers = {"x-goog-api-key": self.api_key, "Content-Type": "application/json"}
+            resp = requests.post(endpoint, headers=headers, json=payload, timeout=60)
+
+            if resp.status_code != 200:
+                return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
+
+            body = resp.json()
+            candidates = body.get("candidates", [])
+            text_parts = []
+            for part in candidates[0].get("content", {}).get("parts", []) if candidates else []:
+                if "text" in part:
+                    text_parts.append(part["text"])
+
+            usage_meta = body.get("usageMetadata", {})
+            return {
+                "model": model_name,
+                "content": "".join(text_parts),
+                "usage": {
+                    "prompt_tokens": usage_meta.get("promptTokenCount", 0),
+                    "completion_tokens": usage_meta.get("candidatesTokenCount", 0),
+                    "total_tokens": usage_meta.get("totalTokenCount", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[Gemini] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
        """
        Call Gemini API with tool support using REST API (following official docs)
--- a/models/minimax/minimax_bot.py
+++ b/models/minimax/minimax_bot.py
@@ -2,6 +2,8 @@

 import time
 import json
+from typing import Optional
+
 import requests

 from models.bot import Bot
@@ -175,6 +177,51 @@ class MinimaxBot(Bot):
            else:
                return result

+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using MiniMax OpenAI-compatible API.
+        Always uses MiniMax-Text-01 — other MiniMax models do not support vision.
+        """
+        try:
+            vision_model = "MiniMax-Text-01"
+            payload = {
+                "model": vision_model,
+                "max_tokens": max_tokens,
+                "messages": [{
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": question},
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                    ],
+                }],
+            }
+            headers = {
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json",
+            }
+            resp = requests.post(f"{self.api_base}/chat/completions",
+                                 headers=headers, json=payload, timeout=60)
+            if resp.status_code != 200:
+                return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
+            data = resp.json()
+            if "error" in data:
+                return {"error": True, "message": data["error"].get("message", str(data["error"]))}
+            content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
+            usage = data.get("usage", {})
+            return {
+                "model": vision_model,
+                "content": content,
+                "usage": {
+                    "prompt_tokens": usage.get("prompt_tokens", 0),
+                    "completion_tokens": usage.get("completion_tokens", 0),
+                    "total_tokens": usage.get("total_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[MINIMAX] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
        """
        Call MiniMax API with tool support for agent integration
@@ -273,37 +320,41 @@ class MinimaxBot(Bot):
            if role == "user":
                # Handle user message
                if isinstance(content, list):
-                    # Extract text from content blocks
-                    text_parts = []
-                    tool_results = []
+                    has_tool_result = any(
+                        isinstance(b, dict) and b.get("type") == "tool_result" for b in content
+                    )
+                    if has_tool_result:
+                        text_parts = []
+                        tool_results = []

-                    for block in content:
-                        if isinstance(block, dict):
-                            if block.get("type") == "text":
-                                text_parts.append(block.get("text", ""))
-                            elif block.get("type") == "tool_result":
-                                # Tool result should be a separate message with role="tool"
-                                tool_call_id = block.get("tool_use_id") or ""
-                                if not tool_call_id:
-                                    logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
-                                result_content = block.get("content", "")
-                                if not isinstance(result_content, str):
-                                    result_content = json.dumps(result_content, ensure_ascii=False)
-                                tool_results.append({
-                                    "role": "tool",
-                                    "tool_call_id": tool_call_id,
-                                    "content": result_content
-                                })
+                        for block in content:
+                            if isinstance(block, dict):
+                                if block.get("type") == "text":
+                                    text_parts.append(block.get("text", ""))
+                                elif block.get("type") == "tool_result":
+                                    tool_call_id = block.get("tool_use_id") or ""
+                                    if not tool_call_id:
+                                        logger.warning(f"[MINIMAX] tool_result missing tool_use_id")
+                                    result_content = block.get("content", "")
+                                    if not isinstance(result_content, str):
+                                        result_content = json.dumps(result_content, ensure_ascii=False)
+                                    tool_results.append({
+                                        "role": "tool",
+                                        "tool_call_id": tool_call_id,
+                                        "content": result_content
+                                    })

-                    if text_parts:
-                        converted.append({
-                            "role": "user",
-                            "content": "\n".join(text_parts)
-                        })
+                        if text_parts:
+                            converted.append({
+                                "role": "user",
+                                "content": "\n".join(text_parts)
+                            })

-                    # Add all tool results (not just the last one)
-                    for tool_result in tool_results:
-                        converted.append(tool_result)
+                        for tool_result in tool_results:
+                            converted.append(tool_result)
+                    else:
+                        # Keep as-is for multimodal content (e.g. image_url blocks)
+                        converted.append(msg)
                else:
                    # Simple text content
                    converted.append({
--- a/models/moonshot/moonshot_bot.py
+++ b/models/moonshot/moonshot_bot.py
@@ -2,6 +2,7 @@

 import json
 import time
+from typing import Optional

 import requests
 from models.bot import Bot
@@ -147,6 +148,49 @@ class MoonshotBot(Bot):
            else:
                return result

+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using Moonshot (Kimi) OpenAI-compatible API."""
+        try:
+            vision_model = model or self.args.get("model", "kimi-k2.5")
+            payload = {
+                "model": vision_model,
+                "max_tokens": max_tokens,
+                "messages": [{
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": question},
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                    ],
+                }],
+            }
+            headers = {
+                "Authorization": f"Bearer {self.api_key}",
+                "Content-Type": "application/json",
+            }
+            resp = requests.post(f"{self.base_url}/chat/completions",
+                                 headers=headers, json=payload, timeout=60)
+            if resp.status_code != 200:
+                return {"error": True, "message": f"HTTP {resp.status_code}: {resp.text[:300]}"}
+            data = resp.json()
+            if "error" in data:
+                return {"error": True, "message": data["error"].get("message", str(data["error"]))}
+            content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
+            usage = data.get("usage", {})
+            return {
+                "model": vision_model,
+                "content": content,
+                "usage": {
+                    "prompt_tokens": usage.get("prompt_tokens", 0),
+                    "completion_tokens": usage.get("completion_tokens", 0),
+                    "total_tokens": usage.get("total_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[MOONSHOT] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    # ==================== Agent mode support ====================

    def call_with_tools(self, messages, tools=None, stream: bool = False, **kwargs):
@@ -435,31 +479,37 @@ class MoonshotBot(Bot):
                continue

            if role == "user":
-                text_parts = []
-                tool_results = []
+                has_tool_result = any(
+                    isinstance(b, dict) and b.get("type") == "tool_result" for b in content
+                )
+                if has_tool_result:
+                    text_parts = []
+                    tool_results = []

-                for block in content:
-                    if not isinstance(block, dict):
-                        continue
-                    if block.get("type") == "text":
-                        text_parts.append(block.get("text", ""))
-                    elif block.get("type") == "tool_result":
-                        tool_call_id = block.get("tool_use_id") or ""
-                        result_content = block.get("content", "")
-                        if not isinstance(result_content, str):
-                            result_content = json.dumps(result_content, ensure_ascii=False)
-                        tool_results.append({
-                            "role": "tool",
-                            "tool_call_id": tool_call_id,
-                            "content": result_content
-                        })
+                    for block in content:
+                        if not isinstance(block, dict):
+                            continue
+                        if block.get("type") == "text":
+                            text_parts.append(block.get("text", ""))
+                        elif block.get("type") == "tool_result":
+                            tool_call_id = block.get("tool_use_id") or ""
+                            result_content = block.get("content", "")
+                            if not isinstance(result_content, str):
+                                result_content = json.dumps(result_content, ensure_ascii=False)
+                            tool_results.append({
+                                "role": "tool",
+                                "tool_call_id": tool_call_id,
+                                "content": result_content
+                            })

-                # Tool results first (must come right after assistant with tool_calls)
-                for tr in tool_results:
-                    converted.append(tr)
+                    for tr in tool_results:
+                        converted.append(tr)

-                if text_parts:
-                    converted.append({"role": "user", "content": "\n".join(text_parts)})
+                    if text_parts:
+                        converted.append({"role": "user", "content": "\n".join(text_parts)})
+                else:
+                    # Keep as-is for multimodal content (e.g. image_url blocks)
+                    converted.append(msg)

            elif role == "assistant":
                openai_msg = {"role": "assistant"}
--- a/models/openai_compatible_bot.py
+++ b/models/openai_compatible_bot.py
@@ -9,6 +9,8 @@ This includes: OpenAI, LinkAI, Azure OpenAI, and many third-party providers.

 import json
 import openai
+import requests
+from typing import Optional
 from common.log import logger
 from agent.protocol.message_utils import drop_orphaned_tool_results_openai

@@ -306,3 +308,51 @@ class OpenAICompatibleBot:
                openai_messages.append(msg)

        return drop_orphaned_tool_results_openai(openai_messages)
+
+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using the OpenAI-compatible /chat/completions endpoint."""
+        try:
+            api_config = self.get_api_config()
+            vision_model = model or api_config.get("model", "gpt-4o")
+            api_key = api_config.get("api_key", "")
+            api_base = (api_config.get("api_base") or "https://api.openai.com/v1").rstrip("/")
+
+            payload = {
+                "model": vision_model,
+                "messages": [{
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": question},
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                    ],
+                }],
+            }
+            headers = {
+                "Authorization": f"Bearer {api_key}",
+                "Content-Type": "application/json",
+            }
+            resp = requests.post(
+                f"{api_base}/chat/completions",
+                headers=headers, json=payload, timeout=60,
+            )
+            if resp.status_code != 200:
+                body = resp.text[:500]
+                logger.error(f"[{self.__class__.__name__}] call_vision HTTP {resp.status_code}: {body}")
+                return {"error": True, "message": f"HTTP {resp.status_code}: {body}"}
+            data = resp.json()
+            content = data.get("choices", [{}])[0].get("message", {}).get("content", "")
+            usage = data.get("usage", {})
+            return {
+                "model": vision_model,
+                "content": content,
+                "usage": {
+                    "prompt_tokens": usage.get("prompt_tokens", 0),
+                    "completion_tokens": usage.get("completion_tokens", 0),
+                    "total_tokens": usage.get("total_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[{self.__class__.__name__}] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
--- a/models/zhipuai/zhipuai_bot.py
+++ b/models/zhipuai/zhipuai_bot.py
@@ -2,6 +2,7 @@

 import time
 import json
+from typing import Optional

 from models.bot import Bot
 from models.zhipuai.zhipu_ai_session import ZhipuAISession
@@ -149,6 +150,40 @@ class ZHIPUAIBot(Bot, ZhipuAIImage):
            else:
                return result

+    def call_vision(self, image_url: str, question: str,
+                    model: Optional[str] = None,
+                    max_tokens: int = 1000) -> dict:
+        """Analyze an image using ZhipuAI OpenAI-compatible SDK.
+        Always uses glm-5v-turbo — the text models (glm-5-turbo etc.) do not support vision.
+        """
+        try:
+            vision_model = "glm-5v-turbo"
+            response = self.client.chat.completions.create(
+                model=vision_model,
+                max_tokens=max_tokens,
+                messages=[{
+                    "role": "user",
+                    "content": [
+                        {"type": "text", "text": question},
+                        {"type": "image_url", "image_url": {"url": image_url}},
+                    ],
+                }],
+            )
+            content = response.choices[0].message.content or ""
+            usage = response.usage
+            return {
+                "model": vision_model,
+                "content": content,
+                "usage": {
+                    "prompt_tokens": getattr(usage, "prompt_tokens", 0),
+                    "completion_tokens": getattr(usage, "completion_tokens", 0),
+                    "total_tokens": getattr(usage, "total_tokens", 0),
+                },
+            }
+        except Exception as e:
+            logger.error(f"[ZHIPU_AI] call_vision error: {e}")
+            return {"error": True, "message": str(e)}
+
    def call_with_tools(self, messages, tools=None, stream=False, **kwargs):
        """
        Call ZhipuAI API with tool support for agent integration
--- a/plugins/cow_cli/cow_cli.py
+++ b/plugins/cow_cli/cow_cli.py
@@ -157,7 +157,6 @@ class CowCliPlugin(Plugin):
            "  /config              查看当前配置",
            "  /config <key>        查看某项配置",
            "  /config <key> <val>  修改配置",
-            "  /install-browser  安装浏览器工具依赖",
            "",
            "💡 也可以用 cow <command> 代替 /<command>",
        ]
@@ -407,7 +406,7 @@ class CowCliPlugin(Plugin):
        from common import const
        _EXACT = {
            "wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
-            "xunfei": const.XUNFEI, const.QWEN: const.QWEN,
+            "xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
            const.MODELSCOPE: const.MODELSCOPE,
            const.MOONSHOT: const.MOONSHOT,
            "moonshot-v1-8k": const.MOONSHOT, "moonshot-v1-32k": const.MOONSHOT,
--- a/plugins/godcmd/godcmd.py
+++ b/plugins/godcmd/godcmd.py
@@ -315,7 +315,7 @@ class Godcmd(Plugin):
                    except Exception as e:
                        ok, result = False, "你没有设置私有GPT模型"
                elif cmd == "reset":
-                    if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI, const.BAIDU, const.XUNFEI, const.QWEN, const.GEMINI, const.ZHIPU_AI, const.CLAUDEAPI]:
+                    if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI, const.BAIDU, const.XUNFEI, const.QWEN, const.QWEN_DASHSCOPE, const.GEMINI, const.ZHIPU_AI, const.CLAUDEAPI]:
                        bot.sessions.clear_session(session_id)
                        if Bridge().chat_bots.get(bottype):
                            Bridge().chat_bots.get(bottype).sessions.clear_session(session_id)
@@ -341,7 +341,7 @@ class Godcmd(Plugin):
                            ok, result = True, "配置已重载"
                        elif cmd == "resetall":
                            if bottype in [const.OPEN_AI, const.OPENAI, const.CHATGPT, const.CHATGPTONAZURE, const.LINKAI,
-                                           const.BAIDU, const.XUNFEI, const.QWEN, const.GEMINI, const.ZHIPU_AI, const.MOONSHOT,
+                                           const.BAIDU, const.XUNFEI, const.QWEN, const.QWEN_DASHSCOPE, const.GEMINI, const.ZHIPU_AI, const.MOONSHOT,
                                           const.MODELSCOPE]:
                                channel.cancel_all_session()
                                bot.sessions.clear_all_session()
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
 name = "cowagent"
 version = "1.0.0"
 description = "CowAgent - AI Agent on WeChat and more"
-requires-python = ">=3.9"
+requires-python = ">=3.7"
 dependencies = [
    "click>=8.0",
    "requests>=2.28.2",
--- a/requirements.txt
+++ b/requirements.txt
@@ -4,8 +4,6 @@ requests>=2.28.2
 chardet>=5.1.0
 Pillow
 web.py
-linkai>=0.0.6.0
-agentmesh-sdk>=0.1.3
 python-dotenv>=1.0.0
 PyYAML>=6.0
 croniter>=2.0.0
--- a/run.sh
+++ b/run.sh
@@ -271,7 +271,7 @@ select_model() {
    echo -e "${YELLOW}2) Zhipu AI (glm-5-turbo, glm-5, etc.)${NC}"
    echo -e "${YELLOW}3) Kimi (kimi-k2.5, kimi-k2, etc.)${NC}"
    echo -e "${YELLOW}4) Doubao (doubao-seed-2-0-code-preview-260215, etc.)${NC}"
-    echo -e "${YELLOW}5) Qwen (qwen3.5-plus, qwen3-max, qwq-plus, etc.)${NC}"
+    echo -e "${YELLOW}5) Qwen (qwen3.6-plus, qwen3.5-plus, qwen3-max, qwq-plus, etc.)${NC}"
    echo -e "${YELLOW}6) Claude (claude-sonnet-4-6, claude-opus-4-6, etc.)${NC}"
    echo -e "${YELLOW}7) Gemini (gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, etc.)${NC}"
    echo -e "${YELLOW}8) OpenAI GPT (gpt-5.4, gpt-5.2, gpt-4.1, etc.)${NC}"
@@ -318,7 +318,7 @@ configure_model() {
        2) read_model_config "Zhipu AI" "glm-5-turbo" "ZHIPU_KEY" ;;
        3) read_model_config "Kimi (Moonshot)" "kimi-k2.5" "MOONSHOT_KEY" ;;
        4) read_model_config "Doubao (Volcengine Ark)" "doubao-seed-2-0-code-preview-260215" "ARK_KEY" ;;
-        5) read_model_config "Qwen (DashScope)" "qwen3.5-plus" "DASHSCOPE_KEY" ;;
+        5) read_model_config "Qwen (DashScope)" "qwen3.6-plus" "DASHSCOPE_KEY" ;;
        6)
            read_model_config "Claude" "claude-sonnet-4-6" "CLAUDE_KEY"
            read_api_base "CLAUDE_BASE" "https://api.anthropic.com/v1"
--- a/scripts/run.ps1
+++ b/scripts/run.ps1
@@ -154,7 +154,7 @@ $ModelChoices = @{
    "2" = @{ Provider = "Zhipu AI";                 Default = "glm-5-turbo";                            Key = "ZHIPU_KEY" }
    "3" = @{ Provider = "Kimi (Moonshot)";          Default = "kimi-k2.5";                              Key = "MOONSHOT_KEY" }
    "4" = @{ Provider = "Doubao (Volcengine Ark)";  Default = "doubao-seed-2-0-code-preview-260215";    Key = "ARK_KEY" }
-    "5" = @{ Provider = "Qwen (DashScope)";         Default = "qwen3.5-plus";                           Key = "DASHSCOPE_KEY" }
+    "5" = @{ Provider = "Qwen (DashScope)";         Default = "qwen3.6-plus";                           Key = "DASHSCOPE_KEY" }
    "6" = @{ Provider = "Claude";                   Default = "claude-sonnet-4-6";                      Key = "CLAUDE_KEY";  Base = "https://api.anthropic.com/v1" }
    "7" = @{ Provider = "Gemini";                   Default = "gemini-3.1-pro-preview";                 Key = "GEMINI_KEY";  Base = "https://generativelanguage.googleapis.com" }
    "8" = @{ Provider = "OpenAI GPT";               Default = "gpt-5.4";                                Key = "OPENAI_KEY";  Base = "https://api.openai.com/v1" }
@@ -169,7 +169,7 @@ function Select-Model {
    Write-Host "2) Zhipu AI (glm-5-turbo, glm-5, etc.)"
    Write-Host "3) Kimi (kimi-k2.5, kimi-k2, etc.)"
    Write-Host "4) Doubao (doubao-seed-2-0-code-preview-260215, etc.)"
-    Write-Host "5) Qwen (qwen3.5-plus, qwen3-max, qwq-plus, etc.)"
+    Write-Host "5) Qwen (qwen3.6-plus, qwen3.5-plus, qwen3-max, qwq-plus, etc.)"
    Write-Host "6) Claude (claude-sonnet-4-6, claude-opus-4-6, etc.)"
    Write-Host "7) Gemini (gemini-3.1-flash-lite-preview, gemini-3.1-pro-preview, etc.)"
    Write-Host "8) OpenAI GPT (gpt-5.4, gpt-5.2, gpt-4.1, etc.)"
@@ -453,7 +453,11 @@ function Update-Project {

    Assert-Python
    Install-Dependencies
-    Start-CowAgent
+
+    # Start via python -m cli.cli instead of cow.exe, because the exe may
+    # still be cached/locked from the previous installation on Windows.
+    Write-Cow "Starting CowAgent..."
+    & $PythonCmd -m cli.cli start
 }

 # ── main ──────────────────────────────────────────────────────────
Author	SHA1	Message	Date
zhayujie	26693acc3f	feat(vision): prioritize main model for image recognition with multi-provider fallback - Add call_vision method to all bot implementations (DashScope, Claude, Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot) using each vendor's native multimodal API format - Remove call_with_tools/call_vision from Bot base class to fix MRO shadowing issue with OpenAICompatibleBot mixin - Refactor vision tool provider resolution: MainModel → other configured models (auto-discovered) → OpenAI → LinkAI, with automatic fallback - Return actual model name used in call_vision responses - Sync config.json API keys to .env bidirectionally on startup - Fix bot instance cache to detect bot_type/use_linkai config changes - Add SSE reconnection support for web console - Preserve image path hints in Gemini text for correct vision tool calls - Update docs/tools/vision.mdx	2026-04-11 19:46:11 +08:00
zhayujie	3cd92ccda3	feat: add port config	2026-04-09 21:29:53 +08:00
zhayujie	d86cb4ded6	fix(weixin): update weixin channel version	2026-04-09 09:55:07 +08:00
zhayujie	4d5375f6d6	fix(win): add Windows platform hint in bash tool description	2026-04-08 16:54:26 +08:00
zhayujie	424557fedb	fix(win): use PowerShell instead of cmd.exe	2026-04-08 16:50:45 +08:00
zhayujie	89251e603f	fix(win): use PowerShell instead of cmd.exe for bash tool on Windows	2026-04-08 16:18:56 +08:00
zhayujie	a653ed07eb	fix(win): defer pip install to a helper bat after cow.exe exits	2026-04-08 15:31:03 +08:00
zhayujie	ad86deb014	fix: prioritize using a custom master model for vision	2026-04-08 15:16:59 +08:00
zhayujie	9525dc7584	fix: avoid stale cow.exe on Windows by spawing fresh process	2026-04-08 12:07:18 +08:00
zhayujie	cd31dd27fd	fix: increase web console capacity and add frontend retry	2026-04-08 11:48:27 +08:00
zhayujie	360e3670eb	feat(browser): detect implicit interactive elements	2026-04-07 01:41:14 +08:00
zhayujie	8dabe3b4c8	fix: remove install-browser cmd display in /help	2026-04-04 23:28:57 +08:00
zhayujie	443e0c2806	feat: show video in web channel	2026-04-03 17:09:38 +08:00
zhayujie	9cc173cc4d	fix: use dynamic model name in system prompt runtime info	2026-04-02 17:01:56 +08:00
zhayujie	b5f33e5ecd	feat: support qwen3.6-plus	2026-04-02 16:46:58 +08:00
zhayujie	40dfc6860f	fix: skill list showing sub-skills inside collection	2026-04-02 11:47:24 +08:00
zhayujie	1c02a04423	fix: handle error when printing QR code on Windows GBK terminals	2026-04-01 17:23:57 +08:00
zhayujie	de0e45070c	chore: remove conflicting dependency	2026-04-01 17:19:15 +08:00
zhayujie	c169cc7d74	fix: remove conflicting dependency	2026-04-01 17:12:15 +08:00
zhayujie	cd62ad76f6	fix: cow CLI support python3.7	2026-04-01 16:51:23 +08:00
zhayujie	dd25b0fb5b	feat: refine system prompt style and tone guidance	2026-04-01 16:24:41 +08:00
zhayujie	a38b22a6a2	docs: update docs	2026-04-01 15:31:41 +08:00