Compare commits

...

82 Commits
2.0.6 ... 2.0.8

Author SHA1 Message Date
zhayujie
55aaf60a57 feat: release 2.0.8 2026-05-06 16:19:20 +08:00
zhayujie
a5790d82f6 feat(qianfan): scope vision support to multimodal models 2026-05-06 16:11:10 +08:00
zhayujie
63f99af1e6 Merge pull request #2800 from jimmyzhuu/feat/qianfan-vision-provider
Add Qianfan support to Vision tool
2026-05-06 15:39:12 +08:00
zhayujie
4eed2568aa fix(bash): reduce safety check false positives 2026-05-06 15:36:44 +08:00
jimmyzhuu
fb7962c7f2 fix: use available qianfan vision model 2026-05-06 13:34:39 +08:00
jimmyzhuu
76e6b7b471 docs: document qianfan vision support 2026-05-06 13:28:46 +08:00
jimmyzhuu
fccb7ff9ed feat: route qianfan vision provider 2026-05-06 13:25:59 +08:00
jimmyzhuu
3b12ef2e66 feat: add qianfan vision calls 2026-05-06 13:24:41 +08:00
jimmyzhuu
f9d099be1b feat: add qianfan vision model constants 2026-05-06 13:23:04 +08:00
zhayujie
c322c0e3a5 docs(models): add ernie-5.0 2026-05-06 12:15:14 +08:00
zhayujie
530fc20596 Merge pull request #2790 from jimmyzhuu/feat/qianfan-provider
Add first-class Baidu Qianfan / ERNIE provider
2026-05-06 11:43:32 +08:00
zhayujie
a23b4ed754 Merge pull request #2797 from Zmjjeff7/feat-translate-youdao
feat(translate): add Youdao as a new translation provider
2026-05-06 11:28:50 +08:00
zhayujie
fc4f5077b0 fix: update .gitignore 2026-05-06 11:27:57 +08:00
Zmjjeff7
6a553886da feat(translate): add Youdao as a new translation provider
The translate module previously only supported Baidu translation, and the
factory raised a bare RuntimeError for any other type. This change adds
Youdao Translation as a second provider and improves the factory's error
message.

Implementation details:
- New YoudaoTranslator class in translate/youdao/youdao_translate.py
- Implements Youdao's v3 SHA-256 signature scheme, including the
  truncate-input rule for queries longer than 20 characters
- Maps ISO 639-1 language codes to Youdao-specific codes
  (zh -> zh-CHS, zh-TW -> zh-CHT, others pass through)
- Differentiates network errors, API error codes, and empty translations
- factory.create_translator now lists the supported types in its
  RuntimeError message instead of failing silently
- Default config exposes youdao_translate_app_key and
  youdao_translate_app_secret

Adds 17 unit tests covering signature correctness, language code mapping,
input truncation edge cases, the full request/response flow, and factory
dispatch. All tests pass under Python 3.11.
2026-05-05 23:58:32 +08:00
zhayujie
1065c7e722 fix(feishu): unblock streaming via async push worker 2026-05-05 19:36:15 +08:00
zhayujie
a9c8a59f58 feat(feishu): one-click QR-scan app creation 2026-05-05 18:32:58 +08:00
zhayujie
8730f7fd27 fix(memory): exclude scheduler-injected pairs from daily memory flush 2026-05-05 16:53:01 +08:00
zhayujie
8f608223d7 perf(feishu): tune streaming render speed 2026-05-05 14:53:30 +08:00
zhayujie
a7cbd47a2f fix(feishu): default feishu_stream_reply to true 2026-05-05 14:30:34 +08:00
zhayujie
b80c3fe5a8 feat(feishu): enhance #2791 with cardkit streaming + ASR fixes
- rewrite streaming reply to official cardkit v2.0 API (default on, auto-fallback)
- fix Whisper hallucination: bump ASR sample rate to 16k, pass language=zh
- fix lock-over-IO and tmp file cleanup from #2791
- drop deprecated feishu_bot_name; quiet unknown-key warnings
- docs: cardkit permission and feishu_stream_reply usage
2026-05-05 14:15:25 +08:00
zhayujie
5080051e39 Merge pull request #2791 from ooaaooaa123/feat/feishu-voice-stream-reply
feat(feishu): 支持语音消息收发与流式打字机回复
2026-05-05 13:10:00 +08:00
zhayujie
23bfc8d0ba fix(feishu): update config-template.json 2026-05-05 13:05:39 +08:00
zhayujie
80e9062041 fix(vision): respect tool.vision.model and add automatic fallback #2792 2026-05-03 22:28:32 +08:00
zhayujie
67bd3420ed perf(scheduler): bound isolated session context to agent_max_context_turns/5 2026-05-03 21:49:59 +08:00
zhayujie
aea081703f fix(scheduler): inject delivered output into receiver session with sliding window
Further refinements on top of #2795:

- persist real session_id (notify_session_id) at task creation so group chats
  correctly map back to the user's actual conversation
- mark scheduler turns with [SCHEDULED] (recognise legacy "Scheduled task"
  prefix too for backward-compatible pruning)
- prune both DB and in-memory to scheduler_inject_max_per_session (default 3),
  only marker-tagged pairs are touched; regular user turns never deleted
- send_message type gated by scheduler_inject_send_message (default false) —
  fixed reminder text rarely benefits follow-up Q&A

Co-authored-by: huangrichao2020 <grdomai43881@gmail.com>
2026-05-03 21:27:24 +08:00
zhayujie
f300d2a2d5 Merge pull request #2795 from huangrichao2020/fix/scheduler-remember-v2
fix: remember scheduled task outputs with correct session mapping (v2)
2026-05-03 21:02:40 +08:00
tingchim2pro
f150d7d83a fix: remember scheduled task outputs in receiver session (v2)
Address review feedback from #2794:

1. Use notify_session_id instead of receiver for correct group chat mapping
   - Task creation should store the real session_id in action.notify_session_id
   - Falls back to receiver for backward compatibility with old tasks

2. Add injection to all four execution branches:
   - _execute_agent_task
   - _execute_send_message
   - _execute_tool_call
   - _execute_skill_call (also fixed missing channel.send)

3. Add config switch and content truncation:
   - scheduler_inject_to_session (default: true) to toggle the feature
   - 2000 char limit prevents high-frequency tasks from bloating sessions

Fixes #2793
2026-05-02 19:00:50 +08:00
ooaaooaa123
4d1f059c0d feat(feishu): add voice message support and streaming text reply
- Receive audio messages: map msg_type=audio to ContextType.VOICE and
    download opus file via lazy _prepare_fn for STT pipeline
  - Send voice replies: upload opus audio via Feishu file API, auto-convert
    non-opus formats (e.g. mp3) using pydub before upload
  - Streaming text reply: inject on_event callback into context; send a
  card
    placeholder on first delta, then PATCH-update it in-place at a
    configurable interval (feishu_stream_interval_ms) to achieve typewriter
    effect; set feishu_streamed=True to suppress duplicate send()
  - Enable NOT_SUPPORT_REPLYTYPE=[] to unblock voice and image reply types
  - Fix AudioSegment mutation bug in audio_convert.py: set_frame_rate /
    set_channels return new objects and must be reassigned
  - Add -nostdin to ffmpeg invocation to prevent stdin deadlock in daemon
  - Add feishu_bot_name, feishu_stream_reply, feishu_stream_interval_ms
    config keys to config-template.json
2026-04-30 16:14:57 +08:00
jimmyzhuu
bc7f953fcc docs: add qianfan provider guide 2026-04-29 16:41:25 +08:00
jimmyzhuu
f653483eea feat: expose qianfan in configuration surfaces 2026-04-29 16:32:53 +08:00
jimmyzhuu
6b200fd36b fix: handle qianfan error responses 2026-04-29 16:24:37 +08:00
jimmyzhuu
161fc6cdf0 feat: add qianfan chat bot 2026-04-29 16:19:27 +08:00
jimmyzhuu
6f68ed6bce test: restore cow cli parent module attribute 2026-04-29 16:12:08 +08:00
jimmyzhuu
a4592ffdfe test: isolate cow cli plugin import 2026-04-29 16:08:40 +08:00
jimmyzhuu
7cd7bd1a48 fix: avoid cow cli import side effects 2026-04-29 16:04:48 +08:00
jimmyzhuu
9eeca70292 feat: register qianfan model provider 2026-04-29 15:52:32 +08:00
zhayujie
02bfe30848 fix(memory): prevent duplicate Deep Dream runs 2026-04-28 15:30:51 +08:00
zhayujie
c9c99de3d9 fix(bash): scope safety confirm to destructive deletions outside workspace 2026-04-28 10:18:47 +08:00
zhayujie
8752f0cc60 refactor(openai): drop SDK dependency and switch to native HTTP client 2026-04-27 20:21:54 +08:00
zhayujie
5c65196e44 feat(web): hint API base version path in config placeholder 2026-04-26 17:10:24 +08:00
zhayujie
f5798bfe90 fix: remove unnecessary API Base URL in run scripts 2026-04-26 16:29:08 +08:00
zhayujie
0e556b3468 feat: switch default model to deepseek-v4-flash 2026-04-26 15:54:50 +08:00
zhayujie
31820f56e7 fix(deepseek): back-fill reasoning_content for all assistant turns 2026-04-24 16:39:48 +08:00
zhayujie
fd88828abd fix(models): unify enable_thinking for deepseek-v4 2026-04-24 15:29:43 +08:00
zhayujie
ae11159918 feat(models): unify enable_thinking for deepseek-v4 and other thinking models 2026-04-24 15:22:45 +08:00
zhayujie
472a8605c0 feat(models): support deepseek-v4-pro and deepseek-v4-flash 2026-04-24 11:35:38 +08:00
zhayujie
e1760ba211 feat: release 2.0.7 version 2026-04-23 18:13:53 +08:00
zhayujie
ce4c0a0aa4 feat: release 2.0.7 2026-04-23 17:18:19 +08:00
zhayujie
64511593c4 feat: release 2.0.7 2026-04-23 17:16:17 +08:00
zhayujie
b0e00dfceb feat: support glm-5.1 2026-04-23 16:43:05 +08:00
zhayujie
fc465b463d feat: support kimi coding plan by temporary solution 2026-04-23 16:24:37 +08:00
zhayujie
68ce2e5232 feat(skill): multi-provider image generation with auto-fallback
- Add Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax
  providers to image-generation skill with universal sequential
  fallback: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI
- Each provider filters unsupported size tiers to valid values
  (e.g. Seedream 1K→2K, Qwen 3K→2K, Gemini 3K→2K)
- Pinned model only tries its native provider; auto-routing uses
  each provider's default model
- Support skill-namespaced config (config.skill.image-generation.model
  → SKILL_IMAGE_GENERATION_MODEL env var)
- Add image lightbox (click-to-enlarge) in web console
- Add  docs for built-in skills (skill-creator, knowledge-wiki,
  image-generation) under docs/skills/
2026-04-23 12:39:39 +08:00
zhayujie
81e8bb62ae feat(skill): support gpt-image-2 in image generation skill 2026-04-22 20:39:49 +08:00
zhayujie
2c13e1b923 feat(models): support kimi-k2.6 2026-04-22 12:01:40 +08:00
zhayujie
a0748c2e3b fix(web): cap reasoning content to 4KB across stream/storage/display 2026-04-21 20:31:38 +08:00
zhayujie
40599bb751 fix(web): smart auto-scroll for chat #2775 2026-04-20 21:43:21 +08:00
zhayujie
f3c64ceea7 fix: refresh skill manager on /skill 2026-04-19 19:50:16 +08:00
zhayujie
15c60de709 fix: improve skill installation to support multiple source formats and ensure target directory 2026-04-19 19:05:51 +08:00
zhayujie
6dd316547f fix(web): fix session title generation fallback and reset Bridge on config change 2026-04-19 18:43:48 +08:00
zhayujie
54c7676a44 docs: update architecture diagram 2026-04-18 23:08:36 +08:00
zhayujie
d25b8966ce fix(web): prevent duplicate image previews 2026-04-18 22:32:34 +08:00
zhayujie
14a119c48c fix(gemini): solving the problem of tool call not returnings 2026-04-18 21:18:27 +08:00
zhayujie
c82515a927 fix(agent): don't drop tool_calls from empty-response retry 2026-04-18 20:50:40 +08:00
zhayujie
26e630c2dd feat(cli): /config support set enable_thinking 2026-04-17 16:09:43 +08:00
zhayujie
13370d2056 fix: thinking display is disabled by default 2026-04-17 15:31:59 +08:00
zhayujie
35282db9e0 feat(models): support claude-opus-4-7 2026-04-16 23:24:16 +08:00
zhayujie
426fb88ce7 fix(knowledge): exclude root-level files from knowledge stats to preserve empty state 2026-04-16 22:55:46 +08:00
zhayujie
2384bd0e10 fix: update CI workflows for repo rename and add latest tag 2026-04-16 21:57:20 +08:00
zhayujie
ba3f66d3d1 feat: show root-level files (index.md, log.md) in knowledge tree 2026-04-16 21:47:44 +08:00
zhayujie
7293a0f670 fix: modify repo name in github workflow 2026-04-16 21:38:58 +08:00
zhayujie
9e86d46267 fix: sync env vars when updating config in docker env 2026-04-16 21:32:07 +08:00
zhayujie
848430f062 feat(knowledge): support nested directories in knowledge base listing and display 2026-04-16 12:28:18 +08:00
zhayujie
abd21335c4 Merge pull request #2772 from 6vision/master
fix: bot_type change notification never shown after model switch
2026-04-16 10:43:41 +08:00
6vision
8fa95f058a fix: bot_type change notification never shown after model switch
Made-with: Cursor
2026-04-15 21:48:50 +08:00
zhayujie
d4e5ecd497 fix: compatible with Python 3.7 by deferring Literal import in truncate.py 2026-04-15 12:29:09 +08:00
zhayujie
3830f76729 feat: add custom model provider 2026-04-15 12:26:05 +08:00
zhayujie
83f778fec9 feat(dream): structured organization of dream memories 2026-04-15 11:27:46 +08:00
zhayujie
cabd24605f fix: add random jitter to daily dream schedule 2026-04-15 00:33:33 +08:00
zhayujie
ae20ba1148 Merge branch 'master' of github.com:zhayujie/chatgpt-on-wechat 2026-04-14 22:58:59 +08:00
zhayujie
3a50b64977 feat: web multi session interface 2026-04-14 22:58:25 +08:00
zhayujie
8692e74536 fix(web): hide session panel by default on mobile and support overlay dismiss 2026-04-14 21:09:01 +08:00
zhayujie
1c18bd9889 docs(memory): update long-term memory docs 2026-04-14 17:14:28 +08:00
137 changed files with 10726 additions and 1324 deletions

View File

@@ -19,7 +19,7 @@ env:
jobs:
build-and-push-image:
if: github.repository == 'zhayujie/chatgpt-on-wechat'
if: github.repository == 'zhayujie/CowAgent'
runs-on: ubuntu-latest
permissions:
contents: read
@@ -51,7 +51,12 @@ jobs:
uses: docker/metadata-action@v4
with:
images: |
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
${{ env.REGISTRY }}/zhayujie/chatgpt-on-wechat
${{ env.REGISTRY }}/zhayujie/cowagent
tags: |
type=raw,value=latest-arm64,enable={{is_default_branch}}
type=ref,event=branch,suffix=-arm64
type=ref,event=tag,suffix=-arm64
- name: Build and push Docker image
uses: docker/build-push-action@v3
@@ -60,7 +65,7 @@ jobs:
push: true
file: ./docker/Dockerfile.latest
platforms: linux/arm64
tags: ${{ steps.meta.outputs.tags }}-arm64
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
- uses: actions/delete-package-versions@v4

View File

@@ -16,10 +16,11 @@ on:
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
DOCKERHUB_IMAGE: zhayujie/chatgpt-on-wechat
jobs:
build-and-push-image:
if: github.repository == 'zhayujie/chatgpt-on-wechat'
if: github.repository == 'zhayujie/CowAgent'
runs-on: ubuntu-latest
permissions:
contents: read
@@ -47,8 +48,14 @@ jobs:
uses: docker/metadata-action@v4
with:
images: |
${{ env.IMAGE_NAME }}
${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
zhayujie/chatgpt-on-wechat
zhayujie/cowagent
${{ env.REGISTRY }}/zhayujie/chatgpt-on-wechat
${{ env.REGISTRY }}/zhayujie/cowagent
tags: |
type=raw,value=latest,enable={{is_default_branch}}
type=ref,event=branch
type=ref,event=tag
- name: Build and push Docker image
uses: docker/build-push-action@v3

351
README.md
View File

@@ -29,7 +29,7 @@
-**工具系统:** 内置文件读写、终端执行、浏览器操作、定时任务等工具Agent 自主调用以完成复杂任务
-**CLI系统** 提供终端命令和对话命令,支持进程管理、技能安装、配置修改等操作
-**多模态消息:** 支持对文本、图片、语音、文件等多类型消息进行解析、处理、生成、发送等操作
-**多模型支持:** 支持 OpenAI, Claude, Gemini, DeepSeek, MiniMax、GLM、Qwen、Kimi、Doubao 等国内外主流模型厂商
-**多模型支持:** 支持 DeepSeek、MiniMax、ClaudeGemini、OpenAI、GLM、Qwen、Doubao、Kimi 等国内外主流模型厂商
-**多通道接入:** 支持运行在本地计算机或服务器可集成到微信、飞书、钉钉、企业微信、QQ、微信公众号、网页中使用
## 声明
@@ -70,6 +70,10 @@
# 🏷 更新日志
>**2026.05.06** [2.0.8版本](https://github.com/zhayujie/CowAgent/releases/tag/2.0.8)飞书渠道全面升级语音、流式输出和Markdown、一键扫码接入、新模型支持DeepSeek V4、百度千帆、定时任务工具增强等
>**2026.04.22** [2.0.7版本](https://github.com/zhayujie/CowAgent/releases/tag/2.0.7)图像生成内置技能GPT Image 2、Nano Banana 等、新模型支持Kimi K2.6、Claude Opus 4.7、GLM 5.1、知识库和记忆增强、Web 控制台优化
>**2026.04.14** [2.0.6版本](https://github.com/zhayujie/CowAgent/releases/tag/2.0.6)知识库系统、梦境记忆模块、上下文智能压缩、Web 控制台多会话及多项优化。
>**2026.04.01** [2.0.5版本](https://github.com/zhayujie/CowAgent/releases/tag/2.0.5)Cow CLI 命令系统、Skill Hub 开源、浏览器工具、企微扫码创建、多项优化和修复。
@@ -113,7 +117,7 @@ irm https://cdn.link-ai.tech/code/cow/run.ps1 | iex
项目支持国内外主流厂商的模型接口,可选模型及配置说明参考:[模型说明](#模型说明)。
> Agent 模式下推荐使用以下模型可根据效果及成本综合选择MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview、gpt-5.4、gpt-5.4-mini
> Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:deepseek-v4-flash、MiniMax-M2.7、glm-5.1、kimi-k2.6、qwen3.5-plus、claude-sonnet-4-6、gemini-3.1-pro-preview、gpt-5.4、gpt-5.4-mini、ernie-5.0
同时支持使用 **LinkAI 平台** 接口,支持上述全部模型,并支持知识库、工作流、插件等 Agent 技能,参考 [接口文档](https://docs.link-ai.tech/platform/api)。
@@ -180,7 +184,9 @@ cow install-browser
# config.json 文件内容示例
{
"channel_type": "weixin", # 接入渠道类型,默认为 weixin, 支持修改为 feishu,dingtalk,wecom_bot,qq,wechatcom_app,wechatmp_service,wechatmp,terminal
"model": "MiniMax-M2.7", # 模型名称
"model": "deepseek-v4-flash", # 模型名称
"deepseek_api_key": "", # DeepSeek API Key
"deepseek_api_base": "https://api.deepseek.com/v1", # DeepSeek API 地址
"minimax_api_key": "", # MiniMax API Key
"zhipu_ai_api_key": "", # 智谱 GLM API Key
"moonshot_api_key": "", # Kimi/Moonshot API Key
@@ -190,8 +196,6 @@ cow install-browser
"claude_api_base": "https://api.anthropic.com/v1", # Claude API 地址,修改可接入三方代理平台
"gemini_api_key": "", # Gemini API Key
"gemini_api_base": "https://generativelanguage.googleapis.com", # Gemini API 地址
"deepseek_api_key": "", # DeepSeek API Key
"deepseek_api_base": "https://api.deepseek.com/v1", # DeepSeek API 地址,可修改为第三方代理
"open_ai_api_key": "", # OpenAI API Key
"open_ai_api_base": "https://api.openai.com/v1", # OpenAI API 地址
"linkai_api_key": "", # LinkAI API Key
@@ -206,7 +210,7 @@ cow install-browser
"agent_max_context_tokens": 50000, # Agent 模式下最大上下文 tokens超出将自动智能压缩处理
"agent_max_context_turns": 20, # Agent 模式下最大上下文记忆轮次,一问一答为一轮,超出后智能压缩处理
"agent_max_steps": 20, # Agent 模式下单次任务的最大决策步数,超出后将停止继续调用工具
"enable_thinking": true # 是否启用深度思考,开启后 Web 端展示模型推理过程,关闭后可加速响应
"enable_thinking": false # 是否启用深度思考模式
}
```
@@ -224,7 +228,7 @@ cow install-browser
<details>
<summary>2. 其他配置</summary>
+ `model`: 模型名称Agent 模式下推荐使用 `MiniMax-M2.7``glm-5-turbo``kimi-k2.5``qwen3.6-plus``claude-sonnet-4-6``gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/CowAgent/blob/master/common/const.py)文件
+ `model`: 模型名称Agent 模式下推荐使用 `deepseek-v4-flash``MiniMax-M2.7``glm-5.1``kimi-k2.6``qwen3.6-plus``claude-sonnet-4-6``gemini-3.1-pro-preview`,全部模型名称参考[common/const.py](https://github.com/zhayujie/CowAgent/blob/master/common/const.py)文件
+ `character_desc`:普通对话模式下的机器人系统提示词。在 Agent 模式下该配置不生效,由工作空间中的文件内容构成。
+ `subscribe_msg`:订阅消息,公众号和企业微信 channel 中请填写,当被订阅时会自动回复, 可使用特殊占位符。目前支持的占位符有{trigger_prefix},在程序中它会自动替换成 bot 的触发词。
</details>
@@ -312,44 +316,36 @@ sudo docker logs -f chatgpt-on-wechat
推荐通过 Web 控制台在线管理模型配置,无需手动编辑文件,详见 [模型文档](https://docs.cowagent.ai/models)。以下是手动修改 `config.json` 配置模型的说明:
<details>
<summary>OpenAI</summary>
<summary>DeepSeek</summary>
1. API Key 创建:在 [OpenAI平台](https://platform.openai.com/api-keys) 创建 API Key
1. API Key 创建:在 [DeepSeek 平台](https://platform.deepseek.com/api_keys) 创建 API Key
2. 填写配置
```json
{
"model": "gpt-5.4",
"open_ai_api_key": "YOUR_API_KEY",
"open_ai_api_base": "https://api.openai.com/v1",
"bot_type": "openai"
}
```
- `model`: 与 OpenAI 接口的 [model参数](https://platform.openai.com/docs/models) 一致,支持包括 gpt-5.4、gpt-5.4-mini、gpt-5.4-nano、o 系列、gpt-4.1 等模型Agent 模式推荐使用 `gpt-5.4``gpt-5.4-mini`
- `open_ai_api_base`: 如果需要接入第三方代理接口,可通过修改该参数进行接入
- `bot_type`: 使用 OpenAI 相关模型时无需填写。当使用第三方代理接口接入 Claude 等非 OpenAI 官方模型时,该参数设为 `openai`
</details>
<details>
<summary>LinkAI</summary>
1. API Key 创建:在 [LinkAI平台](https://link-ai.tech/console/interface) 创建 API Key
2. 填写配置
方式一:官方接入(推荐):
```json
{
"model": "gpt-5.4-mini",
"use_linkai": true,
"linkai_api_key": "YOUR API KEY"
"model": "deepseek-v4-flash",
"deepseek_api_key": "sk-xxxxxxxxxxx"
}
```
- `model`: 推荐填写 `deepseek-v4-flash``deepseek-v4-pro`
- `deepseek_api_key`: DeepSeek 平台的 API Key
- `deepseek_api_base`: 可选,默认为 `https://api.deepseek.com/v1`,可修改为第三方代理地址
方式二OpenAI 兼容方式接入:
```json
{
"model": "deepseek-v4-flash",
"bot_type": "openai",
"open_ai_api_key": "sk-xxxxxxxxxxx",
"open_ai_api_base": "https://api.deepseek.com/v1"
}
```
+ `use_linkai`: 是否使用 LinkAI 接口,默认关闭,设置为 true 后可对接 LinkAI 平台的模型,并使用知识库、工作流、数据库、插件等丰富的 Agent 技能
+ `linkai_api_key`: LinkAI 平台的 API Key可在 [控制台](https://link-ai.tech/console/interface) 中创建
+ `model`: [模型列表](https://link-ai.tech/console/models)中的全部模型均可使用
</details>
<details>
@@ -381,6 +377,56 @@ sudo docker logs -f chatgpt-on-wechat
- `open_ai_api_key`: MiniMax 平台的 API-KEY
</details>
<details>
<summary>Claude</summary>
1. API Key 创建:在 [Claude控制台](https://console.anthropic.com/settings/keys) 创建 API Key
2. 填写配置
```json
{
"model": "claude-sonnet-4-6",
"claude_api_key": "YOUR_API_KEY"
}
```
- `model`: 参考 [官方模型ID](https://docs.anthropic.com/en/docs/about-claude/models/overview#model-aliases) ,支持 `claude-sonnet-4-6、claude-opus-4-7、claude-opus-4-6、claude-sonnet-4-5、claude-sonnet-4-0、claude-opus-4-0、claude-3-5-sonnet-latest`
</details>
<details>
<summary>Gemini</summary>
API Key 创建:在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn) 创建 API Key ,配置如下
```json
{
"model": "gemini-3.1-flash-lite-preview",
"gemini_api_key": ""
}
```
- `model`: 参考[官方文档-模型列表](https://ai.google.dev/gemini-api/docs/models?hl=zh-cn),支持 `gemini-3.1-flash-lite-preview、gemini-3.1-pro-preview、gemini-3-flash-preview、gemini-3-pro-preview`
</details>
<details>
<summary>OpenAI</summary>
1. API Key 创建:在 [OpenAI平台](https://platform.openai.com/api-keys) 创建 API Key
2. 填写配置
```json
{
"model": "gpt-5.4",
"open_ai_api_key": "YOUR_API_KEY",
"open_ai_api_base": "https://api.openai.com/v1",
"bot_type": "openai"
}
```
- `model`: 与 OpenAI 接口的 [model参数](https://platform.openai.com/docs/models) 一致,支持包括 gpt-5.4、gpt-5.4-mini、gpt-5.4-nano、o 系列、gpt-4.1 等模型Agent 模式推荐使用 `gpt-5.4``gpt-5.4-mini`
- `open_ai_api_base`: 如果需要接入第三方代理接口,可通过修改该参数进行接入
- `bot_type`: 使用 OpenAI 相关模型时无需填写。当使用第三方代理接口接入 Claude 等非 OpenAI 官方模型时,该参数设为 `openai`
</details>
<details>
<summary>智谱AI (GLM)</summary>
@@ -388,24 +434,24 @@ sudo docker logs -f chatgpt-on-wechat
```json
{
"model": "glm-5-turbo",
"model": "glm-5.1",
"zhipu_ai_api_key": ""
}
```
- `model`: 可填 `glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long` 等, 参考 [glm 系列模型编码](https://bigmodel.cn/dev/api/normal-model/glm-4)
- `model`: 可填 `glm-5.1、glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long` 等, 参考 [glm 系列模型编码](https://bigmodel.cn/dev/api/normal-model/glm-4)
- `zhipu_ai_api_key`: 智谱AI 平台的 API KEY在 [控制台](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) 创建
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
"model": "glm-5-turbo",
"model": "glm-5.1",
"open_ai_api_base": "https://open.bigmodel.cn/api/paas/v4",
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI 兼容方式
- `model`: 可填 `glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long`
- `model`: 可填 `glm-5.1、glm-5-turbo、glm-5、glm-4.7、glm-4-plus、glm-4-flash、glm-4-air、glm-4-airx、glm-4-long`
- `open_ai_api_base`: 智谱AI 平台的 BASE URL
- `open_ai_api_key`: 智谱AI 平台的 API KEY
</details>
@@ -439,35 +485,6 @@ sudo docker logs -f chatgpt-on-wechat
- `open_ai_api_key`: 通义千问的 API-KEY
</details>
<details>
<summary>Kimi (Moonshot)</summary>
方式一:官方接入,配置如下:
```json
{
"model": "kimi-k2.5",
"moonshot_api_key": ""
}
```
- `model`: 可填写 `kimi-k2.5、kimi-k2、moonshot-v1-8k、moonshot-v1-32k、moonshot-v1-128k`
- `moonshot_api_key`: Moonshot 的 API-KEY在 [控制台](https://platform.moonshot.cn/console/api-keys) 创建
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
"model": "kimi-k2.5",
"open_ai_api_base": "https://api.moonshot.cn/v1",
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI 兼容方式
- `model`: 可填写 `kimi-k2.5、kimi-k2、moonshot-v1-8k、moonshot-v1-32k、moonshot-v1-128k`
- `open_ai_api_base`: Moonshot 的 BASE URL
- `open_ai_api_key`: Moonshot 的 API-KEY
</details>
<details>
<summary>豆包 (Doubao)</summary>
@@ -487,67 +504,74 @@ sudo docker logs -f chatgpt-on-wechat
</details>
<details>
<summary>Claude</summary>
<summary>Kimi (Moonshot)</summary>
1. API Key 创建:在 [Claude控制台](https://console.anthropic.com/settings/keys) 创建 API Key
方式一:官方接入,配置如下:
```json
{
"model": "kimi-k2.6",
"moonshot_api_key": ""
}
```
- `model`: 可填写 `kimi-k2.6、kimi-k2.5、kimi-k2、moonshot-v1-8k、moonshot-v1-32k、moonshot-v1-128k`
- `moonshot_api_key`: Moonshot 的 API-KEY在 [控制台](https://platform.moonshot.cn/console/api-keys) 创建
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
"model": "kimi-k2.6",
"open_ai_api_base": "https://api.moonshot.cn/v1",
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI 兼容方式
- `model`: 可填写 `kimi-k2.6、kimi-k2.5、kimi-k2、moonshot-v1-8k、moonshot-v1-32k、moonshot-v1-128k`
- `open_ai_api_base`: Moonshot 的 BASE URL
- `open_ai_api_key`: Moonshot 的 API-KEY
</details>
<details>
<summary>ModelScope</summary>
```json
{
"bot_type": "modelscope",
"model": "Qwen/QwQ-32B",
"modelscope_api_key": "your_api_key",
"modelscope_base_url": "https://api-inference.modelscope.cn/v1/chat/completions",
"text_to_image": "MusePublic/489_ckpt_FLUX_1"
}
```
- `bot_type`: modelscope 接口格式
- `model`: 参考[模型列表](https://www.modelscope.cn/models?filter=inference_type&page=1)
- `modelscope_api_key`: 参考 [官方文档-访问令牌](https://modelscope.cn/docs/accounts/token) ,在 [控制台](https://modelscope.cn/my/myaccesstoken)
- `modelscope_base_url`: modelscope 平台的 BASE URL
- `text_to_image`: 图像生成模型,参考[模型列表](https://www.modelscope.cn/models?filter=inference_type&page=1)
</details>
<details>
<summary>LinkAI</summary>
1. API Key 创建:在 [LinkAI平台](https://link-ai.tech/console/interface) 创建 API Key
2. 填写配置
```json
{
"model": "claude-sonnet-4-6",
"claude_api_key": "YOUR_API_KEY"
"model": "gpt-5.4-mini",
"use_linkai": true,
"linkai_api_key": "YOUR API KEY"
}
```
- `model`: 参考 [官方模型ID](https://docs.anthropic.com/en/docs/about-claude/models/overview#model-aliases) ,支持 `claude-sonnet-4-6、claude-opus-4-6、claude-sonnet-4-5、claude-sonnet-4-0、claude-opus-4-0、claude-3-5-sonnet-latest`
+ `use_linkai`: 是否使用 LinkAI 接口,默认关闭,设置为 true 后可对接 LinkAI 平台的模型,并使用知识库、工作流、数据库、插件等丰富的 Agent 技能
+ `linkai_api_key`: LinkAI 平台的 API Key可在 [控制台](https://link-ai.tech/console/interface) 中创建
+ `model`: [模型列表](https://link-ai.tech/console/models)中的全部模型均可使用
</details>
<details>
<summary>Gemini</summary>
API Key 创建:在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn) 创建 API Key ,配置如下
```json
{
"model": "gemini-3.1-flash-lite-preview",
"gemini_api_key": ""
}
```
- `model`: 参考[官方文档-模型列表](https://ai.google.dev/gemini-api/docs/models?hl=zh-cn),支持 `gemini-3.1-flash-lite-preview、gemini-3.1-pro-preview、gemini-3-flash-preview、gemini-3-pro-preview`
</details>
<details>
<summary>DeepSeek</summary>
1. API Key 创建:在 [DeepSeek 平台](https://platform.deepseek.com/api_keys) 创建 API Key
2. 填写配置
方式一:官方接入(推荐):
```json
{
"model": "deepseek-chat",
"deepseek_api_key": "sk-xxxxxxxxxxx"
}
```
- `model`: 可填 `deepseek-chat、deepseek-reasoner`,分别对应的是 DeepSeek-V3.2(非思考模式)和 DeepSeek-R1思考模式
- `deepseek_api_key`: DeepSeek 平台的 API Key
- `deepseek_api_base`: 可选,默认为 `https://api.deepseek.com/v1`,可修改为第三方代理地址
方式二OpenAI 兼容方式接入:
```json
{
"model": "deepseek-chat",
"bot_type": "openai",
"open_ai_api_key": "sk-xxxxxxxxxxx",
"open_ai_api_base": "https://api.deepseek.com/v1"
}
```
</details>
<details>
<summary>Azure</summary>
@@ -575,33 +599,35 @@ API Key 创建:在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn
</details>
<details>
<summary>百度文心</summary>
方式一:官方 SDK 接入,配置如下:
<summary>百度千帆 / ERNIE</summary>
方式一:官方接入(推荐),配置如下:
```json
{
"model": "wenxin-4",
"baidu_wenxin_api_key": "IajztZ0bDxgnP9bEykU7lBer",
"baidu_wenxin_secret_key": "EDPZn6L24uAS9d8RWFfotK47dPvkjD6G"
"model": "ernie-5.0",
"qianfan_api_key": "",
"qianfan_api_base": "https://qianfan.baidubce.com/v2"
}
```
- `model`: 可填 `wenxin``wenxin-4`,对应模型为 文心-3.5 和 文心-4.0
- `baidu_wenxin_api_key`:参考 [千帆平台-access_token鉴权](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/dlv4pct3s) 文档获取 API Key
- `baidu_wenxin_secret_key`:参考 [千帆平台-access_token鉴权](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/dlv4pct3s) 文档获取 Secret Key
- `model`: 默认推荐填写 `ernie-5.0`(多模态,可直接识图),也可填写 `ernie-x1.1``ernie-4.5-turbo-128k``ernie-4.5-turbo-32k`;当主模型为纯文本 ERNIE 时Vision 工具会自动 fallback 到 `ernie-4.5-turbo-vl`
- `qianfan_api_key`: 百度千帆 API Key通常以 `bce-v3/` 开头,可在百度智能云控制台创建
- `qianfan_api_base`: 可选,默认为 `https://qianfan.baidubce.com/v2`
方式二OpenAI 兼容方式接入,配置如下:
```json
{
"bot_type": "openai",
"model": "ERNIE-4.0-Turbo-8K",
"model": "ernie-5.0",
"open_ai_api_base": "https://qianfan.baidubce.com/v2",
"open_ai_api_key": "bce-v3/ALTxxxxxxd2b"
"open_ai_api_key": ""
}
```
- `bot_type`: OpenAI 兼容方式
- `model`: 支持官方所有模型,参考[模型列表](https://cloud.baidu.com/doc/WENXINWORKSHOP/s/Wm9cvy6rl)
- `open_ai_api_base`: 百度文心 API 的 BASE URL
- `open_ai_api_key`: 百度文心的 API-KEY参考 [官方文档](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) ,在 [控制台](https://console.bce.baidu.com/iam/#/iam/apikey/list) 创建 API Key
- `model`: 支持千帆平台上的 ERNIE 模型
- `open_ai_api_base`: 百度千帆 OpenAI 兼容 API 的 BASE URL
- `open_ai_api_key`: 百度千帆 API Key
</details>
@@ -640,26 +666,6 @@ API Key 创建:在 [控制台](https://aistudio.google.com/app/apikey?hl=zh-cn
- `open_ai_api_key`: 讯飞星火平台的[APIPassword](https://console.xfyun.cn/services/bm3) ,因模型而已
</details>
<details>
<summary>ModelScope</summary>
```json
{
"bot_type": "modelscope",
"model": "Qwen/QwQ-32B",
"modelscope_api_key": "your_api_key",
"modelscope_base_url": "https://api-inference.modelscope.cn/v1/chat/completions",
"text_to_image": "MusePublic/489_ckpt_FLUX_1"
}
```
- `bot_type`: modelscope 接口格式
- `model`: 参考[模型列表](https://www.modelscope.cn/models?filter=inference_type&page=1)
- `modelscope_api_key`: 参考 [官方文档-访问令牌](https://modelscope.cn/docs/accounts/token) ,在 [控制台](https://modelscope.cn/my/myaccesstoken)
- `modelscope_base_url`: modelscope 平台的 BASE URL
- `text_to_image`: 图像生成模型,参考[模型列表](https://www.modelscope.cn/models?filter=inference_type&page=1)
</details>
<details>
<summary>Coding Plan</summary>
@@ -722,36 +728,26 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
<details>
<summary>3. Feishu - 飞书</summary>
飞书支持两种事件接收模式:WebSocket 长连接(推荐)和 Webhook
飞书使用 WebSocket 长连接模式,无需公网 IP。详细步骤参考 [飞书接入](https://docs.cowagent.ai/channels/feishu)
**方式一:WebSocket 模式(推荐,无需公网 IP**
**方式一:扫码一键创建(推荐**
启动 Cow 后打开 Web 控制台,**通道** → **接入通道** → 选择 **飞书** → 扫码创建。也支持 CLI 启动时在终端打印二维码。
**方式二:手动配置**
在飞书开放平台创建自建应用并配置权限后,将凭据填入 `config.json`
```json
{
"channel_type": "feishu",
"feishu_app_id": "APP_ID",
"feishu_app_secret": "APP_SECRET",
"feishu_event_mode": "websocket"
"feishu_stream_reply": true
}
```
**方式二Webhook 模式(需要公网 IP**
```json
{
"channel_type": "feishu",
"feishu_app_id": "APP_ID",
"feishu_app_secret": "APP_SECRET",
"feishu_token": "VERIFICATION_TOKEN",
"feishu_event_mode": "webhook",
"feishu_port": 9891
}
```
- `feishu_event_mode`: 事件接收模式,`websocket`(推荐)或 `webhook`
- WebSocket 模式需安装依赖:`pip3 install lark-oapi`
详细步骤和参数说明参考 [飞书接入](https://docs.cowagent.ai/channels/feishu)
- `feishu_stream_reply`:是否开启流式打字机回复,默认开启(需 `cardkit:card:write` 权限 + 飞书客户端 ≥ 7.20
</details>
@@ -773,7 +769,15 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
<details>
<summary>5. WeCom Bot - 企微智能机器人</summary>
企微智能机器人使用 WebSocket 长连接模式,无需公网 IP 和域名,配置简单:
企微智能机器人使用 WebSocket 长连接模式,无需公网 IP 和域名。详细步骤参考 [企微智能机器人接入](https://docs.cowagent.ai/channels/wecom-bot)。
**方式一:扫码一键创建(推荐)**
启动 Cow 后打开 Web 控制台,**通道** → **接入通道** → 选择 **企微智能机器人** → 使用企业微信扫码创建。
**方式二:手动配置**
在企业微信中创建智能机器人并选择**长连接模式**,记录 Bot ID 和 Secret 后填入 `config.json`
```json
{
@@ -782,7 +786,6 @@ Coding Plan 是各厂商推出的编程包月套餐,所有厂商均可通过 O
"wecom_bot_secret": "YOUR_SECRET"
}
```
详细步骤和参数说明参考 [企微智能机器人接入](https://docs.cowagent.ai/channels/wecom-bot)
</details>

View File

@@ -0,0 +1,241 @@
"""
SessionService - Manages multi-session lifecycle for both web channel and cloud client.
Provides a unified interface for listing, deleting, renaming, clearing context,
and generating AI titles for conversation sessions. Backed by ConversationStore
(SQLite) and AgentBridge (in-memory agent instances).
"""
import re
from typing import Optional
from common.log import logger
def _truncate_fallback_title(user_message: str, max_len: int = 30) -> str:
"""Pick the first non-empty line of the user message and truncate it."""
if not user_message:
return "New Chat"
first_line = ""
for line in user_message.splitlines():
line = line.strip()
if line:
first_line = line
break
if not first_line:
return "New Chat"
if len(first_line) > max_len:
first_line = first_line[:max_len].rstrip() + "..."
return first_line
def generate_session_title(user_message: str, assistant_reply: str = "") -> str:
"""
Generate a short session title by calling the current bot's reply_text.
Falls back to the first line of the user message if the LLM call fails
or returns an obvious error sentinel.
"""
fallback = _truncate_fallback_title(user_message)
try:
from bridge.bridge import Bridge
from models.session_manager import Session
bot = Bridge().get_bot("chat")
prompt_parts = [f"User: {user_message[:300]}"]
if assistant_reply:
prompt_parts.append(f"Assistant: {assistant_reply[:300]}")
session = Session("__title_gen__", system_prompt="")
session.messages = [
{"role": "user", "content": (
"Generate a very short title (max 15 characters for Chinese, max 6 words for English) "
"summarizing this conversation. Return ONLY the title text, nothing else.\n\n"
+ "\n".join(prompt_parts)
)}
]
result = bot.reply_text(session) or {}
# When bots fail (network error, auth error, rate limit, etc.) they
# typically return completion_tokens=0 with a sentinel content like
# "请再问我一次吧" / "我现在有点累了". Treat that as failure.
completion_tokens = result.get("completion_tokens", 0) or 0
raw = (result.get("content") or "").strip()
if completion_tokens <= 0:
logger.warning(
f"[SessionService] Title generation got empty completion "
f"(completion_tokens={completion_tokens}, content='{raw[:50]}'), "
f"using fallback")
return fallback
title = re.sub(r'<think>.*?</think>', '', raw, flags=re.DOTALL).strip().strip('"\'')
logger.info(f"[SessionService] Title generation result: '{title}' (len={len(title)})")
if title and len(title) <= 50:
return title
except Exception as e:
logger.warning(f"[SessionService] Title generation failed: {e}")
return fallback
class SessionService:
"""
High-level service for session lifecycle management.
Usage:
svc = SessionService()
result = svc.dispatch("list", {"channel_type": "web", "page": 1})
"""
def _get_store(self):
from agent.memory import get_conversation_store
return get_conversation_store()
def _remove_agent(self, session_id: str):
"""Remove the in-memory Agent instance for a session if it exists."""
try:
from bridge.bridge import Bridge
ab = Bridge().get_agent_bridge()
if session_id in ab.agents:
del ab.agents[session_id]
logger.info(f"[SessionService] Removed agent instance: {session_id}")
except Exception:
pass
@staticmethod
def _normalize_sid(session_id: str) -> str:
if session_id and not session_id.startswith("session_"):
return f"session_{session_id}"
return session_id
# ------------------------------------------------------------------
# actions
# ------------------------------------------------------------------
def list_sessions(self, channel_type: Optional[str] = None,
page: int = 1, page_size: int = 50) -> dict:
store = self._get_store()
return store.list_sessions(
channel_type=channel_type,
page=page,
page_size=page_size,
)
def delete_session(self, session_id: str) -> None:
if not session_id:
raise ValueError("session_id required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
store.clear_session(session_id)
self._remove_agent(session_id)
logger.info(f"[SessionService] Session deleted: {session_id}")
def rename_session(self, session_id: str, title: str) -> None:
if not session_id:
raise ValueError("session_id required")
if not title:
raise ValueError("title required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
found = store.rename_session(session_id, title)
if not found:
raise ValueError("session not found")
def clear_context(self, session_id: str) -> int:
"""
Set context boundary. Returns the new context_start_seq value.
"""
if not session_id:
raise ValueError("session_id required")
session_id = self._normalize_sid(session_id)
store = self._get_store()
new_seq = store.clear_context(session_id)
self._remove_agent(session_id)
return new_seq
def gen_title(self, session_id: str, user_message: str,
assistant_reply: str = "") -> str:
"""
Generate an AI title and persist it. Returns the generated title.
"""
if not session_id:
raise ValueError("session_id required")
if not user_message:
raise ValueError("user_message required")
session_id = self._normalize_sid(session_id)
title = generate_session_title(user_message, assistant_reply)
store = self._get_store()
updated = store.rename_session(session_id, title)
logger.info(f"[SessionService] Title set: sid={session_id}, "
f"title='{title}', db_updated={updated}")
return title
# ------------------------------------------------------------------
# dispatch — single entry point for protocol messages
# ------------------------------------------------------------------
def dispatch(self, action: str, payload: Optional[dict] = None) -> dict:
"""
Dispatch a session management action and return a protocol-compatible
response dict.
Action names use a ``*_session`` / session-prefixed convention so they
can coexist with history actions (e.g. ``query``) on the same HISTORY
message channel without ambiguity.
Supported actions:
- list_sessions: list sessions with pagination
- delete_session: delete a session
- rename_session: rename a session title
- clear_context: set context boundary
- generate_title: AI-generate a session title
:param action: one of the above action names
:param payload: action-specific payload
:return: dict with action, code, message, payload
"""
payload = payload or {}
try:
if action == "list_sessions":
result = self.list_sessions(
channel_type=payload.get("channel_type"),
page=int(payload.get("page", 1)),
page_size=int(payload.get("page_size", 50)),
)
return {"action": action, "code": 200, "message": "success", "payload": result}
elif action == "delete_session":
self.delete_session(payload.get("session_id", ""))
return {"action": action, "code": 200, "message": "success", "payload": None}
elif action == "rename_session":
self.rename_session(
payload.get("session_id", ""),
payload.get("title", "").strip(),
)
return {"action": action, "code": 200, "message": "success", "payload": None}
elif action == "clear_context":
new_seq = self.clear_context(payload.get("session_id", ""))
return {"action": action, "code": 200, "message": "success",
"payload": {"context_start_seq": new_seq}}
elif action == "generate_title":
title = self.gen_title(
payload.get("session_id", ""),
payload.get("user_message", ""),
payload.get("assistant_reply", ""),
)
return {"action": action, "code": 200, "message": "success",
"payload": {"title": title}}
else:
return {"action": action, "code": 400,
"message": f"unknown action: {action}", "payload": None}
except ValueError as e:
return {"action": action, "code": 400, "message": str(e), "payload": None}
except Exception as e:
logger.error(f"[SessionService] dispatch error: action={action}, error={e}")
return {"action": action, "code": 500, "message": str(e), "payload": None}

View File

@@ -34,7 +34,8 @@ class KnowledgeService:
# ------------------------------------------------------------------
def list_tree(self) -> dict:
"""
Return the knowledge directory tree grouped by category.
Return the knowledge directory tree grouped by category,
supporting arbitrarily nested sub-directories.
Returns::
@@ -44,10 +45,20 @@ class KnowledgeService:
"dir": "concepts",
"files": [
{"name": "moe.md", "title": "MoE", "size": 1234},
...
],
"children": []
},
{
"dir": "platform",
"files": [],
"children": [
{
"dir": "analysis",
"files": [{"name": "perf.md", ...}],
"children": []
}
]
},
...
],
"stats": {"pages": 15, "size": 32768},
"enabled": true
@@ -56,37 +67,48 @@ class KnowledgeService:
if not os.path.isdir(self.knowledge_dir):
return {"tree": [], "stats": {"pages": 0, "size": 0}, "enabled": conf().get("knowledge", True)}
tree = []
total_files = 0
total_bytes = 0
for name in sorted(os.listdir(self.knowledge_dir)):
full = os.path.join(self.knowledge_dir, name)
if not os.path.isdir(full) or name.startswith("."):
continue
files = []
for fname in sorted(os.listdir(full)):
if fname.endswith(".md") and not fname.startswith("."):
fpath = os.path.join(full, fname)
size = os.path.getsize(fpath)
total_files += 1
total_bytes += size
title = fname.replace(".md", "")
try:
with open(fpath, "r", encoding="utf-8") as f:
first_line = f.readline().strip()
if first_line.startswith("# "):
title = first_line[2:].strip()
except Exception:
pass
files.append({"name": fname, "title": title, "size": size})
tree.append({"dir": name, "files": files})
stats = {"pages": 0, "size": 0}
root_files, tree = self._scan_dir(self.knowledge_dir, stats, is_root=True)
return {
"root_files": root_files,
"tree": tree,
"stats": {"pages": total_files, "size": total_bytes},
"stats": stats,
"enabled": conf().get("knowledge", True),
}
def _scan_dir(self, dir_path: str, stats: dict, is_root: bool = False) -> tuple:
"""
Recursively scan a directory.
:return: (files, children) where files is a list of .md file dicts
in this directory and children is a list of sub-directory nodes.
"""
files = []
children = []
for name in sorted(os.listdir(dir_path)):
if name.startswith("."):
continue
full = os.path.join(dir_path, name)
if os.path.isdir(full):
sub_files, sub_children = self._scan_dir(full, stats)
children.append({"dir": name, "files": sub_files, "children": sub_children})
elif name.endswith(".md"):
size = os.path.getsize(full)
if not is_root:
stats["pages"] += 1
stats["size"] += size
title = name.replace(".md", "")
try:
with open(full, "r", encoding="utf-8") as f:
first_line = f.readline().strip()
if first_line.startswith("# "):
title = first_line[2:].strip()
except Exception:
pass
files.append({"name": name, "title": title, "size": size})
return files, children
# ------------------------------------------------------------------
# read — single file content
# ------------------------------------------------------------------

View File

@@ -139,6 +139,7 @@ def _extract_tool_results(content: Any) -> Dict[str, str]:
def _group_into_display_turns(
rows: List[tuple],
include_thinking: bool = True,
) -> List[Dict[str, Any]]:
"""
Convert raw (role, content_json, created_at) DB rows into display turns.
@@ -216,6 +217,8 @@ def _group_into_display_turns(
continue
btype = block.get("type")
if btype == "thinking":
if not include_thinking:
continue
txt = block.get("thinking", "").strip()
if txt:
steps.append({"type": "thinking", "content": txt})
@@ -496,6 +499,107 @@ class ConversationStore:
finally:
conn.close()
def prune_scheduled_messages(
self,
session_id: str,
keep_last_n: int,
markers: Optional[List[str]] = None,
) -> int:
"""
Keep at most ``keep_last_n`` scheduler-injected user/assistant pairs in
the session, deleting the older ones.
A scheduler-injected pair is identified by a user message whose first
text block starts with one of ``markers``; the immediately following
assistant message (next seq) is treated as its paired output.
Only scheduler-tagged messages are touched; regular user turns are
never deleted. Safe to call repeatedly; no-op if nothing to prune.
Args:
session_id: Session to prune.
keep_last_n: Maximum scheduler pairs to retain (must be >= 0).
markers: Text prefixes that identify scheduler user messages.
Defaults to ``["[SCHEDULED]", "Scheduled task"]`` so that
pairs written by older versions are also recognised.
Returns:
Number of message rows deleted.
"""
if keep_last_n < 0:
keep_last_n = 0
if markers is None:
markers = ["[SCHEDULED]", "Scheduled task"]
def _matches_marker(raw_content: str) -> bool:
try:
parsed = json.loads(raw_content)
except Exception:
parsed = raw_content
text = _extract_display_text(parsed) if not isinstance(parsed, str) else parsed
if not text:
return False
return any(text.startswith(m) for m in markers)
with self._lock:
conn = self._connect()
try:
rows = conn.execute(
"""
SELECT seq, role, content
FROM messages
WHERE session_id = ?
ORDER BY seq ASC
""",
(session_id,),
).fetchall()
# Find scheduler pairs: each is (user_seq, assistant_seq?)
pairs: List[tuple] = [] # list of (user_seq, assistant_seq_or_None)
for idx, (seq, role, raw_content) in enumerate(rows):
if role != "user" or not _matches_marker(raw_content):
continue
assistant_seq = None
# Pair with the very next message if it's an assistant turn.
if idx + 1 < len(rows):
next_seq, next_role, _ = rows[idx + 1]
if next_role == "assistant":
assistant_seq = next_seq
pairs.append((seq, assistant_seq))
if len(pairs) <= keep_last_n:
return 0
to_delete_pairs = pairs[: len(pairs) - keep_last_n]
seqs_to_delete: List[int] = []
for user_seq, assistant_seq in to_delete_pairs:
seqs_to_delete.append(user_seq)
if assistant_seq is not None:
seqs_to_delete.append(assistant_seq)
if not seqs_to_delete:
return 0
placeholders = ",".join("?" * len(seqs_to_delete))
with conn:
conn.execute(
f"DELETE FROM messages WHERE session_id = ? AND seq IN ({placeholders})",
(session_id, *seqs_to_delete),
)
conn.execute(
"""
UPDATE sessions
SET msg_count = (
SELECT COUNT(*) FROM messages WHERE session_id = ?
)
WHERE session_id = ?
""",
(session_id, session_id),
)
return len(seqs_to_delete)
finally:
conn.close()
def cleanup_old_sessions(self, max_age_days: Optional[int] = None) -> int:
"""
Delete sessions that have not been active within max_age_days.
@@ -601,9 +705,17 @@ class ConversationStore:
finally:
conn.close()
# Honour the current enable_thinking switch when building display turns
# so that toggling it off hides previously-saved thinking blocks too.
try:
from config import conf
include_thinking = bool(conf().get("enable_thinking", False))
except Exception:
include_thinking = False
# Strip seq for display grouping, but record max seq per visible user group
plain_rows = [(role, content, created_at) for _seq, role, content, created_at in rows]
visible = _group_into_display_turns(plain_rows)
visible = _group_into_display_turns(plain_rows, include_thinking=include_thinking)
# Build a mapping: find the seq of each visible user message to annotate context boundary.
# Walk through rows to find visible user message seqs in order.

View File

@@ -57,6 +57,7 @@ MEMORY.md 会注入每次对话的系统提示词中,因此必须保持精炼
- **清理无效**:删除临时性记录、空白条目、格式残留、无意义、重复内容等
- **删除冗余**:已被更精炼表述涵盖的旧条目应删除,避免信息重复
- 每条一行,用 "- " 开头,不带日期前缀
- 可用 "## 标题" 对相关条目分组,使结构更清晰
- 目标:控制在 50 条以内,每条尽量一句话概括
### Part 2: 梦境日记([DREAM]
@@ -114,7 +115,7 @@ class MemoryFlushManager:
self.last_flush_timestamp: Optional[datetime] = None
self._trim_flushed_hashes: set = set() # Content hashes of already-flushed messages
self._last_flushed_content_hash: str = "" # Content hash at last flush, for daily dedup
self._last_dream_input_hash: str = "" # Hash of dream input, for dedup
self._last_dream_input_hash: str = "" # "{date}:{daily_hash}" of last dream, for dedup
self._last_flush_thread: Optional[threading.Thread] = None
def get_today_memory_file(self, user_id: Optional[str] = None, ensure_exists: bool = False) -> Path:
@@ -174,6 +175,15 @@ class MemoryFlushManager:
injection.
"""
try:
# Strip scheduler-injected pairs before any further processing.
# These messages already serve as short-term context inside the
# receiver session; promoting them into long-term daily memory
# produces low-value flat logs (e.g. "11:28 price=1013, normal /
# 11:58 price=1013, normal / ...") and wastes summarisation tokens.
messages = self._strip_scheduler_pairs(messages)
if not messages:
return False
import hashlib
deduped = []
for m in messages:
@@ -322,13 +332,18 @@ class MemoryFlushManager:
logger.info("[DeepDream] No recent daily records, skipping to preserve existing MEMORY.md")
return False
# Dedup: skip if input materials haven't changed since last dream
# Dedup: skip if same daily content already dreamed today.
# Note: only hash daily_content (not memory_content), because deep_dream
# itself rewrites MEMORY.md as a side effect, which would otherwise
# invalidate the hash on every subsequent call within the same window.
import hashlib
input_hash = hashlib.md5((memory_content + daily_content).encode("utf-8")).hexdigest()
if not force and input_hash == self._last_dream_input_hash:
logger.debug("[DeepDream] Input unchanged since last dream, skipping")
daily_hash = hashlib.md5(daily_content.encode("utf-8")).hexdigest()
today_str = datetime.now().strftime("%Y-%m-%d")
dedup_key = f"{today_str}:{daily_hash}"
if not force and dedup_key == self._last_dream_input_hash:
logger.info("[DeepDream] Already dreamed today with same daily content, skipping")
return False
self._last_dream_input_hash = input_hash
self._last_dream_input_hash = dedup_key
logger.info(
f"[DeepDream] Materials collected: "
@@ -641,6 +656,40 @@ class MemoryFlushManager:
return "\n".join(parts)
return ""
@classmethod
def _strip_scheduler_pairs(cls, messages: List[Dict]) -> List[Dict]:
"""Drop scheduler-injected user/assistant pairs from a flush batch.
A scheduler user message starts with the ``[SCHEDULED]`` marker
(written by ``AgentBridge.remember_scheduled_output``); the message
immediately following it (if it is an assistant turn) is its paired
output and is dropped together. Regular user/assistant turns and
any tool_use / tool_result blocks are preserved as-is.
"""
if not messages:
return messages
SCHEDULED_PREFIX = "[SCHEDULED]"
result = []
skip_next_assistant = False
for msg in messages:
if not isinstance(msg, dict):
result.append(msg)
skip_next_assistant = False
continue
role = msg.get("role")
if skip_next_assistant and role == "assistant":
skip_next_assistant = False
continue
skip_next_assistant = False
if role == "user":
text = cls._extract_text_from_content(msg.get("content", ""))
if text.lstrip().startswith(SCHEDULED_PREFIX):
skip_next_assistant = True
continue
result.append(msg)
return result
def create_memory_files_if_needed(workspace_dir: Path, user_id: Optional[str] = None):
"""

View File

@@ -13,6 +13,37 @@ from agent.tools.base_tool import BaseTool, ToolResult
from common.log import logger
# Maximum number of characters of model "reasoning / thinking" content to persist
# in conversation history. The full reasoning is still streamed to the UI in real
# time (subject to its own SSE / rendering limits); this bound only controls what
# is stored in DB and replayed in history. Long reasoning is not useful for later
# context (the LLM never sees thinking blocks anyway) and bloats DB.
# Keep aligned with the frontend REASONING_RENDER_CAP and the SSE
# MAX_REASONING_STREAM_CHARS so that storage / stream / display all match.
MAX_STORED_REASONING_CHARS = 4 * 1024 # 4 KB
# Marker inserted between head and tail when reasoning is truncated.
_REASONING_TRUNCATE_MARKER = "\n\n... [reasoning truncated, {omitted} chars omitted] ...\n\n"
def _truncate_reasoning_for_storage(text: str) -> str:
"""Trim long reasoning to head + tail with an omission marker.
Keeps the first and last halves of MAX_STORED_REASONING_CHARS so both the
initial chain-of-thought and the final conclusions are preserved for UI
replay, without storing the entire (often very large) middle.
"""
if not text:
return text
if len(text) <= MAX_STORED_REASONING_CHARS:
return text
half = MAX_STORED_REASONING_CHARS // 2
head = text[:half]
tail = text[-half:]
omitted = len(text) - len(head) - len(tail)
return head + _REASONING_TRUNCATE_MARKER.format(omitted=omitted) + tail
class AgentStreamExecutor:
"""
Agent Stream Executor
@@ -79,22 +110,47 @@ class AgentStreamExecutor:
logger.error(f"Event callback error: {e}")
def _is_thinking_enabled(self) -> bool:
"""Whether deep-thinking mode is on at the model layer.
Mirrors the global toggle used by ``bridge.agent_bridge`` when deciding
whether to send ``thinking={"type": "enabled"}`` to the model. Used for
logging and reasoning-update event emission across all channels.
"""
from config import conf
return bool(conf().get("enable_thinking", False))
def _should_render_thinking_inline(self) -> bool:
"""Whether ``<think>...</think>`` blocks embedded directly in ``content``
(MiniMax, some third-party proxies) should be surfaced to the channel.
Only the Web console can render them in a collapsible panel. IM channels
(WeChat/WeCom/DingTalk/Feishu) must strip them, otherwise users see raw
XML tags in their chat.
"""
from config import conf
channel_type = getattr(self.model, 'channel_type', '') or ''
return conf().get("enable_thinking", True) and channel_type == 'web'
return conf().get("enable_thinking", False) and channel_type == 'web'
def _filter_think_tags(self, text: str) -> str:
"""
Remove <think> and </think> tags but keep the content inside.
Some LLM providers (e.g., MiniMax) may return thinking process wrapped in <think> tags.
We only remove the tags themselves, keeping the actual thinking content.
Handle <think>...</think> blocks in content returned by some LLM providers
(e.g., MiniMax).
- When inline thinking rendering is allowed (Web + thinking enabled):
remove only the tags, keep the content inside.
- Otherwise (IM channels, or thinking disabled globally): remove both
the tags and the content entirely.
"""
if not text:
return text
import re
# Remove only the <think> and </think> tags, keep the content
text = re.sub(r'<think>', '', text)
text = re.sub(r'</think>', '', text)
if self._should_render_thinking_inline():
text = re.sub(r'<think>', '', text)
text = re.sub(r'</think>', '', text)
else:
text = re.sub(r'<think>[\s\S]*?</think>', '', text)
# Also strip unclosed <think> tag at the end (streaming partial)
text = re.sub(r'<think>[\s\S]*$', '', text)
return text
def _hash_args(self, args: dict) -> str:
@@ -185,8 +241,8 @@ class AgentStreamExecutor:
# Log user message with model info
thinking_enabled = self._is_thinking_enabled()
thinking_label = "💭 thinking" if thinking_enabled else "⚡ fast"
logger.info(f"🤖 {self.model.model} | {thinking_label} | 👤 {user_message}")
thinking_label = " | 💭 thinking" if thinking_enabled else ""
logger.info(f"🤖 {self.model.model}{thinking_label} | 👤 {user_message}")
# Add user message (Claude format - use content blocks for consistency)
self.messages.append({
@@ -235,6 +291,9 @@ class AgentStreamExecutor:
if turn > 1:
logger.info(f"[Agent] Requesting explicit response from LLM...")
# Remember position so we can remove the injected prompt later
prompt_insert_idx = len(self.messages)
# 添加一条消息,明确要求回复用户
self.messages.append({
"role": "user",
@@ -248,8 +307,24 @@ class AgentStreamExecutor:
assistant_msg, tool_calls = self._call_llm_stream(retry_on_empty=False)
final_response = assistant_msg
# 如果还是空,才使用 fallback
if not assistant_msg and not tool_calls:
# Remove the injected prompt from history so it doesn't
# appear as a user message in persisted conversations.
# _call_llm_stream may have appended an assistant message
# after the prompt, so we locate and remove only the prompt.
if (prompt_insert_idx < len(self.messages)
and self.messages[prompt_insert_idx].get("role") == "user"):
self.messages.pop(prompt_insert_idx)
logger.debug("[Agent] Removed injected explicit-response prompt from message history")
# If LLM responded with tool_calls instead of text, fall through
# to the tool execution path below (don't break the loop).
if tool_calls:
logger.info(
f"[Agent] LLM returned tool_calls in explicit-response retry, "
f"continuing to execute tools instead of breaking"
)
elif not assistant_msg:
# Still empty (no text and no tool_calls): use fallback
logger.warning(f"[Agent] Still empty after explicit request")
final_response = (
"抱歉,我暂时无法生成回复。请尝试换一种方式描述你的需求,或稍后再试。"
@@ -264,20 +339,28 @@ class AgentStreamExecutor:
else:
logger.info(f"💭 {assistant_msg[:150]}{'...' if len(assistant_msg) > 150 else ''}")
logger.debug(f"✅ 完成 (无工具调用)")
self._emit_event("turn_end", {
"turn": turn,
"has_tool_calls": False
})
break
# If the explicit-response retry produced tool_calls, skip the break
# and continue down to the tool execution branch in this same iteration.
if not tool_calls:
logger.debug(f"✅ 完成 (无工具调用)")
self._emit_event("turn_end", {
"turn": turn,
"has_tool_calls": False
})
break
# Log tool calls with arguments
# Log tool calls with arguments (truncate long values like base64)
tool_calls_str = []
for tc in tool_calls:
# Safely handle None or missing arguments
args = tc.get('arguments') or {}
if isinstance(args, dict):
args_str = ', '.join([f"{k}={v}" for k, v in args.items()])
parts = []
for k, v in args.items():
v_str = str(v)
if len(v_str) > 200:
v_str = v_str[:200] + f"...({len(v_str)} chars)"
parts.append(f"{k}={v_str}")
args_str = ', '.join(parts)
if args_str:
tool_calls_str.append(f"{tc['name']}({args_str})")
else:
@@ -631,8 +714,11 @@ class AgentStreamExecutor:
tool_calls_buffer[index]["arguments"] += func["arguments"]
# Preserve _gemini_raw_parts for Gemini thoughtSignature round-trip
# (direct Gemini: list of parts; LinkAI proxy: base64 string of JSON parts)
if "_gemini_raw_parts" in delta:
gemini_raw_parts = delta["_gemini_raw_parts"]
elif isinstance(choice, dict) and choice.get("_gemini_raw_parts"):
gemini_raw_parts = choice["_gemini_raw_parts"]
except Exception as e:
error_str = str(e)
@@ -799,9 +885,15 @@ class AgentStreamExecutor:
assistant_msg = {"role": "assistant", "content": []}
if full_reasoning:
stored_reasoning = _truncate_reasoning_for_storage(full_reasoning)
if len(stored_reasoning) < len(full_reasoning):
logger.info(
f"[reasoning] truncated for storage: "
f"{len(full_reasoning)} -> {len(stored_reasoning)} chars"
)
assistant_msg["content"].append({
"type": "thinking",
"thinking": full_reasoning
"thinking": stored_reasoning
})
if full_content:

View File

@@ -29,7 +29,7 @@ ENVIRONMENT: All API keys from env_config are auto-injected. Use $VAR_NAME direc
SAFETY:
- Freely create/modify/delete files within the workspace
- For destructive and out-of-workspace commands, explain and confirm first"""
- For destructive commands out of workspace, explain and confirm first"""
params: dict = {
"type": "object",
@@ -169,10 +169,16 @@ SAFETY:
except Exception as retry_err:
logger.warning(f"[Bash] Retry failed: {retry_err}")
# Combine stdout and stderr
output = result.stdout
if result.stderr:
output += "\n" + result.stderr
# When command succeeds with stdout, keep output clean (stderr goes to server log only).
# When command fails or stdout is empty, include stderr so the agent can diagnose.
if result.returncode == 0 and result.stdout.strip():
output = result.stdout
if result.stderr:
logger.info(f"[Bash] stderr (not forwarded): {result.stderr[:500]}")
else:
output = result.stdout
if result.stderr:
output += "\n" + result.stderr
# Check if we need to save full output to temp file
temp_file_path = None
@@ -232,48 +238,43 @@ SAFETY:
def _get_safety_warning(self, command: str) -> str:
"""
Get safety warning for potentially dangerous commands
Only warns about extremely dangerous system-level operations
Get safety warning for absolutely catastrophic commands only.
Keep the blocklist minimal so the agent retains maximum freedom.
:param command: Command to check
:return: Warning message if dangerous, empty string if safe
"""
cmd_lower = command.lower().strip()
# Tokenize to avoid substring false positives (e.g. `rm -rf /tmp/x`
# must not match `rm -rf /`).
tokens = command.lower().split()
# Only block extremely dangerous system operations
dangerous_patterns = [
# System shutdown/reboot
("shutdown", "This command will shut down the system"),
("reboot", "This command will reboot the system"),
("halt", "This command will halt the system"),
("poweroff", "This command will power off the system"),
# `rm -rf /` or `rm -rf /*` targeting the real root.
for i, tok in enumerate(tokens):
if tok != "rm":
continue
has_rf = False
for j in range(i + 1, len(tokens)):
t = tokens[j]
if t.startswith("-") and "r" in t and "f" in t:
has_rf = True
elif t in ("--recursive", "--force"):
continue
elif t in ("/", "/*"):
if has_rf:
return "This command will delete the entire filesystem"
break
else:
break
# Critical system modifications
("rm -rf /", "This command will delete the entire filesystem"),
("rm -rf /*", "This command will delete the entire filesystem"),
("dd if=/dev/zero", "This command can destroy disk data"),
("mkfs", "This command will format a filesystem, destroying all data"),
("fdisk", "This command modifies disk partitions"),
# Disk wiping
if "if=/dev/zero" in command.lower() and "dd " in command.lower():
return "This command can destroy disk data"
# User/system management (only if targeting system users)
("userdel root", "This command will delete the root user"),
("passwd root", "This command will change the root password"),
]
# Power control - match only as a standalone word (\b enforces word boundary)
if re.search(r'\b(shutdown|reboot|halt|poweroff)\b', command.lower()):
return "This command will shut down or restart the system"
for pattern, warning in dangerous_patterns:
if pattern in cmd_lower:
return warning
# Check for recursive deletion outside workspace
if "rm" in cmd_lower and "-rf" in cmd_lower:
# Allow deletion within current workspace
if not any(path in cmd_lower for path in ["./", self.cwd.lower()]):
# Check if targeting system directories
system_dirs = ["/bin", "/usr", "/etc", "/var", "/home", "/root", "/sys", "/proc"]
if any(sysdir in cmd_lower for sysdir in system_dirs):
return "This command will recursively delete system directories"
return "" # No warning needed
return ""
@staticmethod
def _convert_env_vars_for_windows(command: str, dotenv_vars: dict) -> str:

View File

@@ -84,6 +84,49 @@ def get_scheduler_service():
return _scheduler_service
def _remember_delivered_output(
agent_bridge,
task: dict,
channel_type: str,
content: str,
) -> None:
"""Best-effort persistence of the message the scheduler sent to a user.
Uses notify_session_id (the real chat session_id stored at task creation time)
so that group chats correctly associate the output with the user's conversation.
Falls back to receiver for backward compatibility with old tasks.
Per-action-type behaviour:
- agent_task / tool_call / skill_call: gated by ``scheduler_inject_to_session``
(default True). These produce AI-generated content worth remembering.
- send_message: additionally gated by ``scheduler_inject_send_message``
(default False). Fixed reminder text rarely benefits follow-up Q&A and
would just consume context tokens.
"""
if not content:
return
action = task.get("action", {})
action_type = action.get("type", "")
# send_message defaults to NOT being injected; explicit opt-in via config.
if action_type == "send_message":
if not conf().get("scheduler_inject_send_message", False):
return
session_id = action.get("notify_session_id") or action.get("receiver")
if not session_id:
return
try:
remember = getattr(agent_bridge, "remember_scheduled_output", None)
if remember:
task_desc = action.get("task_description") or action.get("content", "")
remember(session_id, str(content), channel_type=channel_type, task_description=task_desc)
except Exception as e:
logger.warning(
f"[Scheduler] Failed to remember delivered output for {session_id}: {e}"
)
def _execute_agent_task(task: dict, agent_bridge):
"""
Execute an agent_task action - let Agent handle the task
@@ -165,6 +208,7 @@ def _execute_agent_task(task: dict, agent_bridge):
# Send the reply
channel.send(reply, context)
_remember_delivered_output(agent_bridge, task, channel_type, reply.content)
logger.info(f"[Scheduler] Task {task['id']} executed successfully, result sent to {receiver}")
else:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
@@ -255,6 +299,7 @@ def _execute_send_message(task: dict, agent_bridge):
logger.debug(f"[Scheduler] Registered request_id {request_id} -> session {receiver}")
channel.send(reply, context)
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: sent message to {receiver}")
else:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
@@ -351,6 +396,7 @@ def _execute_tool_call(task: dict, agent_bridge):
logger.debug(f"[Scheduler] Registered request_id {request_id} -> session {receiver}")
channel.send(reply, context)
_remember_delivered_output(agent_bridge, task, channel_type, content)
logger.info(f"[Scheduler] Task {task['id']} executed: sent tool result to {receiver}")
else:
logger.error(f"[Scheduler] Failed to create channel: {channel_type}")
@@ -429,6 +475,24 @@ def _execute_skill_call(task: dict, agent_bridge):
if result_prefix:
content = f"{result_prefix}\n\n{content}"
# Send the result via channel
from channel.channel_factory import create_channel
try:
channel = create_channel(channel_type)
if channel:
# For web channel, register request_id
if channel_type == "web" and hasattr(channel, 'request_to_session'):
req_id = context.get("request_id")
if req_id:
channel.request_to_session[req_id] = receiver
logger.debug(f"[Scheduler] Registered request_id {req_id} -> session {receiver}")
channel.send(Reply(ReplyType.TEXT, content), context)
_remember_delivered_output(agent_bridge, task, channel_type, content)
except Exception as e:
logger.error(f"[Scheduler] Failed to send skill result: {e}")
logger.info(f"[Scheduler] Task {task['id']} executed: skill result sent to {receiver}")
else:
logger.error(f"[Scheduler] Task {task['id']}: No result from skill execution")

View File

@@ -158,6 +158,11 @@ class SchedulerTool(BaseTool):
# Create task
task_id = str(uuid.uuid4())[:8]
# Capture the real chat session_id at task creation time so that scheduler
# can later inject the delivered output into the user's actual conversation
# (in group chats, session_id != receiver, e.g. "user_id:group_id" on feishu).
notify_session_id = context.get("session_id")
# Build action based on message or ai_task
if message:
action = {
@@ -166,7 +171,8 @@ class SchedulerTool(BaseTool):
"receiver": context.get("receiver"),
"receiver_name": self._get_receiver_name(context),
"is_group": context.get("isgroup", False),
"channel_type": self.config.get("channel_type", "unknown")
"channel_type": self.config.get("channel_type", "unknown"),
"notify_session_id": notify_session_id,
}
else: # ai_task
action = {
@@ -175,7 +181,8 @@ class SchedulerTool(BaseTool):
"receiver": context.get("receiver"),
"receiver_name": self._get_receiver_name(context),
"is_group": context.get("isgroup", False),
"channel_type": self.config.get("channel_type", "unknown")
"channel_type": self.config.get("channel_type", "unknown"),
"notify_session_id": notify_session_id,
}
# 针对钉钉单聊,额外存储 sender_staff_id

View File

@@ -8,7 +8,10 @@ Truncation is based on two independent limits - whichever is hit first wins:
Never returns partial lines (except bash tail truncation edge case).
"""
from typing import Dict, Any, Optional, Literal, Tuple
from __future__ import annotations
from typing import Dict, Any, Optional, Tuple, TYPE_CHECKING
if TYPE_CHECKING:
from typing import Literal
DEFAULT_MAX_LINES = 2000

View File

@@ -2,12 +2,18 @@
Vision tool - Analyze images using Vision API.
Supports local files (auto base64-encoded) and HTTP URLs.
Provider priority (default):
1. Main model via bot.call_vision — zero extra cost
2. Other models whose API key is configured — auto-discovered
3. OpenAI / LinkAI raw HTTP — reliable fallback
When use_linkai=true, LinkAI is promoted to #1.
When tool.vision.model is set, that model is used exclusively first.
Provider resolution:
- tool.vision.model (if set) means "prefer this model first; fall back to
other configured providers if it fails". The model name is mapped to its
native provider (e.g. doubao-* → Doubao, kimi-* → Moonshot, gpt-* →
OpenAI/LinkAI). That provider is tried first, then the standard auto
chain runs as fallback (with the preferred provider de-duplicated).
- Auto chain priority:
1. Main model via bot.call_vision — only when the main bot is known
to actually support vision (not just expose a call_vision method).
2. Other models whose API key is configured.
3. OpenAI / LinkAI raw HTTP.
When use_linkai=true, LinkAI is promoted to #1.
"""
import base64
@@ -43,15 +49,35 @@ _MAIN_MODEL_PROVIDER_NAME = "MainModel"
# Auto-discovered as fallback vision providers when their API key is configured.
# OpenAI and LinkAI are handled separately (raw HTTP providers), so not listed here.
_DISCOVERABLE_MODELS = [
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_5, "Moonshot"),
("moonshot_api_key", const.MOONSHOT, const.KIMI_K2_6, "Moonshot"),
("ark_api_key", const.DOUBAO, const.DOUBAO_SEED_2_PRO, "Doubao"),
("dashscope_api_key", const.QWEN_DASHSCOPE, const.QWEN36_PLUS, "DashScope"),
("claude_api_key", const.CLAUDEAPI, const.CLAUDE_4_6_SONNET, "Claude"),
("gemini_api_key", const.GEMINI, const.GEMINI_31_FLASH_LITE_PRE, "Gemini"),
("qianfan_api_key", const.QIANFAN, const.ERNIE_45_TURBO_VL, "Qianfan"),
("zhipu_ai_api_key", const.ZHIPU_AI, const.GLM_4_7, "ZhipuAI"),
("minimax_api_key", const.MiniMax, const.MINIMAX_M2_7, "MiniMax"),
]
# Model name prefix → discoverable provider display_name.
# Used to auto-route tool.vision.model to its native provider.
# Matched case-insensitively; longest prefix wins.
_MODEL_PREFIX_TO_PROVIDER = [
("doubao-", "Doubao"),
("kimi-", "Moonshot"),
("moonshot-", "Moonshot"),
("qwen", "DashScope"), # qwen-*, qwen3-*, qwen3.6-*, etc.
("claude-", "Claude"),
("ernie-", "Qianfan"),
("gemini-", "Gemini"),
("glm-", "ZhipuAI"),
("minimax-", "MiniMax"),
("abab", "MiniMax"),
]
# Model prefixes that natively belong to OpenAI / LinkAI (raw HTTP providers).
_OPENAI_MODEL_PREFIXES = ("gpt-", "o1-", "o3-", "o4-", "chatgpt-")
@dataclass
class VisionProvider:
@@ -116,7 +142,7 @@ class Vision(BaseTool):
"Error: No model available for Vision.\n"
"The main model does not support vision and no other API keys are configured.\n"
"Options:\n"
" 1. Switch to a multimodal model (e.g. qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
" 1. Switch to a multimodal model (e.g. ernie-4.5-turbo-vl, qwen3.6-plus, claude-sonnet-4-6, gemini-2.0-flash)\n"
" 2. Configure OPENAI_API_KEY: env_config(action=\"set\", key=\"OPENAI_API_KEY\", value=\"your-key\")\n"
" 3. Configure LINKAI_API_KEY: env_config(action=\"set\", key=\"LINKAI_API_KEY\", value=\"your-key\")"
)
@@ -126,6 +152,9 @@ class Vision(BaseTool):
except Exception as e:
return ToolResult.fail(f"Error: {e}")
# Default model is only used as a last-resort placeholder for providers
# whose VisionProvider.model_override is None (e.g. raw OpenAI provider
# when the user did not configure tool.vision.model).
return self._call_with_fallback(providers, DEFAULT_MODEL, question, image_content)
def _call_with_fallback(self, providers: List[VisionProvider], model: str,
@@ -162,29 +191,55 @@ class Vision(BaseTool):
def _resolve_providers(self) -> List[VisionProvider]:
"""
Build an ordered list of available providers.
Build an ordered list of providers to try.
Priority:
- use_linkai=true → [LinkAI, MainModel, OtherModels…, OpenAI]
- default → [MainModel, OtherModels…, OpenAI, LinkAI]
Semantics of `tool.vision.model`:
"Prefer this model first; fall back to other configured providers
if it fails."
"OtherModels" are auto-discovered from configured API keys.
The main model's bot_type is excluded from OtherModels to avoid
duplicating the MainModel provider.
Order:
1. The provider that natively serves `tool.vision.model` (if any
and its API key is configured) — using the user-specified model
name verbatim.
2. Auto-discovery chain as fallback:
- use_linkai=true → [LinkAI, MainModel?, OtherModels…, OpenAI]
- default → [MainModel?, OtherModels…, OpenAI, LinkAI]
MainModel is only included when the main bot is known to support
vision (see _main_bot_supports_vision).
Providers that share the same display name as the preferred provider
are de-duplicated to avoid retrying the same endpoint twice.
"""
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
user_model = self._resolve_user_vision_model()
providers: List[VisionProvider] = []
# Step 1: preferred provider derived from tool.vision.model
if user_model:
preferred = self._route_by_model_name(user_model)
if preferred:
providers.extend(preferred)
# Step 2: auto-discovery chain as fallback
existing = {p.name for p in providers}
fallback: List[VisionProvider] = []
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
if use_linkai:
self._append_provider(providers, self._build_linkai_provider)
self._append_provider(providers, self._build_main_model_provider)
self._append_other_model_providers(providers)
self._append_provider(providers, self._build_openai_provider)
self._append_provider(fallback, lambda: self._build_linkai_provider(user_model))
self._append_provider(fallback, self._build_main_model_provider)
self._append_other_model_providers(fallback, preferred_model=user_model)
self._append_provider(fallback, lambda: self._build_openai_provider(user_model))
else:
self._append_provider(providers, self._build_main_model_provider)
self._append_other_model_providers(providers)
self._append_provider(providers, self._build_openai_provider)
self._append_provider(providers, self._build_linkai_provider)
self._append_provider(fallback, self._build_main_model_provider)
self._append_other_model_providers(fallback, preferred_model=user_model)
self._append_provider(fallback, lambda: self._build_openai_provider(user_model))
self._append_provider(fallback, lambda: self._build_linkai_provider(user_model))
for p in fallback:
if p.name in existing:
continue
providers.append(p)
existing.add(p.name)
return providers
@@ -194,29 +249,135 @@ class Vision(BaseTool):
if p:
providers.append(p)
def _append_other_model_providers(self, providers: List[VisionProvider]) -> None:
@staticmethod
def _resolve_user_vision_model() -> Optional[str]:
"""Read tool.vision.model from config; return None if unset/blank."""
tool_conf = conf().get("tool", {})
if not isinstance(tool_conf, dict):
return None
vision_conf = tool_conf.get("vision", {})
if not isinstance(vision_conf, dict):
return None
m = vision_conf.get("model")
if isinstance(m, str) and m.strip():
return m.strip()
return None
@staticmethod
def _infer_provider_from_model(model_name: str) -> Optional[str]:
"""
Infer the provider display name from a model name's prefix.
Returns None when no rule matches (or for OpenAI-family names, which
are handled separately by the caller).
"""
if not model_name:
return None
lower = model_name.lower()
# Sort by prefix length desc so e.g. "moonshot-" wins over hypothetical "moo-"
for prefix, display_name in sorted(_MODEL_PREFIX_TO_PROVIDER, key=lambda x: -len(x[0])):
if lower.startswith(prefix.lower()):
return display_name
return None
def _route_by_model_name(self, user_model: str) -> Optional[List[VisionProvider]]:
"""
Try to build a provider list using the user-specified model name.
Returns:
- [provider] : matched and the provider's key is configured
- [] : matched but key missing → tell caller to surface this
as a hard error rather than silently falling back
- None : no rule matches → caller should fall through to auto
"""
lower = user_model.lower()
# OpenAI / LinkAI family
if lower.startswith(_OPENAI_MODEL_PREFIXES):
providers: List[VisionProvider] = []
# Prefer LinkAI when explicitly enabled, else OpenAI first
use_linkai = conf().get("use_linkai", False) and conf().get("linkai_api_key")
if use_linkai:
self._append_provider(providers, lambda: self._build_linkai_provider(user_model))
self._append_provider(providers, lambda: self._build_openai_provider(user_model))
else:
self._append_provider(providers, lambda: self._build_openai_provider(user_model))
self._append_provider(providers, lambda: self._build_linkai_provider(user_model))
if providers:
return providers
logger.warning(f"[Vision] tool.vision.model='{user_model}' looks like an OpenAI "
f"model but neither OPENAI_API_KEY nor LINKAI_API_KEY is configured.")
return None # fall through to auto
# Discoverable native providers (Doubao, Moonshot, etc.)
target_display = self._infer_provider_from_model(user_model)
if not target_display:
return None # unknown prefix → auto
for config_key, bot_type, _default_model, display_name in _DISCOVERABLE_MODELS:
if display_name != target_display:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
logger.warning(f"[Vision] tool.vision.model='{user_model}' routes to "
f"'{display_name}' but '{config_key}' is not configured. "
f"Falling back to auto-discovery.")
return None # fall through to auto
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
if not hasattr(bot, 'call_vision'):
logger.warning(f"[Vision] '{display_name}' bot does not implement call_vision.")
return None
except Exception as e:
logger.warning(f"[Vision] Failed to create '{display_name}' bot: {e}")
return None
return [VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=user_model,
use_bot=True,
fallback_bot=bot,
)]
return None
def _append_other_model_providers(self, providers: List[VisionProvider],
preferred_model: Optional[str] = None) -> None:
"""
Auto-discover other models whose API key is configured.
Skip the main model's own bot_type (already covered by MainModel provider).
Skip bot_types that already have a provider in the list (e.g. OpenAI).
Skip the main model's own bot_type (already covered by MainModel
provider), unless the main model itself does not support vision —
in that case we still want the vendor's dedicated vision model
as a fallback. Also skip bot_types that already appear in the
provider list.
If preferred_model matches a provider's family, use it instead
of that provider's hard-coded default model.
"""
# Determine main model's bot_type so we can skip it
main_bot_type = None
main_bot_supports_vision = False
if self.model and hasattr(self.model, '_resolve_bot_type'):
main_bot_type = self.model._resolve_bot_type(conf().get("model", ""))
main_bot = getattr(self.model, "bot", None)
main_bot_supports_vision = self._main_bot_supports_vision(main_bot)
existing_names = {p.name for p in providers}
preferred_provider = self._infer_provider_from_model(preferred_model) if preferred_model else None
for config_key, bot_type, default_model, display_name in _DISCOVERABLE_MODELS:
if display_name in existing_names:
continue
if bot_type == main_bot_type:
# Same bot_type as the main model is normally handled by the
# MainModel provider; only skip it here if the main model
# actually supports vision. Otherwise fall through and add
# the vendor's dedicated vision model as a fallback.
if bot_type == main_bot_type and main_bot_supports_vision:
continue
api_key = conf().get(config_key, "")
if not api_key or not api_key.strip():
continue
# Create a bot instance and check if it supports call_vision
try:
from models.bot_factory import create_bot
bot = create_bot(bot_type)
@@ -225,62 +386,105 @@ class Vision(BaseTool):
except Exception:
continue
providers.append(VisionProvider(
model_for_provider = (preferred_model
if preferred_provider == display_name and preferred_model
else default_model)
provider = VisionProvider(
name=display_name,
api_key="",
api_base="",
model_override=default_model,
model_override=model_for_provider,
use_bot=True,
fallback_bot=bot,
))
)
def _resolve_vision_model(self) -> Optional[str]:
"""
Determine which model to use for vision.
# Same vendor as the main bot is the most natural fallback when
# the main model itself does not support vision — promote it to
# the front of the list instead of relying on declaration order.
if bot_type == main_bot_type:
providers.insert(0, provider)
else:
providers.append(provider)
1. User explicit config: tool.vision.model in config.json
2. Fallback to the main configured model name
def _main_bot_supports_vision(self, bot) -> bool:
"""
tool_conf = conf().get("tool", {})
user_vision_model = tool_conf.get("vision", {}).get("model") if isinstance(tool_conf, dict) else None
if user_vision_model:
return user_vision_model
model_name = conf().get("model", "")
return model_name or None
Whether the main bot is known to natively support vision.
Having a `call_vision` method is necessary but not sufficient —
some bots implement the method against an endpoint that does not
actually serve vision models, which causes silent failures when a
vendor-foreign model name is forwarded.
Resolution order:
1. If the bot explicitly declares `supports_vision`, trust it.
This lets bots opt in or out based on their own runtime
configuration (e.g. the currently selected model).
2. Otherwise, fall back to a model-name prefix heuristic: trust
call_vision when the main model looks like an OpenAI family
model or matches a known multimodal vendor prefix.
"""
if bot is None:
return False
if hasattr(bot, "supports_vision"):
return bool(getattr(bot, "supports_vision"))
main_model = (conf().get("model") or "").lower()
if not main_model:
return False
if main_model.startswith(_OPENAI_MODEL_PREFIXES):
return True
return self._infer_provider_from_model(main_model) is not None
def _build_main_model_provider(self) -> Optional[VisionProvider]:
"""
Use the vendor's own model for vision via bot.call_vision.
Only available when the bot class has call_vision.
Gated by _main_bot_supports_vision so non-vision bots (DeepSeek, etc.)
do not get routed vendor-foreign model names.
"""
if not (self.model and hasattr(self.model, 'bot')):
return None
try:
bot = self.model.bot
if not hasattr(bot, 'call_vision'):
return None
except Exception:
return None
if not hasattr(bot, 'call_vision'):
return None
if not self._main_bot_supports_vision(bot):
return None
vision_model = self._resolve_vision_model()
# Use the configured main model name; do NOT inject tool.vision.model
# here, because by the time we reach this branch the tool.vision.model
# routing has already been attempted (and either matched the main bot
# or failed to find a provider).
main_model_name = conf().get("model") or None
return VisionProvider(
name=_MAIN_MODEL_PROVIDER_NAME,
api_key="",
api_base="",
model_override=vision_model,
model_override=main_model_name,
use_bot=True,
)
def _build_openai_provider(self) -> Optional[VisionProvider]:
def _build_openai_provider(self, preferred_model: Optional[str] = None) -> Optional[VisionProvider]:
api_key = conf().get("open_ai_api_key") or os.environ.get("OPENAI_API_KEY")
if not api_key:
return None
api_base = (conf().get("open_ai_api_base") or os.environ.get("OPENAI_API_BASE", "")).rstrip("/") \
or "https://api.openai.com/v1"
return VisionProvider(name="OpenAI", api_key=api_key, api_base=self._ensure_v1(api_base))
# Only honor preferred_model when it looks like an OpenAI-family name;
# otherwise the OpenAI endpoint would 400 on a vendor-specific name.
model_override = preferred_model if (
preferred_model and preferred_model.lower().startswith(_OPENAI_MODEL_PREFIXES)
) else None
return VisionProvider(
name="OpenAI",
api_key=api_key,
api_base=self._ensure_v1(api_base),
model_override=model_override,
)
def _build_linkai_provider(self) -> Optional[VisionProvider]:
def _build_linkai_provider(self, preferred_model: Optional[str] = None) -> Optional[VisionProvider]:
api_key = conf().get("linkai_api_key") or os.environ.get("LINKAI_API_KEY")
if not api_key:
return None
@@ -290,8 +494,15 @@ class Vision(BaseTool):
extra = get_cloud_headers(api_key)
extra.pop("Authorization", None)
extra.pop("Content-Type", None)
return VisionProvider(name="LinkAI", api_key=api_key, api_base=self._ensure_v1(api_base),
extra_headers=extra)
# LinkAI is a multi-vendor proxy and accepts most model names, so we
# honor any user-configured model name here.
return VisionProvider(
name="LinkAI",
api_key=api_key,
api_base=self._ensure_v1(api_base),
extra_headers=extra,
model_override=preferred_model,
)
def _call_via_bot(self, model: str, question: str, image_content: dict,
provider: Optional[VisionProvider] = None) -> ToolResult:

36
app.py
View File

@@ -274,6 +274,39 @@ def sigterm_handler_wrap(_signo):
signal.signal(_signo, func)
def _sync_builtin_skills():
"""Sync builtin skills from project skills/ to workspace skills/ on startup."""
import shutil
try:
workspace = conf().get("agent_workspace", "~/cow")
workspace = os.path.expanduser(workspace)
project_root = os.path.dirname(os.path.abspath(__file__))
builtin_dir = os.path.join(project_root, "skills")
custom_dir = os.path.join(workspace, "skills")
if not os.path.isdir(builtin_dir):
return
os.makedirs(custom_dir, exist_ok=True)
synced = 0
for name in os.listdir(builtin_dir):
src = os.path.join(builtin_dir, name)
if not os.path.isdir(src) or not os.path.isfile(os.path.join(src, "SKILL.md")):
continue
dst = os.path.join(custom_dir, name)
try:
if os.path.isdir(dst):
shutil.rmtree(dst)
shutil.copytree(src, dst)
synced += 1
except Exception as e:
logger.warning(f"[App] Failed to sync builtin skill '{name}': {e}")
if synced:
logger.info(f"[App] Synced {synced} builtin skill(s) to workspace")
except Exception as e:
logger.warning(f"[App] Builtin skills sync failed: {e}")
def run():
global _channel_mgr
try:
@@ -299,6 +332,9 @@ def run():
if web_console_enabled and "web" not in channel_names:
channel_names.append("web")
# Sync builtin skills to workspace before channels start
_sync_builtin_skills()
logger.info(f"[App] Starting channels: {channel_names}")
_channel_mgr = ChannelManager()

View File

@@ -14,6 +14,7 @@ from bridge.reply import Reply, ReplyType
from common import const
from common.log import logger
from common.utils import expand_path
from config import conf
from models.openai_compatible_bot import OpenAICompatibleBot
@@ -68,6 +69,7 @@ class AgentLLMModel(LLMModel):
_MODEL_BOT_TYPE_MAP = {
"wenxin": const.BAIDU, "wenxin-4": const.BAIDU,
"xunfei": const.XUNFEI, const.QWEN: const.QWEN_DASHSCOPE,
const.QIANFAN: const.QIANFAN,
const.MODELSCOPE: const.MODELSCOPE,
}
_MODEL_PREFIX_MAP = [
@@ -75,10 +77,10 @@ class AgentLLMModel(LLMModel):
("gemini", const.GEMINI), ("glm", const.ZHIPU_AI), ("claude", const.CLAUDEAPI),
("moonshot", const.MOONSHOT), ("kimi", const.MOONSHOT),
("doubao", const.DOUBAO), ("deepseek", const.DEEPSEEK),
("ernie", const.QIANFAN),
]
def __init__(self, bridge: Bridge, bot_type: str = "chat"):
from config import conf
super().__init__(model=conf().get("model", const.GPT_41))
self.bridge = bridge
self.bot_type = bot_type
@@ -87,7 +89,6 @@ class AgentLLMModel(LLMModel):
@property
def model(self):
from config import conf
return conf().get("model", const.GPT_41)
@model.setter
@@ -96,8 +97,6 @@ class AgentLLMModel(LLMModel):
def _resolve_bot_type(self, model_name: str) -> str:
"""Resolve bot type from model name, matching Bridge.__init__ logic."""
from config import conf
if conf().get("use_linkai", False) and conf().get("linkai_api_key"):
return const.LINKAI
# Support custom bot type configuration
@@ -117,8 +116,9 @@ class AgentLLMModel(LLMModel):
return const.MOONSHOT
if conf().get("bot_type") == "modelscope":
return const.MODELSCOPE
lowered_model = model_name.lower()
for prefix, btype in self._MODEL_PREFIX_MAP:
if model_name.startswith(prefix):
if lowered_model.startswith(prefix):
return btype
return const.OPENAI
@@ -167,13 +167,15 @@ class AgentLLMModel(LLMModel):
if session_id:
kwargs['session_id'] = session_id
# Determine thinking: respect global config, then channel_type
# Thinking mode is a global toggle independent of the channel.
# IM channels (WeChat/WeCom/DingTalk/Feishu) won't render the
# reasoning trace, but still benefit from the higher answer
# quality the thinking pass produces.
from config import conf
global_thinking = conf().get("enable_thinking", True)
if not global_thinking:
kwargs['thinking'] = {"type": "disabled"}
else:
kwargs['thinking'] = {"type": "enabled"} if channel_type == "web" else {"type": "disabled"}
kwargs['thinking'] = (
{"type": "enabled"} if conf().get("enable_thinking", False)
else {"type": "disabled"}
)
response = self.bot.call_with_tools(**kwargs)
return self._format_response(response)
@@ -220,13 +222,15 @@ class AgentLLMModel(LLMModel):
if session_id:
kwargs['session_id'] = session_id
# Determine thinking: respect global config, then channel_type
# Thinking mode is a global toggle independent of the channel.
# IM channels (WeChat/WeCom/DingTalk/Feishu) won't render the
# reasoning trace, but still benefit from the higher answer
# quality the thinking pass produces.
from config import conf
global_thinking = conf().get("enable_thinking", True)
if not global_thinking:
kwargs['thinking'] = {"type": "disabled"}
else:
kwargs['thinking'] = {"type": "enabled"} if channel_type == "web" else {"type": "disabled"}
kwargs['thinking'] = (
{"type": "enabled"} if conf().get("enable_thinking", False)
else {"type": "disabled"}
)
stream = self.bot.call_with_tools(**kwargs)
@@ -414,6 +418,18 @@ class AgentBridge:
# Store session_id on agent so executor can clear DB on fatal errors
agent._current_session_id = session_id
# Bound the in-memory context for scheduler sessions before each run.
# Scheduler sessions are stable per-task and append every trigger,
# so without trimming they would grow unbounded across runs and
# blow up prompt cost. Regular user chats are not touched here —
# the agent's own context manager handles that path.
if session_id and session_id.startswith("scheduler_"):
from config import conf
scheduler_keep_turns = max(
1, int(conf().get("agent_max_context_turns", 20)) // 5
)
self._trim_in_memory_to_turns(agent, scheduler_keep_turns)
try:
# Use agent's run_stream method with event handler
response = agent.run_stream(
@@ -446,7 +462,7 @@ class AgentBridge:
except Exception as e:
logger.warning(f"[AgentBridge] Failed to clear DB after recovery: {e}")
# Check if there are files to send (from read tool)
# Check if there are files to send (from send/read tool)
if hasattr(agent, 'stream_executor') and hasattr(agent.stream_executor, 'files_to_send'):
files_to_send = agent.stream_executor.files_to_send
if files_to_send:
@@ -608,18 +624,245 @@ class AgentBridge:
from config import conf
if not conf().get("conversation_persistence", True):
return
# When deep-thinking display is disabled, strip "thinking" content
# blocks before persisting so they don't resurface on history reload.
# The in-memory message list keeps them intact for this run's
# multi-turn LLM context.
thinking_enabled = bool(conf().get("enable_thinking", False))
except Exception:
pass
thinking_enabled = False
messages_to_store = new_messages
if not thinking_enabled:
messages_to_store = self._strip_thinking_blocks(new_messages)
try:
from agent.memory import get_conversation_store
get_conversation_store().append_messages(
session_id, new_messages, channel_type=channel_type
session_id, messages_to_store, channel_type=channel_type
)
except Exception as e:
logger.warning(
f"[AgentBridge] Failed to persist messages for session={session_id}: {e}"
)
# Marker used to identify scheduler-injected user messages so we can apply
# a sliding window without touching real user turns. The legacy prefix
# "Scheduled task" (written by the v2 PR) is also recognised when pruning,
# so old data can be aged out instead of leaking forever.
_SCHEDULED_MARKER = "[SCHEDULED]"
_SCHEDULED_LEGACY_MARKERS = ("Scheduled task",)
def remember_scheduled_output(
self,
session_id: str,
content: str,
channel_type: str = "",
task_description: str = "",
) -> None:
"""Add the visible output of a scheduled task to the receiver's session.
Scheduled task execution uses an isolated session so internal planning and
tool calls do not leak into the user's chat. The final message is still
part of the conversation from the user's point of view, so keep a small
visible turn in the receiver session for follow-up questions.
Configuration:
scheduler_inject_to_session (bool, default True):
Master switch. When False, this method is a no-op.
scheduler_inject_max_per_session (int, default 3):
Maximum scheduler-injected user/assistant pairs retained per
session. Older injections are pruned automatically.
Content is truncated to 2000 chars to prevent a single high-volume task
from bloating one entry.
"""
from config import conf
if not conf().get("scheduler_inject_to_session", True):
return
if not session_id or not content:
return
max_len = 2000
if len(content) > max_len:
content = content[:max_len] + "..."
user_text = self._SCHEDULED_MARKER
if task_description:
user_text = f"{self._SCHEDULED_MARKER} {task_description}"
messages = [
{"role": "user", "content": [{"type": "text", "text": user_text}]},
{"role": "assistant", "content": [{"type": "text", "text": content}]},
]
# Persist first so the new pair gets a stable seq, then prune old
# scheduler pairs in DB, then sync the in-memory agent.messages buffer.
self._persist_messages(session_id, messages, channel_type)
keep_last_n = max(int(conf().get("scheduler_inject_max_per_session", 3) or 0), 0)
try:
from agent.memory import get_conversation_store
deleted = get_conversation_store().prune_scheduled_messages(
session_id, keep_last_n=keep_last_n
)
if deleted:
logger.debug(
f"[AgentBridge] Pruned {deleted} old scheduler messages "
f"for session={session_id} (keep_last_n={keep_last_n})"
)
except Exception as e:
logger.warning(
f"[AgentBridge] Failed to prune scheduled messages "
f"for session={session_id}: {e}"
)
agent = self.agents.get(session_id)
if agent:
try:
with agent.messages_lock:
agent.messages.extend(messages)
self._prune_scheduled_in_memory(agent, keep_last_n)
except Exception as e:
logger.warning(
f"[AgentBridge] Failed to update in-memory scheduled output "
f"for session={session_id}: {e}"
)
@staticmethod
def _trim_in_memory_to_turns(agent, keep_turns: int) -> None:
"""Bound ``agent.messages`` to the most recent ``keep_turns`` real
user/assistant turns, dropping older history together with any
intermediate tool_use/tool_result blocks that belonged to it.
A "real" user message is any user message whose content is not solely a
tool_result block — matches the heuristic used elsewhere when filtering
history (see ``AgentInitializer._filter_text_only_messages``).
No-op when the session is already within budget. Caller does not need
to hold the lock; this method acquires it itself.
"""
if keep_turns <= 0:
return
def _is_real_user(msg) -> bool:
if not isinstance(msg, dict) or msg.get("role") != "user":
return False
content = msg.get("content")
if isinstance(content, list):
if any(
isinstance(b, dict) and b.get("type") == "tool_result"
for b in content
):
return False
return any(
isinstance(b, dict) and b.get("type") == "text" and b.get("text")
for b in content
)
if isinstance(content, str):
return bool(content.strip())
return False
with agent.messages_lock:
msgs = agent.messages
real_user_indices = [i for i, m in enumerate(msgs) if _is_real_user(m)]
if len(real_user_indices) <= keep_turns:
return
# Cut at the (k-th from the end) real user message; keep everything
# from there onwards so the surviving slice is still a valid
# user/assistant sequence.
cut_idx = real_user_indices[-keep_turns]
if cut_idx == 0:
return
kept = msgs[cut_idx:]
msgs.clear()
msgs.extend(kept)
logger.debug(
f"[AgentBridge] Trimmed in-memory messages to last "
f"{keep_turns} turns ({len(kept)} messages remain)"
)
@classmethod
def _prune_scheduled_in_memory(cls, agent, keep_last_n: int) -> None:
"""Mirror conversation_store.prune_scheduled_messages on agent.messages.
Caller must hold ``agent.messages_lock``.
"""
if keep_last_n < 0:
keep_last_n = 0
markers = (cls._SCHEDULED_MARKER,) + cls._SCHEDULED_LEGACY_MARKERS
def _is_marker_user(msg) -> bool:
if not isinstance(msg, dict) or msg.get("role") != "user":
return False
content = msg.get("content")
text = ""
if isinstance(content, str):
text = content
elif isinstance(content, list):
for block in content:
if isinstance(block, dict) and block.get("type") == "text":
text = block.get("text", "")
break
return any(text.startswith(m) for m in markers)
msgs = agent.messages
pair_indices = [] # list of (user_idx, assistant_idx_or_None)
for idx, msg in enumerate(msgs):
if not _is_marker_user(msg):
continue
assistant_idx = None
if idx + 1 < len(msgs):
nxt = msgs[idx + 1]
if isinstance(nxt, dict) and nxt.get("role") == "assistant":
assistant_idx = idx + 1
pair_indices.append((idx, assistant_idx))
if len(pair_indices) <= keep_last_n:
return
to_drop = pair_indices[: len(pair_indices) - keep_last_n]
drop_set = set()
for u_idx, a_idx in to_drop:
drop_set.add(u_idx)
if a_idx is not None:
drop_set.add(a_idx)
# Rebuild the list in place to keep external references stable.
kept = [m for i, m in enumerate(msgs) if i not in drop_set]
msgs.clear()
msgs.extend(kept)
@staticmethod
def _strip_thinking_blocks(messages: list) -> list:
"""Return a shallow copy of messages with assistant "thinking" blocks removed."""
cleaned = []
for msg in messages:
if not isinstance(msg, dict):
cleaned.append(msg)
continue
if msg.get("role") != "assistant":
cleaned.append(msg)
continue
content = msg.get("content")
if not isinstance(content, list):
cleaned.append(msg)
continue
filtered_blocks = [
b for b in content
if not (isinstance(b, dict) and b.get("type") == "thinking")
]
if len(filtered_blocks) == len(content):
cleaned.append(msg)
else:
new_msg = dict(msg)
new_msg["content"] = filtered_blocks
cleaned.append(new_msg)
return cleaned
def clear_session(self, session_id: str):
"""
Clear a specific session's agent and conversation history
@@ -705,4 +948,4 @@ class AgentBridge:
agent.tools = [t for t in agent.tools if t.name != "web_search"]
logger.info("[AgentBridge] web_search tool removed (API key no longer available)")
except Exception as e:
logger.debug(f"[AgentBridge] Failed to refresh conditional tools: {e}")
logger.debug(f"[AgentBridge] Failed to refresh conditional tools: {e}")

View File

@@ -144,7 +144,15 @@ class AgentInitializer:
from agent.memory import get_conversation_store
store = get_conversation_store()
max_turns = conf().get("agent_max_context_turns", 20)
restore_turns = max(3, max_turns // 6)
# Scheduler tasks run on a stable isolated session per task and
# can fire many times a day; a smaller restore window keeps prompt
# cost bounded while still letting the agent see "last few" runs
# for trend / dedup style logic. Regular chat sessions keep the
# original heuristic so user dialogues feel continuous.
if session_id.startswith("scheduler_"):
restore_turns = max(1, max_turns // 5)
else:
restore_turns = max(3, max_turns // 6)
saved = store.load_messages(session_id, max_turns=restore_turns)
if saved:
filtered = self._filter_text_only_messages(saved)
@@ -548,17 +556,23 @@ class AgentInitializer:
import threading
def _daily_flush_loop():
import random
last_run_date = None # Track last successful run date to prevent same-day re-trigger
while True:
try:
now = datetime.datetime.now()
target = now.replace(hour=23, minute=55, second=0, microsecond=0)
if target <= now:
jitter_min = random.randint(50, 55)
jitter_sec = random.randint(0, 59)
target = now.replace(hour=23, minute=jitter_min, second=jitter_sec, microsecond=0)
# Always schedule for tomorrow if we already ran today, or if target time has passed
if target <= now or (last_run_date == now.date()):
target += datetime.timedelta(days=1)
wait_seconds = (target - now).total_seconds()
logger.info(f"[DailyFlush] Next flush at {target.strftime('%Y-%m-%d %H:%M')} (in {wait_seconds/3600:.1f}h)")
logger.info(f"[DailyFlush] Next flush at {target.strftime('%Y-%m-%d %H:%M:%S')} (in {wait_seconds/3600:.1f}h)")
time.sleep(wait_seconds)
self._flush_all_agents()
last_run_date = datetime.datetime.now().date()
except Exception as e:
logger.warning(f"[DailyFlush] Error in daily flush loop: {e}")
time.sleep(3600)

View File

@@ -61,6 +61,11 @@ class Bridge(object):
if model_type and model_type.startswith("deepseek"):
self.btype["chat"] = const.DEEPSEEK
if model_type and isinstance(model_type, str):
lowered_model_type = model_type.lower()
if lowered_model_type == const.QIANFAN or lowered_model_type.startswith("ernie"):
self.btype["chat"] = const.QIANFAN
if model_type in [const.MODELSCOPE]:
self.btype["chat"] = const.MODELSCOPE

View File

@@ -297,8 +297,12 @@ class ChatChannel(Channel):
logger.debug("[chat_channel] sending reply: {}, context: {}".format(reply, context))
# 如果是文本回复,尝试提取并发送图片
if reply.type == ReplyType.TEXT:
# Web channel renders images/videos inline via renderMarkdown,
# so skip the extract-and-send step to avoid duplicate media.
if reply.type == ReplyType.TEXT and context.get("channel_type") != "web":
self._extract_and_send_images(reply, context)
elif reply.type == ReplyType.TEXT:
self._send(reply, context)
# 如果是图片回复但带有文本内容,先发文本再发图片
elif reply.type == ReplyType.IMAGE_URL and hasattr(reply, 'text_content') and reply.text_content:
# 先发送文本

View File

@@ -55,12 +55,186 @@ def _ensure_lark_imported():
return lark
def _print_qr_to_terminal(qr_url: str):
"""Render a QR code as ASCII art and emit it via logger.
走 logger 而非 print 是为了避免 nohup/cow 后台启动场景下 stdout 块缓冲导致
二维码滞后输出看起来像出现了两次。logger 的 StreamHandler 是行缓冲,
既能在前台终端看到,也能进 run.log。
"""
qr_lines = []
try:
import qrcode as qr_lib
import io
qr = qr_lib.QRCode(error_correction=qr_lib.constants.ERROR_CORRECT_L, box_size=1, border=1)
qr.add_data(qr_url)
qr.make(fit=True)
buf = io.StringIO()
qr.print_ascii(out=buf, invert=True)
qr_lines = buf.getvalue().splitlines()
except ImportError:
qr_lines = ["(未安装 qrcode 包,无法渲染 ASCII 二维码pip install qrcode)"]
except Exception as e:
qr_lines = [f"(渲染二维码失败:{e})"]
header = "=" * 60
banner = [
"",
header,
" 飞书一键创建应用:请使用 飞书 App 扫描下方二维码",
" (二维码 10 分钟内有效,仅供一次扫描)",
header,
]
footer = [
f" 或点击链接创建: {qr_url}",
" 等待扫码...",
"",
]
full = banner + qr_lines + footer
logger.info("[FeiShu] One-click 飞书应用创建二维码(请用飞书 App 扫码):\n" + "\n".join(full))
def _persist_feishu_credentials(app_id: str, app_secret: str) -> bool:
"""Write feishu_app_id / feishu_app_secret + ensure feishu in channel_type into config.json.
Returns True on success, False on failure (e.g. config.json missing or unwritable).
"""
try:
config_path = os.path.join(
os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))),
"config.json",
)
if os.path.exists(config_path):
with open(config_path, "r", encoding="utf-8") as f:
file_cfg = json.load(f)
else:
file_cfg = {}
file_cfg["feishu_app_id"] = app_id
file_cfg["feishu_app_secret"] = app_secret
# 保证 channel_type 中包含 feishu用户可能纯通过 CLI 启动单通道)
ch_type = file_cfg.get("channel_type", conf().get("channel_type", "")) or ""
existing = [s.strip() for s in ch_type.split(",") if s.strip()]
if "feishu" not in existing:
existing.append("feishu")
file_cfg["channel_type"] = ",".join(existing)
with open(config_path, "w", encoding="utf-8") as f:
json.dump(file_cfg, f, indent=4, ensure_ascii=False)
# 同步到内存中的 conf(),让本次启动直接生效
conf()["feishu_app_id"] = app_id
conf()["feishu_app_secret"] = app_secret
if "channel_type" in file_cfg:
conf()["channel_type"] = file_cfg["channel_type"]
try:
os.chmod(config_path, 0o600)
except Exception:
pass
return True
except Exception as e:
logger.error(f"[FeiShu] Failed to persist credentials to config.json: {e}")
return False
def _register_via_qr_in_terminal() -> bool:
"""CLI-side one-click app creation via lark_oapi.register_app.
Blocks the calling thread (typically the channel startup thread) until the user
finishes scanning, the QR code expires, or registration is cancelled.
Returns True if credentials were obtained AND persisted; False otherwise.
The caller should fall back to the original "missing credentials" error in that case.
"""
if not LARK_SDK_AVAILABLE:
logger.error(
"[FeiShu] 缺少 feishu_app_id / feishu_app_secret。"
"未安装 lark-oapi SDK无法在终端发起扫码创建。"
"请执行 pip install -U 'lark-oapi>=1.5.5' 后重试,或手动在 config.json 中填入凭据。"
)
return False
try:
lark_mod = _ensure_lark_imported()
except Exception as e:
logger.error(f"[FeiShu] Import lark_oapi failed: {e}")
return False
# register_app 是 lark-oapi 1.5.5 才引入的能力,旧版本调用会得到难以理解的
# AttributeError。提前显式检查给出明确的升级提示。
if not hasattr(lark_mod, "register_app"):
try:
from importlib.metadata import version as _pkg_version
installed = _pkg_version("lark-oapi")
except Exception:
installed = "unknown"
logger.error(
f"[FeiShu] 当前 lark-oapi 版本 ({installed}) 不支持一键创建应用,需要 >= 1.5.5。"
"请执行 pip install -U 'lark-oapi>=1.5.5' 后重试,或手动在 config.json 中填入凭据。"
)
return False
logger.info("[FeiShu] 检测到尚未配置 feishu_app_id / feishu_app_secret"
"正在向飞书申请一键创建应用...")
def _on_qr(info):
url = info.get("url", "")
if url:
_print_qr_to_terminal(url)
def _on_status(info):
# 过滤 polling 心跳(每 5 秒一次),保留 slow_down / domain_switched 等
status = info.get("status")
if status == "polling":
return
logger.info(f"[FeiShu] register_app status: {info}")
try:
result = lark_mod.register_app(
on_qr_code=_on_qr,
on_status_change=_on_status,
source="cowagent",
)
except Exception as e:
err_cls = e.__class__.__name__
if "Expired" in err_cls:
logger.error("[FeiShu] 二维码已过期,请重启程序后重试。")
elif "Denied" in err_cls:
logger.error("[FeiShu] 已取消授权。")
else:
logger.error(f"[FeiShu] 一键创建失败:{e}")
return False
app_id = result.get("client_id", "")
app_secret = result.get("client_secret", "")
if not app_id or not app_secret:
logger.error("[FeiShu] 创建结果缺少 app_id/app_secret无法继续。")
return False
if not _persist_feishu_credentials(app_id, app_secret):
logger.error(
"[FeiShu] 应用创建成功但写入 config.json 失败,请手动复制以下值到配置文件:\n"
f" feishu_app_id = {app_id}\n"
f" feishu_app_secret = {app_secret}"
)
return False
logger.info(f"[FeiShu] 应用创建成功,凭据已写入 config.json (app_id={app_id})。")
return True
@singleton
class FeiShuChanel(ChatChannel):
feishu_app_id = conf().get('feishu_app_id')
feishu_app_secret = conf().get('feishu_app_secret')
feishu_token = conf().get('feishu_token')
feishu_event_mode = conf().get('feishu_event_mode', 'websocket') # webhook 或 websocket
# 覆盖父类默认值 [ReplyType.VOICE, ReplyType.IMAGE]。
# 飞书原生支持发送音频opus 格式,通过文件上传接口)和图片,
# 所有回复类型均已处理,置为空列表以启用语音和图片回复。
NOT_SUPPORT_REPLYTYPE = []
def __init__(self):
super().__init__()
@@ -86,6 +260,20 @@ class FeiShuChanel(ChatChannel):
self.feishu_app_secret = conf().get('feishu_app_secret')
self.feishu_token = conf().get('feishu_token')
self.feishu_event_mode = conf().get('feishu_event_mode', 'websocket')
# 命令行启动场景:缺少凭据时尝试通过 lark.register_app 在终端弹二维码
# 引导用户扫码创建应用。Web 控制台启动同样会走到这里,但控制台用户通常
# 已经通过 /api/feishu/register 完成了创建并写回 config.json。
if not self.feishu_app_id or not self.feishu_app_secret:
if _register_via_qr_in_terminal():
self.feishu_app_id = conf().get('feishu_app_id')
self.feishu_app_secret = conf().get('feishu_app_secret')
else:
err = "[FeiShu] feishu_app_id 与 feishu_app_secret 缺失,无法启动通道"
logger.error(err)
self.report_startup_error(err)
return
self._fetch_bot_open_id()
if self.feishu_event_mode == 'websocket':
self._startup_websocket()
@@ -384,10 +572,22 @@ class FeiShuChanel(ChatChannel):
no_need_at=True
)
if context:
# 流式回复模式:向 context 注入 on_event 回调agent 每产出一段文字时会调用它。
# 回调内部先发送一条占位消息获取 message_id之后通过 PATCH 接口原地更新内容,
# 实现打字机效果。回调结束时设置 context["feishu_streamed"]=True
# 让 send() 跳过重复发送,避免最终完整回复再被重复投递一次。
# 默认开启流式打字机回复。需机器人开通 cardkit:card:write 权限且飞书客户端 7.20+
# 任意环节失败会自动降级为非流式文本回复。
if conf().get("feishu_stream_reply", True):
context["on_event"] = self._make_feishu_stream_callback(context, feishu_msg.access_token)
self.produce(context)
logger.debug(f"[FeiShu] query={feishu_msg.content}, type={feishu_msg.ctype}")
def send(self, reply: Reply, context: Context):
# 如果文本回复已通过流式传输发送,则跳过重复发送
if reply.type == ReplyType.TEXT and context.get("feishu_streamed"):
logger.debug("[FeiShu] streaming already delivered text reply, skipping send()")
return
msg = context.get("msg")
is_group = context["isgroup"]
if msg:
@@ -450,6 +650,16 @@ class FeiShuChanel(ChatChannel):
msg_type = "file"
content_key = "file_key"
elif reply.type == ReplyType.VOICE:
# 语音回复:上传音频文件到飞书,然后发送 audio 类型消息
file_key = self._upload_audio(reply.content, access_token)
if not file_key:
logger.warning("[FeiShu] upload audio failed")
return
reply_content = file_key
msg_type = "audio"
content_key = "file_key"
# Check if we can reply to an existing message (need msg_id)
can_reply = is_group and msg and hasattr(msg, 'msg_id') and msg.msg_id
@@ -481,6 +691,396 @@ class FeiShuChanel(ChatChannel):
else:
logger.error(f"[FeiShu] send message failed, code={res.get('code')}, msg={res.get('msg')}")
def _make_feishu_stream_callback(self, context, access_token):
"""
基于飞书官方"流式更新卡片"API 实现打字机回复。
流程:
1. message_update 首次到达 → POST /cardkit/v1/cards 创建带 streaming_mode 的卡片实体,
随后用 POST /im/v1/messages或 reply以 card_id 把卡片发出去
2. 后续 message_update → PUT /cardkit/v1/cards/{id}/elements/{eid}/content
传入"当前轮"的全量文本,飞书平台自动计算增量并以打字机效果上屏
(流式模式下不受 10 QPS 限制)
3. message_end一轮 LLM 输出结束,且本轮触发了工具调用)→ 把 current 累计到 committed
并加入分隔符;下一轮 message_update 又从空白开始,避免多轮内容串到一起
4. agent_end → 用 final_response 强制覆盖卡片,再 PATCH /cardkit/v1/cards/{id}/settings
关闭 streaming_mode标记 context["feishu_streamed"]=True 让 chat_channel 跳过普通 send()
前提条件:
- 机器人已开通 cardkit:card:write 权限
- 飞书客户端 7.20+
失败降级:
- 创建卡片实体失败(缺权限、网络等)→ 不设置 feishu_streamed 标记,让 chat_channel
走普通文本回复路径,用户收到完整回复但无打字机效果,并打 warning 日志
"""
# 共享状态(受 lock 保护)
# 多轮 agent 模式下,每个"中间过场消息"会作为一张独立卡片发送。
# current_text 只承载当前正在流式渲染的那张卡片的内容message_end / agent_end
# 时会把它定型并 reset。
current_text = [""] # 当前卡片正在累加的 LLM 输出
card_id = [None] # 当前流式卡片的实体 ID每段独立
message_id = [None] # 当前卡片发送后的消息 ID仅日志用
# 占位发送是同步进行的,但用一个 in-flight 标记防止并发的多条 message_update
# 事件各自触发一次创建+发送,导致发出多张卡片。
init_in_flight = [False]
# 一旦初始化失败就长期标记为 disabled本次回复不再尝试任何流式调用
disabled = [False]
lock = threading.Lock()
# ---- 异步推送队列 ----------------------------------------------------
# 同步 requests.put 单次 100~300ms会阻塞 LLM stream 线程读下一个 chunk。
# 把推送丢给独立 worker 线程消费 queue回调本身只做内存追加立即返回。
# 队列里只放"最新累积文本"的快照worker 用 deduplication 避免重复推同一个
# 内容(高频 chunk 场景下队列会堆积,只推最后一个就够了)。
import queue as _queue
push_queue: "_queue.Queue[str | None]" = _queue.Queue()
def _push_worker():
while True:
snapshot = push_queue.get()
if snapshot is None:
push_queue.task_done()
return
# 合并队列中已堆积的快照:只推最后一个,省 PUT 次数同时降低延迟
merged_count = 1
stop = False
while True:
try:
nxt = push_queue.get_nowait()
except _queue.Empty:
break
merged_count += 1
if nxt is None:
stop = True
break
snapshot = nxt
try:
_stream_update_text(snapshot)
finally:
for _ in range(merged_count):
push_queue.task_done()
if stop:
return
push_thread = threading.Thread(target=_push_worker, daemon=True, name="feishu-stream-push")
push_thread.start()
def _drain_push_queue():
"""等当前队列里所有 PUT 都完成。message_end/agent_end 在做最终定型前必须 drain
否则 worker 里堆积的旧快照可能在 final_text PUT 之后到达,把最终内容覆盖掉。"""
try:
push_queue.join()
except Exception:
pass
msg = context.get("msg")
is_group = context.get("isgroup", False)
receiver = context.get("receiver")
receive_id_type = context.get("receive_id_type", "open_id")
# 客户端打字机渲染参数(飞书 App 侧实际"出字"速度):
# - print_freq_ms每次刷新的间隔
# - print_step每次刷新出多少个字符
# 当前 40ms × 4 字 ≈ 100 字/秒,接近 ChatGPT/DeepSeek 网页端的节奏。
print_freq_ms = 40
print_step = 4
print_strategy = "fast"
headers = {
"Authorization": "Bearer " + access_token,
"Content-Type": "application/json; charset=utf-8",
}
# 卡片中富文本组件的 element_id后续所有 PUT 流式更新都打到这个组件
ELEMENT_ID = "stream_md"
# 操作序号,每次 PUT 必须严格递增(飞书要求)
sequence = [0]
def _next_sequence():
sequence[0] += 1
return sequence[0]
def _build_card_json():
"""卡片 JSON 2.0 结构 + streaming_mode + 单 markdown 组件"""
return json.dumps({
"schema": "2.0",
"config": {
"streaming_mode": True,
"summary": {"content": "[正在生成回复...]"},
"streaming_config": {
"print_frequency_ms": {"default": print_freq_ms},
"print_step": {"default": print_step},
"print_strategy": print_strategy,
},
},
"body": {
"elements": [
{
"tag": "markdown",
"content": "...",
"element_id": ELEMENT_ID,
}
],
},
# 注意JSON 2.0 不支持自定义 fallback 字段(传入会报错)。
# 客户端 < 7.20 时,飞书会自动展示"请升级客户端"占位,无需配置。
}, ensure_ascii=False)
def _create_and_send_card():
"""同步执行:创建卡片实体 → 发送消息。任意一步失败则 disabled=True 触发降级"""
try:
# 步骤 1: 创建卡片实体
create_url = "https://open.feishu.cn/open-apis/cardkit/v1/cards"
create_body = {"type": "card_json", "data": _build_card_json()}
res = requests.post(
create_url, headers=headers, json=create_body, timeout=(5, 10)
)
res_json = res.json()
if res_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: create card failed "
f"(code={res_json.get('code')}, msg={res_json.get('msg')}). "
f"本次回复已自动降级为普通文本回复(一次性返回完整内容)。"
f"如需开启流式打字机效果与完整 Markdown 渲染,请到飞书开放平台 "
f"https://open.feishu.cn/app 给机器人开通 cardkit:card:write 权限"
f"(创建与更新卡片)并重新发布版本,同时确保飞书客户端 >= 7.20。"
)
with lock:
disabled[0] = True
return
cid = res_json["data"]["card_id"]
with lock:
card_id[0] = cid
# 步骤 2: 通过 card_id 发送消息(群聊优先用 reply单聊直接 send
content_payload = json.dumps(
{"type": "card", "data": {"card_id": cid}}, ensure_ascii=False
)
can_reply = is_group and msg and hasattr(msg, "msg_id") and msg.msg_id
if can_reply:
send_url = (
f"https://open.feishu.cn/open-apis/im/v1/messages/"
f"{msg.msg_id}/reply"
)
send_body = {"msg_type": "interactive", "content": content_payload}
send_res = requests.post(
send_url, headers=headers, json=send_body, timeout=(5, 10)
)
else:
send_url = "https://open.feishu.cn/open-apis/im/v1/messages"
params = {"receive_id_type": receive_id_type}
send_body = {
"receive_id": receiver,
"msg_type": "interactive",
"content": content_payload,
}
send_res = requests.post(
send_url, headers=headers, params=params, json=send_body,
timeout=(5, 10),
)
send_json = send_res.json()
if send_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: send card failed: {send_json}. 降级为普通文本。"
)
with lock:
disabled[0] = True
return
mid = send_json["data"]["message_id"]
with lock:
message_id[0] = mid
logger.info(
f"[FeiShu] Stream: card created and sent, "
f"card_id={cid}, message_id={mid}"
)
except Exception as e:
logger.warning(
f"[FeiShu] Stream: create/send card exception: {e}. 降级为普通文本。"
)
with lock:
disabled[0] = True
finally:
with lock:
init_in_flight[0] = False
def _stream_update_text(full_text):
"""PUT 流式更新文本组件。content 必须是当前组件的全量文本。"""
with lock:
cid = card_id[0]
if not cid:
return
url = (
f"https://open.feishu.cn/open-apis/cardkit/v1/cards/"
f"{cid}/elements/{ELEMENT_ID}/content"
)
body = {
"content": full_text,
"sequence": _next_sequence(),
}
try:
res = requests.put(url, headers=headers, json=body, timeout=(5, 10))
res_json = res.json()
if res_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: update text failed: {res_json}"
)
except Exception as e:
logger.warning(f"[FeiShu] Stream: update text exception: {e}")
def _close_streaming_mode(final_text: str = ""):
"""关闭流式模式(卡片转入"普通"状态,可被转发)。
同时通过整卡更新接口把 summary 改成最终内容的预览,否则飞书会话列表
会一直显示创建卡片时的占位摘要("[正在生成回复...]")。
"""
with lock:
cid = card_id[0]
if not cid:
return
# 1) 通过整卡更新接口把 streaming_mode 关掉,并改写 summary
# settings 接口的 config 不接受 summary 字段,会报 code=2200
preview_src = (final_text or "").strip().replace("\n", " ")
preview = preview_src[:30] if preview_src else ""
full_card = {
"schema": "2.0",
"config": {
"streaming_mode": False,
"summary": {"content": preview or " "},
},
"body": {
"elements": [
{
"tag": "markdown",
"content": final_text or " ",
"element_id": ELEMENT_ID,
}
],
},
}
put_url = f"https://open.feishu.cn/open-apis/cardkit/v1/cards/{cid}"
put_body = {
"card": {"type": "card_json", "data": json.dumps(full_card, ensure_ascii=False)},
"sequence": _next_sequence(),
}
try:
res = requests.put(put_url, headers=headers, json=put_body, timeout=(5, 10))
res_json = res.json()
if res_json.get("code") != 0:
logger.warning(
f"[FeiShu] Stream: finalize card (close+summary) failed: {res_json}"
)
except Exception as e:
logger.warning(
f"[FeiShu] Stream: finalize card exception: {e}"
)
def on_event(event: dict):
event_type = event.get("type")
data = event.get("data", {})
# 一旦降级,本次回复不再做任何流式操作
with lock:
if disabled[0]:
return
if event_type == "message_update":
delta = data.get("delta", "")
if not delta:
return
# 第一段:判断是否需要初始化(创建卡片 + 发送)
need_init = False
with lock:
if card_id[0] is None and not init_in_flight[0]:
init_in_flight[0] = True
need_init = True
if need_init:
_create_and_send_card()
# 初始化失败已标记 disabled下次循环直接 return
with lock:
if disabled[0]:
return
# 第二段:累加文本,把快照丢给 push worker 异步推送。
# 这里不能直接 requests.put否则会阻塞 LLM stream 线程读下一个 chunk
# (实测 DeepSeek 高频小 chunk 场景每个 PUT ~150ms累积起来非常卡
snapshot = ""
should_push = False
with lock:
current_text[0] += delta
if card_id[0]:
snapshot = current_text[0]
should_push = True
if should_push:
push_queue.put(snapshot)
elif event_type == "message_end":
# 一轮 LLM 输出结束。如果本轮触发了工具调用,说明当前轮的文本是
# "中间过场消息"(如"来看看!"),应该作为独立卡片定型,然后为下一轮
# 重新创建一张新卡片。这样最终用户看到的是:
# [卡片1: 中间过场1]
# [卡片2: 中间过场2]
# ...
# [卡片N: 最终回复]
# 与 wecom_bot 的多消息流式体验对齐。
tool_calls = data.get("tool_calls", []) or []
if not tool_calls:
# 没有工具调用:本轮即最终回复,留给 agent_end 统一处理。
return
with lock:
text_to_finalize = current_text[0].rstrip()
current_text[0] = ""
if not text_to_finalize:
return
# 等异步队列里堆积的快照都推完,避免它们晚于 final 文本到达把内容覆盖掉
_drain_push_queue()
# 用最终文本覆盖当前卡片并关闭流式模式(凝固成普通卡片,
# 同时把会话列表的 summary 改成预览,不再显示"正在生成回复..."
_stream_update_text(text_to_finalize)
_close_streaming_mode(text_to_finalize)
# 重置卡片状态,下一段 message_update 会触发新卡片的创建
with lock:
card_id[0] = None
message_id[0] = None
sequence[0] = 0
elif event_type == "agent_end":
# 最终回复:用 final_response 覆盖当前流式卡片,然后关闭流式模式。
final_response = data.get("final_response", "")
if not final_response:
return
final_text = str(final_response)
# 标记 streamed 让 chat_channel 跳过 send()
context["feishu_streamed"] = True
with lock:
has_card = card_id[0] is not None
init_busy = init_in_flight[0]
# 罕见情况agent_end 触发时还没创建过卡片(极快返回 / 没有
# message_update主动创建一张承载 final_text。
if not has_card and not init_busy:
with lock:
init_in_flight[0] = True
_create_and_send_card()
with lock:
if disabled[0]:
return
_drain_push_queue()
_stream_update_text(final_text)
_close_streaming_mode(final_text)
# 通知 push worker 退出(本次回复彻底结束)
push_queue.put(None)
return on_event
def fetch_access_token(self) -> str:
url = "https://open.feishu.cn/open-apis/auth/v3/tenant_access_token/internal/"
headers = {
@@ -687,6 +1287,66 @@ class FeiShuChanel(ChatChannel):
except Exception as e:
logger.warning(f"[FeiShu] Failed to remove temp file {temp_file}: {e}")
def _upload_audio(self, audio_path, access_token):
"""
Upload a local audio file to Feishu and return file_key.
audio_path is a plain local file path (no file:// prefix).
Feishu audio messages only support opus format; non-opus files are converted first.
"""
logger.debug(f"[FeiShu] start upload audio, path={audio_path}")
if not os.path.exists(audio_path):
logger.error(f"[FeiShu] audio file not found: {audio_path}")
return None
# Feishu only plays audio messages in opus format.
# Convert if the TTS engine produced a different format (e.g. mp3 from OpenAI TTS).
upload_path = audio_path
if not audio_path.lower().endswith('.opus'):
opus_path = os.path.splitext(audio_path)[0] + '.opus'
try:
from pydub import AudioSegment
audio = AudioSegment.from_file(audio_path)
audio.export(opus_path, format='opus')
upload_path = opus_path
logger.info(f"[FeiShu] Converted audio to opus: {opus_path}")
except Exception as e:
logger.warning(f"[FeiShu] Failed to convert audio to opus, uploading original: {e}")
upload_path = audio_path
file_name = os.path.splitext(os.path.basename(upload_path))[0] + '.opus'
upload_url = "https://open.feishu.cn/open-apis/im/v1/files"
data = {'file_type': 'opus', 'file_name': file_name}
headers = {'Authorization': f'Bearer {access_token}'}
try:
with open(upload_path, "rb") as f:
upload_response = requests.post(
upload_url,
files={"file": f},
data=data,
headers=headers,
timeout=(5, 30)
)
logger.info(
f"[FeiShu] upload audio response, status={upload_response.status_code}, res={upload_response.content}")
response_data = upload_response.json()
if response_data.get("code") == 0:
return response_data.get("data").get("file_key")
else:
logger.error(f"[FeiShu] upload audio failed: {response_data}")
return None
except Exception as e:
logger.error(f"[FeiShu] upload audio exception: {e}")
return None
finally:
# 无论上传成功与否都清理转换产生的临时 opus 文件,避免失败路径下磁盘堆积。
if upload_path != audio_path and os.path.exists(upload_path):
try:
os.remove(upload_path)
except Exception as e:
logger.warning(f"[FeiShu] Failed to remove temp opus file {upload_path}: {e}")
def _upload_file_url(self, file_url, access_token):
"""
Upload file to Feishu

View File

@@ -162,6 +162,38 @@ class FeishuMessage(ChatMessage):
else:
logger.info(f"[FeiShu] Failed to download file, key={file_key}, res={response.text}")
self._prepare_fn = _download_file
elif msg_type == "audio":
# 飞书用户发送的语音消息类型为 "audio",文件为 opus 编码格式。
# 映射为 ContextType.VOICE交由 chat_channel 的语音转文字STT流程处理。
# 文件通过 _prepare_fn 延迟下载,在 chat_channel 调用 cmsg.prepare() 时才执行。
self.ctype = ContextType.VOICE
content = json.loads(msg.get("content"))
file_key = content.get("file_key")
self.content = TmpDir().path() + file_key + ".opus"
logger.info(f"[FeiShu] audio message: file_key={file_key}, save_path={self.content}")
def _download_audio():
logger.info(f"[FeiShu] downloading audio: file_key={file_key}, msg_id={self.msg_id}")
url = f"https://open.feishu.cn/open-apis/im/v1/messages/{self.msg_id}/resources/{file_key}"
headers = {
"Authorization": "Bearer " + access_token,
}
params = {
"type": "file"
}
try:
response = requests.get(url=url, headers=headers, params=params)
logger.info(f"[FeiShu] download audio response: status={response.status_code}, size={len(response.content)} bytes")
if response.status_code == 200:
with open(self.content, "wb") as f:
f.write(response.content)
logger.info(f"[FeiShu] audio saved to: {self.content}")
else:
logger.error(f"[FeiShu] Failed to download audio, key={file_key}, status={response.status_code}, res={response.text}")
except Exception as e:
logger.error(f"[FeiShu] Exception downloading audio, key={file_key}: {e}", exc_info=True)
self._prepare_fn = _download_audio
else:
raise NotImplementedError("Unsupported message type: Type:{} ".format(msg_type))

View File

@@ -213,6 +213,9 @@
<div id="session-list" class="session-list"></div>
</aside>
<!-- Mobile overlay for session panel (click to close) -->
<div id="session-panel-overlay" class="session-panel-overlay hidden" onclick="closeSessionPanel()"></div>
<!-- ================================================================ -->
<!-- MAIN CONTENT -->
<!-- ================================================================ -->
@@ -285,7 +288,7 @@
<!-- ====================================================== -->
<!-- VIEW: Chat -->
<!-- ====================================================== -->
<div id="view-chat" class="view active">
<div id="view-chat" class="view active relative">
<!-- Messages -->
<div id="chat-messages" class="flex-1 overflow-y-auto">
<!-- Welcome Screen -->
@@ -361,6 +364,18 @@
</div>
</div>
<!-- Scroll-to-bottom FAB -->
<button id="scroll-to-bottom-btn"
class="hidden absolute right-5 bottom-[80px] z-10
w-9 h-9 rounded-full shadow-lg
bg-white dark:bg-[#2A2A2A] border border-slate-200 dark:border-white/15
text-slate-500 dark:text-slate-400 hover:text-primary-500 dark:hover:text-primary-400
flex items-center justify-center cursor-pointer transition-all duration-200
hover:shadow-xl hover:scale-105"
onclick="_autoScrollEnabled = true; scrollChatToBottom(true);">
<i class="fas fa-chevron-down text-sm"></i>
</button>
<!-- Chat Input -->
<div class="flex-shrink-0 border-t border-slate-200 dark:border-white/10 bg-white dark:bg-[#1A1A1A] px-4 py-3">
<div class="max-w-3xl mx-auto">
@@ -445,6 +460,9 @@
</div>
<div class="cfg-dropdown-menu"></div>
</div>
<div id="cfg-custom-tip" class="mt-1.5 text-xs text-slate-400 dark:text-slate-500 hidden">
<i class="fas fa-info-circle mr-1"></i><span data-i18n="config_custom_tip">接口需遵循 OpenAI API 协议</span>
</div>
</div>
<!-- Model -->
<div>
@@ -546,7 +564,7 @@
<span class="cfg-tip" data-tip-key="config_enable_thinking_hint"><i class="fas fa-circle-question"></i></span>
</label>
<label class="relative inline-flex items-center cursor-pointer">
<input id="cfg-enable-thinking" type="checkbox" class="sr-only peer" checked>
<input id="cfg-enable-thinking" type="checkbox" class="sr-only peer">
<div class="w-9 h-5 bg-slate-200 dark:bg-slate-700 peer-checked:bg-primary-400 rounded-full
after:content-[''] after:absolute after:top-[2px] after:left-[2px] after:bg-white
after:rounded-full after:h-4 after:w-4 after:transition-all peer-checked:after:translate-x-full"></div>

View File

@@ -339,6 +339,23 @@
}
.confirm-btn-ok:hover { background: #dc2626; }
/* Session panel overlay (mobile only, click to close) */
.session-panel-overlay {
display: none;
}
@media (max-width: 768px) {
.session-panel-overlay {
display: block;
position: fixed;
inset: 0;
z-index: 44;
background: rgba(0, 0, 0, 0.3);
}
.session-panel-overlay.hidden {
display: none;
}
}
/* Mobile: session panel as overlay */
@media (max-width: 768px) {
.session-panel {
@@ -492,6 +509,22 @@
color: #b0b8c4;
margin-bottom: 0.375rem;
}
/* Streaming reasoning: render as plain pre to avoid expensive markdown
re-parsing on every chunk. Wrap long lines so the bubble width is
respected and use the same font size/color as the rendered version. */
.agent-thinking-step .thinking-stream-pre {
margin: 0;
padding: 0;
background: transparent;
border: 0;
font-family: inherit;
font-size: inherit;
line-height: 1.5;
color: inherit;
white-space: pre-wrap;
word-break: break-word;
overflow-wrap: anywhere;
}
/* Content step - real text output frozen before tool calls */
.agent-content-step {
@@ -935,13 +968,13 @@
font-size: 8px;
transition: transform 0.15s;
}
.knowledge-tree-group.open .chevron {
.knowledge-tree-group.open > .knowledge-tree-group-btn .chevron {
transform: rotate(90deg);
}
.knowledge-tree-group-items {
display: none;
}
.knowledge-tree-group.open .knowledge-tree-group-items {
.knowledge-tree-group.open > .knowledge-tree-group-items {
display: block;
}
@@ -1035,12 +1068,10 @@
}
.cfg-tip:hover { color: #64748b; }
.dark .cfg-tip:hover { color: #cbd5e1; }
.cfg-tip::after {
content: attr(data-tooltip);
position: absolute;
left: 50%;
bottom: calc(100% + 6px);
transform: translateX(-50%);
/* Floating tooltip portal — appended to <body> by JS so it isn't clipped
by overflow:hidden ancestors. */
.cfg-tip-floating {
position: fixed;
padding: 6px 10px;
border-radius: 8px;
font-size: 12px;
@@ -1053,13 +1084,13 @@
opacity: 0;
pointer-events: none;
transition: opacity 0.15s;
z-index: 50;
z-index: 9999;
}
.dark .cfg-tip::after {
.dark .cfg-tip-floating {
background: #334155;
color: #f1f5f9;
}
.cfg-tip:hover::after {
.cfg-tip-floating.show {
opacity: 1;
}

File diff suppressed because it is too large Load Diff

View File

@@ -91,39 +91,9 @@ def _get_upload_dir() -> str:
def _generate_session_title(user_message: str, assistant_reply: str = "") -> str:
"""
Generate a short session title by calling the current bot's reply_text.
"""
import re
fallback = user_message[:50].split("\n")[0].strip() or "New Chat"
try:
from bridge.bridge import Bridge
from models.session_manager import Session
bot = Bridge().get_bot("chat")
prompt_parts = [f"User: {user_message[:300]}"]
if assistant_reply:
prompt_parts.append(f"Assistant: {assistant_reply[:300]}")
session = Session("__title_gen__", system_prompt="")
session.messages = [
{"role": "user", "content": (
"Generate a very short title (max 15 characters for Chinese, max 6 words for English) "
"summarizing this conversation. Return ONLY the title text, nothing else.\n\n"
+ "\n".join(prompt_parts)
)}
]
result = bot.reply_text(session)
raw = (result.get("content") or "").strip()
# Strip <think>...</think> reasoning blocks
title = re.sub(r'<think>.*?</think>', '', raw, flags=re.DOTALL).strip().strip('"\'')
logger.info(f"[WebChannel] Title generation result: '{title}' (len={len(title)})")
if title and len(title) <= 50:
return title
except Exception as e:
logger.warning(f"[WebChannel] Title generation failed: {e}")
return fallback
"""Delegate to the shared SessionService implementation."""
from agent.chat.session_service import generate_session_title
return generate_session_title(user_message, assistant_reply)
class WebMessage(ChatMessage):
@@ -238,9 +208,24 @@ class WebChannel(ChatChannel):
# Fallback: polling mode
if session_id in self.session_queues:
content = reply.content if reply.content is not None else ""
# Skip file:// IMAGE_URL/FILE replies originating from an SSE-enabled
# request: they were already pushed via the `file_to_send` event during
# agent execution. By the time the chat_channel sends the IMAGE_URL reply,
# the SSE stream has typically closed (after the text "done") and the
# request_id is gone from sse_queues, so we'd otherwise duplicate the file
# as a polling bubble. Scheduler/push tasks have no on_event and must
# still go through polling normally.
if (
reply.type in (ReplyType.IMAGE_URL, ReplyType.FILE)
and content.startswith("file://")
and context.get("on_event") is not None
):
logger.debug(f"Polling skipped duplicate file reply for session {session_id}")
return
response_data = {
"type": str(reply.type),
"content": reply.content,
"content": content,
"timestamp": time.time(),
"request_id": request_id
}
@@ -255,6 +240,17 @@ class WebChannel(ChatChannel):
def _make_sse_callback(self, request_id: str):
"""Build an on_event callback that pushes agent stream events into the SSE queue."""
# Cap reasoning bytes pushed to the frontend per request to avoid
# browser stalls / crashes on very long chains-of-thought. Anything
# beyond the cap is dropped from the stream (DB still persists a
# truncated copy via _truncate_reasoning_for_storage).
# Keep aligned with frontend REASONING_RENDER_CAP and backend
# MAX_STORED_REASONING_CHARS.
MAX_REASONING_STREAM_CHARS = 4 * 1024 # 4 KB
# Use a single-element list as a mutable counter accessible from closure.
reasoning_chars_sent = [0]
reasoning_capped_notified = [False]
def on_event(event: dict):
if request_id not in self.sse_queues:
return
@@ -264,8 +260,21 @@ class WebChannel(ChatChannel):
if event_type == "reasoning_update":
delta = data.get("delta", "")
if delta:
q.put({"type": "reasoning", "content": delta})
if not delta:
return
remaining = MAX_REASONING_STREAM_CHARS - reasoning_chars_sent[0]
if remaining <= 0:
if not reasoning_capped_notified[0]:
reasoning_capped_notified[0] = True
q.put({
"type": "reasoning",
"content": "\n\n... [reasoning truncated for display] ...",
})
return
if len(delta) > remaining:
delta = delta[:remaining]
reasoning_chars_sent[0] += len(delta)
q.put({"type": "reasoning", "content": delta})
elif event_type == "message_update":
delta = data.get("delta", "")
@@ -299,6 +308,25 @@ class WebChannel(ChatChannel):
if tool_calls:
q.put({"type": "message_end", "has_tool_calls": True})
elif event_type == "agent_end":
# Safety net: if the agent finishes with an empty final_response,
# chat_channel skips _send_reply (because reply.content is empty),
# which means no "done" event is ever emitted and the SSE stream
# would hang until the 10-min idle timeout. Push a fallback "done"
# here so the frontend always gets closure.
final_response = data.get("final_response", "")
if not final_response or not str(final_response).strip():
logger.warning(
f"[WebChannel] agent_end with empty final_response for "
f"request {request_id}, sending fallback done"
)
q.put({
"type": "done",
"content": "(模型未返回任何内容,请重试或换一种方式描述你的需求)",
"request_id": request_id,
"timestamp": time.time(),
})
elif event_type == "file_to_send":
file_path = data.get("path", "")
file_name = data.get("file_name", os.path.basename(file_path))
@@ -547,6 +575,7 @@ class WebChannel(ChatChannel):
'/config', 'ConfigHandler',
'/api/channels', 'ChannelsHandler',
'/api/weixin/qrlogin', 'WeixinQrHandler',
'/api/feishu/register', 'FeishuRegisterHandler',
'/api/tools', 'ToolsHandler',
'/api/skills', 'SkillsHandler',
'/api/memory', 'MemoryHandler',
@@ -742,65 +771,60 @@ class ChatHandler:
class ConfigHandler:
_RECOMMENDED_MODELS = [
const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
const.KIMI_K2_5, const.KIMI_K2,
const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
const.DEEPSEEK_V4_FLASH, const.DEEPSEEK_V4_PRO, const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER,
const.MINIMAX_M2_7_HIGHSPEED, const.MINIMAX_M2_7, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING,
const.CLAUDE_4_6_SONNET, const.CLAUDE_4_7_OPUS, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET,
const.GEMINI_31_FLASH_LITE_PRE, const.GEMINI_31_PRO_PRE, const.GEMINI_3_FLASH_PRE,
const.GPT_54, const.GPT_54_MINI, const.GPT_54_NANO, const.GPT_5, const.GPT_41, const.GPT_4o,
const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER,
const.GLM_5_1, const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7,
const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX,
const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE,
const.KIMI_K2_6, const.KIMI_K2_5, const.KIMI_K2,
const.ERNIE_5, const.ERNIE_X1_1, const.ERNIE_45_TURBO_128K, const.ERNIE_45_TURBO_32K,
]
# Generic placeholder hints surfaced in the web console. We deliberately
# show the version-path tail (e.g. "/v1") so users are reminded to type
# the full base URL. The form is intentionally vague (`...../v1`) so it
# never looks like a real default a user might paste verbatim — and we
# never auto-rewrite anything on the server side.
_PLACEHOLDER_V1 = "https://...../v1"
_PLACEHOLDER_QIANFAN = "https://...../v2"
_PLACEHOLDER_ZHIPU = "https://...../api/paas/v4"
_PLACEHOLDER_DOUBAO = "https://...../api/v3"
_PLACEHOLDER_GEMINI = "https://....."
PROVIDER_MODELS = OrderedDict([
("deepseek", {
"label": "DeepSeek",
"api_key_field": "deepseek_api_key",
"api_base_key": "deepseek_api_base",
"api_base_default": "https://api.deepseek.com/v1",
"api_base_placeholder": _PLACEHOLDER_V1,
"models": [const.DEEPSEEK_V4_FLASH, const.DEEPSEEK_V4_PRO, const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER],
}),
("minimax", {
"label": "MiniMax",
"api_key_field": "minimax_api_key",
"api_base_key": None,
"api_base_default": None,
"api_base_placeholder": "",
"models": [const.MINIMAX_M2_7, const.MINIMAX_M2_7_HIGHSPEED, const.MINIMAX_M2_5, const.MINIMAX_M2_1, const.MINIMAX_M2_1_LIGHTNING],
}),
("zhipu", {
"label": "智谱AI",
"api_key_field": "zhipu_ai_api_key",
"api_base_key": "zhipu_ai_api_base",
"api_base_default": "https://open.bigmodel.cn/api/paas/v4",
"models": [const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7],
}),
("dashscope", {
"label": "通义千问",
"api_key_field": "dashscope_api_key",
"api_base_key": None,
"api_base_default": None,
"models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
}),
("moonshot", {
"label": "Kimi",
"api_key_field": "moonshot_api_key",
"api_base_key": "moonshot_base_url",
"api_base_default": "https://api.moonshot.cn/v1",
"models": [const.KIMI_K2_5, const.KIMI_K2],
}),
("doubao", {
"label": "豆包",
"api_key_field": "ark_api_key",
"api_base_key": "ark_base_url",
"api_base_default": "https://ark.cn-beijing.volces.com/api/v3",
"models": [const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE],
}),
("claudeAPI", {
"label": "Claude",
"api_key_field": "claude_api_key",
"api_base_key": "claude_api_base",
"api_base_default": "https://api.anthropic.com/v1",
"models": [const.CLAUDE_4_6_SONNET, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET],
"api_base_placeholder": _PLACEHOLDER_V1,
"models": [const.CLAUDE_4_6_SONNET, const.CLAUDE_4_7_OPUS, const.CLAUDE_4_6_OPUS, const.CLAUDE_4_5_SONNET],
}),
("gemini", {
"label": "Gemini",
"api_key_field": "gemini_api_key",
"api_base_key": "gemini_api_base",
"api_base_default": "https://generativelanguage.googleapis.com",
"api_base_placeholder": _PLACEHOLDER_GEMINI,
"models": [const.GEMINI_31_FLASH_LITE_PRE, const.GEMINI_31_PRO_PRE, const.GEMINI_3_FLASH_PRE],
}),
("openai", {
@@ -808,20 +832,55 @@ class ConfigHandler:
"api_key_field": "open_ai_api_key",
"api_base_key": "open_ai_api_base",
"api_base_default": "https://api.openai.com/v1",
"api_base_placeholder": _PLACEHOLDER_V1,
"models": [const.GPT_54, const.GPT_54_MINI, const.GPT_54_NANO, const.GPT_5, const.GPT_41, const.GPT_4o],
}),
("deepseek", {
"label": "DeepSeek",
"api_key_field": "deepseek_api_key",
"api_base_key": "deepseek_api_base",
"api_base_default": "https://api.deepseek.com/v1",
"models": [const.DEEPSEEK_CHAT, const.DEEPSEEK_REASONER],
("zhipu", {
"label": "智谱AI",
"api_key_field": "zhipu_ai_api_key",
"api_base_key": "zhipu_ai_api_base",
"api_base_default": "https://open.bigmodel.cn/api/paas/v4",
"api_base_placeholder": _PLACEHOLDER_ZHIPU,
"models": [const.GLM_5_1, const.GLM_5_TURBO, const.GLM_5, const.GLM_4_7],
}),
("dashscope", {
"label": "通义千问",
"api_key_field": "dashscope_api_key",
"api_base_key": None,
"api_base_default": None,
"api_base_placeholder": "",
"models": [const.QWEN36_PLUS, const.QWEN35_PLUS, const.QWEN3_MAX],
}),
("doubao", {
"label": "豆包",
"api_key_field": "ark_api_key",
"api_base_key": "ark_base_url",
"api_base_default": "https://ark.cn-beijing.volces.com/api/v3",
"api_base_placeholder": _PLACEHOLDER_DOUBAO,
"models": [const.DOUBAO_SEED_2_PRO, const.DOUBAO_SEED_2_CODE],
}),
("moonshot", {
"label": "Kimi",
"api_key_field": "moonshot_api_key",
"api_base_key": "moonshot_base_url",
"api_base_default": "https://api.moonshot.cn/v1",
"api_base_placeholder": _PLACEHOLDER_V1,
"models": [const.KIMI_K2_6, const.KIMI_K2_5, const.KIMI_K2],
}),
("qianfan", {
"label": "百度千帆",
"api_key_field": "qianfan_api_key",
"api_base_key": "qianfan_api_base",
"api_base_default": "https://qianfan.baidubce.com/v2",
"api_base_placeholder": _PLACEHOLDER_QIANFAN,
"models": [const.ERNIE_5, const.ERNIE_X1_1, const.ERNIE_45_TURBO_128K, const.ERNIE_45_TURBO_32K],
}),
("modelscope", {
"label": "ModelScope",
"api_key_field": "modelscope_api_key",
"api_base_key": None,
"api_base_default": None,
"api_base_placeholder": "",
"models": [const.QWEN3_5_27B, const.QWEN3_235B_A22B_INSTRUCT_2507],
}),
("linkai", {
@@ -829,17 +888,26 @@ class ConfigHandler:
"api_key_field": "linkai_api_key",
"api_base_key": None,
"api_base_default": None,
"api_base_placeholder": "",
"models": _RECOMMENDED_MODELS,
}),
("custom", {
"label": "自定义",
"api_key_field": "custom_api_key",
"api_base_key": "custom_api_base",
"api_base_default": "",
"api_base_placeholder": _PLACEHOLDER_V1,
"models": [],
}),
])
EDITABLE_KEYS = {
"model", "bot_type", "use_linkai",
"open_ai_api_base", "deepseek_api_base", "claude_api_base", "gemini_api_base",
"zhipu_ai_api_base", "moonshot_base_url", "ark_base_url",
"open_ai_api_key", "deepseek_api_key", "claude_api_key", "gemini_api_key",
"open_ai_api_base", "deepseek_api_base", "qianfan_api_base", "claude_api_base", "gemini_api_base",
"zhipu_ai_api_base", "moonshot_base_url", "ark_base_url", "custom_api_base",
"open_ai_api_key", "deepseek_api_key", "qianfan_api_key", "claude_api_key", "gemini_api_key",
"zhipu_ai_api_key", "dashscope_api_key", "moonshot_api_key",
"ark_api_key", "minimax_api_key", "linkai_api_key",
"ark_api_key", "minimax_api_key", "linkai_api_key", "custom_api_key",
"agent_max_context_tokens", "agent_max_context_turns", "agent_max_steps",
"enable_thinking", "web_password",
}
@@ -877,6 +945,7 @@ class ConfigHandler:
"models": p["models"],
"api_base_key": p["api_base_key"],
"api_base_default": p["api_base_default"],
"api_base_placeholder": p.get("api_base_placeholder", ""),
"api_key_field": p.get("api_key_field"),
}
@@ -894,7 +963,7 @@ class ConfigHandler:
"agent_max_context_tokens": local_config.get("agent_max_context_tokens", 50000),
"agent_max_context_turns": local_config.get("agent_max_context_turns", 20),
"agent_max_steps": local_config.get("agent_max_steps", 20),
"enable_thinking": bool(local_config.get("enable_thinking", True)),
"enable_thinking": bool(local_config.get("enable_thinking", False)),
"api_bases": api_bases,
"api_keys": api_keys_masked,
"providers": providers,
@@ -940,6 +1009,19 @@ class ConfigHandler:
json.dump(file_cfg, f, indent=4, ensure_ascii=False)
logger.info(f"[WebChannel] Config updated: {list(applied.keys())}")
# Reset Bridge so that bot routing reflects the new config.
# Without this, Bridge keeps its cached bot instance (e.g. LinkAIBot)
# even after the user switches bot_type / use_linkai / model in UI.
bridge_routing_keys = {"bot_type", "use_linkai", "model"}
if any(k in applied for k in bridge_routing_keys):
try:
from bridge.bridge import Bridge
Bridge().reset_bot()
logger.info("[WebChannel] Bridge bot routing reset due to config change")
except Exception as reset_err:
logger.warning(f"[WebChannel] Failed to reset bridge: {reset_err}")
return json.dumps({"status": "success", "applied": applied}, ensure_ascii=False)
except Exception as e:
logger.error(f"Error updating config: {e}")
@@ -963,8 +1045,6 @@ class ChannelsHandler:
"fields": [
{"key": "feishu_app_id", "label": "App ID", "type": "text"},
{"key": "feishu_app_secret", "label": "App Secret", "type": "secret"},
{"key": "feishu_token", "label": "Verification Token", "type": "secret"},
{"key": "feishu_bot_name", "label": "Bot Name", "type": "text"},
],
}),
("dingtalk", {
@@ -1459,6 +1539,174 @@ class WeixinQrHandler:
return json.dumps({"status": "success", "qr_status": qr_status})
class FeishuRegisterHandler:
"""飞书智能体应用一键创建OAuth 设备授权流,基于 lark.register_app SDK
GET /api/feishu/register → 启动注册:调用 SDK 生成二维码 URL立即返回
后台线程继续轮询飞书侧直到用户扫码授权。
POST /api/feishu/register → 轮询当前会话状态pending / done / error / expired
注册成功后不直接写 config由前端再调
/api/channels {action:'connect'} 走标准启用流程。
"""
# 进程内单例状态({url, expire_in, status, app_id, app_secret, error, thread})。
# 简单的本地自部署场景下不需要 session 隔离。
_state = {}
_lock = threading.Lock()
@staticmethod
def _qr_to_data_uri(data: str) -> str:
"""复用 WeixinQrHandler 的二维码渲染。"""
return WeixinQrHandler._qr_to_data_uri(data)
@classmethod
def _reset_state(cls):
with cls._lock:
cls._state = {}
@classmethod
def _start_register_thread(cls):
"""启动一次新的注册会话。如已有进行中的会话,先取消(通过 cancel_event"""
# 先取消可能存在的上一次会话,避免两个 SDK 线程并发 poll 同一个端点
with cls._lock:
old_cancel = cls._state.get("cancel_event") if cls._state else None
if old_cancel is not None:
old_cancel.set()
cancel_event = threading.Event()
cls._state = {"status": "starting", "cancel_event": cancel_event}
def _worker():
try:
import lark_oapi as lark
except ImportError:
with cls._lock:
cls._state["status"] = "error"
cls._state["error"] = "lark-oapi SDK 未安装,请执行 pip install -U lark-oapi"
return
def _on_qr(info):
# SDK 拿到二维码 URL 后立即回调;写入 state 让前端 GET 立刻能拿到
with cls._lock:
cls._state["url"] = info.get("url", "")
cls._state["expire_in"] = info.get("expire_in", 600)
cls._state["qr_image"] = cls._qr_to_data_uri(info.get("url", ""))
cls._state["status"] = "pending"
logger.info(f"[FeishuRegister] QR ready, expire_in={info.get('expire_in')}s")
def _on_status(info):
# 过滤掉 polling 心跳(每 5 秒一次,纯噪音);
# 保留 slow_down / domain_switched 等真正的状态切换事件
status = info.get("status")
if status == "polling":
return
logger.info(f"[FeishuRegister] SDK status: {info}")
try:
result = lark.register_app(
on_qr_code=_on_qr,
on_status_change=_on_status,
source="cowagent",
cancel_event=cancel_event,
)
with cls._lock:
cls._state["status"] = "done"
cls._state["app_id"] = result.get("client_id", "")
cls._state["app_secret"] = result.get("client_secret", "")
logger.info(f"[FeishuRegister] App created: app_id={result.get('client_id')}")
except Exception as e:
err_msg = str(e)
err_cls = e.__class__.__name__
# 飞书 SDK 抛出的 AppExpiredError / AppAccessDeniedError / RegisterAppError
if "Expired" in err_cls:
status = "expired"
elif "Denied" in err_cls:
status = "denied"
elif "abort" in err_msg.lower() or "cancel" in err_msg.lower():
# 被新一轮注册抢占,保持安静
return
else:
status = "error"
with cls._lock:
# 仅当当前 state 仍属于本次 worker 时才写入,避免覆盖更新的会话
if cls._state.get("cancel_event") is cancel_event:
cls._state["status"] = status
cls._state["error"] = err_msg
logger.warning(f"[FeishuRegister] Register failed ({err_cls}): {err_msg}")
threading.Thread(target=_worker, daemon=True, name="feishu-register").start()
def GET(self):
"""启动一次新的注册会话。如果已有 pending/done 会话则覆盖。"""
_require_auth()
web.header('Content-Type', 'application/json; charset=utf-8')
try:
self._start_register_thread()
# 等待 SDK 拿到二维码 URL最多 10s。SDK 内部会马上回调 _on_qr。
import time as _t
for _ in range(100):
with self._lock:
if self._state.get("url") or self._state.get("status") in ("error", "expired", "denied"):
break
_t.sleep(0.1)
with self._lock:
if self._state.get("status") in ("error", "expired", "denied"):
return json.dumps({
"status": "error",
"message": self._state.get("error", "register failed"),
})
if not self._state.get("url"):
return json.dumps({
"status": "error",
"message": "等待飞书二维码超时,请重试",
})
return json.dumps({
"status": "success",
"qrcode_url": self._state["url"],
"qr_image": self._state.get("qr_image", ""),
"expire_in": self._state.get("expire_in", 600),
})
except Exception as e:
logger.error(f"[WebChannel] FeishuRegister GET error: {e}")
return json.dumps({"status": "error", "message": str(e)})
def POST(self):
"""轮询注册结果。"""
_require_auth()
web.header('Content-Type', 'application/json; charset=utf-8')
try:
body = json.loads(web.data() or b"{}")
action = body.get("action", "poll")
if action != "poll":
return json.dumps({"status": "error", "message": f"unknown action: {action}"})
with self._lock:
status = self._state.get("status", "idle")
if status == "done":
payload = {
"status": "success",
"register_status": "done",
"app_id": self._state.get("app_id", ""),
"app_secret": self._state.get("app_secret", ""),
}
# 一次性返回凭据后清掉,避免敏感信息长期驻留内存
self._state = {}
return json.dumps(payload)
if status in ("error", "expired", "denied"):
return json.dumps({
"status": "success",
"register_status": status,
"message": self._state.get("error", ""),
})
# pending / starting还在等用户扫码
return json.dumps({
"status": "success",
"register_status": "pending",
})
except Exception as e:
logger.error(f"[WebChannel] FeishuRegister POST error: {e}")
return json.dumps({"status": "error", "message": str(e)})
def _get_workspace_root():
"""Resolve the agent workspace directory."""
from common.utils import expand_path

View File

@@ -1 +1 @@
2.0.6
2.0.8

View File

@@ -644,32 +644,52 @@ def _list_local():
skills_dir = get_skills_dir()
builtin_dir = get_builtin_skills_dir()
# Merge builtin skills that are on disk but missing from config
_merge_builtin_into_config(config, builtin_dir, skills_dir)
if not config:
# Fallback: scan directories directly
entries = []
for d in [builtin_dir, skills_dir]:
if not os.path.isdir(d):
continue
source = "builtin" if d == builtin_dir else "custom"
for name in sorted(os.listdir(d)):
skill_path = os.path.join(d, name)
if os.path.isdir(skill_path) and not name.startswith("."):
has_skill_md = os.path.exists(os.path.join(skill_path, "SKILL.md"))
if has_skill_md:
entries.append({"name": name, "source": source, "enabled": True, "description": ""})
if not entries:
click.echo("No skills installed.")
return
_print_skill_table(entries)
click.echo("No skills installed.")
return
entries = sorted(config.values(), key=lambda x: x.get("name", ""))
if not entries:
click.echo("No skills installed.")
return
_print_skill_table(entries)
def _merge_builtin_into_config(config: dict, builtin_dir: str, skills_dir: str):
"""Scan builtin and custom dirs, add any new skills into config dict."""
dirty = False
for d, source in [(builtin_dir, "builtin"), (skills_dir, "custom")]:
if not os.path.isdir(d):
continue
for name in os.listdir(d):
if name.startswith(".") or name in ("skills_config.json",):
continue
skill_path = os.path.join(d, name)
if not os.path.isdir(skill_path):
continue
if not os.path.isfile(os.path.join(skill_path, "SKILL.md")):
continue
if name in config:
continue
desc = _read_skill_description(skill_path)
config[name] = {
"name": name,
"description": desc,
"source": source,
"enabled": True,
"category": "skill",
}
dirty = True
if dirty:
config_path = os.path.join(skills_dir, "skills_config.json")
try:
os.makedirs(skills_dir, exist_ok=True)
with open(config_path, "w", encoding="utf-8") as f:
json.dump(config, f, indent=4, ensure_ascii=False)
except Exception:
pass
def _print_skill_table(entries):
"""Print skills as a formatted table."""
def _display_label(e):

View File

@@ -56,6 +56,7 @@ class CloudClient(LinkAIClient):
self._memory_service = None
self._knowledge_service = None
self._chat_service = None
self._session_service = None
@property
def skill_service(self):
@@ -118,6 +119,18 @@ class CloudClient(LinkAIClient):
logger.error(f"[CloudClient] Failed to init ChatService: {e}")
return self._chat_service
@property
def session_service(self):
"""Lazy-init SessionService."""
if self._session_service is None:
try:
from agent.chat.session_service import SessionService
self._session_service = SessionService()
logger.debug("[CloudClient] SessionService initialised")
except Exception as e:
logger.error(f"[CloudClient] Failed to init SessionService: {e}")
return self._session_service
# ------------------------------------------------------------------
# message push callback
# ------------------------------------------------------------------
@@ -546,12 +559,23 @@ class CloudClient(LinkAIClient):
# ------------------------------------------------------------------
# history callback
# ------------------------------------------------------------------
# Session-related actions handled via the HISTORY channel
_SESSION_ACTIONS = {
"list_sessions", "delete_session", "rename_session",
"clear_context", "generate_title",
}
def on_history(self, data: dict) -> dict:
"""
Handle HISTORY messages from the cloud console.
Returns paginated conversation history for a session.
:param data: message data with 'action' and 'payload' (session_id, page, page_size)
Supports both history query and session management actions
through a unified HISTORY message channel:
- query: paginated conversation history
- list_sessions / delete_session / rename_session /
clear_context / generate_title: session lifecycle
:param data: message data with 'action' and 'payload'
:return: response dict
"""
action = data.get("action", "query")
@@ -561,8 +585,19 @@ class CloudClient(LinkAIClient):
if action == "query":
return self._query_history(payload)
if action in self._SESSION_ACTIONS:
return self._dispatch_session(action, payload)
return {"action": action, "code": 404, "message": f"unknown action: {action}", "payload": None}
def _dispatch_session(self, action: str, payload: dict) -> dict:
"""Delegate session actions to SessionService."""
svc = self.session_service
if svc is None:
return {"action": action, "code": 500,
"message": "SessionService not available", "payload": None}
return svc.dispatch(action, payload)
def _query_history(self, payload: dict) -> dict:
"""Query paginated conversation history using ConversationStore."""
session_id = payload.get("session_id", "")

View File

@@ -3,6 +3,7 @@ OPEN_AI = "openAI"
OPENAI = "openai"
CHATGPT = "chatGPT" # legacy alias for OPENAI, kept for backward compatibility
BAIDU = "baidu"
QIANFAN = "qianfan"
XUNFEI = "xunfei"
CHATGPTONAZURE = "chatGPTOnAzure"
LINKAI = "linkai"
@@ -14,6 +15,7 @@ ZHIPU_AI = "zhipu"
MOONSHOT = "moonshot"
MiniMax = "minimax"
DEEPSEEK = "deepseek"
CUSTOM = "custom" # custom OpenAI-compatible API, bot_type won't auto-switch on model change
MODELSCOPE = "modelscope"
# 模型列表
@@ -27,6 +29,7 @@ CLAUDE_35_SONNET = "claude-3-5-sonnet-latest" # 带 latest 标签的模型名
CLAUDE_35_SONNET_1022 = "claude-3-5-sonnet-20241022" # 带具体日期的模型名称,会固定为该日期发布的模型
CLAUDE_35_SONNET_0620 = "claude-3-5-sonnet-20240620"
CLAUDE_4_OPUS = "claude-opus-4-0"
CLAUDE_4_7_OPUS = "claude-opus-4-7" # Claude Opus 4.7
CLAUDE_4_6_OPUS = "claude-opus-4-6" # Claude Opus 4.6 - Agent推荐模型
CLAUDE_4_SONNET = "claude-sonnet-4-0" # Claude Sonnet 4.0
CLAUDE_4_5_SONNET = "claude-sonnet-4-5" # Claude Sonnet 4.5 - Agent推荐模型
@@ -80,6 +83,17 @@ TTS_1_HD = "tts-1-hd"
# DeepSeek
DEEPSEEK_CHAT = "deepseek-chat" # DeepSeek-V3对话模型
DEEPSEEK_REASONER = "deepseek-reasoner" # DeepSeek-R1模型
DEEPSEEK_V4_FLASH = "deepseek-v4-flash" # DeepSeek V4 Flash - 默认推荐 (思考模式 + 工具调用)
DEEPSEEK_V4_PRO = "deepseek-v4-pro" # DeepSeek V4 Pro - 复杂任务更强 (思考模式 + 工具调用)
# Baidu Qianfan / ERNIE
ERNIE_5 = "ernie-5.0" # ERNIE 5.0 - default recommendation
ERNIE_X1_1 = "ernie-x1.1" # ERNIE X1.1 - reasoning-focused, multimodal
ERNIE_45_TURBO_128K = "ernie-4.5-turbo-128k"
ERNIE_45_TURBO_32K = "ernie-4.5-turbo-32k"
ERNIE_4_TURBO_8K = "ERNIE-4.0-Turbo-8K"
ERNIE_45_TURBO_VL = "ernie-4.5-turbo-vl"
ERNIE_45_TURBO_VL_32K = "ernie-4.5-turbo-vl-32k"
# Qwen (通义千问 - 阿里云 DashScope)
QWEN_TURBO = "qwen-turbo"
@@ -101,7 +115,8 @@ MINIMAX_M2 = "MiniMax-M2" # MiniMax M2
MINIMAX_ABAB6_5 = "abab6.5-chat" # MiniMax abab6.5
# GLM (智谱AI)
GLM_5_TURBO = "glm-5-turbo" # 智谱 GLM-5-Turbo - Latest
GLM_5_1 = "glm-5.1" # 智谱 GLM-5.1 - Agent recommended model (default)
GLM_5_TURBO = "glm-5-turbo" # 智谱 GLM-5-Turbo
GLM_5 = "glm-5" # 智谱 GLM-5
GLM_4 = "glm-4"
GLM_4_PLUS = "glm-4-plus"
@@ -117,6 +132,7 @@ GLM_4_7 = "glm-4.7" # 智谱 GLM-4.7 - Agent推荐模型
MOONSHOT = "moonshot"
KIMI_K2 = "kimi-k2"
KIMI_K2_5 = "kimi-k2.5"
KIMI_K2_6 = "kimi-k2.6" # Kimi K2.6 - Agent recommended model (default)
# Doubao (Volcengine Ark)
DOUBAO = "doubao"
@@ -150,15 +166,25 @@ MODELSCOPE_MODEL_LIST = ["deepseek-ai/DeepSeek-R1-0528", "deepseek-ai/DeepSeek-R
MODEL_LIST = [
# DeepSeek
DEEPSEEK_V4_FLASH, DEEPSEEK_V4_PRO, DEEPSEEK_CHAT, DEEPSEEK_REASONER,
# Baidu Qianfan / ERNIE
QIANFAN, ERNIE_5, ERNIE_X1_1, ERNIE_45_TURBO_128K, ERNIE_45_TURBO_32K, ERNIE_4_TURBO_8K,
ERNIE_45_TURBO_VL, ERNIE_45_TURBO_VL_32K,
# MiniMax
MiniMax, MINIMAX_M2_7, MINIMAX_M2_7_HIGHSPEED, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
# Claude
CLAUDE3, CLAUDE_4_6_SONNET, CLAUDE_4_6_OPUS, CLAUDE_4_OPUS, CLAUDE_4_5_SONNET, CLAUDE_4_SONNET, CLAUDE_3_OPUS, CLAUDE_3_OPUS_0229,
CLAUDE_35_SONNET, CLAUDE_35_SONNET_1022, CLAUDE_35_SONNET_0620, CLAUDE_3_SONNET, CLAUDE_3_HAIKU,
CLAUDE3, CLAUDE_4_6_SONNET, CLAUDE_4_7_OPUS, CLAUDE_4_6_OPUS, CLAUDE_4_OPUS, CLAUDE_4_5_SONNET, CLAUDE_4_SONNET, CLAUDE_3_OPUS, CLAUDE_3_OPUS_0229,
CLAUDE_35_SONNET, CLAUDE_35_SONNET_1022, CLAUDE_35_SONNET_0620, CLAUDE_3_SONNET, CLAUDE_3_HAIKU,
"claude", "claude-3-haiku", "claude-3-sonnet", "claude-3-opus", "claude-3.5-sonnet",
# Gemini
GEMINI_31_FLASH_LITE_PRE, GEMINI_31_PRO_PRE, GEMINI_3_PRO_PRE, GEMINI_3_FLASH_PRE, GEMINI_25_PRO_PRE, GEMINI_25_FLASH_PRE,
GEMINI_20_FLASH, GEMINI_20_flash_exp, GEMINI_15_PRO, GEMINI_15_flash, GEMINI_PRO, GEMINI,
# OpenAI
GPT35, GPT35_0125, GPT35_1106, "gpt-3.5-turbo-16k",
GPT4, GPT4_06_13, GPT4_32k, GPT4_32k_06_13,
@@ -168,31 +194,29 @@ MODEL_LIST = [
GPT_5, GPT_5_MINI, GPT_5_NANO,
GPT_54, GPT_54_MINI, GPT_54_NANO,
O1, O1_MINI,
# DeepSeek
DEEPSEEK_CHAT, DEEPSEEK_REASONER,
# Qwen
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
# MiniMax
MiniMax, MINIMAX_M2_7, MINIMAX_M2_7_HIGHSPEED, MINIMAX_M2_5, MINIMAX_M2_1, MINIMAX_M2_1_LIGHTNING, MINIMAX_M2, MINIMAX_ABAB6_5,
# GLM
ZHIPU_AI, GLM_5_TURBO, GLM_5, GLM_4, GLM_4_PLUS, GLM_4_flash, GLM_4_LONG, GLM_4_ALLTOOLS,
# GLM (智谱AI)
ZHIPU_AI, GLM_5_1, GLM_5_TURBO, GLM_5, GLM_4, GLM_4_PLUS, GLM_4_flash, GLM_4_LONG, GLM_4_ALLTOOLS,
GLM_4_0520, GLM_4_AIR, GLM_4_AIRX, GLM_4_7,
# Kimi
MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k",
KIMI_K2, KIMI_K2_5,
# Qwen (通义千问)
QWEN36_PLUS, QWEN35_PLUS, QWEN3_MAX, QWEN_MAX, QWEN_PLUS, QWEN_TURBO, QWEN_LONG,
# Doubao
# Doubao (豆包)
DOUBAO, DOUBAO_SEED_2_CODE, DOUBAO_SEED_2_PRO, DOUBAO_SEED_2_LITE, DOUBAO_SEED_2_MINI,
# Kimi (Moonshot)
MOONSHOT, "moonshot-v1-8k", "moonshot-v1-32k", "moonshot-v1-128k",
KIMI_K2_6, KIMI_K2_5, KIMI_K2,
# ModelScope
MODELSCOPE,
# LinkAI
LINKAI_35, LINKAI_4_TURBO, LINKAI_4o,
# 其他模型
WEN_XIN, WEN_XIN_4, XUNFEI,
LINKAI_35, LINKAI_4_TURBO, LINKAI_4o,
MODELSCOPE
]
MODEL_LIST = MODEL_LIST + GITEE_AI_MODEL_LIST + MODELSCOPE_MODEL_LIST

View File

@@ -1,6 +1,10 @@
{
"channel_type": "weixin",
"model": "MiniMax-M2.7",
"model": "deepseek-v4-flash",
"deepseek_api_key": "",
"deepseek_api_base": "https://api.deepseek.com/v1",
"qianfan_api_key": "",
"qianfan_api_base": "https://qianfan.baidubce.com/v2",
"minimax_api_key": "",
"zhipu_ai_api_key": "",
"ark_api_key": "",
@@ -22,8 +26,9 @@
"linkai_app_code": "",
"feishu_app_id": "",
"feishu_app_secret": "",
"feishu_stream_reply": true,
"dingtalk_client_id": "",
"dingtalk_client_secret":"",
"dingtalk_client_secret": "",
"wecom_bot_id": "",
"wecom_bot_secret": "",
"web_password": "",
@@ -31,5 +36,6 @@
"agent_max_context_tokens": 50000,
"agent_max_context_turns": 20,
"agent_max_steps": 20,
"enable_thinking": false,
"knowledge": true
}

View File

@@ -17,10 +17,12 @@ available_setting = {
"open_ai_api_base": "https://api.openai.com/v1",
"claude_api_base": "https://api.anthropic.com/v1", # claude api base
"gemini_api_base": "https://generativelanguage.googleapis.com", # gemini api base
"custom_api_key": "", # custom OpenAI-compatible provider api key (used when bot_type is "custom")
"custom_api_base": "", # custom OpenAI-compatible provider api base (used when bot_type is "custom")
"proxy": "", # openai使用的代理
# chatgpt模型 当use_azure_chatgpt为true时其名称为Azure上model deployment名称
"model": "gpt-3.5-turbo", # 可选择: gpt-4o, pt-4o-mini, gpt-4-turbo, claude-3-sonnet, wenxin, moonshot, qwen-turbo, xunfei, glm-4, minimax, gemini等模型全部可选模型详见common/const.py文件
"bot_type": "", # 可选配置使用兼容openai格式的三方服务时候需填"openai"(历史值"chatGPT"仍兼容。bot具体名称详见common/const.py文件如不填根据model名称判断
"bot_type": "", # 可选配置使用兼容openai格式的三方服务时候需填"openai"或"custom"custom模式下切换模型不会自动切换bot_type。bot具体名称详见common/const.py文件如不填根据model名称判断
"use_azure_chatgpt": False, # 是否使用azure的chatgpt
"azure_deployment_id": "", # azure 模型部署名称
"azure_api_version": "", # azure api版本
@@ -74,6 +76,9 @@ available_setting = {
"baidu_wenxin_api_key": "", # Baidu api key
"baidu_wenxin_secret_key": "", # Baidu secret key
"baidu_wenxin_prompt_enabled": False, # Enable prompt if you are using ernie character model
# Baidu Qianfan / ERNIE OpenAI-compatible API
"qianfan_api_key": "", # Baidu Qianfan API key in bce-v3 format
"qianfan_api_base": "https://qianfan.baidubce.com/v2", # Qianfan OpenAI-compatible API base
# 讯飞星火API
"xunfei_app_id": "", # 讯飞应用ID
"xunfei_api_key": "", # 讯飞 API key
@@ -121,10 +126,13 @@ available_setting = {
"chat_start_time": "00:00", # 服务开始时间
"chat_stop_time": "24:00", # 服务结束时间
# 翻译api
"translate": "baidu", # 翻译api支持baidu
"translate": "baidu", # 翻译api支持baidu, youdao
# baidu翻译api的配置
"baidu_translate_app_id": "", # 百度翻译api的appid
"baidu_translate_app_key": "", # 百度翻译api的秘钥
# youdao翻译api的配置
"youdao_translate_app_key": "", # 有道翻译api的应用ID
"youdao_translate_app_secret": "", # 有道翻译api的应用密钥
# wechatmp的配置
"wechatmp_token": "", # 微信公众平台的Token
"wechatmp_port": 8080, # 微信公众平台的端口,需要端口转发到80或443
@@ -140,12 +148,13 @@ available_setting = {
"wechatcomapp_agent_id": "", # 企业微信app的agent_id
"wechatcomapp_aes_key": "", # 企业微信app的aes_key
# 飞书配置
"feishu_port": 80, # 飞书bot监听端口
"feishu_port": 80, # 飞书bot监听端口仅webhook模式需要
"feishu_app_id": "", # 飞书机器人应用APP Id
"feishu_app_secret": "", # 飞书机器人APP secret
"feishu_token": "", # 飞书 verification token
"feishu_bot_name": "", # 飞书机器人的名字
"feishu_token": "", # 飞书 verification token仅webhook模式需要
"feishu_event_mode": "websocket", # 飞书事件接收模式: webhook(HTTP服务器) 或 websocket(长连接)
# 飞书流式回复(基于官方 cardkit 流式卡片 API需要机器人开通 cardkit:card:write 权限,且飞书客户端 7.20+
"feishu_stream_reply": True, # 是否开启流式回复(打字机效果)。失败/老客户端自动降级为非流式或升级提示
# 钉钉配置
"dingtalk_client_id": "", # 钉钉机器人Client ID
"dingtalk_client_secret": "", # 钉钉机器人Client Secret
@@ -194,6 +203,8 @@ available_setting = {
"minimax_api_key": "",
"Minimax_group_id": "",
"Minimax_base_url": "",
"deepseek_api_key": "",
"deepseek_api_base": "https://api.deepseek.com/v1",
"web_port": 9899,
"web_password": "", # Web console password; empty means no authentication required
"web_session_expire_days": 30, # Auth session expiry in days
@@ -202,8 +213,12 @@ available_setting = {
"agent_max_context_tokens": 50000, # Agent模式下最大上下文tokens
"agent_max_context_turns": 20, # Agent模式下最大上下文记忆轮次
"agent_max_steps": 20, # Agent模式下单次运行最大决策步数
"enable_thinking": True, # Whether to enable deep thinking for web channel
"enable_thinking": False, # Enable deep-thinking mode for thinking-capable models
"knowledge": True, # 是否开启知识库功能
# Per-skill runtime config. Nested keys are flattened to env vars at startup
# using the rule: skill[<name>][<key>] -> SKILL_<NAME>_<KEY>
# (e.g. skill["image-generation"].model -> SKILL_IMAGE_GENERATION_MODEL).
"skill": {},
}
@@ -220,13 +235,13 @@ class Config(dict):
def __getitem__(self, key):
# 跳过以下划线开头的注释字段
if not key.startswith("_") and key not in available_setting:
logger.warning("[Config] key '{}' not in available_setting, may not take effect".format(key))
logger.debug("[Config] key '{}' not in available_setting, may not take effect".format(key))
return super().__getitem__(key)
def __setitem__(self, key, value):
# 跳过以下划线开头的注释字段
if not key.startswith("_") and key not in available_setting:
logger.warning("[Config] key '{}' not in available_setting, may not take effect".format(key))
logger.debug("[Config] key '{}' not in available_setting, may not take effect".format(key))
return super().__setitem__(key, value)
def get(self, key, default=None):
@@ -376,12 +391,18 @@ def load_config():
"gemini_api_base": "GEMINI_API_BASE",
"minimax_api_key": "MINIMAX_API_KEY",
"minimax_api_base": "MINIMAX_API_BASE",
"deepseek_api_key": "DEEPSEEK_API_KEY",
"deepseek_api_base": "DEEPSEEK_API_BASE",
"qianfan_api_key": "QIANFAN_API_KEY",
"qianfan_api_base": "QIANFAN_API_BASE",
"zhipu_ai_api_key": "ZHIPU_AI_API_KEY",
"zhipu_ai_api_base": "ZHIPU_AI_API_BASE",
"moonshot_api_key": "MOONSHOT_API_KEY",
"moonshot_api_base": "MOONSHOT_API_BASE",
"ark_api_key": "ARK_API_KEY",
"ark_api_base": "ARK_API_BASE",
"dashscope_api_key": "DASHSCOPE_API_KEY",
"dashscope_api_base": "DASHSCOPE_API_BASE",
# Channel credentials (used by skills that check env vars)
"feishu_app_id": "FEISHU_APP_ID",
"feishu_app_secret": "FEISHU_APP_SECRET",
@@ -402,12 +423,45 @@ def load_config():
if val:
os.environ[env_key] = str(val)
injected += 1
injected += _sync_skill_config_to_env(config.get("skill", {}))
if injected:
logger.info("[INIT] Synced {} config values to environment variables".format(injected))
config.load_user_datas()
def _sync_skill_config_to_env(skill_section) -> int:
"""Flatten skill-namespaced config into environment variables.
Mapping rule: ``config["skill"][<name>][<key>]`` -> ``SKILL_<NAME>_<KEY>``
(e.g. ``skill["image-generation"].model`` -> ``SKILL_IMAGE_GENERATION_MODEL``).
This lets subprocess-based skill scripts read their own settings without
importing project code. Existing env vars are NOT overwritten so the
real environment always wins.
Returns the number of variables actually injected.
"""
if not isinstance(skill_section, dict):
return 0
injected = 0
for skill_name, skill_conf in skill_section.items():
if not isinstance(skill_conf, dict):
continue
name_part = str(skill_name).replace("-", "_").upper()
for key, val in skill_conf.items():
if val is None or val == "":
continue
env_key = "SKILL_{}_{}".format(name_part, str(key).upper())
if env_key in os.environ:
continue
os.environ[env_key] = str(val)
injected += 1
return injected
def get_root():
return os.path.dirname(os.path.abspath(__file__))

View File

@@ -9,7 +9,9 @@ services:
- "9899:9899"
environment:
CHANNEL_TYPE: 'weixin'
MODEL: 'MiniMax-M2.7'
MODEL: 'deepseek-v4-flash'
DEEPSEEK_API_KEY: ''
DEEPSEEK_API_BASE: 'https://api.deepseek.com/v1'
MINIMAX_API_KEY: ''
ZHIPU_AI_API_KEY: ''
ARK_API_KEY: ''

View File

@@ -3,67 +3,109 @@ title: 飞书
description: 将 CowAgent 接入飞书应用
---
通过自建应用 CowAgent 接入飞书,需要是飞书企业用户且具有企业管理权限
> 通过飞书自建应用接入 CowAgent,支持单聊与群聊(@机器人),使用 WebSocket 长连接模式,无需公网 IP支持流式打字机回复、语音消息收发
## 一、创建企业自建应用
<Note>
接入需要是飞书企业用户且具有企业管理权限。
</Note>
### 1. 创建应用
## 一、接入方式
进入 [飞书开发平台](https://open.feishu.cn/app/),点击 **创建企业自建应用**,填写必要信息后点击 **创建**
### 方式一:扫码一键接入(推荐)
启动 Cow 项目后在终端中即可完成扫码创建。或打开 Web 控制台本地链接http://127.0.0.1:9899 ),选择 **通道** 菜单,点击 **接入通道**,选择 **飞书**,点击 **一键创建飞书应用**,使用 **飞书 App** 扫描二维码即可自动完成应用创建并接入:
<img src="https://cdn.link-ai.tech/doc/20260505181126.png" width="800"/>
<Note>
1. `lark-oapi` 依赖版本需要 >=1.5.5
2. 扫码创建出的应用会自动预置全部所需权限(消息收发、卡片读写、群聊事件等)和事件订阅,无需到开发者后台手动配置。
</Note>
### 方式二:手动创建接入
需要先在飞书开放平台创建自建应用并配置权限,再通过 Web 控制台或配置文件接入。
**步骤一:创建应用**
1. 进入 [飞书开发平台](https://open.feishu.cn/app/),点击 **创建企业自建应用**
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-create-app.jpg" width="500"/>
### 2. 添加机器人能力
在 **添加应用能力** 菜单中,为应用添加 **机器人** 能力:
2. 在 **添加应用能力** 中,为应用添加 **机器人** 能力
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-add-bot.jpg" width="800"/>
### 3. 配置应用权限
点击 **权限管理**,复制以下权限配置,粘贴到 **权限配置** 下方的输入框内,全选筛选出来的权限,点击 **批量开通** 并确认:
3. 在 **权限管理** 中,将以下权限粘贴到输入框,全选并 **批量开通**
```
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource,cardkit:card:write
```
<img src="https://cdn.link-ai.tech/doc/feishu-hosting-add-auth2.png" width="800"/>
## 二、项目配置
1. 在 **凭证与基础信息** 中获取 `App ID` 和 `App Secret`
4. 在 **凭证与基础信息** 中获取 `App ID` 和 `App Secret`
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-appid-secret.jpg" width="800"/>
2. 将以下配置加入项目根目录的 `config.json` 文件:
**步骤二:接入 CowAgent**
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_bot_name": "YOUR_BOT_NAME"
}
```
<Tabs>
<Tab title="Web 控制台">
打开 Web 控制台,选择 **通道** 菜单,点击 **接入通道**,选择 **飞书**切换到「手动填写」Tab输入 App ID 和 App Secret点击接入即可。
</Tab>
<Tab title="配置文件">
在 `config.json` 中添加以下配置后启动程序:
| 参数 | 说明 |
| --- | --- |
| `feishu_app_id` | 飞书机器人应用 App ID |
| `feishu_app_secret` | 飞书机器人 App Secret |
| `feishu_bot_name` | 飞书机器人名称(创建应用时设置),群聊中使用依赖此配置 |
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_stream_reply": true
}
```
配置完成后启动项目。
| 参数 | 说明 | 默认值 |
| --- | --- | --- |
| `feishu_app_id` | 飞书应用 App ID | - |
| `feishu_app_secret` | 飞书应用 App Secret | - |
| `feishu_stream_reply` | 是否开启流式打字机回复 | `true` |
</Tab>
</Tabs>
## 三、配置事件订阅
**步骤三:发布应用**
1. 成功运行项目后,在飞书开放平台点击 **事件与回调**,选择 **长连接** 方式,点击保存:
1. 启动 Cow 项目后,在飞书开放平台点击 **事件与回调**,选择 **长连接** 模式并保存:
<img src="https://cdn.link-ai.tech/doc/202601311731183.png" width="600"/>
2. 点击下方的 **添加事件**,搜索 "接收消息",选择 "**接收消息v2.0**",确认添加
2. 点击 **添加事件**,搜索 "接收消息",选择 **接收消息 v2.0** 并确认
3. 点击 **版本管理与发布**,创建版本并申请 **线上发布**,在飞书客户端查看审批消息并审核通过:
3. 点击 **版本管理与发布**,创建版本并申请 **线上发布**,在飞书客户端审核通过:
<img src="https://cdn.link-ai.tech/doc/202601311807356.png" width="600"/>
完成后在飞书中搜索机器人名称,即可开始对话。
## 二、功能说明
| 功能 | 支持情况 |
| --- | --- |
| 单聊 | ✅ |
| 群聊(@机器人) | ✅ |
| 文本消息 | ✅ 收发 |
| 图片消息 | ✅ 收发 |
| 语音消息 | ✅ 收发 |
| 流式回复 | ✅(通过 `feishu_stream_reply` 配置控制,默认开启) |
<Note>
流式回复需要机器人具备 `cardkit:card:write` 权限(一键创建已默认开通),且接收方飞书客户端版本 ≥ 7.20。低版本客户端会显示升级提示,权限或版本不满足时自动降级为普通文本回复。
</Note>
## 三、使用
完成接入后,在飞书中搜索机器人名称即可开始单聊对话。
如需在群聊中使用,将机器人添加到群中,@机器人发送消息即可。

View File

@@ -12,7 +12,7 @@ Web 控制台是 CowAgent 的默认通道,启动后会自动运行,通过浏
"channel_type": "web",
"web_port": 9899,
"web_password": "",
"enable_thinking": true
"enable_thinking": false
}
```
@@ -22,7 +22,7 @@ Web 控制台是 CowAgent 的默认通道,启动后会自动运行,通过浏
| `web_port` | Web 服务监听端口 | `9899` |
| `web_password` | 访问密码,留空表示不启用密码保护 | `""` |
| `web_session_expire_days` | 登录会话有效天数 | `30` |
| `enable_thinking` | 是否启用深度思考,开启后 Web 端展示推理过程,关闭可加速响应 | `true` |
| `enable_thinking` | 是否启用深度思考模式 | `false` |
配置密码后,访问控制台时需先输入密码完成登录。登录状态默认保持 30 天,期间重启服务也无需重新登录。密码也支持在控制台的「配置」页面中在线修改。

View File

@@ -58,17 +58,18 @@ Session: 12 messages | 8 skills loaded
**修改配置项:**
```text
/config model deepseek-chat
/config model deepseek-v4-flash
```
**支持修改的配置项:**
| 配置项 | 说明 | 示例值 |
| --- | --- | --- |
| `model` | AI 模型名称 | `deepseek-chat` |
| `model` | AI 模型名称 | `deepseek-v4-flash` |
| `agent_max_context_tokens` | 最大上下文 tokens | `40000` |
| `agent_max_context_turns` | 最大上下文记忆轮次 | `30` |
| `agent_max_steps` | 单次任务最大决策步数 | `15` |
| `enable_thinking` | 是否启用深度思考模式 | `true` / `false` |
<Note>
修改 `model` 时,系统会自动匹配对应的模型调用方式。配置会写入 `config.json` 并持久保存。

View File

@@ -72,17 +72,19 @@
"group": "模型配置",
"pages": [
"models/index",
"models/deepseek",
"models/minimax",
"models/glm",
"models/qwen",
"models/kimi",
"models/doubao",
"models/claude",
"models/gemini",
"models/openai",
"models/deepseek",
"models/glm",
"models/qwen",
"models/doubao",
"models/kimi",
"models/qianfan",
"models/linkai",
"models/coding-plan"
"models/coding-plan",
"models/custom"
]
}
]
@@ -132,6 +134,14 @@
"skills/create",
"skills/hub"
]
},
{
"group": "内置技能",
"pages": [
"skills/skill-creator",
"skills/knowledge-wiki",
"skills/image-generation"
]
}
]
},
@@ -199,6 +209,8 @@
"group": "发布记录",
"pages": [
"releases/overview",
"releases/v2.0.8",
"releases/v2.0.7",
"releases/v2.0.6",
"releases/v2.0.5",
"releases/v2.0.4",
@@ -247,17 +259,19 @@
"group": "Model Configuration",
"pages": [
"en/models/index",
"en/models/deepseek",
"en/models/minimax",
"en/models/glm",
"en/models/qwen",
"en/models/kimi",
"en/models/doubao",
"en/models/claude",
"en/models/gemini",
"en/models/openai",
"en/models/deepseek",
"en/models/glm",
"en/models/qwen",
"en/models/doubao",
"en/models/kimi",
"en/models/qianfan",
"en/models/linkai",
"en/models/coding-plan"
"en/models/coding-plan",
"en/models/custom"
]
}
]
@@ -304,9 +318,16 @@
"pages": [
"en/skills/index",
"en/skills/install",
"en/skills/skill-creator",
"en/skills/hub"
]
},
{
"group": "Built-in Skills",
"pages": [
"en/skills/skill-creator",
"en/skills/knowledge-wiki",
"en/skills/image-generation"
]
}
]
},
@@ -374,9 +395,12 @@
"group": "Release Notes",
"pages": [
"en/releases/overview",
"en/releases/v2.0.8",
"en/releases/v2.0.7",
"en/releases/v2.0.6",
"en/releases/v2.0.5",
"en/releases/v2.0.4",
"en/releases/v2.0.3",
"en/releases/v2.0.2",
"en/releases/v2.0.1",
"en/releases/v2.0.0"
@@ -422,17 +446,19 @@
"group": "モデル設定",
"pages": [
"ja/models/index",
"ja/models/deepseek",
"ja/models/minimax",
"ja/models/glm",
"ja/models/qwen",
"ja/models/kimi",
"ja/models/doubao",
"ja/models/claude",
"ja/models/gemini",
"ja/models/openai",
"ja/models/deepseek",
"ja/models/glm",
"ja/models/qwen",
"ja/models/doubao",
"ja/models/kimi",
"ja/models/qianfan",
"ja/models/linkai",
"ja/models/coding-plan"
"ja/models/coding-plan",
"ja/models/custom"
]
}
]
@@ -482,6 +508,14 @@
"ja/skills/create",
"ja/skills/hub"
]
},
{
"group": "内蔵スキル",
"pages": [
"ja/skills/skill-creator",
"ja/skills/knowledge-wiki",
"ja/skills/image-generation"
]
}
]
},
@@ -549,6 +583,8 @@
"group": "リリースノート",
"pages": [
"ja/releases/overview",
"ja/releases/v2.0.8",
"ja/releases/v2.0.7",
"ja/releases/v2.0.6",
"ja/releases/v2.0.5",
"ja/releases/v2.0.4",

View File

@@ -28,7 +28,7 @@
-**Tool System**: Built-in tools for file I/O, terminal execution, browser automation, scheduled tasks, messaging, and more — autonomously invoked by the Agent.
-**CLI System**: Provides terminal commands and in-chat commands for process management, skill installation, configuration, and more.
-**Multimodal Messages**: Supports parsing, processing, generating, and sending text, images, voice, files, and other message types.
-**Multiple Model Support**: Supports OpenAI, Claude, Gemini, DeepSeek, MiniMax, GLM, Qwen, Kimi, Doubao, and other mainstream model providers.
-**Multiple Model Support**: Supports DeepSeek, MiniMax, Claude, Gemini, OpenAI, GLM, Qwen, Doubao, Kimi, and other mainstream model providers.
-**Multi-platform Deployment**: Runs on local computers or servers, integrable into WeChat, Web, Feishu, DingTalk, WeChat Official Account, and WeCom applications.
## Disclaimer
@@ -164,15 +164,15 @@ Supports mainstream model providers. Recommended models for Agent mode:
| Provider | Recommended Model |
| --- | --- |
| DeepSeek | `deepseek-v4-flash` |
| MiniMax | `MiniMax-M2.7` |
| GLM | `glm-5-turbo` |
| Kimi | `kimi-k2.5` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Qwen | `qwen3.6-plus` |
| Claude | `claude-sonnet-4-6` |
| Gemini | `gemini-3.1-pro-preview` |
| OpenAI | `gpt-5.4` |
| DeepSeek | `deepseek-chat` |
| GLM | `glm-5.1` |
| Qwen | `qwen3.6-plus` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Kimi | `kimi-k2.6` |
For detailed configuration of each model, see the [Models documentation](https://docs.cowagent.ai/en/models/index).

View File

@@ -1,69 +1,107 @@
---
title: Feishu (Lark)
description: Integrate CowAgent into Feishu application
description: Integrate CowAgent into Feishu via a custom enterprise app
---
Integrate CowAgent into Feishu by creating a custom enterprise app. You need to be a Feishu enterprise user with admin privileges.
> Integrate CowAgent into Feishu via a custom enterprise app. Supports p2p chat and group chat (@bot), uses WebSocket long connection (no public IP needed), supports streaming typewriter replies and voice messages.
## 1. Create Enterprise Custom App
<Note>
You need to be a Feishu enterprise user with admin privileges.
</Note>
### 1.1 Create App
## 1. Setup
Go to [Feishu Developer Platform](https://open.feishu.cn/app/), click **Create Enterprise Custom App**, fill in the required information and click **Create**:
### Option 1: One-click Scan to Create (Recommended)
No need to manually create an app on the Feishu Developer Platform. Start the Cow project, open the web console (default `http://127.0.0.1:9899/`), go to **Channels**, click **Add Channel**, choose **Feishu**, then under the **Scan QR** tab click **One-click Create Feishu App** and scan with the **Feishu App** to complete app creation and connection automatically.
<Note>
The created app comes with all required permissions (messaging, card read/write, group events, etc.) and event subscriptions pre-configured. Currently only the Feishu mainland version is supported (Lark international not yet supported).
</Note>
When starting from CLI without `feishu_app_id` configured, the QR code is also printed to the terminal.
### Option 2: Manual Setup
Manually create a custom app on the Feishu Developer Platform, then connect via Web Console or config file.
**Step 1: Create the App**
1. Go to [Feishu Developer Platform](https://open.feishu.cn/app/), click **Create Enterprise Custom App**:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-create-app.jpg" width="500"/>
### 1.2 Add Bot Capability
In **Add App Capabilities**, add **Bot** capability to the app:
2. In **Add App Capabilities**, add the **Bot** capability:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-add-bot.jpg" width="800"/>
### 1.3 Configure App Permissions
Click **Permission Management**, paste the following permission string into the input box below **Permission Configuration**, select all filtered permissions, click **Batch Enable** and confirm:
3. In **Permission Management**, paste the following permissions and **Batch Enable** all:
```
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource,cardkit:card:write
```
<img src="https://cdn.link-ai.tech/doc/feishu-hosting-add-auth2.png" width="800"/>
## 2. Project Configuration
1. Get `App ID` and `App Secret` from **Credentials & Basic Info**:
4. Get `App ID` and `App Secret` from **Credentials & Basic Info**:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-appid-secret.jpg" width="800"/>
2. Add the following configuration to `config.json` in the project root:
**Step 2: Connect to CowAgent**
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_bot_name": "YOUR_BOT_NAME"
}
```
<Tabs>
<Tab title="Web Console">
Open the web console, go to **Channels**, click **Add Channel**, choose **Feishu**, switch to the **Manual** tab, enter App ID and App Secret, then click connect.
</Tab>
<Tab title="Config File">
Add the following to `config.json` and start the program:
| Parameter | Description |
| --- | --- |
| `feishu_app_id` | Feishu bot App ID |
| `feishu_app_secret` | Feishu bot App Secret |
| `feishu_bot_name` | Bot name (set when creating the app), required for group chat usage |
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_stream_reply": true
}
```
Start the project after configuration is complete.
| Parameter | Description | Default |
| --- | --- | --- |
| `feishu_app_id` | Feishu app App ID | - |
| `feishu_app_secret` | Feishu app App Secret | - |
| `feishu_stream_reply` | Enable streaming typewriter reply | `true` |
</Tab>
</Tabs>
## 3. Configure Event Subscription
**Step 3: Publish the App**
1. After the project is running successfully, go to the Feishu Developer Platform, click **Events & Callbacks**, select **Long Connection** mode, and click save:
1. After Cow is running, go to **Events & Callbacks** in the Feishu Developer Platform, choose **Long Connection** mode and save:
<img src="https://cdn.link-ai.tech/doc/202601311731183.png" width="600"/>
2. Click **Add Event** below, search for "Receive Message", select "**Receive Message v2.0**", and confirm.
2. Click **Add Event**, search for "Receive Message" and choose **Receive Message v2.0**.
3. Click **Version Management & Release**, create a new version and apply for **Production Release**. Check the approval message in the Feishu client and approve:
3. Click **Version Management & Release**, create a version and apply for **Production Release**. Approve the request in the Feishu client:
<img src="https://cdn.link-ai.tech/doc/202601311807356.png" width="600"/>
Once completed, search for the bot name in Feishu to start chatting.
## 2. Features
| Feature | Status |
| --- | --- |
| P2P chat | ✅ |
| Group chat (@bot) | ✅ |
| Text messages | ✅ send/receive |
| Image messages | ✅ send/receive |
| Voice messages | ✅ send/receive |
| Streaming reply | ✅ (powered by Feishu cardkit streaming card) |
<Note>
Streaming reply requires the `cardkit:card:write` permission (already enabled by one-click creation) and Feishu client version ≥ 7.20. Older clients see an upgrade prompt; if the permission or version is not satisfied, replies fall back to plain text automatically.
</Note>
## 3. Usage
After connection, search for the bot name in Feishu to start a chat.
To use in groups, add the bot to a group and @-mention it.

View File

@@ -44,17 +44,18 @@ View or modify runtime configuration. Changes take effect immediately without re
**Modify a config item:**
```text
/config model deepseek-chat
/config model deepseek-v4-flash
```
**Configurable items:**
| Item | Description | Example |
| --- | --- | --- |
| `model` | AI model name | `deepseek-chat` |
| `model` | AI model name | `deepseek-v4-flash` |
| `agent_max_context_tokens` | Max context tokens | `40000` |
| `agent_max_context_turns` | Max context memory turns | `30` |
| `agent_max_steps` | Max decision steps per task | `15` |
| `enable_thinking` | Enable deep thinking mode | `true` / `false` |
<Note>
When changing `model`, the system automatically matches the corresponding model API. Configuration is persisted to `config.json`.

View File

@@ -121,7 +121,8 @@ sudo docker logs -f chatgpt-on-wechat
```json
{
"channel_type": "web",
"model": "MiniMax-M2.5",
"model": "deepseek-v4-flash",
"deepseek_api_key": "",
"agent": true,
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
@@ -133,7 +134,7 @@ sudo docker logs -f chatgpt-on-wechat
| Parameter | Description | Default |
| --- | --- | --- |
| `channel_type` | Channel type | `web` |
| `model` | Model name | `MiniMax-M2.5` |
| `model` | Model name | `deepseek-v4-flash` |
| `agent` | Enable Agent mode | `true` |
| `agent_workspace` | Agent workspace path | `~/cow` |
| `agent_max_context_tokens` | Max context tokens | `40000` |

View File

@@ -9,7 +9,7 @@ CowAgent 2.0 has evolved from a simple chatbot into a super intelligent assistan
CowAgent's architecture consists of the following core modules:
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
<img src="https://cdn.link-ai.tech/doc/cow-agent-arch-en.jpg.jpg" alt="CowAgent Architecture" />
| Module | Description |
| --- | --- |

View File

@@ -5,6 +5,8 @@ description: CowAgent long-term memory system — file persistence, automatic wr
Long-term memory is stored in workspace files, persisting across sessions. The Agent loads historical memory on demand via retrieval tools during conversation, and automatically writes conversation summaries to long-term memory when context is trimmed.
<img src="https://cdn.link-ai.tech/doc/memory-architecture-en.jpg" alt="Memory Architecture" />
## Memory Types
### Core Memory (MEMORY.md)
@@ -39,20 +41,25 @@ The memory system supports hybrid retrieval modes:
The Agent automatically triggers memory retrieval during conversation as needed, incorporating relevant historical information into context. Results are ranked by a combined score (default: 0.7 vector weight + 0.3 keyword weight). Daily memory scores decay over time (30-day half-life), while core memory does not decay.
## First Launch
## Related Files
On first launch, the Agent will proactively ask the user for key information and save it to the workspace (default `~/cow`):
Files related to memory in the workspace (default `~/cow`):
| File | Description |
| --- | --- |
| `system.md` | Agent system prompt and behavior settings |
| `user.md` | User identity information and preferences |
| `AGENT.md` | Agent personality and behavior settings |
| `USER.md` | User identity information and preferences |
| `RULE.md` | Custom rules and constraints |
| `MEMORY.md` | Core memory (long-term) |
| `memory/YYYY-MM-DD.md` | Daily memory (created on demand) |
| `memory/dreams/YYYY-MM-DD.md` | Dream diary (auto-generated by Deep Dream) |
## Web Console
The memory management page in the Web console allows browsing memory files and dream diaries, with tab switching support:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
<img src="https://cdn.link-ai.tech/doc/20260414171014.png" width="800" />
</Frame>
## Configuration

View File

@@ -12,6 +12,6 @@ description: Claude model configuration
| Parameter | Description |
| --- | --- |
| `model` | Options include `claude-sonnet-4-6`, `claude-opus-4-6`, `claude-sonnet-4-5`, `claude-sonnet-4-0`, `claude-3-5-sonnet-latest`, etc. See [official models](https://docs.anthropic.com/en/docs/about-claude/models/overview) |
| `model` | Options include `claude-sonnet-4-6`, `claude-opus-4-7`, `claude-opus-4-6`, `claude-sonnet-4-5`, `claude-sonnet-4-0`, `claude-3-5-sonnet-latest`, etc. See [official models](https://docs.anthropic.com/en/docs/about-claude/models/overview) |
| `claude_api_key` | Create at [Claude Console](https://console.anthropic.com/settings/keys) |
| `claude_api_base` | Optional. Defaults to `https://api.anthropic.com/v1`. Change to use third-party proxy |

View File

@@ -102,18 +102,18 @@ Reference: [China Quick Start](https://docs.bigmodel.cn/cn/coding-plan/quick-sta
```json
{
"bot_type": "openai",
"bot_type": "moonshot",
"model": "kimi-for-coding",
"open_ai_api_base": "https://api.kimi.com/coding/v1",
"open_ai_api_key": "YOUR_API_KEY"
"moonshot_base_url": "https://api.kimi.com/coding/v1",
"moonshot_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | `kimi-for-coding` |
| `open_ai_api_base` | `https://api.kimi.com/coding/v1` |
| `open_ai_api_key` | Coding Plan specific key (not shared with pay-as-you-go) |
| `model` | Use `kimi-for-coding` for auto-updating model, or specify a model such as `kimi-k2.6` |
| `moonshot_base_url` | `https://api.kimi.com/coding/v1` |
| `moonshot_api_key` | Coding Plan specific key (not shared with pay-as-you-go) |
Reference: [Key & Docs](https://www.kimi.com/code/docs/)

62
docs/en/models/custom.mdx Normal file
View File

@@ -0,0 +1,62 @@
---
title: Custom
description: Custom provider for third-party APIs and local models
---
For models accessed via OpenAI-compatible APIs, such as:
- **Third-party API proxies**: Use a unified API Base to call multiple models
- **Local models**: Models deployed locally via Ollama, vLLM, LocalAI, etc.
- **Private deployments**: Self-hosted model services within your organization
<Note>
Unlike the `openai` provider, switching models under the Custom provider will not auto-switch the provider type. Your custom API address is always preserved.
</Note>
## Configuration
### Third-party API Proxy
```json
{
"bot_type": "custom",
"model": "deepseek-v4-flash",
"custom_api_key": "YOUR_API_KEY",
"custom_api_base": "https://{your-proxy.com}/v1"
}
```
| Parameter | Description |
| --- | --- |
| `bot_type` | Must be set to `custom` |
| `model` | Model name, any model supported by your proxy service |
| `custom_api_key` | API key provided by your proxy service |
| `custom_api_base` | API base URL, must be OpenAI-compatible |
### Local Models
Local models typically don't require an API key — just set the API base:
```json
{
"bot_type": "custom",
"model": "qwen3.5:27b",
"custom_api_base": "http://localhost:11434/v1"
}
```
Common local deployment tools and their default addresses:
| Tool | Default API Base |
| --- | --- |
| [Ollama](https://ollama.com) | `http://localhost:11434/v1` |
| [vLLM](https://docs.vllm.ai) | `http://localhost:8000/v1` |
| [LocalAI](https://localai.io) | `http://localhost:8080/v1` |
## Switching Models
Under the Custom provider, switching models only changes `model` without affecting `bot_type` or the API address:
```
/config model qwen3.5:27b
```

View File

@@ -7,26 +7,57 @@ Option 1: Native integration (recommended):
```json
{
"model": "deepseek-chat",
"model": "deepseek-v4-flash",
"deepseek_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | `deepseek-chat` (DeepSeek-V3.2, non-thinking mode), `deepseek-reasoner` (DeepSeek-R1, thinking mode) |
| `model` | Supports `deepseek-v4-flash` (default) and `deepseek-v4-pro` |
| `deepseek_api_key` | Create at [DeepSeek Platform](https://platform.deepseek.com/api_keys) |
| `deepseek_api_base` | Optional, defaults to `https://api.deepseek.com/v1`. Can be changed to a third-party proxy |
## Model Selection
| Model | Use Case |
| --- | --- |
| `deepseek-v4-flash` | Default: fast and cost-effective |
| `deepseek-v4-pro` | Stronger on complex tasks |
## Thinking Mode
The V4 series (`deepseek-v4-flash` / `deepseek-v4-pro`) supports an explicit "thinking mode": the model emits a chain-of-thought (`reasoning_content`) before the final answer to improve answer quality.
### Toggle
Controlled by the global `enable_thinking` setting:
```json
{
"enable_thinking": true
}
```
- `true`: thinking is on across all channels. The Web console renders the reasoning trace; IM channels (WeChat / WeCom / DingTalk / Feishu) don't render it but still benefit from higher answer quality.
- `false`: thinking off, faster responses with lower first-token latency.
### Notes
- **Sampling parameters**: under thinking mode, `temperature`, `top_p`, `presence_penalty`, and `frequency_penalty` are silently ignored by the server (no error). CowAgent skips sending them automatically.
- **Multi-turn tool calls**: once the history contains any tool-call turn, DeepSeek requires `reasoning_content` on every assistant message. CowAgent handles the round-trip automatically, including across mid-session toggles of the thinking switch.
<Tip>
Start with `deepseek-v4-flash`; switch to `deepseek-v4-pro` for harder tasks; enable `enable_thinking` when you want deeper reasoning.
</Tip>
Option 2: OpenAI-compatible configuration:
```json
{
"model": "deepseek-chat",
"model": "deepseek-v4-flash",
"bot_type": "openai",
"open_ai_api_key": "YOUR_API_KEY",
"open_ai_api_base": "https://api.deepseek.com/v1"
}
```

View File

@@ -5,14 +5,14 @@ description: Zhipu AI GLM model configuration
```json
{
"model": "glm-5-turbo",
"model": "glm-5.1",
"zhipu_ai_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | Options include `glm-5-turbo`, `glm-5`, `glm-4.7`, `glm-4-plus`, `glm-4-flash`, `glm-4-air`, etc. See [model codes](https://bigmodel.cn/dev/api/normal-model/glm-4) |
| `model` | Options include `glm-5.1`, `glm-5-turbo`, `glm-5`, `glm-4.7`, `glm-4-plus`, `glm-4-flash`, `glm-4-air`, etc. See [model codes](https://bigmodel.cn/dev/api/normal-model/glm-4) |
| `zhipu_ai_api_key` | Create at [Zhipu AI Console](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) |
OpenAI-compatible configuration is also supported:
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
```json
{
"bot_type": "openai",
"model": "glm-5-turbo",
"model": "glm-5.1",
"open_ai_api_base": "https://open.bigmodel.cn/api/paas/v4",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -6,7 +6,7 @@ description: Supported models and recommended choices for CowAgent
CowAgent supports mainstream LLMs from domestic and international providers. Model interfaces are implemented in the project's `models/` directory.
<Note>
For Agent mode, the following models are recommended based on quality and cost: MiniMax-M2.7, glm-5-turbo, kimi-k2.5, qwen3.6-plus, claude-sonnet-4-6, gemini-3.1-pro-preview
For Agent mode, the following models are recommended based on quality and cost: deepseek-v4-flash, MiniMax-M2.7, claude-sonnet-4-6, gemini-3.1-pro-preview, glm-5.1, qwen3.6-plus, kimi-k2.6, ernie-5.0
</Note>
## Configuration
@@ -18,21 +18,15 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
## Supported Models
<CardGroup cols={2}>
<Card title="DeepSeek" href="/en/models/deepseek">
deepseek-v4-flash, deepseek-v4-pro, and more
</Card>
<Card title="Baidu Qianfan / ERNIE" href="/en/models/qianfan">
ernie-5.0, ernie-4.5-turbo-128k, and more
</Card>
<Card title="MiniMax" href="/en/models/minimax">
MiniMax-M2.7 and other series models
</Card>
<Card title="GLM (Zhipu AI)" href="/en/models/glm">
glm-5-turbo, glm-5 and other series models
</Card>
<Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
qwen3.6-plus, qwen3-max and more
</Card>
<Card title="Kimi" href="/en/models/kimi">
kimi-k2.5, kimi-k2 and more
</Card>
<Card title="Doubao (ByteDance)" href="/en/models/doubao">
doubao-seed series models
</Card>
<Card title="Claude" href="/en/models/claude">
claude-sonnet-4-6 and more
</Card>
@@ -42,8 +36,17 @@ You can also use the [LinkAI](https://link-ai.tech) platform interface to flexib
<Card title="OpenAI" href="/en/models/openai">
gpt-5.4, gpt-4.1, o-series and more
</Card>
<Card title="DeepSeek" href="/en/models/deepseek">
deepseek-chat, deepseek-reasoner
<Card title="GLM (Zhipu AI)" href="/en/models/glm">
glm-5.1, glm-5-turbo, glm-5 and other series models
</Card>
<Card title="Qwen (Tongyi Qianwen)" href="/en/models/qwen">
qwen3.6-plus, qwen3-max and more
</Card>
<Card title="Doubao (ByteDance)" href="/en/models/doubao">
doubao-seed series models
</Card>
<Card title="Kimi" href="/en/models/kimi">
kimi-k2.6, kimi-k2.5, kimi-k2 and more
</Card>
<Card title="LinkAI" href="/en/models/linkai">
Unified multi-model interface + knowledge base

View File

@@ -5,14 +5,14 @@ description: Kimi (Moonshot) model configuration
```json
{
"model": "kimi-k2.5",
"model": "kimi-k2.6",
"moonshot_api_key": "YOUR_API_KEY"
}
```
| Parameter | Description |
| --- | --- |
| `model` | Options include `kimi-k2.5`, `kimi-k2`, `moonshot-v1-8k`, `moonshot-v1-32k`, `moonshot-v1-128k` |
| `model` | Options include `kimi-k2.6`, `kimi-k2.5`, `kimi-k2`, `moonshot-v1-8k`, `moonshot-v1-32k`, `moonshot-v1-128k` |
| `moonshot_api_key` | Create at [Moonshot Console](https://platform.moonshot.cn/console/api-keys) |
OpenAI-compatible configuration is also supported:
@@ -20,7 +20,7 @@ OpenAI-compatible configuration is also supported:
```json
{
"bot_type": "openai",
"model": "kimi-k2.5",
"model": "kimi-k2.6",
"open_ai_api_base": "https://api.moonshot.cn/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -3,7 +3,7 @@ title: LinkAI
description: Unified access to multiple models via LinkAI platform
---
The [LinkAI](https://link-ai.tech) platform lets you flexibly switch between OpenAI, Claude, Gemini, DeepSeek, Qwen, Kimi, and other models, with support for knowledge base, workflows, plugins, and other Agent capabilities.
The [LinkAI](https://link-ai.tech) platform lets you flexibly switch between OpenAI, Claude, Gemini, DeepSeek, MiniMax, Qwen, Kimi, and other models, with support for knowledge base, workflows, plugins, and other Agent capabilities.
```json
{

View File

@@ -0,0 +1,63 @@
---
title: Baidu Qianfan / ERNIE
description: Baidu Qianfan ERNIE model configuration
---
Option 1: Native integration (recommended):
```json
{
"model": "ernie-5.0",
"qianfan_api_key": "",
"qianfan_api_base": "https://qianfan.baidubce.com/v2"
}
```
| Parameter | Description |
| --- | --- |
| `model` | Default recommendation: `ernie-5.0`; also supports `ernie-x1.1`, `ernie-4.5-turbo-128k`, `ernie-4.5-turbo-32k` |
| `qianfan_api_key` | Qianfan API key, usually starting with `bce-v3/` |
| `qianfan_api_base` | Optional, defaults to `https://qianfan.baidubce.com/v2` |
## Model Selection
| Model | Use Case |
| --- | --- |
| `ernie-5.0` | Default recommendation; latest ERNIE flagship with the strongest overall capability |
| `ernie-x1.1` | Deep-thinking reasoning model with lower hallucination and stronger instruction following / tool calling |
| `ernie-4.5-turbo-128k` | Long-context and general chat |
| `ernie-4.5-turbo-32k` | General chat with a balanced context window and cost |
## Vision tool
Once `qianfan_api_key` is configured, Agent mode can auto-discover Qianfan for the Vision tool:
- When the main model itself is multimodal (e.g. `ernie-5.0`, `ernie-x1.1`, `ernie-4.5-turbo-vl`), images are handled directly by the main model with no extra setup.
- When the main model is text-only (e.g. `ernie-4.5-turbo-128k`), the Vision tool automatically falls back to `ernie-4.5-turbo-vl`.
To force a specific Vision model, set it explicitly in `config.json`:
```json
{
"tool": {
"vision": {
"model": "ernie-4.5-turbo-vl"
}
}
}
```
Option 2: OpenAI-compatible configuration:
```json
{
"model": "ernie-5.0",
"bot_type": "openai",
"open_ai_api_key": "",
"open_ai_api_base": "https://qianfan.baidubce.com/v2"
}
```
<Tip>
Prefer `qianfan_api_key` for new configurations. Existing `wenxin`, `wenxin-4`, `baidu_wenxin_api_key`, and `baidu_wenxin_secret_key` configurations remain supported.
</Tip>

View File

@@ -5,6 +5,7 @@ description: CowAgent version history
| Version | Date | Description |
| --- | --- | --- |
| [2.0.7](/en/releases/v2.0.7) | 2026.04.22 | Image Generation Skill (6-provider auto-routing), new models (Kimi K2.6, Claude Opus 4.7, GLM 5.1), knowledge base and Web Console improvements |
| [2.0.6](/en/releases/v2.0.6) | 2026.04.14 | Knowledge Base, Deep Dream Memory Distillation, Smart Context Compression, Web Console upgrades |
| [2.0.5](/en/releases/v2.0.5) | 2026.04.01 | Cow CLI, Skill Hub open source, Browser tool, WeCom Bot QR scan, and more |
| [2.0.4](/en/releases/v2.0.4) | 2026.03.22 | Personal WeChat channel, new model support, Japanese docs, script refactoring and bug fixes |

View File

@@ -0,0 +1,91 @@
---
title: v2.0.3
description: CowAgent 2.0.3 - WeCom Smart Bot and QQ channels, Web Console file handling, memory system upgrade
---
## 🔌 New Channels
### WeCom Smart Bot
Added the WeCom Smart Bot (`wecom_bot`) channel with streaming card output, support for receiving and replying to text and image messages, and full configuration through the Web Console.
Documentation: [WeCom Smart Bot](https://docs.cowagent.ai/en/channels/wecom-bot).
Related commits: [d4480b6](https://github.com/zhayujie/CowAgent/commit/d4480b6), [a42f31f](https://github.com/zhayujie/CowAgent/commit/a42f31f), [4ecd4df](https://github.com/zhayujie/CowAgent/commit/4ecd4df), [8b45d6c](https://github.com/zhayujie/CowAgent/commit/8b45d6c)
### QQ Channel
Added the QQ official bot (`qq`) channel with support for text and image messages in both private chats and group chats.
Documentation: [QQ Bot](https://docs.cowagent.ai/en/channels/qq).
Related commits: [005a0e1](https://github.com/zhayujie/CowAgent/commit/005a0e1), [a4d54f5](https://github.com/zhayujie/CowAgent/commit/a4d54f5)
## 🖥️ Web Console File Input and Processing
The Web Console chat UI now supports file and image uploads — files can be sent directly to the agent for processing. The Read tool gains parsing support for Office documents (Word, Excel, PPT).
Related commits: [30c6d9b](https://github.com/zhayujie/CowAgent/commit/30c6d9b)
## 🤖 New Models
- **GPT-5.4 Series**: Added `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano` ([1623deb](https://github.com/zhayujie/CowAgent/commit/1623deb))
- **Gemini 3.1 Flash Lite Preview**: Added `gemini-3.1-flash-lite-preview` ([ba915f2](https://github.com/zhayujie/CowAgent/commit/ba915f2))
## 💰 Coding Plan Support
Added integration with vendor Coding Plan (monthly programming subscription) tiers via the unified OpenAI-compatible path. Supported vendors include Aliyun, MiniMax, Zhipu GLM, Kimi, and Volcengine.
See [Coding Plan docs](https://docs.cowagent.ai/en/models/coding-plan) for detailed configuration.
## 🧠 Memory System Upgrade
Memory flush improvements:
- Use the LLM to summarize out-of-window conversations into compact daily memory entries
- Summarization runs asynchronously on a background thread, never blocking replies
- Smarter batch trimming policy reduces flush frequency
- Daily scheduled flush as a safety net for low-activity scenarios
- Fixed context-memory loss issues
Related commits: [022c13f](https://github.com/zhayujie/CowAgent/commit/022c13f), [c116235](https://github.com/zhayujie/CowAgent/commit/c116235)
## 🔧 Tool Refactoring
- **Image Vision**: Image recognition (Vision) is refactored from a Skill into a built-in Tool with a dedicated Vision Provider configuration, improving stability and maintainability ([a50fafa](https://github.com/zhayujie/CowAgent/commit/a50fafa), [3b8b562](https://github.com/zhayujie/CowAgent/commit/3b8b562))
- **Web Fetch**: Web fetch is refactored from a Skill into a built-in Tool with support for downloading and parsing remote documents (PDF, Word, Excel, PPT) ([ccb9030](https://github.com/zhayujie/CowAgent/commit/ccb9030), [fa61744](https://github.com/zhayujie/CowAgent/commit/fa61744))
## 🐳 Docker Deployment Improvements
- **Config Template Alignment**: `docker-compose.yml` env vars aligned with `config-template.json`, covering full model API key and Agent settings
- **Web Console Port Mapping**: Added `9899` port mapping so the Web Console is reachable in browser after Docker deployment
- **Hot Config Reload**: Bot API key and API base are now read at request time — changes from the Web Console take effect without restart
- **Workspace Persistence**: Added a `./cow` volume mount so agent workspace data (memories, persona, skills, etc.) persists across container rebuilds and upgrades
## ⚡ Performance Improvements
- **Faster Startup**: The Feishu channel imports its dependencies lazily, avoiding a 410s startup delay ([924dc79](https://github.com/zhayujie/CowAgent/commit/924dc79))
- **Channel Stability**: Improved channel connection stability and added env-var support for channel configuration ([f1c04bc](https://github.com/zhayujie/CowAgent/commit/f1c04bc), [46d97fd](https://github.com/zhayujie/CowAgent/commit/46d97fd))
## 🐛 Bug Fixes
- **bot_type Propagation**: Fixed `bot_type` propagation under Agent mode ([#2691](https://github.com/zhayujie/CowAgent/pull/2691)) Thanks [@Weikjssss](https://github.com/Weikjssss)
- **bot_type Resolution Priority**: Adjusted `bot_type` resolution priority under Agent mode ([#2692](https://github.com/zhayujie/CowAgent/pull/2692)) Thanks [@6vision](https://github.com/6vision)
- **Zhipu Config**: Fixed Zhipu `bot_type` naming, Web Console persistence, and regex escaping ([#2693](https://github.com/zhayujie/CowAgent/pull/2693)) Thanks [@6vision](https://github.com/6vision)
- **OpenAI-Compat Layer**: Unified error handling via the `openai_compat` layer ([#2688](https://github.com/zhayujie/CowAgent/pull/2688)) Thanks [@JasonOA888](https://github.com/JasonOA888)
- **OpenAI-Compat Migration**: Completed the `openai_compat` migration across all model bots ([#2689](https://github.com/zhayujie/CowAgent/pull/2689))
- **Gemini Tool Calling**: Fixed tool-call matching for Gemini ([eda82ba](https://github.com/zhayujie/CowAgent/commit/eda82ba))
- **Session Concurrency**: Fixed race conditions in concurrent session scenarios ([9879878](https://github.com/zhayujie/CowAgent/commit/9879878))
- **History Recovery**: Fixed incomplete history recovery — only user/assistant text messages are restored, tool calls are stripped ([b788a3d](https://github.com/zhayujie/CowAgent/commit/b788a3d), [a33ce97](https://github.com/zhayujie/CowAgent/commit/a33ce97))
- **Feishu Group Chat**: Removed the `bot_name` dependency for Feishu group chats ([b641bff](https://github.com/zhayujie/CowAgent/commit/b641bff))
- **Safari Compatibility**: Fixed an IME Enter key issue that mistakenly sent messages on Safari ([0687916](https://github.com/zhayujie/CowAgent/commit/0687916))
- **Windows Compatibility**: Fixed bash-style `$VAR` to `%VAR%` env-var conversion on Windows ([7c67513](https://github.com/zhayujie/CowAgent/commit/7c67513))
- **MiniMax Params**: Added a `max_tokens` cap for MiniMax models ([1767413](https://github.com/zhayujie/CowAgent/commit/1767413))
- **.gitignore**: Added Python directory ignore rules ([#2683](https://github.com/zhayujie/CowAgent/pull/2683)) Thanks [@pelioo](https://github.com/pelioo)
- **AGENT.md Proactive Evolution**: Improved the system prompt guidance around AGENT.md — instead of waiting for explicit user edits, the agent now proactively detects persona/style shifts in the conversation and updates AGENT.md accordingly
## 📦 Upgrade
Run `./run.sh update` for a one-click upgrade, or manually pull the latest code and restart. See [Upgrade Guide](https://docs.cowagent.ai/en/guide/upgrade) for details.
**Release Date**: 2026.03.18 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.2...2.0.3)

View File

@@ -0,0 +1,65 @@
---
title: v2.0.7
description: CowAgent 2.0.7 - Image Generation Skill (6-provider auto-routing), new models, knowledge base enhancements, Web Console improvements and bug fixes
---
## 🎨 Image Generation Skill
New built-in `image-generation` skill supporting text-to-image, image-to-image, and multi-image fusion across six major providers:
- **6-provider auto-routing**: OpenAI (GPT-Image-2) → Gemini (Nano Banana) → Seedream (Volcengine Ark) → Qwen (DashScope) → MiniMax → LinkAI — automatically selects from configured providers in fixed priority order, with automatic fallback on failure
- **Zero model selection**: Just configure an API key and it works — no need to manually specify a model. You can also name a specific model in conversation (e.g. "draw a cat with seedream")
- **Flexible control**: Supports `quality`, `size` (512/1K4K), and `aspect_ratio` parameters, with each provider automatically mapping to its supported values
- **Image editing**: Pass existing images for editing, style transfer, or multi-image fusion (Seedream supports up to 14 reference images)
- **Skill-level config**: Pin a default model via `skill.image-generation.model` in `config.json`
- **Image lightbox**: All images in the Web console now support click-to-enlarge preview
Docs: [Image Generation Skill](https://docs.cowagent.ai/en/skills/image-generation)
## 🤖 New Model Support
- **Kimi K2.6**: Added `kimi-k2.6` model support
- **Claude Opus 4.7**: Added `claude-opus-4-7` model support
- **GLM 5.1**: Added `glm-5.1` model support
- **Kimi Coding Plan**: Support for Kimi Coding Plan mode
- **Custom model providers**: New custom model provider configuration for easier integration with additional vendors
## 💬 Web Console Improvements
- **Smart auto-scroll**: Improved chat scroll behaviour — no longer forces scroll to bottom while the user is reading earlier messages
- **Reasoning content cap**: Deep thinking content capped at 4 KB to prevent frontend lag
- **Mobile optimisation**: Session sidebar hidden by default on mobile, with overlay dismiss support
- **Session title fix**: Fixed title auto-generation fallback logic and Bridge reset on config change
- **Image preview dedup**: Fixed duplicate image rendering within the same message
## 📚 Knowledge Base Enhancements
- **Nested directory support**: Knowledge base listing and display now support multi-level nested directories
- **Root-level file display**: Show `index.md`, `log.md` and other root-level files in the knowledge tree
- **Empty state stats fix**: Root-level files no longer interfere with empty-state detection
## 🌙 Dream Memory Improvements
- **Structured organisation**: Dream memory files are now auto-archived by date with a cleaner directory structure
- **Schedule jitter**: Daily dream trigger includes random jitter to avoid concurrency conflicts in cluster deployments
## 🛠 Skill System Improvements
- **Skill manager refresh**: `/skill` commands now automatically refresh the skill manager to keep state in sync
- **Installation sources**: Skill installation supports multiple source formats (URL, zip, local file, etc.) with automatic target directory handling
## 🐛 Other Fixes
- **Gemini fix**: Fixed Gemini tool calls not returning results
- **Agent retry**: Empty-response retries no longer drop `tool_calls`
- **Docker env sync**: Fixed environment variables not syncing after config update in Docker environments
- **Python 3.7 compat**: Deferred `Literal` import for Python 3.7 compatibility
- **Model switch notification**: Fixed bot_type change notification not showing after model switch. Thanks @6vision
- **Config command**: `/config` now supports setting `enable_thinking`
- **Thinking display**: Deep thinking display disabled by default
## 📦 Upgrade
Run `cow update` or `./run.sh update` to upgrade, or pull the latest code and restart. See [Upgrade Guide](https://docs.cowagent.ai/en/guide/upgrade).
**Release Date**: 2026.04.22 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.6...master)

View File

@@ -0,0 +1,68 @@
---
title: v2.0.8
description: CowAgent 2.0.8 - Major Feishu channel upgrade (voice, streaming typewriter, one-click QR app creation), DeepSeek V4 / Baidu Qianfan ERNIE 5.0 support, scheduler memory enhancements and multiple fixes
---
## 🪶 Major Feishu Channel Upgrade
### 1. One-click QR-scan App Creation
No more manual app setup, permission scopes and event subscriptions in the Feishu Open Platform. When `feishu_app_id` is not configured, both the Web Console and CLI startup flow now show a QR-scan entry — scan with Feishu, authorize, and the bot is created and config is filled back automatically. Out-of-the-box.
Documentation: [Feishu Channel](https://docs.cowagent.ai/en/channels/feishu)
### 2. Voice Messages
Receive Feishu voice messages with automatic speech-to-text, and reply in voice via TTS. Recognition accuracy for short Chinese voice messages has been improved.
### 3. Streaming Typewriter Replies
Integrated with Feishu CardKit streaming cards, **enabled by default**, matching the Web Console experience:
- Multi-turn agent flows render intermediate updates and the final reply on separate cards
- Tuned for high-throughput models like DeepSeek to keep pace with the Web Console
- Falls back to plain text replies automatically when not supported, no manual config needed
- Requires Feishu client ≥ 7.20
The voice and streaming building blocks come from a community contribution #2791. Thanks [@ooaaooaa123](https://github.com/ooaaooaa123)
## 🤖 New Model Support
- **DeepSeek V4 series**: Added `deepseek-v4-pro` / `deepseek-v4-flash`, with `deepseek-v4-flash` set as the new default
- **Unified thinking-mode toggle**: DeepSeek V4, Qwen3 and other thinking-capable models now share the same `enable_thinking` switch
- **Baidu Qianfan / ERNIE first-class integration**: New `qianfan` provider supporting `ernie-5.0` (default recommendation), `ernie-x1.1`, `ernie-4.5-turbo-128k`, `ernie-4.5-turbo-32k`. Dedicated `qianfan_api_key` / `qianfan_api_base` settings keep OpenAI config clean; legacy `wenxin` / `wenxin-4` paths are fully preserved. #2790 Thanks [@jimmyzhuu](https://github.com/jimmyzhuu)
Documentation: [Baidu Qianfan / ERNIE](https://docs.cowagent.ai/en/models/qianfan)
## 🌐 Translation Provider
- **Youdao translator**: Added a Youdao provider to the `translate/` module using the v3 SHA-256 signing scheme, with automatic ISO 639-1 language-code mapping (`zh`, `zh-TW`, etc.) #2797 Thanks [@Zmjjeff7](https://github.com/Zmjjeff7)
## 🛠 OpenAI Client Refactor
- **Drop SDK dependency**: The OpenAI bot is reimplemented on a native HTTP client — leaner startup, fewer dependency conflicts
- **Web Console hint**: API base inputs in the model config UI now include version-path placeholder hints
## ⏰ Scheduler Memory Enhancements
- **Follow-up on task results**: Scheduled task results are automatically injected into the receiver's session history — the next turn can ask follow-up questions without re-stating context. Thanks [@huangrichao2020](https://github.com/huangrichao2020)
- **No long-term memory pollution**: Scheduler-injected pairs are excluded from the daily memory flush so high-frequency tasks don't drown the memory store
- **Bounded scheduler context**: The scheduler's own session context is automatically capped, so long-running periodic tasks don't accumulate state and slow down replies
## 🔧 Tools and Safety
- **Vision model selection**: `tool.vision.model` config now actually takes effect, with automatic fallback when unconfigured #2792
- **Bash safety prompt**: The destructive-deletion confirm prompt is now scoped to paths outside the workspace — routine in-workspace operations are no longer interrupted
## 🐛 Other Fixes
- Fixed Deep Dream firing duplicate runs in multi-instance setups
- Fixed missing `reasoning_content` on some history turns in DeepSeek multi-turn conversations
## 📦 Upgrade
Source-code deployments can run `cow update` or `./run.sh update` for a one-click upgrade, or pull the latest code and restart manually. See [Upgrade Guide](https://docs.cowagent.ai/en/guide/upgrade) for details.
> ⚠️ One-click Feishu app creation requires `lark-oapi>=1.5.5`. `cow update` pulls it automatically; manual deployments must update dependencies.
**Release Date**: 2026.05.05 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.7...2.0.8)

View File

@@ -0,0 +1,158 @@
---
title: image-generation - Image Generation
description: Text-to-image / image-to-image / multi-image fusion with automatic multi-provider routing and fallback
---
A general-purpose image generation and editing skill supporting six providers: OpenAI, Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax, and LinkAI. No need to choose a model manually — the script automatically selects a configured provider based on a fixed priority order.
## Model Selection
`image-generation` uses a "fixed priority + automatic fallback" strategy — just configure your keys and it works:
1. **Priority order**: `OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI`
2. **Unconfigured providers are skipped**: only providers with an API key participate
3. **Automatic fallback on failure**: on errors like 401, model not enabled, or network issues, the next provider is tried
4. **Specified model goes first**: if a specific model name is provided, its provider is promoted to the front
### Supported Models
| Provider | Models / Aliases | Notes |
| --- | --- | --- |
| OpenAI | `gpt-image-2`, `gpt-image-1` | General-purpose, high quality, supports `quality` parameter |
| Gemini Nano Banana | `nano-banana-2`, `nano-banana-pro`, `nano-banana` | Corresponds to `gemini-3.1-flash`, `gemini-3-pro`, `gemini-2.5-flash` image variants |
| Seedream (Volcengine Ark) | `seedream-5.0-lite`, `seedream-4.5` | Native 2K4K, up to 14 reference images for fusion |
| Qwen (DashScope) | `qwen-image-2.0`, `qwen-image-2.0-pro` | Strong with Chinese text rendering and text-image layouts |
| MiniMax | `image-01` | Fast and simple image generation |
| LinkAI | Any model | Universal proxy, used as fallback |
<Note>
By default, the Agent does not pick a model — it uses automatic routing. If you want a specific model, just say so in the conversation, e.g. "use seedream to draw a cat" or "generate a poster with gpt-image-2". You can also pin a default model via the "Custom Configuration" section below.
</Note>
## Custom Configuration
### API Key Setup
You need **at least one** provider key. Configuring multiple providers enables automatic fallback. There are three ways to set up keys:
#### Option 1: Automatic Reuse of Existing Keys
If you have already configured model keys in the web console or `config.json` (e.g. `openai_api_key`, `gemini_api_key`, etc.), these keys are **automatically synced** to the corresponding environment variables at startup. In other words, if your chat model works, image generation can use the same key with zero extra configuration.
#### Option 2: Configure in config.json
Add the key fields directly to `config.json`:
```json
{
"openai_api_key": "sk-xxx",
"openai_api_base": "https://api.openai.com/v1",
"gemini_api_key": "AIza-xxx",
"ark_api_key": "xxx",
"dashscope_api_key": "sk-xxx",
"minimax_api_key": "xxx",
"linkai_api_key": "xxx"
}
```
A restart is required after changes. Each key also has a corresponding `*_api_base` field for custom endpoints.
#### Option 3: Configure via Conversation
Send an API key in the chat and the Agent will save it to `~/cow/.env` using the `env_config` tool — **no restart needed**. For example:
```
Set OPENAI_API_KEY to sk-xxx
```
Or:
```
Configure ARK_API_KEY as xxx
```
### API Key Reference
| Environment Variable | config.json Field | Provider | Default Base URL |
| --- | --- | --- | --- |
| `OPENAI_API_KEY` | `openai_api_key` | OpenAI | `https://api.openai.com/v1` |
| `GEMINI_API_KEY` | `gemini_api_key` | Gemini | `https://generativelanguage.googleapis.com` |
| `ARK_API_KEY` | `ark_api_key` | Volcengine Ark (Seedream) | `https://ark.cn-beijing.volces.com/api/v3` |
| `DASHSCOPE_API_KEY` | `dashscope_api_key` | Alibaba DashScope (Qwen) | `https://dashscope.aliyuncs.com` |
| `MINIMAX_API_KEY` | `minimax_api_key` | MiniMax | `https://api.minimaxi.com` |
| `LINKAI_API_KEY` | `linkai_api_key` | LinkAI | `https://api.link-ai.tech` |
### Pinning a Default Model
To force all image generation through a specific provider's model, add this to `config.json`:
```json
"skill": {
"image-generation": {
"model": "seedream-5.0-lite"
}
}
```
At startup, this is automatically converted to the environment variable `SKILL_IMAGE_GENERATION_MODEL`, and the script will always use this model's provider for generation.
## Enabling and Disabling
`image-generation` is a built-in skill that **automatically adjusts its status based on API keys**:
- **Key configured**: the skill is active — the Agent will invoke it when asked to draw
- **Key not configured**: the skill still appears in context (marked as "needs configuration") — the Agent will guide the user to set up a key rather than failing silently
To control it manually:
```text
/skill disable image-generation # Disable (won't be invoked even if keys are present)
/skill enable image-generation # Re-enable
```
In the terminal: `cow skill disable image-generation` / `cow skill enable image-generation`.
## Parameters
| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `prompt` | string | Yes | — | Image description |
| `image_url` | string / list | No | null | Input image(s) for editing — local path or URL. Pass multiple for multi-image fusion |
| `quality` | string | No | auto | `low` / `medium` / `high` — only some providers support this |
| `size` | string | No | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value like `1024x1024` |
| `aspect_ratio` | string | No | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9`; Gemini also supports `1:4` / `4:1` / `1:8` / `8:1` |
<Warning>
**Higher quality and larger size cost more and take longer.**
- For everyday conversations and quick previews, use the defaults (`auto`) or `quality=low` + `size=1K` — roughly 20 seconds
- For posters or when the user explicitly asks for high resolution, use `quality=high` + `size=2K/4K` — may take 15 minutes depending on the model
</Warning>
## Output
On success:
```json
{
"model": "doubao-seedream-5-0-260128",
"images": [
{"url": "/path/to/output.png"}
]
}
```
On failure: `{ "error": "..." }`. After an error, **do not retry directly** — it is almost always a configuration issue (wrong key, incorrect API base, model not enabled). Have the user fix the configuration first.
## Common Use Cases
- **Text-to-image**: generate illustrations, posters, icons, avatars, storyboards, etc. from a description
- **Image-to-image**: change styles, swap elements, add decorations or text on an existing image
- **Multi-image fusion**: combine multiple reference images into one (outfit swaps, character group photos, etc.)
<Note>
- Bash timeout should be set to 600 seconds. Each provider has a 300-second HTTP timeout, but the script may try multiple providers sequentially
- Input images are automatically compressed to ≤ 4 MB with the longest edge ≤ 4096 px
- Gemini / Seedream / Qwen / MiniMax do not support the `quality` parameter — passing it has no effect
- Seedream defaults to 2K; `seedream-5.0-lite` supports up to 3K; `seedream-4.5` supports up to 4K
</Note>

View File

@@ -0,0 +1,112 @@
---
title: knowledge-wiki - Knowledge Base
description: Maintain a local structured knowledge base with automatic archiving, categorisation, and cross-referencing
---
Organises notes, insights, and reference materials from your conversations into a structured local knowledge base, automatically maintaining an index and cross-references between pages.
`knowledge-wiki` maintains a `knowledge/` directory in your workspace — essentially the Agent's "second brain". The skill is marked `always: true`, so it is **always loaded** and requires no external dependencies.
## When It Triggers
- You share an article, document, or URL that you want to keep for future reference
- A conversation produces conclusions worth retaining long-term
- You want to look up something you accumulated earlier
## Directory Structure
```
knowledge/
├── index.md # Global index (must be maintained)
├── log.md # Operation log (append-only)
└── <category>/ # Category subdirectories (grouped by content)
└── <slug>.md # Knowledge page (lowercase-hyphenated filename)
```
## Three Core Operations
### 1. Ingest
When you share some material, the Agent will:
1. Read and understand the original content, extracting key information
2. Decide which category it belongs to — check `index.md` first; create a new category if none fits
3. Generate a knowledge page at `knowledge/<category>/<slug>.md`
4. Update the index `index.md` and the log `log.md`
### 2. Synthesise
When a conversation produces new conclusions or insights:
1. Create a new knowledge page under an appropriate category
2. Add cross-links to and from related existing pages
3. Update the index and log
### 3. Query
When you ask about previously accumulated knowledge:
1. Search `index.md` for potentially relevant pages
2. Open specific pages with the `read` tool
3. Supplement with `memory_search` if needed
4. Include links to knowledge pages in the answer so you can click through to the source
## Page Format
```markdown
# Page Title
> Source: <source URL or brief description>
Body content. Link between pages using relative paths:
[Related Page](../category/related-page.md)
## Key Points
- ...
## Related Pages
- [Page A](../category/page-a.md) — why it's related
```
<Note>
- `> Source:` records where this knowledge came from. Always include it when there is a clear source
- Cross-references are important: when creating or updating a page, remember to add back-links in the related pages too
- **Only link to pages that already exist.** If a concept deserves its own page, create it first, then add the link
</Note>
## Index Format
`knowledge/index.md` uses a flat list grouped by category, one knowledge page per line:
```markdown
# Knowledge Index
## Category A
- [Page Title](category-a/page-slug.md) — one-line summary
## Category B
- [Page Title](category-b/page-slug.md) — one-line summary
```
No tables, no emojis. Category names and organisation can be adjusted freely.
## Log Format
`knowledge/log.md` is append-only — newest entries go at the bottom:
```markdown
## [YYYY-MM-DD] ingest | Page Title
## [YYYY-MM-DD] synthesize | Page Title
```
## Writing Guidelines
- **Filenames**: lowercase with hyphens, e.g. `machine-learning.md`
- **One topic per page** — link related content across pages
- **Update, don't duplicate** — if a page already exists, update it rather than creating a new one
- **Always update the index** `knowledge/index.md` after any change
- **Distill, don't copy** — capture the key points, not the entire source
- **Use full paths when referencing knowledge pages in conversations**, e.g. `[Title](knowledge/<category>/<slug>.md)`. Use relative paths only for inter-page links
- **Include links when answering questions based on knowledge pages** so users can dig deeper

View File

@@ -0,0 +1,180 @@
---
title: skill-creator - Skill Creator
description: Create, install, and update skills — standardises SKILL.md format and directory structure
---
`skill-creator` is a "meta-skill" that helps the Agent create, install, and update other skills, ensuring every skill follows a consistent `SKILL.md` format and directory layout.
## When It Triggers
- The user wants to install a skill from a URL or remote repository
- The user wants to create a brand-new skill from scratch
- An existing skill needs upgrading or restructuring
## What Is a Skill?
A skill is a reusable instruction set plus optional scripts and assets. It injects domain expertise into the Agent so it can handle specific tasks like a specialist.
A skill typically contains:
1. **Specialised workflow** — step-by-step instructions for a category of tasks
2. **Tool usage** — how to call a particular API or process a particular file format
3. **Domain knowledge** — team conventions, business rules, data schemas, etc.
4. **Attached resources** — scripts, reference docs, templates, etc.
<Note>
**Core principle: less is more.** Only write what the Agent wouldn't figure out on its own. For every line you add, ask yourself: is it worth the tokens?
</Note>
## Directory Structure
```
skill-name/
├── SKILL.md # Required: skill definition
│ ├── YAML frontmatter (name / description are mandatory)
│ └── Markdown body (instructions + examples)
└── Optional resources
├── scripts/ # Executable scripts (Python / Bash, etc.)
├── references/ # Large reference docs the Agent reads on demand
└── assets/ # Templates, icons, etc. used directly in output
```
## SKILL.md Specification
Frontmatter fields in the SKILL.md header:
| Field | Description |
| --- | --- |
| `name` | Skill name — lowercase with hyphens, must match the directory name |
| `description` | **The most important field.** Clearly state what the skill does and when to use it. The Agent reads this to decide whether to invoke it. All trigger-related descriptions go here, not in the body |
| `metadata.cowagent.requires.bins` | System CLI tools that must be installed |
| `metadata.cowagent.requires.env` | Required environment variables (all must be present) |
| `metadata.cowagent.requires.anyEnv` | Multiple API keys — at least one must be set |
| `metadata.cowagent.requires.anyBins` | Multiple tools — at least one must be installed |
| `metadata.cowagent.always` | Set to `true` to always load, skipping dependency checks |
| `metadata.cowagent.emoji` | Display emoji (optional) |
| `metadata.cowagent.os` | OS restriction, e.g. `["darwin", "linux"]` |
<Note>
The `category` field does not need to be set manually — the system automatically sets it to `skill`.
</Note>
Two ways to declare API key dependencies:
```yaml
metadata:
cowagent:
requires:
env: ["MYAPI_KEY"] # Must be present
```
```yaml
metadata:
cowagent:
requires:
anyEnv: ["OPENAI_API_KEY", "LINKAI_API_KEY"] # At least one
```
**Skills are auto-enabled/disabled based on dependencies**: they activate when all required environment variables are present and deactivate when any are missing — no need for manual `/skill enable`.
## Resource Directories
| Directory | What goes here | What does NOT go here |
| --- | --- | --- |
| `scripts/` | Code that needs to run repeatedly, or scripts that produce deterministic results | Demo-only code snippets |
| `references/` | Documents **over 500 lines** that genuinely won't fit in SKILL.md (e.g. a full DB schema) | General API docs, tutorials, examples |
| `assets/` | Files that appear in the final output (templates, icons, boilerplate, etc.) | Explanatory documentation |
<Warning>
**In principle, everything goes in `SKILL.md`** — only split into resource directories when it truly won't fit.
Do not add `README.md`, `CHANGELOG.md`, or `INSTALLATION_GUIDE.md` to a skill — put everything in `SKILL.md`. Resource directories should only contain scripts that actually run or assets that are actually used.
</Warning>
## Installing External Skills
After installation, the skill lands in `<workspace>/skills/<name>/`.
| Source | How to install |
| --- | --- |
| URL (single file) | curl / web_fetch |
| URL (zip archive) | Download and extract |
| Local SKILL.md | Read directly |
| Local zip archive | Extract |
Installation steps:
1. Locate the `SKILL.md` (may be at the root or in a subdirectory of the archive)
2. Read the `name` from the frontmatter
3. Copy the **entire skill directory** (including `SKILL.md`, `scripts/`, `assets/`, etc.) to `<workspace>/skills/<name>/`
4. If the archive contains an `INSTALL.md` or similar setup script, run it — but the final result must still reside under `<workspace>/skills/<name>/`
## Creating a Skill from Scratch
Recommended order:
1. **Clarify requirements** — ask the user for a few concrete use cases (don't ask too many at once)
2. **Plan the structure** — does this skill need scripts? Reference docs? Template assets?
3. **Scaffold** — use the init script:
```bash
scripts/init_skill.py <skill-name> --path <workspace>/skills [--resources scripts,references,assets] [--examples]
```
4. **Fill in content** — write SKILL.md, add scripts and resources. Always test scripts after writing them
5. **Validate** (optional):
```bash
scripts/quick_validate.py <workspace>/skills/<skill-name>
```
6. **Iterate** — keep improving based on real-world usage feedback
## Naming Conventions
- Use only lowercase letters, digits, and hyphens. Normalise user-given names, e.g. `Plan Mode` → `plan-mode`
- Maximum 64 characters
- Keep it short, start with a verb, make it self-explanatory
- Use tool names as prefixes when appropriate, e.g. `gh-address-comments`, `linear-address-issue`
- The directory name and the `name` field must match exactly
## Three-Level Loading
Skills are not loaded into context all at once — they use a three-level progressive loading mechanism:
1. **Metadata** (`name` + `description`) — always in context (~100 words). The Agent uses this to decide whether to invoke the skill
2. **SKILL.md body** — loaded only when the skill is activated; keep it under 500 lines
3. **Resource files** — read on demand by the Agent
For skills with multiple variants (e.g. multi-cloud deployment), organise like this:
```
cloud-deploy/
├── SKILL.md # Main workflow and provider selection logic
└── references/
├── aws.md
├── gcp.md
└── azure.md
```
When the user picks AWS, the Agent only reads `aws.md` — no need to load all three providers.
## Common Design Patterns
**Step-by-step**: numbered steps with corresponding scripts.
```markdown
1. Analyse form structure (run analyze_form.py)
2. Generate field mappings (edit fields.json)
3. Auto-fill the form (run fill_form.py)
```
**Branching**: different flows based on user intent.
```markdown
1. Determine operation type:
**Creating new content?** → follow the "Create" workflow
**Editing existing content?** → follow the "Edit" workflow
```
**Template-based**: when output format has strict requirements, include a template in SKILL.md for the Agent to follow.

View File

@@ -23,11 +23,12 @@ If the current provider fails, the tool automatically tries the next one until i
| Vendor | Vision Model | Notes |
| --- | --- | --- |
| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
| Baidu Qianfan | Main model | Multimodal main models (e.g. `ernie-5.0`) handle images directly; falls back to `ernie-4.5-turbo-vl` for text-only main models |
| Qwen (DashScope) | Main model | Via MultiModalConversation API |
| Claude | Main model | Anthropic native image format |
| Gemini | Main model | inlineData format |
| Doubao | Main model | doubao-seed-2-0 series natively supported |
| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
| Kimi (Moonshot) | Main model | kimi-k2.6, kimi-k2.5 natively supported |
| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
@@ -52,7 +53,7 @@ To specify a particular model for the vision tool, add to `config.json`:
{
"tool": {
"vision": {
"model": "gpt-4o"
"model": "ernie-4.5-turbo-vl"
}
}
}

View File

@@ -139,7 +139,8 @@ sudo docker logs -f chatgpt-on-wechat
```json
{
"channel_type": "web",
"model": "MiniMax-M2.7",
"model": "deepseek-v4-flash",
"deepseek_api_key": "",
"agent": true,
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
@@ -152,8 +153,9 @@ sudo docker logs -f chatgpt-on-wechat
```yaml
environment:
CHANNEL_TYPE: 'web'
MODEL: 'MiniMax-M2.7'
MINIMAX_API_KEY: 'your-api-key'
MODEL: 'deepseek-v4-flash'
DEEPSEEK_API_KEY: 'your-api-key'
DEEPSEEK_API_BASE: 'https://api.deepseek.com/v1'
AGENT: 'True'
AGENT_MAX_CONTEXT_TOKENS: 40000
AGENT_MAX_CONTEXT_TURNS: 30
@@ -165,7 +167,7 @@ sudo docker logs -f chatgpt-on-wechat
| 参数 | 环境变量 | 说明 | 默认值 |
| --- | --- | --- | --- |
| `channel_type` | `CHANNEL_TYPE` | 接入渠道类型 | `web` |
| `model` | `MODEL` | 模型名称 | `MiniMax-M2.5` |
| `model` | `MODEL` | 模型名称 | `deepseek-v4-flash` |
| `agent` | `AGENT` | 是否启用 Agent 模式 | `true` |
| `agent_workspace` | - | Agent 工作空间路径 | `~/cow` |
| `agent_max_context_tokens` | `AGENT_MAX_CONTEXT_TOKENS` | 最大上下文 tokens | `40000` |

View File

@@ -9,7 +9,7 @@ CowAgent 2.0 从简单的聊天机器人全面升级为超级智能助理,采
CowAgent 的整体架构由以下核心模块组成:
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
<img src="https://cdn.link-ai.tech/doc/cow-agent-arch-zh.jpg" alt="CowAgent Architecture" />
| 模块 | 说明 |
| --- | --- |
@@ -70,7 +70,7 @@ Agent 的工作空间默认位于 `~/cow` 目录,用于存储系统提示词
"agent_max_context_tokens": 40000,
"agent_max_context_turns": 30,
"agent_max_steps": 15,
"enable_thinking": true
"enable_thinking": false
}
```
@@ -81,5 +81,5 @@ Agent 的工作空间默认位于 `~/cow` 目录,用于存储系统提示词
| `agent_max_context_tokens` | 最大上下文 token 数 | `50000` |
| `agent_max_context_turns` | 最大上下文记忆轮次 | `20` |
| `agent_max_steps` | 单次任务最大决策步数 | `20` |
| `enable_thinking` | 是否启用深度思考,开启后 Web 端展示推理过程,关闭可加速响应 | `true` |
| `enable_thinking` | 是否启用深度思考模式 | `false` |
| `knowledge` | 是否启用个人知识库 | `true` |

View File

@@ -28,7 +28,7 @@
-**ツールシステム**: ファイル読み書き、ターミナル実行、ブラウザ操作、スケジュールタスク、メッセージ送信などの組み込みツールを提供。Agentが自律的に呼び出して複雑なタスクを完了します。
-**CLIシステム**: ターミナルコマンドとチャットコマンドを提供し、プロセス管理、Skillインストール、設定変更などの操作をサポートします。
-**マルチモーダルメッセージ**: テキスト、画像、音声、ファイルなど、さまざまなメッセージタイプの解析・処理・生成・送信に対応しています。
-**複数モデル対応**: OpenAI、Claude、Gemini、DeepSeek、MiniMax、GLM、Qwen、Kimi、Doubaoなど、主要なモデルプロバイダーに対応しています。
-**複数モデル対応**: DeepSeek、MiniMax、Claude、Gemini、OpenAI、GLM、Qwen、Doubao、Kimiなど、主要なモデルプロバイダーに対応しています。
-**マルチプラットフォームデプロイ**: ローカルPCやサーバー上で実行でき、WeChat、Web、Feishu、DingTalk、WeChat公式アカウント、WeComアプリケーションに統合可能です。
## 免責事項
@@ -164,15 +164,15 @@ sudo docker logs -f chatgpt-on-wechat
| プロバイダー | 推奨モデル |
| --- | --- |
| DeepSeek | `deepseek-v4-flash` |
| MiniMax | `MiniMax-M2.7` |
| GLM | `glm-5-turbo` |
| Kimi | `kimi-k2.5` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Qwen | `qwen3.6-plus` |
| Claude | `claude-sonnet-4-6` |
| Gemini | `gemini-3.1-pro-preview` |
| OpenAI | `gpt-5.4` |
| DeepSeek | `deepseek-chat` |
| GLM | `glm-5.1` |
| Qwen | `qwen3.6-plus` |
| Doubao | `doubao-seed-2-0-code-preview-260215` |
| Kimi | `kimi-k2.6` |
各モデルの詳細設定については、[モデルドキュメント](https://docs.cowagent.ai/en/models/index)を参照してください。

View File

@@ -1,69 +1,107 @@
---
title: Feishu (Lark)
description: CowAgent を Feishu アプリケーションに統合する
description: 企業向けカスタムアプリで CowAgent を Feishu に接続
---
企業向けカスタムアプリを作成して、CowAgent を Feishu に統合します。管理者権限を持つ Feishu 企業ユーザーである必要があります。
> 飛書Feishu企業向けカスタムアプリを通じて CowAgent を接続。1 対 1 チャット、グループチャット(@メンションに対応。WebSocket 長接続を使用するため公開 IP 不要、ストリーミングのタイプライター応答や音声メッセージにも対応します。
## 1. 企業カスタムアプリの作成
<Note>
接続には管理者権限を持つ Feishu 企業ユーザーが必要です。
</Note>
### 1.1 アプリの作成
## 1. 接続方法
[Feishu 開発者プラットフォーム](https://open.feishu.cn/app/)にアクセスし、**企業カスタムアプリを作成**をクリックして、必要な情報を入力し**作成**をクリックします:
### 方式 1: ワンクリック作成(推奨)
事前に Feishu 開発者プラットフォームでアプリを作成する必要はありません。Cow を起動後、Web コンソール(既定 `http://127.0.0.1:9899/`)を開き、**チャネル** メニュー → **チャネルを追加** → **Feishu** を選択し、**QR スキャン** タブで **ワンクリックで Feishu アプリを作成** をクリック。**Feishu アプリ** で QR コードをスキャンするとアプリ作成と接続が自動完了します。
<Note>
作成されたアプリには必要な権限(メッセージ送受信、カード読み書き、グループイベントなど)とイベント購読がすべて事前設定されています。現在は Feishu 中国版のみ対応で、Lark 国際版は未対応です。
</Note>
CLI から `feishu_app_id` 未設定で起動した場合は、ターミナルにも QR コードが表示されます。
### 方式 2: 手動作成
Feishu 開発者プラットフォームで自分でアプリを作成し、Web コンソールまたは設定ファイルから接続します。
**ステップ 1: アプリ作成**
1. [Feishu 開発者プラットフォーム](https://open.feishu.cn/app/) にアクセスし、**企業カスタムアプリを作成** をクリック:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-create-app.jpg" width="500"/>
### 1.2 Bot 機能追加
**アプリ機能の追加**で、アプリに **Bot** 機能を追加します:
2. **アプリ機能の追加** で **Bot** 機能追加:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-add-bot.jpg" width="800"/>
### 1.3 アプリ権限の設定
**権限管理**をクリックし、**権限設定**の下の入力欄に以下の権限文字列を貼り付け、フィルタされたすべての権限を選択し、**一括有効化**をクリックして確認します:
3. **権限管理** で以下の権限を貼り付け、全選択して **一括有効化**:
```
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource
im:message,im:message.group_at_msg,im:message.group_at_msg:readonly,im:message.p2p_msg,im:message.p2p_msg:readonly,im:message:send_as_bot,im:resource,cardkit:card:write
```
<img src="https://cdn.link-ai.tech/doc/feishu-hosting-add-auth2.png" width="800"/>
## 2. プロジェクト設定
1. **認証情報と基本情報**から `App ID` と `App Secret` を取得します:
4. **認証情報と基本情報** から `App ID` と `App Secret` を取得:
<img src="https://img-1317903499.cos.ap-guangzhou.myqcloud.com/docs/feishu-hosting-appid-secret.jpg" width="800"/>
2. プロジェクトルートの `config.json` に以下の設定を追加します:
**ステップ 2: CowAgent に接続**
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_bot_name": "YOUR_BOT_NAME"
}
```
<Tabs>
<Tab title="Web コンソール">
Web コンソールから **チャネル** → **チャネルを追加** → **Feishu** → **手動入力** タブに切り替え、App ID と App Secret を入力して接続。
</Tab>
<Tab title="設定ファイル">
`config.json` に以下を追加して起動:
| パラメータ | 説明 |
| --- | --- |
| `feishu_app_id` | Feishu Bot の App ID |
| `feishu_app_secret` | Feishu Bot の App Secret |
| `feishu_bot_name` | Bot 名(アプリ作成時に設定)、グループチャットで使用する際に必要 |
```json
{
"channel_type": "feishu",
"feishu_app_id": "YOUR_APP_ID",
"feishu_app_secret": "YOUR_APP_SECRET",
"feishu_stream_reply": true
}
```
設定完了後、プロジェクトを起動します。
| パラメータ | 説明 | デフォルト |
| --- | --- | --- |
| `feishu_app_id` | Feishu アプリの App ID | - |
| `feishu_app_secret` | Feishu アプリの App Secret | - |
| `feishu_stream_reply` | ストリーミングタイプライター応答を有効化 | `true` |
</Tab>
</Tabs>
## 3. イベントサブスクリプションの設定
**ステップ 3: アプリの公開**
1. プロジェクトが正常に動作した後、Feishu 開発者プラットフォームに移動し、**イベントとコールバック**をクリックし、**ロングコネクション**モードを選択して保存をクリックします:
1. Cow 起動後、Feishu 開発者プラットフォーム**イベントとコールバック****ロングコネクション** モードを選択して保存:
<img src="https://cdn.link-ai.tech/doc/202601311731183.png" width="600"/>
2. 下の**イベントを追加**をクリックし、「メッセージ受信」を検索して「**メッセージ受信 v2.0**を選択し、確認します
2. **イベントを追加**「メッセージ受信」を検索し**メッセージ受信 v2.0** を選択。
3. **バージョン管理とリリース**をクリックし、新しいバージョンを作成し**本番リリース**を申請します。Feishu クライアントで承認メッセージを確認し、承認します:
3. **バージョン管理とリリース** で新バージョンを作成し **本番リリース** を申請Feishu クライアントで承認:
<img src="https://cdn.link-ai.tech/doc/202601311807356.png" width="600"/>
完了後、Feishu で Bot 名を検索してチャットを開始できます。
## 2. 機能一覧
| 機能 | 対応状況 |
| --- | --- |
| 1 対 1 チャット | ✅ |
| グループチャット(@Bot | ✅ |
| テキストメッセージ | ✅ 送受信 |
| 画像メッセージ | ✅ 送受信 |
| 音声メッセージ | ✅ 送受信 |
| ストリーミング応答 | ✅Feishu cardkit ストリーミングカードベース) |
<Note>
ストリーミング応答には `cardkit:card:write` 権限(ワンクリック作成では自動付与)と Feishu クライアント 7.20 以上が必要です。古いクライアントではアップグレード案内が表示され、権限/バージョン未充足時は通常テキスト応答に自動フォールバックします。
</Note>
## 3. 使い方
接続完了後、Feishu で Bot 名を検索してチャットを開始できます。
グループで使う場合は Bot をグループに追加し、@メンションでメッセージを送ってください。

View File

@@ -44,17 +44,18 @@ description: ステータスの確認、設定管理、コンテキスト制御
**設定項目を変更:**
```text
/config model deepseek-chat
/config model deepseek-v4-flash
```
**変更可能な設定項目:**
| 項目 | 説明 | 例 |
| --- | --- | --- |
| `model` | AI モデル名 | `deepseek-chat` |
| `model` | AI モデル名 | `deepseek-v4-flash` |
| `agent_max_context_tokens` | 最大コンテキストトークン数 | `40000` |
| `agent_max_context_turns` | 最大コンテキスト記憶ターン数 | `30` |
| `agent_max_steps` | タスクごとの最大判断ステップ数 | `15` |
| `enable_thinking` | ディープシンキングモードの有効化 | `true` / `false` |
<Note>
`model` を変更すると、システムが対応するモデル API を自動的にマッチングします。設定は `config.json` に永続的に保存されます。

View File

@@ -121,7 +121,8 @@ sudo docker logs -f chatgpt-on-wechat
```json
{
"channel_type": "web",
"model": "MiniMax-M2.5",
"model": "deepseek-v4-flash",
"deepseek_api_key": "",
"agent": true,
"agent_workspace": "~/cow",
"agent_max_context_tokens": 40000,
@@ -133,7 +134,7 @@ sudo docker logs -f chatgpt-on-wechat
| パラメータ | 説明 | デフォルト値 |
| --- | --- | --- |
| `channel_type` | チャネルタイプ | `web` |
| `model` | モデル名 | `MiniMax-M2.5` |
| `model` | モデル名 | `deepseek-v4-flash` |
| `agent` | Agent モードを有効化 | `true` |
| `agent_workspace` | Agent のワークスペースパス | `~/cow` |
| `agent_max_context_tokens` | 最大コンテキストトークン数 | `40000` |

View File

@@ -9,7 +9,7 @@ CowAgent 2.0 は、シンプルなチャットボットから、自律的な思
CowAgent のアーキテクチャは以下のコアモジュールで構成されています:
<img src="https://cdn.link-ai.tech/doc/68ef7b212c6f791e0e74314b912149f9-sz_5847990.png" alt="CowAgent Architecture" />
<img src="https://cdn.link-ai.tech/doc/cow-agent-arch-en.jpg.jpg" alt="CowAgent Architecture" />
| モジュール | 説明 |
| --- | --- |

View File

@@ -5,6 +5,8 @@ description: CowAgent の長期記憶システム — ファイル永続化、
長期記憶はワークスペースのファイルに保存され、セッション間で永続化されます。Agent は会話中に検索ツールを通じて過去の記憶をオンデマンドで読み込み、コンテキストのトリミング時に会話の要約を自動的に長期記憶に書き込みます。
<img src="https://cdn.link-ai.tech/doc/memory-architecture-en.jpg" alt="Memory Architecture" />
## 記憶の種類
### コア記憶MEMORY.md
@@ -30,20 +32,25 @@ Agent は以下のメカニズムにより、会話内容を長期記憶に自
すべての記憶書き込みはバックグラウンドスレッドで非同期に実行されLLM の要約 + ファイル書き込み)、通常の会話応答をブロックしません。
## 初回起動
## 関連ファイル
初回起動時に、Agent はユーザーに主要な情報を積極的に尋ね、ワークスペース(デフォルト `~/cow`に保存します
ワークスペース(デフォルト `~/cow`内の記憶関連ファイル
| ファイル | 説明 |
| --- | --- |
| `system.md` | Agent のシステムプロンプトと動作設定 |
| `user.md` | ユーザーの身元情報と好み |
| `AGENT.md` | Agent のパーソナリティと動作設定 |
| `USER.md` | ユーザーの身元情報と好み |
| `RULE.md` | カスタムルールと制約 |
| `MEMORY.md` | コア記憶(長期) |
| `memory/YYYY-MM-DD.md` | 日次記憶(オンデマンドで作成) |
| `memory/dreams/YYYY-MM-DD.md` | 夢日記Deep Dream で自動生成) |
## Web コンソール
Web コンソールの記憶管理ページで、記憶ファイルと夢日記を閲覧できます。タブ切り替えに対応:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
<img src="https://cdn.link-ai.tech/doc/20260414171014.png" width="800" />
</Frame>
## 設定

View File

@@ -12,6 +12,6 @@ description: Claudeモデルの設定
| パラメータ | 説明 |
| --- | --- |
| `model` | `claude-sonnet-4-6`、`claude-opus-4-6`、`claude-sonnet-4-5`、`claude-sonnet-4-0`、`claude-3-5-sonnet-latest`などから選択可能。[公式モデル一覧](https://docs.anthropic.com/en/docs/about-claude/models/overview)を参照 |
| `model` | `claude-sonnet-4-6`、`claude-opus-4-7`、`claude-opus-4-6`、`claude-sonnet-4-5`、`claude-sonnet-4-0`、`claude-3-5-sonnet-latest`などから選択可能。[公式モデル一覧](https://docs.anthropic.com/en/docs/about-claude/models/overview)を参照 |
| `claude_api_key` | [Claude Console](https://console.anthropic.com/settings/keys)で作成 |
| `claude_api_base` | 任意。デフォルトは`https://api.anthropic.com/v1`。サードパーティプロキシを使用する場合に変更 |

View File

@@ -102,18 +102,18 @@ description: Coding Planモデルの設定
```json
{
"bot_type": "openai",
"bot_type": "moonshot",
"model": "kimi-for-coding",
"open_ai_api_base": "https://api.kimi.com/coding/v1",
"open_ai_api_key": "YOUR_API_KEY"
"moonshot_base_url": "https://api.kimi.com/coding/v1",
"moonshot_api_key": "YOUR_API_KEY"
}
```
| パラメータ | 説明 |
| --- | --- |
| `model` | `kimi-for-coding` |
| `open_ai_api_base` | `https://api.kimi.com/coding/v1` |
| `open_ai_api_key` | Coding Plan専用キー従量課金とは共有不可 |
| `model` | `kimi-for-coding`で自動更新モデル、または`kimi-k2.6`などのモデルを指定 |
| `moonshot_base_url` | `https://api.kimi.com/coding/v1` |
| `moonshot_api_key` | Coding Plan専用キー従量課金とは共有不可 |
参考: [キー & ドキュメント](https://www.kimi.com/code/docs/)

62
docs/ja/models/custom.mdx Normal file
View File

@@ -0,0 +1,62 @@
---
title: カスタム
description: サードパーティAPIやローカルモデル向けのカスタムプロバイダー設定
---
OpenAI互換プロトコルでアクセスするモデルサービスに適用します
- **サードパーティAPIプロキシ**統一APIベースで複数モデルを呼び出し
- **ローカルモデル**Ollama、vLLM、LocalAIなどでローカルにデプロイされたモデル
- **プライベートデプロイ**:組織内でホストされたモデルサービス
<Note>
`openai` プロバイダーとの違い:カスタムプロバイダーでは `/config model` でモデルを切り替えてもプロバイダータイプは自動切り替えされず、カスタムAPIアドレスが常に保持されます。
</Note>
## 設定方法
### サードパーティAPIプロキシ
```json
{
"bot_type": "custom",
"model": "deepseek-v4-flash",
"custom_api_key": "YOUR_API_KEY",
"custom_api_base": "https://{your-proxy.com}/v1"
}
```
| パラメータ | 説明 |
| --- | --- |
| `bot_type` | `custom` に設定必須 |
| `model` | モデル名、プロキシサービスがサポートする任意のモデル名 |
| `custom_api_key` | プロキシサービスが提供するAPIキー |
| `custom_api_base` | APIアドレス、OpenAI互換プロトコルが必要 |
### ローカルモデル
ローカルモデルは通常APIキー不要で、APIベースのみ設定します
```json
{
"bot_type": "custom",
"model": "qwen3.5:27b",
"custom_api_base": "http://localhost:11434/v1"
}
```
一般的なローカルデプロイツールとデフォルトアドレス:
| ツール | デフォルトAPIベース |
| --- | --- |
| [Ollama](https://ollama.com) | `http://localhost:11434/v1` |
| [vLLM](https://docs.vllm.ai) | `http://localhost:8000/v1` |
| [LocalAI](https://localai.io) | `http://localhost:8080/v1` |
## モデル切り替え
カスタムプロバイダーではモデル切り替え時に `model` のみ変更され、`bot_type` やAPIアドレスは変わりません
```
/config model qwen3.5:27b
```

View File

@@ -7,22 +7,55 @@ description: DeepSeekモデルの設定
```json
{
"model": "deepseek-chat",
"model": "deepseek-v4-flash",
"deepseek_api_key": "YOUR_API_KEY"
}
```
| パラメータ | 説明 |
| --- | --- |
| `model` | `deepseek-chat`DeepSeek-V3.2、非思考モード)、`deepseek-reasoner`DeepSeek-R1、思考モード |
| `deepseek_api_key` | [DeepSeek Platform](https://platform.deepseek.com/api_keys)で作成 |
| `model` | `deepseek-v4-flash`(デフォルト)、`deepseek-v4-pro` をサポート |
| `deepseek_api_key` | [DeepSeek Platform](https://platform.deepseek.com/api_keys) で作成 |
| `deepseek_api_base` | オプション、デフォルトは `https://api.deepseek.com/v1`。サードパーティプロキシに変更可能 |
## モデルの選び方
| モデル | 適用シーン |
| --- | --- |
| `deepseek-v4-flash` | デフォルト推奨、高速・低コスト |
| `deepseek-v4-pro` | 複雑なタスクでより強力 |
## 思考モード
V4シリーズ`deepseek-v4-flash` / `deepseek-v4-pro`)は明示的な「思考モード」をサポートします。最終回答の前に思考内容(`reasoning_content`)を出力することで、回答品質を高めます。
### スイッチ
グローバル設定 `enable_thinking` で制御します:
```json
{
"enable_thinking": true
}
```
- `true`すべてのチャネルで思考モードがオン。Webコンソールでは思考過程を表示し、IMチャネルWeChat / WeCom / DingTalk / Feishuでは表示されないものの、回答品質の向上というメリットを得られます。
- `false`:思考オフ、応答が速く、初回トークンの遅延も低くなります。
### 注意事項
- **サンプリングパラメータ**:思考モード時は `temperature`、`top_p`、`presence_penalty`、`frequency_penalty` がサーバ側で無視されますエラーにはなりません。CowAgentは自動的に送信をスキップします。
- **マルチターンのツール呼び出し**履歴にツール呼び出しが含まれる場合、DeepSeekはすべてのassistantメッセージに `reasoning_content` を返送するよう要求します。CowAgentが自動でラウンドトリップ処理を行うため、セッション途中で思考スイッチを切り替えてもエラーになりません。
<Tip>
通常は `deepseek-v4-flash` を使い、難しいタスクでは `deepseek-v4-pro` に切り替え、深い思考が必要な時は `enable_thinking` を有効にしてください。
</Tip>
方法2OpenAI互換方式
```json
{
"model": "deepseek-chat",
"model": "deepseek-v4-flash",
"bot_type": "openai",
"open_ai_api_key": "YOUR_API_KEY",
"open_ai_api_base": "https://api.deepseek.com/v1"

View File

@@ -5,14 +5,14 @@ description: 智谱AI GLMモデルの設定
```json
{
"model": "glm-5-turbo",
"model": "glm-5.1",
"zhipu_ai_api_key": "YOUR_API_KEY"
}
```
| パラメータ | 説明 |
| --- | --- |
| `model` | `glm-5-turbo`、`glm-5`、`glm-4.7`、`glm-4-plus`、`glm-4-flash`、`glm-4-air`などから選択可能。[モデルコード](https://bigmodel.cn/dev/api/normal-model/glm-4)を参照 |
| `model` | `glm-5.1`、`glm-5-turbo`、`glm-5`、`glm-4.7`、`glm-4-plus`、`glm-4-flash`、`glm-4-air`などから選択可能。[モデルコード](https://bigmodel.cn/dev/api/normal-model/glm-4)を参照 |
| `zhipu_ai_api_key` | [智谱AI Console](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys)で作成 |
OpenAI互換の設定もサポートしています:
@@ -20,7 +20,7 @@ OpenAI互換の設定もサポートしています:
```json
{
"bot_type": "openai",
"model": "glm-5-turbo",
"model": "glm-5.1",
"open_ai_api_base": "https://open.bigmodel.cn/api/paas/v4",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -6,7 +6,7 @@ description: CowAgentがサポートするモデルとおすすめの選択肢
CowAgentは国内外の主要なLLMをサポートしています。モデルインターフェースはプロジェクトの`models/`ディレクトリに実装されています。
<Note>
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
Agent モードでは、品質とコストのバランスから以下のモデルをおすすめします: deepseek-v4-flash、MiniMax-M2.7、claude-sonnet-4-6、gemini-3.1-pro-preview、glm-5.1、qwen3.6-plus、kimi-k2.6、ernie-5.0
</Note>
## 設定
@@ -18,21 +18,15 @@ CowAgentは国内外の主要なLLMをサポートしています。モデルイ
## サポートモデル
<CardGroup cols={2}>
<Card title="DeepSeek" href="/ja/models/deepseek">
deepseek-v4-flash、deepseek-v4-pro など
</Card>
<Card title="Baidu Qianfan / ERNIE" href="/ja/models/qianfan">
ernie-5.0、ernie-4.5-turbo-128k など
</Card>
<Card title="MiniMax" href="/ja/models/minimax">
MiniMax-M2.7およびその他のシリーズモデル
</Card>
<Card title="GLM (智谱AI)" href="/ja/models/glm">
glm-5-turbo、glm-5およびその他のシリーズモデル
</Card>
<Card title="Qwen (通义千问)" href="/ja/models/qwen">
qwen3.6-plus、qwen3-maxなど
</Card>
<Card title="Kimi" href="/ja/models/kimi">
kimi-k2.5、kimi-k2など
</Card>
<Card title="Doubao (ByteDance)" href="/ja/models/doubao">
doubao-seedシリーズモデル
</Card>
<Card title="Claude" href="/ja/models/claude">
claude-sonnet-4-6など
</Card>
@@ -42,8 +36,17 @@ CowAgentは国内外の主要なLLMをサポートしています。モデルイ
<Card title="OpenAI" href="/ja/models/openai">
gpt-5.4、gpt-4.1、oシリーズなど
</Card>
<Card title="DeepSeek" href="/ja/models/deepseek">
deepseek-chat、deepseek-reasoner
<Card title="GLM (智谱AI)" href="/ja/models/glm">
glm-5.1、glm-5-turbo、glm-5およびその他のシリーズモデル
</Card>
<Card title="Qwen (通义千问)" href="/ja/models/qwen">
qwen3.6-plus、qwen3-maxなど
</Card>
<Card title="Doubao (ByteDance)" href="/ja/models/doubao">
doubao-seedシリーズモデル
</Card>
<Card title="Kimi" href="/ja/models/kimi">
kimi-k2.6、kimi-k2.5、kimi-k2など
</Card>
<Card title="LinkAI" href="/ja/models/linkai">
統合マルチモデルインターフェース + ナレッジベース

View File

@@ -5,14 +5,14 @@ description: Kimi (Moonshot) モデルの設定
```json
{
"model": "kimi-k2.5",
"model": "kimi-k2.6",
"moonshot_api_key": "YOUR_API_KEY"
}
```
| パラメータ | 説明 |
| --- | --- |
| `model` | `kimi-k2.5`、`kimi-k2`、`moonshot-v1-8k`、`moonshot-v1-32k`、`moonshot-v1-128k`から選択可能 |
| `model` | `kimi-k2.6`、`kimi-k2.5`、`kimi-k2`、`moonshot-v1-8k`、`moonshot-v1-32k`、`moonshot-v1-128k`から選択可能 |
| `moonshot_api_key` | [Moonshot Console](https://platform.moonshot.cn/console/api-keys)で作成 |
OpenAI互換の設定もサポートしています:
@@ -20,7 +20,7 @@ OpenAI互換の設定もサポートしています:
```json
{
"bot_type": "openai",
"model": "kimi-k2.5",
"model": "kimi-k2.6",
"open_ai_api_base": "https://api.moonshot.cn/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -3,7 +3,7 @@ title: LinkAI
description: LinkAIプラットフォームで複数モデルに統合アクセス
---
[LinkAI](https://link-ai.tech)プラットフォームでは、OpenAI、Claude、Gemini、DeepSeek、Qwen、Kimiなどのモデルを柔軟に切り替えることができ、ナレッジベース、ワークフロー、プラグイン、その他のAgent機能をサポートしています。
[LinkAI](https://link-ai.tech)プラットフォームでは、OpenAI、Claude、Gemini、DeepSeek、MiniMax、Qwen、Kimiなどのモデルを柔軟に切り替えることができ、ナレッジベース、ワークフロー、プラグイン、その他のAgent機能をサポートしています。
```json
{

View File

@@ -0,0 +1,63 @@
---
title: Baidu Qianfan / ERNIE
description: Baidu Qianfan ERNIE モデル設定
---
方法 1: 公式接続(推奨):
```json
{
"model": "ernie-5.0",
"qianfan_api_key": "",
"qianfan_api_base": "https://qianfan.baidubce.com/v2"
}
```
| パラメータ | 説明 |
| --- | --- |
| `model` | デフォルトの推奨は `ernie-5.0`。`ernie-x1.1`、`ernie-4.5-turbo-128k`、`ernie-4.5-turbo-32k` も利用できます |
| `qianfan_api_key` | Qianfan API Key。通常は `bce-v3/` で始まります |
| `qianfan_api_base` | 任意。デフォルトは `https://qianfan.baidubce.com/v2` |
## モデル選択
| モデル | 用途 |
| --- | --- |
| `ernie-5.0` | デフォルト推奨。文心の最新フラッグシップモデルで、総合性能が最も強い |
| `ernie-x1.1` | 深層推論モデル。ハルシネーションが少なく、指示追従とツール呼び出しが強化 |
| `ernie-4.5-turbo-128k` | 長いコンテキストと一般的なチャット向け |
| `ernie-4.5-turbo-32k` | コンテキスト長とコストのバランスが良い一般チャット向け |
## Vision ツール
`qianfan_api_key` を設定すると、Agent モードの Vision ツールは Qianfan を自動検出します:
- 主モデルが多モーダル(`ernie-5.0`、`ernie-x1.1`、`ernie-4.5-turbo-vl` など)の場合は、追加設定なしで主モデルがそのまま画像を処理します。
- 主モデルがテキスト専用(`ernie-4.5-turbo-128k` などの場合は、Vision ツールが自動的に `ernie-4.5-turbo-vl` にフォールバックします。
特定の Vision モデルを強制したい場合は、`config.json` で明示的に指定できます:
```json
{
"tool": {
"vision": {
"model": "ernie-4.5-turbo-vl"
}
}
}
```
方法 2: OpenAI 互換接続:
```json
{
"model": "ernie-5.0",
"bot_type": "openai",
"open_ai_api_key": "",
"open_ai_api_base": "https://qianfan.baidubce.com/v2"
}
```
<Tip>
新しい設定では `qianfan_api_key` の利用を推奨します。既存の `wenxin`、`wenxin-4`、`baidu_wenxin_api_key`、`baidu_wenxin_secret_key` 設定は引き続き利用できます。
</Tip>

View File

@@ -5,6 +5,7 @@ description: CowAgent バージョン履歴
| バージョン | 日付 | 説明 |
| --- | --- | --- |
| [2.0.7](/ja/releases/v2.0.7) | 2026.04.22 | 画像生成スキル6プロバイダー自動ルーティング、新モデルKimi K2.6、Claude Opus 4.7、GLM 5.1)、ナレッジベースと Web コンソールの改善 |
| [2.0.6](/ja/releases/v2.0.6) | 2026.04.14 | ナレッジベース、Deep Dream 記憶蒸留、スマートコンテキスト圧縮、Web コンソールアップグレード |
| [2.0.5](/ja/releases/v2.0.5) | 2026.04.01 | Cow CLI、Skill Hub オープンソース、ブラウザツール、企業微信スキャン作成、その他改善 |
| [2.0.4](/ja/releases/v2.0.4) | 2026.03.22 | 個人WeChatチャネル追加、新モデルサポート、日本語ドキュメント、スクリプトリファクタリングおよび複数修正 |

View File

@@ -0,0 +1,65 @@
---
title: v2.0.7
description: CowAgent 2.0.7 - 画像生成スキル6プロバイダー自動ルーティング、新モデルサポート、ナレッジベース強化、Web コンソール改善およびバグ修正
---
## 🎨 画像生成スキル
新しい内蔵スキル `image-generation` を追加。テキストから画像生成、画像編集、複数画像の融合に対応し、6 社の主要プロバイダーをカバー:
- **6 プロバイダー自動ルーティング**OpenAI (GPT-Image-2) → Gemini (Nano Banana) → Seedream (Volcengine Ark) → Qwen (DashScope) → MiniMax → LinkAI — 固定の優先順位で設定済みプロバイダーを自動選択、失敗時は次のプロバイダーへ自動フォールバック
- **モデル選択不要**API Key を設定するだけで使用可能、モデルを手動で指定する必要なし。会話で特定モデルを指名することも可能「seedream で猫を描いて」)
- **柔軟な制御**`quality`(画質)、`size`解像度、512/1K〜4K、`aspect_ratio`(アスペクト比)パラメータ対応、各プロバイダーが自動的に有効な値にマッピング
- **画像編集**既存の画像を渡して編集・スタイル変換・複数画像融合が可能Seedream は最大 14 枚の参照画像をサポート)
- **スキルレベル設定**`config.json` の `skill.image-generation.model` でデフォルトモデルを固定可能
- **画像ライトボックス**Web コンソールのすべての画像がクリックで拡大プレビュー対応
ドキュメント:[画像生成スキル](https://docs.cowagent.ai/ja/skills/image-generation)
## 🤖 新モデルサポート
- **Kimi K2.6**`kimi-k2.6` モデルサポートを追加
- **Claude Opus 4.7**`claude-opus-4-7` モデルサポートを追加
- **GLM 5.1**`glm-5.1` モデルサポートを追加
- **Kimi Coding Plan**Kimi Coding Plan モードをサポート
- **カスタムモデルプロバイダー**:新しいカスタムモデルプロバイダー設定により、追加ベンダーとの統合が容易に
## 💬 Web コンソール改善
- **スマート自動スクロール**:チャットスクロールの動作を改善 — ユーザーが過去のメッセージを閲覧中に強制的に最下部にスクロールしなくなりました
- **推論コンテンツ制限**:深い思考コンテンツを 4KB に制限し、フロントエンドのラグを防止
- **モバイル最適化**:セッションサイドバーをモバイルではデフォルトで非表示、オーバーレイタップで閉じることが可能
- **セッションタイトル修正**:タイトル自動生成のフォールバックロジックと設定変更時の Bridge リセットを修正
- **画像プレビュー重複排除**:同一メッセージ内での画像の重複レンダリングを修正
## 📚 ナレッジベース強化
- **ネストディレクトリ対応**:ナレッジベースの一覧表示が多階層のネストディレクトリに対応
- **ルートレベルファイル表示**:ナレッジツリーにルートディレクトリの `index.md`、`log.md` などを表示
- **空状態統計の修正**:ルートレベルファイルが空状態検出に干渉しなくなりました
## 🌙 夢の記憶改善
- **構造化整理**:夢の記憶ファイルが日付別に自動アーカイブされ、ディレクトリ構造がより整理されました
- **スケジュールジッター**:毎日の夢トリガーにランダムジッターを追加し、クラスター環境での同時実行の競合を回避
## 🛠 スキルシステム改善
- **スキルマネージャーの更新**`/skill` コマンド実行後にスキルマネージャーを自動リフレッシュし、状態の同期を確保
- **インストールソース拡張**スキルインストールが複数のソース形式URL、zip、ローカルファイルなどに対応し、ターゲットディレクトリを自動的に確保
## 🐛 その他の修正
- **Gemini 修正**Gemini の tool call が結果を返さない問題を修正
- **Agent リトライ**:空レスポンスのリトライ時に `tool_calls` が破棄されなくなりました
- **Docker 環境変数同期**Docker 環境で設定更新後に環境変数が同期されない問題を修正
- **Python 3.7 互換**Python 3.7 互換性のために `Literal` のインポートを遅延
- **モデル切替通知**:モデル切替後に bot_type 変更通知が表示されない問題を修正。Thanks @6vision
- **設定コマンド**`/config` で `enable_thinking` の設定が可能に
- **思考表示**:深い思考の表示がデフォルトで無効に
## 📦 アップグレード
`cow update` または `./run.sh update` でアップグレード、またはコードを手動で pull して再起動。詳細は[アップグレードガイド](https://docs.cowagent.ai/ja/guide/upgrade)を参照。
**リリース日**2026.04.22 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.6...master)

View File

@@ -0,0 +1,68 @@
---
title: v2.0.8
description: CowAgent 2.0.8 - 飛書チャネル全面アップグレード(音声、ストリーミングタイプライター、ワンクリック QR アプリ作成、DeepSeek V4 / 百度千帆 ERNIE 5.0 サポート、スケジューラ記憶強化および複数の修正
---
## 🪶 飛書チャネル全面アップグレード
### 1. ワンクリック QR スキャンでアプリ作成
飛書オープンプラットフォームで手動でアプリを作成し、権限とイベントサブスクリプションを設定する必要がなくなりました。Web コンソールおよびコマンドライン起動時に `feishu_app_id` が未設定の場合、QR スキャン入口が自動的に表示されます。飛書でスキャン・認可するとボットが自動作成され、設定が自動で書き戻され、すぐに使い始められます。
ドキュメント:[飛書チャネル](https://docs.cowagent.ai/ja/channels/feishu)
### 2. 音声メッセージ送受信
ユーザーから送られた飛書の音声メッセージを受信し、自動的にテキストへ変換できるようになりました。返信も TTS による音声形式に対応。中国語の短い音声メッセージの認識精度も改善されています。
### 3. ストリーミングタイプライター返信
飛書 CardKit ストリーミングカードを統合し、**デフォルト有効**で Web コンソールと同等の体験を提供:
- マルチターンの Agent シナリオで、中間メッセージと最終回答を別カードで表示
- DeepSeek など高頻度出力モデル向けに最適化、Web コンソールと同等の速度を実現
- 非対応時は自動的に通常のテキスト返信にフォールバック、手動設定不要
- 飛書クライアント ≥ 7.20 が必要
飛書の音声メッセージ送受信とストリーミングタイプライターのベース機能はコミュニティ貢献 #2791 によるものです。Thanks [@ooaaooaa123](https://github.com/ooaaooaa123)
## 🤖 新モデルサポート
- **DeepSeek V4 シリーズ**`deepseek-v4-pro` / `deepseek-v4-flash` を追加、デフォルトモデルを `deepseek-v4-flash` に切り替え
- **思考モデルスイッチの統一**DeepSeek V4、Qwen3 など思考対応モデルの切り替え動作を `enable_thinking` に統一
- **百度千帆 / ERNIE のファーストクラス対応**:新たな `qianfan` プロバイダーを追加。`ernie-5.0`(デフォルト推奨)、`ernie-x1.1`、`ernie-4.5-turbo-128k`、`ernie-4.5-turbo-32k` をサポート。`qianfan_api_key` / `qianfan_api_base` の独立設定により OpenAI 設定を汚染せず、旧来の `wenxin` / `wenxin-4` パスも完全互換 #2790 Thanks [@jimmyzhuu](https://github.com/jimmyzhuu)
ドキュメント:[百度千帆 / ERNIE](https://docs.cowagent.ai/ja/models/qianfan)
## 🌐 翻訳プロバイダー
- **有道翻訳を追加**`translate/` モジュールに有道翻訳プロバイダーを追加。v3 SHA-256 署名方式に対応し、`zh` / `zh-TW` などの ISO 639-1 言語コードを自動マッピング #2797 Thanks [@Zmjjeff7](https://github.com/Zmjjeff7)
## 🛠 OpenAI クライアントのリファクタリング
- **SDK 依存を排除**OpenAI Bot をネイティブ HTTP クライアントに刷新、起動が軽量化、依存衝突も削減
- **Web コンソールヒント**:モデル設定の API Base 入力欄にバージョンパスのプレースホルダーヒントを追加
## ⏰ スケジューラ記憶強化
- **タスク結果への追問**:定期タスクの実行結果を受信側のセッション履歴に自動注入。次のターンでコンテキストを再説明することなくそのまま追問可能 Thanks [@huangrichao2020](https://github.com/huangrichao2020)
- **長期記憶を汚染しない**:注入されたスケジューラ対話は毎日の記憶フラッシュ対象から除外され、高頻度タスクで記憶ストアが埋まることを防止
- **遅くなり続ける問題を回避**:スケジューラ自身のコンテキスト長を自動制限、長期反復実行でも蓄積して応答を遅延させません
## 🔧 ツールと安全性
- **Vision モデル選択**`tool.vision.model` 設定が実際に反映されるようになり、未設定時は自動フォールバック #2792
- **Bash セーフティ確認**:破壊的削除の確認プロンプトをワークスペース外のパスに限定。ワークスペース内の通常操作は中断されません
## 🐛 その他の修正
- マルチインスタンス環境で Deep Dream が重複実行される問題を修正
- DeepSeek マルチターン会話の一部の履歴ターンで `reasoning_content` が欠落する問題を修正
## 📦 アップグレード
ソースコードデプロイは `cow update` または `./run.sh update` でワンクリックアップグレード、または最新コードを手動で pull して再起動してください。詳細は[アップグレードガイド](https://docs.cowagent.ai/ja/guide/upgrade)を参照。
> ⚠️ 飛書のワンクリックアプリ作成は `lark-oapi>=1.5.5` が必要です。`cow update` は自動で取得します。手動デプロイの場合は依存関係の更新を確認してください。
**リリース日**2026.05.05 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.7...2.0.8)

View File

@@ -0,0 +1,158 @@
---
title: image-generation - 画像生成
description: テキストから画像生成 / 画像編集 / 複数画像の融合、複数プロバイダーの自動ルーティングとフォールバック対応
---
汎用の画像生成・編集スキルです。OpenAI、Gemini、SeedreamVolcengine Ark、QwenDashScope、MiniMax、LinkAI の 6 社に対応。モデルを手動で選ぶ必要はなく、固定の優先順位に従って、設定済みのプロバイダーを自動的に選択します。
## モデル選択
`image-generation` は「固定優先度 + 自動フォールバック」のストラテジーを採用しています。API Key を設定するだけで使えます:
1. **優先順位**: `OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI`
2. **未設定のプロバイダーはスキップ**: API Key が設定されているプロバイダーのみが参加
3. **失敗時は自動で次へ**: 401、モデル未開通、ネットワークエラーなどの場合、次のプロバイダーを試行
4. **モデル指定時は前置**: 特定のモデル名を渡すと、そのプロバイダーが最前列に昇格
### 対応モデル
| プロバイダー | モデル / エイリアス | 特徴 |
| --- | --- | --- |
| OpenAI | `gpt-image-2`、`gpt-image-1` | 汎用テキスト→画像、高品質、`quality` パラメータ対応 |
| Gemini Nano Banana | `nano-banana-2`、`nano-banana-pro`、`nano-banana` | `gemini-3.1-flash`、`gemini-3-pro`、`gemini-2.5-flash` の画像バージョン |
| SeedreamVolcengine Ark | `seedream-5.0-lite`、`seedream-4.5` | ネイティブ 2K4K、最大 14 枚の参照画像を融合 |
| QwenDashScope | `qwen-image-2.0`、`qwen-image-2.0-pro` | 中国語テキスト描画やテキスト・画像レイアウトに強い |
| MiniMax | `image-01` | シンプルで高速な画像生成 |
| LinkAI | 任意のモデル | 汎用プロキシ、フォールバック用 |
<Note>
デフォルトでは Agent はモデルを選ばず、自動ルーティングを使用します。特定のモデルを使いたい場合は、会話で直接指定してください「seedream で猫を描いて」「gpt-image-2 でポスターを作って」)。下記の「カスタム設定」でデフォルトモデルを固定することもできます。
</Note>
## カスタム設定
### API Key の設定
**少なくとも 1 つ**のプロバイダーの Key が必要です。複数設定すると自動フォールバックが有効になります。設定方法は 3 通り:
#### 方法 1既存のモデル Key を自動再利用
Web コンソールや `config.json` で対話モデルの Key`openai_api_key`、`gemini_api_key` など)を設定済みの場合、起動時にこれらの Key は対応する環境変数に**自動同期**されます。つまり、対話モデルが使えていれば、画像生成も同じ Key で追加設定なしに利用できます。
#### 方法 2config.json で設定
`config.json` に Key フィールドを直接記述:
```json
{
"openai_api_key": "sk-xxx",
"openai_api_base": "https://api.openai.com/v1",
"gemini_api_key": "AIza-xxx",
"ark_api_key": "xxx",
"dashscope_api_key": "sk-xxx",
"minimax_api_key": "xxx",
"linkai_api_key": "xxx"
}
```
変更後は再起動が必要です。各 Key には対応する `*_api_base` フィールドがあり、カスタムエンドポイントを指定できます。
#### 方法 3会話で直接設定
チャットで API Key を送信すると、Agent が `env_config` ツールで `~/cow/.env` に保存します。**再起動不要**でただちに反映されます。例:
```
OPENAI_API_KEY を sk-xxx に設定して
```
または:
```
ARK_API_KEY を xxx に設定して
```
### API Key 一覧
| 環境変数 | config.json フィールド | プロバイダー | デフォルト Base URL |
| --- | --- | --- | --- |
| `OPENAI_API_KEY` | `openai_api_key` | OpenAI | `https://api.openai.com/v1` |
| `GEMINI_API_KEY` | `gemini_api_key` | Gemini | `https://generativelanguage.googleapis.com` |
| `ARK_API_KEY` | `ark_api_key` | Volcengine ArkSeedream | `https://ark.cn-beijing.volces.com/api/v3` |
| `DASHSCOPE_API_KEY` | `dashscope_api_key` | Alibaba DashScopeQwen | `https://dashscope.aliyuncs.com` |
| `MINIMAX_API_KEY` | `minimax_api_key` | MiniMax | `https://api.minimaxi.com` |
| `LINKAI_API_KEY` | `linkai_api_key` | LinkAI | `https://api.link-ai.tech` |
### デフォルトモデルの固定
すべての画像生成を特定のプロバイダーのモデルで固定したい場合、`config.json` に以下を追加:
```json
"skill": {
"image-generation": {
"model": "seedream-5.0-lite"
}
}
```
起動時にこの設定は環境変数 `SKILL_IMAGE_GENERATION_MODEL` に自動変換され、スクリプトはこのモデルのプロバイダーを常に使用します。
## 有効化と無効化
`image-generation` は内蔵スキルで、**API Key に基づいてステータスが自動調整**されます:
- **Key 設定済み**:スキルはアクティブ — Agent は画像生成リクエストを受けると呼び出す
- **Key 未設定**:スキルはコンテキストに表示される(「設定が必要」とマーク)— Agent は呼び出し失敗の代わりに Key の設定を案内する
手動で制御する場合:
```text
/skill disable image-generation # 無効化Key があっても呼び出されない)
/skill enable image-generation # 再有効化
```
ターミナルでは `cow skill disable image-generation` / `cow skill enable image-generation`。
## パラメータ
| パラメータ | 型 | 必須 | デフォルト | 説明 |
| --- | --- | --- | --- | --- |
| `prompt` | string | はい | — | 画像の説明 |
| `image_url` | string / list | いいえ | null | 編集用の入力画像。ローカルパスまたは URL。複数指定で複数画像融合 |
| `quality` | string | いいえ | auto | `low` / `medium` / `high` — 一部のプロバイダーのみ対応 |
| `size` | string | いいえ | auto | `512` / `1K` / `2K` / `3K` / `4K`、またはピクセル値(例: `1024x1024` |
| `aspect_ratio` | string | いいえ | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9`Gemini は `1:4` / `4:1` / `1:8` / `8:1` にも対応 |
<Warning>
**品質が高いほど・解像度が大きいほど、コストが高く、時間がかかります。**
- 日常の会話やプレビューにはデフォルト(`auto`)、または `quality=low` + `size=1K` を使用 — 約 20 秒で生成
- ポスターやユーザーが高解像度を明示的に要求した場合は `quality=high` + `size=2K/4K` — モデルによって 1〜5 分かかる場合があります
</Warning>
## 出力
成功時:
```json
{
"model": "doubao-seedream-5-0-260128",
"images": [
{"url": "/path/to/output.png"}
]
}
```
失敗時:`{ "error": "..." }`。エラー後は**直接リトライしないでください** — ほぼ確実に設定の問題ですKey の誤り、API ベース URL の不一致、モデル未開通など)。まず設定を修正してから再試行してください。
## よくある使い方
- **テキスト→画像**:説明からイラスト、ポスター、アイコン、アバター、絵コンテなどを生成
- **画像→画像**:既存の画像のスタイル変更、要素の入れ替え、装飾やテキストの追加
- **複数画像の融合**:複数の参照画像を 1 枚に合成(着せ替え、キャラクター集合写真など)
<Note>
- bash タイムアウトは 600 秒に設定してください。各プロバイダーの HTTP タイムアウトは 300 秒ですが、スクリプトが複数のプロバイダーを順番に試行する場合があります
- 入力画像は自動的に 4 MB 以下・最長辺 4096 px 以下に圧縮されます
- Gemini / Seedream / Qwen / MiniMax は `quality` パラメータに対応していません(渡しても無視されます)
- Seedream のデフォルトは 2K。`seedream-5.0-lite` は 3K まで、`seedream-4.5` は 4K まで対応
</Note>

View File

@@ -0,0 +1,112 @@
---
title: knowledge-wiki - ナレッジベース
description: ローカルの構造化ナレッジベースを管理し、自動でアーカイブ・分類・相互参照を行う
---
会話で生まれた資料、アイデア、メモをローカルの構造化ナレッジベースに整理し、インデックスとページ間の相互参照を自動で維持します。
`knowledge-wiki` はワークスペース内の `knowledge/` ディレクトリを管理します。Agent の「外部メモリ」のようなものです。`always: true` が設定されているため**常にコンテキストにロード**され、外部依存は不要です。
## いつ起動するか
- 記事、ドキュメント、URL を共有して、後で参照できるように残したいとき
- 会話の中で長期保存に値する結論が出たとき
- 以前蓄積したナレッジを調べたいとき
## ディレクトリ構成
```
knowledge/
├── index.md # グローバルインデックス(必ずメンテナンスする)
├── log.md # 操作ログ(追記のみ)
└── <category>/ # カテゴリサブディレクトリ(内容ごとにグループ化)
└── <slug>.md # ナレッジページ(小文字ハイフン区切りのファイル名)
```
## 3 つの基本操作
### 1. 収録Ingest
資料を共有すると、Agent は:
1. 原文を読んで理解し、重要な情報を抽出
2. どのカテゴリに属するか判断 — まず `index.md` をチェックし、適切なカテゴリがなければ新規作成
3. `knowledge/<category>/<slug>.md` にナレッジページを生成
4. インデックス `index.md` とログ `log.md` を更新
### 2. 統合Synthesize
会話の中で新しい結論やインサイトが生まれたとき:
1. 適切なカテゴリの下に新しいナレッジページを作成
2. 関連する既存ページに相互リンクを追加
3. インデックスとログを更新
### 3. 検索Query
以前蓄積したナレッジについて質問されたとき:
1. `index.md` から関連しそうなページを探す
2. `read` ツールで具体的なページを開く
3. 必要に応じて `memory_search` で補完検索
4. 回答にナレッジページへのリンクを含め、ユーザーが原文を確認できるようにする
## ページの書き方
```markdown
# ページタイトル
> Source: <ソース URL または簡単な説明>
本文。ページ間は相対パスでリンク:
[関連ページ](../category/related-page.md)
## 要点
- ...
## 関連ページ
- [ページ A](../category/page-a.md) — 関連する理由
```
<Note>
- `> Source:` はこのナレッジの出典を記録します。明確な出典がある場合は必ず記載してください
- 相互参照は重要です:ページを作成・更新したら、関連ページにも逆リンクを追加してください
- **既に存在するページにのみリンクしてください**。ある概念が独立ページに値する場合は、先にページを作成してからリンクを追加してください
</Note>
## インデックス形式
`knowledge/index.md` はフラットリスト形式で、カテゴリごとにグループ化し、各ナレッジページを 1 行で表します:
```markdown
# Knowledge Index
## カテゴリ A
- [ページタイトル](category-a/page-slug.md) — 一行の要約
## カテゴリ B
- [ページタイトル](category-b/page-slug.md) — 一行の要約
```
テーブルや絵文字は使いません。カテゴリ名や構成は柔軟に調整できます。
## ログ形式
`knowledge/log.md` は追記のみ、最新のエントリが一番下:
```markdown
## [YYYY-MM-DD] ingest | ページタイトル
## [YYYY-MM-DD] synthesize | ページタイトル
```
## 執筆ガイドライン
- **ファイル名**は小文字+ハイフン(例: `machine-learning.md`
- **1 ページ 1 トピック** — 関連コンテンツはリンクで繋ぐ
- **重複ページを作らず、既存ページを更新する**
- **変更のたびにインデックスを更新する**`knowledge/index.md`
- **要点を抽出し、全文をコピーしない**
- **会話中にナレッジページを参照する際はフルパスを使用**(例: `[タイトル](knowledge/<category>/<slug>.md)`)。ページ間の相互リンクのみ相対パスを使用
- **ナレッジページに基づいて回答する際はリンクを含める** — ユーザーが詳細を確認できるように

View File

@@ -0,0 +1,180 @@
---
title: skill-creator - スキル作成
description: スキルの作成・インストール・更新、SKILL.md の書き方とディレクトリ構成の標準化
---
`skill-creator` は「メタスキル」です。Agent が他のスキルを作成・インストール・更新する際に呼び出され、すべてのスキルの `SKILL.md` の書き方とディレクトリ構成を統一します。
## いつ起動するか
- ユーザーが URL やリモートリポジトリからスキルをインストールしたいとき
- ユーザーが新しいスキルをゼロから作成したいとき
- 既存のスキルをアップグレード・リファクタリングする必要があるとき
## スキルとは
スキルは「再利用可能な説明書」にオプションのスクリプトやリソースを加えたものです。特定のドメインの専門知識を Agent に注入し、該当タスクをスペシャリストのように処理できるようにします。
スキルには通常、以下が含まれます:
1. **専門ワークフロー** — ある種のタスクの完全な手順
2. **ツールの使い方** — 特定の API やファイル形式の処理方法
3. **ドメイン知識** — チームの規約、ビジネスルール、データ構造など
4. **付属リソース** — スクリプト、参考ドキュメント、テンプレートなど
<Note>
**基本原則:省けるものは省く。** Agent が自力で推測できない内容だけを書きましょう。1 行追加するたびに「このトークンコストに見合うか?」と自問してください。
</Note>
## ディレクトリ構成
```
skill-name/
├── SKILL.md # 必須:スキル定義
│ ├── YAML frontmattername / description は必須)
│ └── Markdown 本文(説明 + 例)
└── オプションリソース
├── scripts/ # 実行可能スクリプトPython / Bash など)
├── references/ # 分量が多い参考ドキュメントAgent が必要時に読む)
└── assets/ # テンプレート、アイコンなど(出力に直接使われるもの)
```
## SKILL.md 仕様
SKILL.md ヘッダーの `frontmatter` フィールド:
| フィールド | 説明 |
| --- | --- |
| `name` | スキル名。小文字+ハイフン、ディレクトリ名と一致させる |
| `description` | **最も重要なフィールド**。「このスキルが何をするか」「いつ使うべきか」を明記する。Agent はこれを見て呼び出すかどうかを判断する。トリガーに関する記述はすべてここに書き、本文には書かない |
| `metadata.cowagent.requires.bins` | システムに必要な CLI ツール |
| `metadata.cowagent.requires.env` | 必要な環境変数(すべて揃っている必要がある) |
| `metadata.cowagent.requires.anyEnv` | 複数の API Key のうち 1 つあればよい |
| `metadata.cowagent.requires.anyBins` | 複数のツールのうち 1 つあればよい |
| `metadata.cowagent.always` | `true` にすると常にロードされ、依存チェックをスキップ |
| `metadata.cowagent.emoji` | 表示用の絵文字(任意) |
| `metadata.cowagent.os` | OS 制限、例: `["darwin", "linux"]` |
<Note>
`category` フィールドは手動で設定する必要はありません。システムが自動的に `skill` に設定します。
</Note>
API Key 依存の宣言方法は 2 通り:
```yaml
metadata:
cowagent:
requires:
env: ["MYAPI_KEY"] # 必須
```
```yaml
metadata:
cowagent:
requires:
anyEnv: ["OPENAI_API_KEY", "LINKAI_API_KEY"] # いずれか 1 つ
```
**スキルは依存関係に基づいて自動的に有効/無効になります**:環境変数が揃えば自動有効、不足すれば自動無効。手動で `/skill enable` する必要はありません。
## リソースディレクトリの使い方
| ディレクトリ | 入れるもの | 入れないもの |
| --- | --- | --- |
| `scripts/` | 繰り返し実行するコード、確定的な結果が必要なスクリプト | デモ用のコード片 |
| `references/` | **500 行超**で SKILL.md に収まらない大きなドキュメント(完全な DB スキーマなど) | 一般的な API ドキュメント、チュートリアル |
| `assets/` | 最終出力に含まれるファイル(テンプレート、アイコン、ボイラープレートなど) | 説明用ドキュメント |
<Warning>
**原則としてすべての内容を `SKILL.md` に書きます** — リソースディレクトリに分割するのは本当に収まらない場合だけです。
`README.md`、`CHANGELOG.md`、`INSTALLATION_GUIDE.md` などをスキルに追加しないでください。すべて `SKILL.md` に入れましょう。リソースディレクトリには実際に実行するスクリプトや実際に使う素材だけを配置してください。
</Warning>
## 外部スキルのインストール
インストール後、スキルは `<workspace>/skills/<name>/` に配置されます。
| ソース | インストール方法 |
| --- | --- |
| URL単一ファイル | curl / web_fetch |
| URLzip アーカイブ) | ダウンロードして展開 |
| ローカル SKILL.md | 直接読み込み |
| ローカル zip アーカイブ | 展開 |
インストール手順:
1. `SKILL.md` を見つける(アーカイブのルートまたはサブディレクトリにある場合がある)
2. frontmatter から `name` を読み取る
3. **スキルディレクトリ全体**`SKILL.md`、`scripts/`、`assets/` など)を `<workspace>/skills/<name>/` にコピー
4. アーカイブに `INSTALL.md` などのセットアップスクリプトがあれば実行するが、最終的に `<workspace>/skills/<name>/` に収まっている必要がある
## スキルをゼロから作成
推奨手順:
1. **要件を明確にする** — ユーザーに具体的なユースケースをいくつか挙げてもらう(一度に多く聞きすぎない)
2. **構成を計画する** — スクリプトは必要か?参考ドキュメントは?テンプレートは?
3. **スキャフォールド** — 初期化スクリプトを使用:
```bash
scripts/init_skill.py <skill-name> --path <workspace>/skills [--resources scripts,references,assets] [--examples]
```
4. **内容を埋める** — SKILL.md を書き、スクリプトとリソースを追加。スクリプトは必ず実行テストする
5. **バリデーション**(任意):
```bash
scripts/quick_validate.py <workspace>/skills/<skill-name>
```
6. **イテレーション** — 実際の使用フィードバックに基づいて継続的に改善
## 命名規則
- 小文字、数字、ハイフンのみ使用。ユーザーの入力は正規化する(例: `Plan Mode` → `plan-mode`
- 64 文字以内
- 短く、動詞で始め、一目で何をするか分かるように
- 必要に応じてツール名をプレフィックスにする(例: `gh-address-comments`、`linear-address-issue`
- ディレクトリ名と `name` フィールドは完全に一致させる
## 3 段階ローディング
スキルは一度にすべてコンテキストに読み込まれるわけではなく、3 段階で必要に応じてロードされます:
1. **メタ情報**`name` + `description` — 常にコンテキスト内(約 100 語。Agent がスキルを使うかどうかの判断に使用
2. **SKILL.md 本文** — スキルが有効化されたときだけロード。500 行以内を推奨
3. **リソースファイル** — Agent が必要なときに読み込む
複数のバリエーション(例: マルチクラウドデプロイ)を持つスキルは次のように整理:
```
cloud-deploy/
├── SKILL.md # メインワークフローとプロバイダー選択ロジック
└── references/
├── aws.md
├── gcp.md
└── azure.md
```
ユーザーが AWS を選んだら、Agent は `aws.md` だけを読みます。3 社分のドキュメントをすべてロードする必要はありません。
## よくあるデザインパターン
**ステップ式**:番号付きの手順と対応スクリプト。
```markdown
1. フォーム構造を分析analyze_form.py を実行)
2. フィールドマッピングを生成fields.json を編集)
3. フォームを自動入力fill_form.py を実行)
```
**分岐式**:ユーザーの意図に応じて異なるフローへ。
```markdown
1. 操作タイプを判定:
**新規作成?** → 「作成フロー」へ
**既存の編集?** → 「編集フロー」へ
```
**テンプレート式**出力形式に厳密な要件がある場合、SKILL.md にテンプレートを含め、Agent にそれに従って出力させる。

View File

@@ -23,11 +23,12 @@ Vision ツールは多段階の自動選択+自動フォールバック戦略
| ベンダー | ビジョンモデル | 説明 |
| --- | --- | --- |
| OpenAI / 互換プロトコル | メインモデル | すべての OpenAI 互換マルチモーダルモデルに対応 |
| Baidu Qianfan | メインモデル | 多モーダルの主モデル(`ernie-5.0` など)は直接画像を処理。テキスト専用主モデルの場合は `ernie-4.5-turbo-vl` に自動フォールバック |
| 通義千問 (DashScope) | メインモデル | MultiModalConversation API 経由 |
| Claude | メインモデル | Anthropic ネイティブ画像形式 |
| Gemini | メインモデル | inlineData 形式 |
| 豆包 (Doubao) | メインモデル | doubao-seed-2-0 シリーズがネイティブ対応 |
| Kimi (Moonshot) | メインモデル | kimi-k2.5 がネイティブ対応 |
| Kimi (Moonshot) | メインモデル | kimi-k2.6、kimi-k2.5 がネイティブ対応 |
| 智谱 AI | glm-5v-turbo | 常にビジョン専用モデルを使用 |
| MiniMax | MiniMax-Text-01 | 常にビジョン専用モデルを使用 |
@@ -52,7 +53,7 @@ Vision ツールで使用するモデルを指定するには、`config.json`
{
"tool": {
"vision": {
"model": "gpt-4o"
"model": "ernie-4.5-turbo-vl"
}
}
}

View File

@@ -5,6 +5,8 @@ description: CowAgent 的长期记忆系统 — 文件持久化、自动写入
长期记忆保存在工作空间文件中跨会话持久存在。Agent 在对话中通过检索工具按需加载历史记忆,也会在上下文裁剪时自动将对话摘要写入长期记忆。
<img src="https://cdn.link-ai.tech/doc/memory-architecture-zh.jpeg" alt="Memory Architecture" />
## 记忆类型
### 核心记忆MEMORY.md
@@ -39,20 +41,25 @@ Agent 通过以下机制自动将对话内容持久化为长期记忆:
Agent 会在对话中根据需要自动触发记忆检索,将相关历史信息纳入上下文。检索结果按混合评分排序(默认向量权重 0.7、关键词权重 0.3),日级记忆会随时间衰减(半衰期 30 天),核心记忆不衰减。
## 首次启动
## 相关文件
首次启动 Agent 时Agent 会主动向用户询问关键信息,并记录至工作空间(默认 `~/cow`)中:
工作空间(默认 `~/cow`)中与记忆相关的文件
| 文件 | 说明 |
| --- | --- |
| `system.md` | Agent 的系统提示词和行为设定 |
| `user.md` | 用户身份信息和偏好 |
| `AGENT.md` | Agent 的人格和行为设定 |
| `USER.md` | 用户身份信息和偏好 |
| `RULE.md` | 自定义规则和约束 |
| `MEMORY.md` | 核心记忆(长期) |
| `memory/YYYY-MM-DD.md` | 日级记忆(按需创建) |
| `memory/dreams/YYYY-MM-DD.md` | 梦境日记Deep Dream 自动生成) |
## Web 控制台
在 Web 控制台的记忆管理页面中,可浏览记忆文件和梦境日记,支持通过 Tab 切换查看:
<Frame>
<img src="https://cdn.link-ai.tech/doc/20260203000455.png" width="800" />
<img src="https://cdn.link-ai.tech/doc/20260414171014.png" width="800" />
</Frame>
## 相关配置

View File

@@ -12,6 +12,6 @@ description: Claude 模型配置
| 参数 | 说明 |
| --- | --- |
| `model` | 支持 `claude-sonnet-4-6`、`claude-opus-4-6`、`claude-sonnet-4-5`、`claude-sonnet-4-0`、`claude-3-5-sonnet-latest` 等,参考 [官方模型](https://docs.anthropic.com/en/docs/about-claude/models/overview) |
| `model` | 支持 `claude-sonnet-4-6`、`claude-opus-4-7`、`claude-opus-4-6`、`claude-sonnet-4-5`、`claude-sonnet-4-0`、`claude-3-5-sonnet-latest` 等,参考 [官方模型](https://docs.anthropic.com/en/docs/about-claude/models/overview) |
| `claude_api_key` | 在 [Claude 控制台](https://console.anthropic.com/settings/keys) 创建 |
| `claude_api_base` | 可选,默认为 `https://api.anthropic.com/v1`,修改可接入第三方代理 |

View File

@@ -99,27 +99,6 @@ description: Coding Plan 模式模型配置
---
## Kimi
```json
{
"bot_type": "openai",
"model": "kimi-for-coding",
"open_ai_api_base": "https://api.kimi.com/coding/v1",
"open_ai_api_key": "YOUR_API_KEY"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | `kimi-for-coding` |
| `open_ai_api_base` | `https://api.kimi.com/coding/v1` |
| `open_ai_api_key` | Coding Plan 专用 Key与按量计费接口不通用 |
官方文档:[Key 获取](https://www.kimi.com/code/docs/)
---
## 火山引擎
```json
@@ -138,3 +117,24 @@ description: Coding Plan 模式模型配置
| `open_ai_api_key` | API Key 与普通接口通用 |
官方文档:[快速开始](https://www.volcengine.com/docs/82379/1928261?lang=zh)
---
## Kimi
```json
{
"bot_type": "moonshot",
"model": "kimi-for-coding",
"moonshot_base_url": "https://api.kimi.com/coding/v1",
"moonshot_api_key": "YOUR_API_KEY"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | 填写 `kimi-for-coding` 会自动更新模型,或指定模型例如 `kimi-k2.6` |
| `moonshot_base_url` | `https://api.kimi.com/coding/v1` |
| `moonshot_api_key` | Coding Plan 专用 Key与按量计费接口不通用 |
官方文档:[Key 获取](https://www.kimi.com/code/docs/)

62
docs/models/custom.mdx Normal file
View File

@@ -0,0 +1,62 @@
---
title: 自定义
description: 自定义厂商配置,适用于第三方 API 代理和本地模型
---
适用于通过 OpenAI 兼容协议接入的第三方模型服务或本地部署的模型,例如:
- **第三方 API 代理**:使用统一的 API Base 调用多种模型
- **本地模型**:通过 Ollama、vLLM、LocalAI 等工具在本地部署的模型
- **私有化部署**:企业内部部署的模型服务
<Note>
与 `openai` 厂商的区别:选择自定义厂商后,通过 `/config model` 切换模型时,不会自动切换厂商类型,始终使用自定义的 API 地址。
</Note>
## 配置方式
### 第三方 API 代理
```json
{
"bot_type": "custom",
"model": "",
"custom_api_key": "YOUR_API_KEY",
"custom_api_base": "https://{your-proxy.com}/v1"
}
```
| 参数 | 说明 |
| --- | --- |
| `bot_type` | 必须设为 `custom` |
| `model` | 模型名称,填写代理服务支持的任意模型名 |
| `custom_api_key` | API 密钥,由代理服务提供 |
| `custom_api_base` | API 地址,由代理服务提供,需兼容 OpenAI 协议 |
### 本地模型
本地模型通常不需要 API Key只需填写 API Base 即可:
```json
{
"bot_type": "custom",
"model": "qwen3.5:27b",
"custom_api_base": "http://localhost:11434/v1"
}
```
常见的本地部署工具及默认地址:
| 工具 | 默认 API Base |
| --- | --- |
| [Ollama](https://ollama.com) | `http://localhost:11434/v1` |
| [vLLM](https://docs.vllm.ai) | `http://localhost:8000/v1` |
| [LocalAI](https://localai.io) | `http://localhost:8080/v1` |
## 切换模型
自定义厂商下切换模型时,只会修改 `model`,不会改变 `bot_type` 和 API 地址:
```
/config model qwen3.5:27b
```

View File

@@ -7,25 +7,57 @@ description: DeepSeek 模型配置
```json
{
"model": "deepseek-chat",
"model": "deepseek-v4-flash",
"deepseek_api_key": "YOUR_API_KEY"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | `deepseek-chat`DeepSeek-V3.2,非思考模式)、`deepseek-reasoner`DeepSeek-R1思考模式 |
| `model` | 支持 `deepseek-v4-flash`(默认)、`deepseek-v4-pro` |
| `deepseek_api_key` | 在 [DeepSeek 平台](https://platform.deepseek.com/api_keys) 创建 |
| `deepseek_api_base` | 可选,默认为 `https://api.deepseek.com/v1`,可修改为第三方代理地址 |
## 模型选择
| 模型 | 适用场景 |
| --- | --- |
| `deepseek-v4-flash` | 默认推荐,速度快、成本低 |
| `deepseek-v4-pro` | 更智能、复杂任务效果更强 |
## 思考模式
V4 系列(`deepseek-v4-flash` / `deepseek-v4-pro`)支持显式的"思考模式":模型在输出最终回答前,先输出一段思维链(`reasoning_content`),从而提升答案质量。
### 开关
通过全局配置 `enable_thinking` 控制:
```json
{
"enable_thinking": true
}
```
- `true`所有渠道下模型都会先思考再作答。Web 控制台会展示思考过程IM 渠道(微信 / 企微 / 钉钉 / 飞书)虽不展示但同样获得更好答案。
- `false`:关闭思考,响应更快,首字延迟更低。
### 行为说明
- **采样参数**:思考模式下 `temperature`、`top_p`、`presence_penalty`、`frequency_penalty` 会被服务端忽略不会报错CowAgent 会自动跳过传入。
- **多轮工具调用**当历史中包含工具调用时DeepSeek 要求所有 assistant 消息必须回传 `reasoning_content`。CowAgent 会自动处理回传逻辑,跨轮次切换思考开关也不会出错。
<Tip>
默认使用 `deepseek-v4-flash`;复杂任务可使用 `deepseek-v4-pro`;需要深度思考可开启 `enable_thinking`。
</Tip>
方式二OpenAI 兼容方式接入:
```json
{
"model": "deepseek-chat",
"model": "deepseek-v4-flash",
"bot_type": "openai",
"open_ai_api_key": "YOUR_API_KEY",
"open_ai_api_base": "https://api.deepseek.com/v1"
}
```

View File

@@ -5,14 +5,14 @@ description: 智谱AI GLM 模型配置
```json
{
"model": "glm-5-turbo",
"model": "glm-5.1",
"zhipu_ai_api_key": "YOUR_API_KEY"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | 可填 `glm-5-turbo`、`glm-5`、`glm-4.7`、`glm-4-plus`、`glm-4-flash`、`glm-4-air` 等,参考 [模型编码](https://bigmodel.cn/dev/api/normal-model/glm-4) |
| `model` | 可填 `glm-5.1`、`glm-5-turbo`、`glm-5`、`glm-4.7`、`glm-4-plus`、`glm-4-flash`、`glm-4-air` 等,参考 [模型编码](https://bigmodel.cn/dev/api/normal-model/glm-4) |
| `zhipu_ai_api_key` | 在 [智谱AI 控制台](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) 创建 |
也支持 OpenAI 兼容方式接入:
@@ -20,7 +20,7 @@ description: 智谱AI GLM 模型配置
```json
{
"bot_type": "openai",
"model": "glm-5-turbo",
"model": "glm-5.1",
"open_ai_api_base": "https://open.bigmodel.cn/api/paas/v4",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -6,7 +6,7 @@ description: CowAgent 支持的模型及推荐选择
CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在项目的 `models/` 目录下。
<Note>
Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:MiniMax-M2.7、glm-5-turbo、kimi-k2.5、qwen3.6-plus、claude-sonnet-4-6、gemini-3.1-pro-preview
Agent 模式下推荐使用以下模型,可根据效果及成本综合选择:deepseek-v4-flash、MiniMax-M2.7、claude-sonnet-4-6、gemini-3.1-pro-preview、glm-5.1、qwen3.6-plus、kimi-k2.6、ernie-5.0
同时支持使用 [LinkAI](https://link-ai.tech) 平台接口,可灵活切换多种模型,并支持知识库、工作流、插件等 Agent 能力。
</Note>
@@ -23,21 +23,15 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
## 支持的模型
<CardGroup cols={2}>
<Card title="DeepSeek" href="/models/deepseek">
deepseek-v4-flash、deepseek-v4-pro 等
</Card>
<Card title="百度千帆 / ERNIE" href="/models/qianfan">
ernie-5.0、ernie-4.5-turbo-128k 等
</Card>
<Card title="MiniMax" href="/models/minimax">
MiniMax-M2.7 等系列模型
</Card>
<Card title="智谱 GLM" href="/models/glm">
glm-5-turbo、glm-5 等系列模型
</Card>
<Card title="通义千问 Qwen" href="/models/qwen">
qwen3.6-plus、qwen3-max 等
</Card>
<Card title="Kimi" href="/models/kimi">
kimi-k2.5、kimi-k2 等
</Card>
<Card title="豆包 Doubao" href="/models/doubao">
doubao-seed 系列模型
</Card>
<Card title="Claude" href="/models/claude">
claude-sonnet-4-6 等
</Card>
@@ -47,12 +41,24 @@ CowAgent 支持国内外主流厂商的大语言模型,模型接口实现在
<Card title="OpenAI" href="/models/openai">
gpt-5.4、gpt-4.1、o 系列等
</Card>
<Card title="DeepSeek" href="/models/deepseek">
deepseek-chat、deepseek-reasoner
<Card title="智谱 GLM" href="/models/glm">
glm-5.1、glm-5-turbo、glm-5 等系列模型
</Card>
<Card title="通义千问 Qwen" href="/models/qwen">
qwen3.6-plus、qwen3-max 等
</Card>
<Card title="豆包 Doubao" href="/models/doubao">
doubao-seed 系列模型
</Card>
<Card title="Kimi" href="/models/kimi">
kimi-k2.6、kimi-k2.5、kimi-k2 等
</Card>
<Card title="LinkAI" href="/models/linkai">
多模型统一接口 + 知识库
</Card>
<Card title="自定义" href="/models/custom">
第三方代理、本地模型等
</Card>
</CardGroup>

View File

@@ -5,14 +5,14 @@ description: Kimi (Moonshot) 模型配置
```json
{
"model": "kimi-k2.5",
"model": "kimi-k2.6",
"moonshot_api_key": "YOUR_API_KEY"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | 可填 `kimi-k2.5`、`kimi-k2`、`moonshot-v1-8k`、`moonshot-v1-32k`、`moonshot-v1-128k` |
| `model` | 可填 `kimi-k2.6`、`kimi-k2.5`、`kimi-k2`、`moonshot-v1-8k`、`moonshot-v1-32k`、`moonshot-v1-128k` |
| `moonshot_api_key` | 在 [Moonshot 控制台](https://platform.moonshot.cn/console/api-keys) 创建 |
也支持 OpenAI 兼容方式接入:
@@ -20,7 +20,7 @@ description: Kimi (Moonshot) 模型配置
```json
{
"bot_type": "openai",
"model": "kimi-k2.5",
"model": "kimi-k2.6",
"open_ai_api_base": "https://api.moonshot.cn/v1",
"open_ai_api_key": "YOUR_API_KEY"
}

View File

@@ -3,7 +3,7 @@ title: LinkAI
description: 通过 LinkAI 平台统一接入多种模型
---
通过 [LinkAI](https://link-ai.tech) 平台可灵活切换 OpenAI、Claude、Gemini、DeepSeek、Qwen、Kimi 等多种模型,并支持知识库、工作流、插件等 Agent 能力。
通过 [LinkAI](https://link-ai.tech) 平台可灵活切换 OpenAI、Claude、Gemini、DeepSeek、MiniMax、Qwen、Kimi 等多种模型,并支持知识库、工作流、插件等 Agent 能力。
```json
{

63
docs/models/qianfan.mdx Normal file
View File

@@ -0,0 +1,63 @@
---
title: 百度千帆
description: 百度千帆 ERNIE 模型配置
---
方式一:官方接入(推荐):
```json
{
"model": "ernie-5.0",
"qianfan_api_key": "",
"qianfan_api_base": "https://qianfan.baidubce.com/v2"
}
```
| 参数 | 说明 |
| --- | --- |
| `model` | 默认推荐使用 `ernie-5.0`;也可使用 `ernie-x1.1`、`ernie-4.5-turbo-128k`、`ernie-4.5-turbo-32k` |
| `qianfan_api_key` | 千帆 API Key格式通常以 `bce-v3/` 开头 |
| `qianfan_api_base` | 可选,默认为 `https://qianfan.baidubce.com/v2` |
## 模型选择
| 模型 | 适用场景 |
| --- | --- |
| `ernie-5.0` | 默认推荐,文心新一代旗舰模型,综合能力最强 |
| `ernie-x1.1` | 深度思考推理模型,幻觉更低、指令遵循与工具调用更强 |
| `ernie-4.5-turbo-128k` | 长上下文和通用对话 |
| `ernie-4.5-turbo-32k` | 通用对话,成本和上下文更均衡 |
## Vision 工具
配置 `qianfan_api_key` 后Agent 的 Vision 工具可以自动使用千帆视觉模型:
- 当主模型本身是多模态时(如 `ernie-5.0`、`ernie-x1.1`、`ernie-4.5-turbo-vl`),直接由主模型识别图像,无需额外配置
- 当主模型是纯文本时(如 `ernie-4.5-turbo-128k`Vision 工具会自动 fallback 到 `ernie-4.5-turbo-vl`
如需手动指定 Vision 模型,可在 `config.json` 中显式配置:
```json
{
"tool": {
"vision": {
"model": "ernie-4.5-turbo-vl"
}
}
}
```
方式二OpenAI 兼容方式接入:
```json
{
"model": "ernie-5.0",
"bot_type": "openai",
"open_ai_api_key": "",
"open_ai_api_base": "https://qianfan.baidubce.com/v2"
}
```
<Tip>
新配置推荐使用 `qianfan_api_key`。旧的 `wenxin`、`wenxin-4`、`baidu_wenxin_api_key`、`baidu_wenxin_secret_key` 配置仍保持兼容。
</Tip>

View File

@@ -5,6 +5,8 @@ description: CowAgent 版本更新历史
| 版本 | 日期 | 说明 |
| --- | --- | --- |
| [2.0.8](/releases/v2.0.8) | 2026.05.06 | 飞书渠道全面升级语音、流式输出和Markdown、扫码一键接入、DeepSeek V4和百度模型新增、定时任务工具增强 |
| [2.0.7](/releases/v2.0.7) | 2026.04.22 | 图像生成技能六厂商自动路由、新模型支持Kimi K2.6、Claude Opus 4.7、GLM 5.1、知识库增强、Web 控制台优化 |
| [2.0.6](/releases/v2.0.6) | 2026.04.14 | 项目更名、知识库系统、梦境记忆蒸馏、上下文智能压缩、Web 控制台多会话及多项优化 |
| [2.0.5](/releases/v2.0.5) | 2026.04.01 | Cow CLI、Skill Hub 开源、浏览器工具、企微扫码创建、多项优化和修复 |
| [2.0.4](/releases/v2.0.4) | 2026.03.22 | 新增个人微信通道、新模型支持、日文文档、脚本重构及多项修复 |

View File

@@ -12,7 +12,7 @@ description: CowAgent 2.0.6 - 知识库系统、梦境记忆蒸馏、上下文
## 📚 知识库系统
新增个人知识库系统Agent 可自主构建和维护结构化知识,并在对话中按需检索引用
新增个人知识库系统Agent 可自主构建和维护结构化知识,并在对话中按需检索引用
- **索引驱动的自组织结构**:知识库采用 `knowledge/` 目录,按分类自动组织,每个知识页面为独立的 Markdown 文件
- **自动写入**:向 Agent 发送文件、链接等知识,或在讨论中识别到有价值的知识时,自动创建或更新知识页面
@@ -22,9 +22,10 @@ description: CowAgent 2.0.6 - 知识库系统、梦境记忆蒸馏、上下文
<img src="https://cdn.link-ai.tech/doc/20260413105435.png" width="750" />
相关文档:[知识库](https://docs.cowagent.ai/knowledge)
Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
## 🌙 梦境记忆蒸馏Deep Dream
全新的记忆整理机制,每日自动将分散的对话记忆蒸馏为精炼的长期记忆:

64
docs/releases/v2.0.7.mdx Normal file
View File

@@ -0,0 +1,64 @@
---
title: v2.0.7
description: CowAgent 2.0.7 - 图像生成技能六厂商自动路由、新模型支持、知识库增强、Web 控制台优化及多项修复
---
## 🎨 图像生成技能
新增图像生成内置技能,支持文生图、图生图、多图融合,支持 `GPT-Image-2`、`Nano Banana` 等多种模型:
- **自动路由**支持六种模型厂商自动切换OpenAI (GPT-Image-2) → Gemini (Nano Banana) → Seedream (火山方舟) → Qwen (百炼) → MiniMax → LinkAI
- **开箱即用**:配置 API Key 即可使用,无需手动指定模型。也支持在对话中指定特定模型
- **灵活控制**:支持 `quality`(画质)、`size`分辨率512/1K~4K、`aspect_ratio`(宽高比)等参数,各厂商自动适配有效值
- **图片编辑**:传入已有图片即可进行编辑、风格迁移、多图融合
- **Skill 级配置**:支持通过 `config.json` 中的 `skill.image-generation.model` 固定默认模型
相关文档:[图像生成技能](https://docs.cowagent.ai/skills/image-generation)
## 🤖 新模型支持
- **Kimi K2.6**:新增 `kimi-k2.6` 模型支持
- **Claude Opus 4.7**:新增 `claude-opus-4-7` 模型支持
- **GLM 5.1**:新增 `glm-5.1` 模型支持
- **Kimi Coding Plan**:支持 Kimi Coding Plan 模式
- **自定义模型厂商**:新增[自定义模型](https://docs.cowagent.ai/models/custom)提供方配置,方便接入本地模型及更多厂商
## 📚 知识库增强
- **嵌套目录支持**:知识库列表和展示支持多级嵌套目录
- **根级文件展示**:知识树中显示根目录下的 `index.md`、`log.md` 等文件
- **空状态统计修复**:排除根级文件对知识库统计的干扰,正确保持空状态
## 🌙 梦境记忆优化
- **结构化组织**:梦境记忆文件按日期自动归档,目录结构更清晰
- **定时抖动**:每日定时触发增加随机抖动,避免集群场景下的并发冲突
## 🛠 技能系统改进
- **技能管理刷新**`/skill` 命令执行后自动加载最新技能,确保状态同步
- **安装来源扩展**技能安装支持多种来源格式URL、zip、本地文件等
## 💬 Web 控制台优化
- **智能自动滚动**:优化聊天窗口滚动逻辑,用户手动翻阅时不再强制跳到底部 Thanks @colin2060
- **移动端适配**:侧边栏默认隐藏,支持点击遮罩关闭
- **图片预览去重**:修复同一消息中图片重复渲染的问题
- **推理内容截断**:深度思考内容超出阶段,解决前端卡顿问题
- **会话标题修复**:修复标题自动生成的回退逻辑
## 🐛 其他修复
- **Gemini 修复**:修复 Gemini tool call 不返回结果的问题
- **Agent 重试**:空响应重试时不再丢弃 tool_calls
- **Docker 环境变量**:修复 Docker 环境下更新配置后环境变量未同步的问题 Thanks @sunboy0523
- **Python 3.7 兼容**:延迟导入 `Literal` 以兼容 Python 3.7
- **模型切换通知**:修复切换模型后 bot_type 变更通知未显示的问题。Thanks @6vision
- **配置命令增强**`/config` 支持设置 `enable_thinking`
## 📦 升级方式
源码部署可执行 `cow update` 或 `./run.sh update` 一键升级,或手动拉取代码后重启。详见 [更新升级文档](https://docs.cowagent.ai/guide/upgrade)。
**发布日期**2026.04.22 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.6...2.0.7)

63
docs/releases/v2.0.8.mdx Normal file
View File

@@ -0,0 +1,63 @@
---
title: v2.0.8
description: CowAgent 2.0.8 - 飞书渠道全面升级语音、流式打字机、一键扫码接入、DeepSeek V4 / 百度千帆支持、定时任务工具优化
---
## 🪶 飞书渠道全面升级
### 1. 一键扫码创建飞书应用
不再需要手动到飞书开放平台建应用、填权限和事件订阅。Web 控制台和命令行启动时若未配置 `feishu_app_id`,会自动展示扫码入口,飞书扫码授权后自动创建机器人并回填配置,开箱即用。
相关文档:[飞书渠道](https://docs.cowagent.ai/channels/feishu)
### 2. 语音消息收发
支持接收用户发送的飞书语音消息并自动转文本,回复也可走 TTS 以语音形式发出。同时优化了中文短语音的识别准确度。
### 3. 流式打字机回复
接入飞书 CardKit 流式卡片,**默认开启**,体验对齐 Web 端:
- 多轮 Agent 场景下中间过场消息与最终回复分卡呈现
- 针对 DeepSeek 等高频输出模型做了专门优化,速度与 Web 端持平
- 不支持时自动回退为普通文本回复,无需手动配置
- 要求飞书客户端 ≥ 7.20
飞书语音消息收发与流式打字机的基础能力来自社区贡献 #2791 Thanks @ooaaooaa123
## 🤖 新模型支持
- **DeepSeek V4 系列**:新增 `deepseek-v4-pro` / `deepseek-v4-flash`,并将默认模型切换为 `deepseek-v4-flash`
- **思考模型开关统一**DeepSeek V4、Qwen3 等思考模型的开关行为对齐到 `enable_thinking`
- **百度千帆模型接入**:新增百度千帆厂商,支持 `ernie-5.0`、`ernie-4.5-turbo-128k` 等模型,并支持图像识别工具,相关文档查看 [百度千帆](https://docs.cowagent.ai/models/qianfan)。#2790 Thanks @jimmyzhuu
- **新增有道翻译**`translate` 模块新增有道翻译支持 #2797 Thanks @Zmjjeff7
## 🛠 OpenAI 客户端重构
- **去 SDK 依赖**OpenAI Bot 改为原生 HTTP 实现,启动更轻、依赖冲突更少
- **Web 控制台提示**:模型配置 API Base 输入框加入版本路径占位提示
## ⏰ 定时任务记忆增强
- **任务结果可被追问**:定时任务的执行结果自动注入到接收方的会话历史中,下一轮对话可直接追问,无需重新交代上下文 Thanks @huangrichao2020
- **不污染长期记忆**:注入的调度对话不会被纳入每日梦境记忆汇总,避免高频任务把记忆刷满
- **避免越跑越慢**:调度任务自己的上下文长度自动控制在合理范围内,长期反复执行也不会越积越大、拖慢响应
## 🔧 工具与安全
- **图像识别模型**:让 `tool.vision.model` 配置真正生效,未配置时自动 fallback #2792 Thanks CNXudiandian
- **Bash 安全确认**:仅对工作区外的破坏性删除做二次确认,工作区内常规操作不再打扰
## 🐛 其他修复
- 修复 Deep Dream 在多实例场景下重复触发
- 修复 DeepSeek 多轮对话中部分历史轮次缺失 `reasoning_content`
## 📦 升级方式
源码部署可执行 `cow update` 或 `./run.sh update` 一键升级,或手动拉取代码后重启。详见 [更新升级文档](https://docs.cowagent.ai/guide/upgrade)。
> ⚠️ 飞书一键创建应用依赖 `lark-oapi>=1.5.5``cow update` 会自动拉取;手动部署请确保依赖已更新。
**发布日期**2026.05.06 | [Full Changelog](https://github.com/zhayujie/CowAgent/compare/2.0.7...2.0.8)

View File

@@ -0,0 +1,160 @@
---
title: image-generation - 图像生成
description: 文生图 / 图生图 / 多图融合,支持多家厂商自动路由与回退
---
通用的图像生成与编辑技能,支持 OpenAI、Gemini、Seedream火山方舟、Qwen百炼、MiniMax、LinkAI 共六家厂商。不需要手动选模型,脚本会按固定优先级自动挑选已配置的厂商来出图。
## 模型选择
`image-generation` 采用「固定优先级 + 自动回退」的策略,配好 Key 就能用:
1. **优先级顺序**`OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI`
2. **没配 Key 的跳过**:只有设了 API Key 的厂商才会参与
3. **失败自动切下一家**:遇到 401、模型未开通、网络异常等错误时会自动试下一个
4. **指定模型时前置**:如果明确传了某个模型名,对应厂商会被提到最前面先试
### 支持的模型
| 厂商 | 模型 / 别名 | 特点 |
| --- | --- | --- |
| OpenAI | `gpt-image-2`、`gpt-image-1` | 通用文生图,高质量、高智能,支持 `quality` 参数控制画质 |
| Gemini Nano Banana | `nano-banana-2`、`nano-banana-pro`、`nano-banana` | 对应 `gemini-3.1-flash`、`gemini-3-pro`、`gemini-2.5-flash` 的图像版本 |
| Seedream火山方舟 | `seedream-5.0-lite`、`seedream-4.5` | 原生 2K4K最多 14 张图融合 |
| Qwen百炼 | `qwen-image-2.0`、`qwen-image-2.0-pro` | 擅长中文排版和图文融合 |
| MiniMax | `image-01` | 简单快速的图片生成 |
| LinkAI | 任意模型 | 通用代理,兜底用 |
<Note>
默认情况下 Agent 不会主动选模型,而是走自动路由。如果你想用某个特定模型,直接在对话里说就行,比如「用 seedream 画一只猫」或「用 gpt-image-2 生成海报」。也可以通过下面的「自定义配置」固定默认模型。
</Note>
## 自定义配置
### API Key 配置
至少需要配**一个**厂商的 Key配多个就能享受自动回退能力。有三种配置方式
#### 方式一:已有模型 Key 自动复用
如果你在 web控制台 或 `config.json` 中配置了对话模型的 Key比如 `openai_api_key`、`gemini_api_key` 等),启动时这些 Key 会被**自动同步**到对应的环境变量。也就是说,只要你的对话模型能用,图像生成就能直接用同一个 Key不需要额外配置。
#### 方式二:在 config.json 中配置
在 `config.json` 中直接写对应的 Key 字段即可,支持的字段如下:
```json
{
"openai_api_key": "sk-xxx",
"openai_api_base": "https://api.openai.com/v1",
"gemini_api_key": "AIza-xxx",
"ark_api_key": "xxx",
"dashscope_api_key": "sk-xxx",
"minimax_api_key": "xxx",
"linkai_api_key": "xxx"
}
```
修改后需要重启生效。每个 Key 还有对应的 `*_api_base` 字段可以自定义接口地址。
#### 方式三:对话中直接配置
在对话里发送 API KeyAgent 会通过 `env_config` 工具自动保存到 `~/cow/.env`**不需要重启**就能生效。例如:
```
帮我配置 OPENAI_API_KEY 为 sk-xxx
```
或者:
```
设置 ARK_API_KEY 为 xxx
```
### API Key 一览
| 环境变量 | config.json 字段 | 对应厂商 | 默认 Base URL |
| --- | --- | --- | --- |
| `OPENAI_API_KEY` | `openai_api_key` | OpenAI | `https://api.openai.com/v1` |
| `GEMINI_API_KEY` | `gemini_api_key` | Gemini | `https://generativelanguage.googleapis.com` |
| `ARK_API_KEY` | `ark_api_key` | 火山方舟Seedream | `https://ark.cn-beijing.volces.com/api/v3` |
| `DASHSCOPE_API_KEY` | `dashscope_api_key` | 阿里百炼Qwen | `https://dashscope.aliyuncs.com` |
| `MINIMAX_API_KEY` | `minimax_api_key` | MiniMax | `https://api.minimaxi.com` |
| `LINKAI_API_KEY` | `linkai_api_key` | LinkAI | `https://api.link-ai.tech` |
### 指定默认模型
如果想让所有图像生成固定走某个厂商的模型,可以在 `config.json` 里加:
```json
"skill": {
"image-generation": {
"model": "seedream-5.0-lite"
}
}
```
启动时这段配置会被自动转成环境变量 `SKILL_IMAGE_GENERATION_MODEL`,脚本读到后会固定使用这个模型所在的厂商进行生成。
## 开启和关闭
`image-generation` 是内置技能,**会根据 API Key 自动调整状态**
- **Key 已配置**技能正常可用Agent 收到画图请求时会直接调用
- **Key 未配置**技能仍然会出现在上下文中标记为「需要配置」Agent 会引导用户去配 Key而不是直接调用失败
如果想手动控制,也可以用命令:
```text
/skill disable image-generation # 手动关闭(即使有 Key 也不会被调用)
/skill enable image-generation # 重新开启
```
终端里对应的命令是 `cow skill disable image-generation` / `cow skill enable image-generation`。
## 参数
| 参数 | 类型 | 必填 | 默认 | 说明 |
| --- | --- | --- | --- | --- |
| `prompt` | string | 是 | — | 图像描述 |
| `image_url` | string / list | 否 | null | 编辑用的输入图,支持本地路径或 URL。传多个就是多图融合 |
| `quality` | string | 否 | auto | `low` / `medium` / `high`,只有部分厂商支持 |
| `size` | string | 否 | auto | `512` / `1K` / `2K` / `3K` / `4K`,也可以写像素值如 `1024x1024` |
| `aspect_ratio` | string | 否 | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9`Gemini 还支持 `1:4` / `4:1` / `1:8` / `8:1` |
<Warning>
**质量越高、分辨率越大,花的钱越多、等的时间越长。**
- 日常对话和快速预览直接用默认(`auto`),或者 `quality=low` + `size=1K`,大概 20 秒出图
- 做海报、用户明确要高清的时候再上 `quality=high` + `size=2K/4K`,可能要等 15 分钟,取决于不同模型的速度
</Warning>
## 输出
成功时返回:
```json
{
"model": "doubao-seedream-5-0-260128",
"images": [
{"url": "/path/to/output.png"}
]
}
```
失败时返回 `{ "error": "..." }`。出错后**不要直接重试**——大概率是配置问题Key 填错、API 地址不对、模型没开通),让用户修好配置再试。
## 常见用法
- **文生图**:根据描述生成插画、海报、图标、头像、分镜图等
- **图生图**:在已有图片上改风格、换元素、加装饰、加文字等
- **多图融合**:把多张参考图合成一张(换装、角色合影等)
<Note>
- bash 超时建议设 600 秒。单个厂商的 HTTP 超时是 300 秒,但脚本可能依次尝试多个厂商
- 输入的图片会自动压缩到 4MB 以内、最长边不超过 4096px
- Gemini / Seedream / Qwen / MiniMax 不支持 `quality` 参数,传了也没用
- Seedream 默认出 2K 图,`seedream-5.0-lite` 支持到 3K`seedream-4.5` 支持到 4K
</Note>

View File

@@ -34,7 +34,7 @@ CowAgent 支持通过统一的 `install` 命令安装来自 **[Cow 技能广场]
## 从 LinkAI 安装
[LinkAI](https://link-ai.tech/console) 上的所有公开资源 (1w+个插件/应用/工作流) ,以及自己创建的资源 (应用/工作流/知识库/数据库/插件) 都可以通过命令一键安装:
[LinkAI](https://link-ai.tech/console) 上的所有公开资源 (1w+个应用/工作流/插件) ,以及自己创建的资源 (应用/工作流/知识库/数据库/插件) 都可以通过命令一键安装:
```text
/skill install linkai:<code>

View File

@@ -0,0 +1,112 @@
---
title: knowledge-wiki - 知识库
description: 维护本地结构化知识库,自动归档、分类和交叉引用
---
帮你把对话中产生的资料、灵感和零散笔记整理成结构化的本地知识库,自动维护索引和页面之间的交叉引用。
`knowledge-wiki` 在工作空间下维护一个 `knowledge/` 目录,相当于 Agent 的「外脑」。技能设置了 `always: true`,会**常驻上下文**,不需要任何外部依赖。
## 什么时候会触发
- 你分享了一篇文章、一份文档或一个 URL想要沉淀下来
- 聊天过程中聊出了值得长期保留的结论
- 你想查一下之前积累过的知识
## 目录结构
```
knowledge/
├── index.md # 全局索引(必须维护)
├── log.md # 操作日志(只追加)
└── <category>/ # 分类子目录(按内容自由分组)
└── <slug>.md # 知识页(文件名用小写加中划线)
```
## 三个核心操作
### 1. 收录Ingest
你分享了一段资料时Agent 会:
1. 读懂原文,提取关键信息
2. 按内容决定放到哪个分类下——先看 `index.md` 里有没有合适的分类,没有就新建一个
3. 生成知识页 `knowledge/<category>/<slug>.md`
4. 更新索引 `index.md` 和日志 `log.md`
### 2. 综合Synthesize
聊天中产生了新的结论或洞见时:
1. 在合适的分类下创建新知识页
2. 给相关的已有页面加上互相指向的链接
3. 更新索引和日志
### 3. 查询Query
你问到以前积累的知识时:
1. 先从 `index.md` 里找可能相关的页面
2. 用 `read` 工具打开具体页面
3. 需要时再用 `memory_search` 补充检索
4. 回答里会带上知识页的链接,方便你点过去看原文
## 知识页怎么写
```markdown
# 页面标题
> Source: <来源 URL 或简要说明>
正文内容。页面之间用相对路径链接:
[相关页](../category/related-page.md)
## 要点
- ...
## 相关页面
- [页面 A](../category/page-a.md) — 为什么相关
```
<Note>
- `> Source:` 用来记录这条知识的来源。有明确来源时一定要写
- 交叉引用很重要:创建或更新某页时,记得也去关联页面里补上反向链接
- **只链接已经存在的页面**。如果某个概念值得单独成页,先建好再加链接
</Note>
## 索引格式
`knowledge/index.md` 采用扁平列表,按分类分组,每个知识页占一行:
```markdown
# Knowledge Index
## 分类 A
- [页面标题](category-a/page-slug.md) — 一句话摘要
## 分类 B
- [页面标题](category-b/page-slug.md) — 一句话摘要
```
不用表格,不加 emoji。分类怎么起名、怎么组织都可以灵活调整。
## 日志格式
`knowledge/log.md` 只追加、不修改,最新的写在最下面:
```markdown
## [YYYY-MM-DD] ingest | 页面标题
## [YYYY-MM-DD] synthesize | 页面标题
```
## 写作约定
- **文件名**用小写加中划线,比如 `machine-learning.md`
- **一页只讲一件事**,需要关联的内容通过链接串起来
- **有了就更新,不要重复建页**
- **每次改完都要更新索引** `knowledge/index.md`
- **写精华别抄全文**,抓住要点就行
- **对话里引用知识页时用完整路径**,比如 `[标题](knowledge/<category>/<slug>.md)`。页面之间互相链接才用相对路径
- **基于知识页回答问题时附上链接**,方便深入查阅

View File

@@ -0,0 +1,180 @@
---
title: skill-creator - 技能创建
description: 创建、安装、更新技能,规范 SKILL.md 写法与目录结构
---
`skill-creator` 是一个「元技能」,专门用来帮助 Agent 创建、安装和更新其他技能,确保所有技能的 `SKILL.md` 写法和目录结构保持一致。
## 什么时候会触发
- 用户想从 URL 或远程仓库安装一个技能
- 用户想从头创建一个全新的技能
- 需要升级或重构已有技能
## 技能是什么
简单来说,技能就是一份「可复用的说明书」加上可选的脚本和资源。它给 Agent 注入了某个领域的专业知识,让 Agent 在遇到对应任务时能像专家一样处理。
一个技能通常包含以下内容:
1. **专项工作流** — 某类任务的完整步骤
2. **工具用法** — 怎么调某种 API 或处理某种文件
3. **领域知识** — 团队约定、业务规则、数据结构之类
4. **附带资源** — 脚本、参考文档、模板等
<Note>
**核心原则:能省则省**。只写 Agent 自己想不到的内容,每加一行都要问自己:值不值得占这些 token
</Note>
## 目录结构
```
skill-name/
├── SKILL.md # 必需:技能定义
│ ├── YAML frontmatter必填 name / description
│ └── Markdown 正文(说明 + 示例)
└── 可选资源
├── scripts/ # 可执行脚本Python / Bash 等)
├── references/ # 内容较多的参考文档Agent 按需读取
└── assets/ # 模板、图标等,会直接用在输出里
```
## SKILL.md 规范定义
SKILL.md 文件头部的 `frontmatter` 字段:
| 字段 | 说明 |
| --- | --- |
| `name` | 技能名,小写加中划线,必须和目录名一致 |
| `description` | **最关键的字段**。写清楚「这个技能干什么」和「什么情况下该用它」Agent 看到这段来决定要不要调它。注意:所有触发相关的描述都放在这里,不要写到正文里 |
| `metadata.cowagent.requires.bins` | 系统里必须装了哪些命令行工具 |
| `metadata.cowagent.requires.env` | 需要哪些环境变量(全部满足才行) |
| `metadata.cowagent.requires.anyEnv` | 多个 API Key 满足一个就行 |
| `metadata.cowagent.requires.anyBins` | 多个工具满足一个就行 |
| `metadata.cowagent.always` | 设为 `true` 会始终加载,不检查依赖 |
| `metadata.cowagent.emoji` | 展示用的 emoji可选 |
| `metadata.cowagent.os` | 限定系统,如 `["darwin", "linux"]` |
<Note>
`category` 字段不需要手写,系统会自动设成 `skill`。
</Note>
声明 API Key 依赖有两种写法:
```yaml
metadata:
cowagent:
requires:
env: ["MYAPI_KEY"] # 必须有
```
```yaml
metadata:
cowagent:
requires:
anyEnv: ["OPENAI_API_KEY", "LINKAI_API_KEY"] # 有一个就行
```
**技能会自动按依赖启禁用**:环境变量齐了就自动启用,缺了就自动禁用,不需要手动 `/skill enable`。
## 资源目录怎么用
| 目录 | 放什么 | 不要放 |
| --- | --- | --- |
| `scripts/` | 需要反复执行的代码,或需要确定性结果的脚本 | 纯演示用的代码片段 |
| `references/` | **超过 500 行**、SKILL.md 实在塞不下的大文档(比如完整的数据库 Schema | 普通 API 文档、示例、教程 |
| `assets/` | 会出现在最终产物里的文件(模板、图标、样板代码等) | 说明性文档 |
<Warning>
**原则上所有内容都写在 `SKILL.md` 里**,只有确实放不下才拆到资源目录。
不要给技能加 `README.md`、`CHANGELOG.md`、`INSTALLATION_GUIDE.md` 之类的文件——全部放进 `SKILL.md`。资源目录里只放真正要跑的脚本或真正要用的素材。
</Warning>
## 安装外部技能
安装后最终落在 `<workspace>/skills/<name>/` 目录。
| 来源 | 怎么装 |
| --- | --- |
| URL单文件 | curl / web_fetch 直接拉 |
| URLzip 包) | 下载解压 |
| 本地 SKILL.md | 直接读 |
| 本地 zip 包 | 解压 |
安装步骤:
1. 找到 `SKILL.md`(可能在包的根目录或某个子目录里)
2. 从 frontmatter 里读出 `name`
3. 把**整个技能目录**(包括 `SKILL.md`、`scripts/`、`assets/` 等)复制到 `<workspace>/skills/<name>/`
4. 如果包里有 `INSTALL.md` 之类的安装脚本,照着跑一遍,但最终结果仍然要落在 `<workspace>/skills/<name>/` 下
## 从头创建技能
推荐按这个顺序来:
1. **搞清楚需求** — 让用户举几个具体的使用场景,一次别问太多
2. **想好结构** — 这个技能需要脚本吗?需要参考文档吗?需要模板素材吗?
3. **生成骨架** — 用初始化脚本:
```bash
scripts/init_skill.py <skill-name> --path <workspace>/skills [--resources scripts,references,assets] [--examples]
```
4. **填充内容** — 写好 SKILL.md、补上脚本和资源。脚本写完一定要实际跑一遍
5. **格式校验**(可选):
```bash
scripts/quick_validate.py <workspace>/skills/<skill-name>
```
6. **迭代完善** — 实际用起来之后根据反馈持续改进
## 命名规则
- 只用小写字母、数字和中划线。用户给的名字需要做标准化处理,比如 `Plan Mode` → `plan-mode`
- 长度别超过 64 个字符
- 尽量短、用动词开头、一看就知道干什么
- 必要时用工具名做前缀,比如 `gh-address-comments`、`linear-address-issue`
- 目录名和 `name` 字段必须完全一致
## 三级加载机制
技能不会一次性全部塞进上下文,而是分三级按需加载:
1. **元信息**`name` + `description`)— 常驻上下文,约 100 词。Agent 靠它判断「要不要用这个技能」
2. **SKILL.md 正文** — 确定要用了才加载,建议控制在 500 行以内
3. **资源文件** — Agent 需要的时候再读
如果一个技能涉及多个变体(比如多云厂商部署),建议这样组织:
```
cloud-deploy/
├── SKILL.md # 主流程和厂商选择逻辑
└── references/
├── aws.md
├── gcp.md
└── azure.md
```
用户选了 AWSAgent 只需要读 `aws.md`,不用把三家的文档全加载进来。
## 常见设计模式
**步骤式**:按编号列出操作步骤和对应脚本。
```markdown
1. 分析表单结构(运行 analyze_form.py
2. 生成字段映射(编辑 fields.json
3. 自动填充表单(运行 fill_form.py
```
**分支式**:根据用户意图走不同流程。
```markdown
1. 判断操作类型:
**新建内容?** → 走「创建流程」
**编辑已有内容?** → 走「编辑流程」
```
**模板式**:输出格式有严格要求时,在 SKILL.md 里直接给一个样板,让 Agent 照着写。

Some files were not shown because too many files have changed in this diff Show More