feat(voice): rework TTS/ASR stack and unify tool/skill config schema

2026-07-18 20:17:09 +08:00 · 2026-05-21 16:00:54 +08:00
parent 2b90f377e6
commit b8333e351c
31 changed files with 1551 additions and 335 deletions
--- a/docs/en/models/qianfan.mdx
+++ b/docs/en/models/qianfan.mdx
@@ -40,7 +40,7 @@ To force a specific Vision model, set it explicitly in `config.json`:

 ```json
 {
-  "tool": {
+  "tools": {
    "vision": {
      "model": "ernie-4.5-turbo-vl"
    }
--- a/docs/en/releases/v2.0.7.mdx
+++ b/docs/en/releases/v2.0.7.mdx
@@ -11,7 +11,7 @@ New built-in `image-generation` skill supporting text-to-image, image-to-image,
 - **Zero model selection**: Just configure an API key and it works — no need to manually specify a model. You can also name a specific model in conversation (e.g. "draw a cat with seedream")
 - **Flexible control**: Supports `quality`, `size` (512/1K–4K), and `aspect_ratio` parameters, with each provider automatically mapping to its supported values
 - **Image editing**: Pass existing images for editing, style transfer, or multi-image fusion (Seedream supports up to 14 reference images)
- **Skill-level config**: Pin a default model via `skill.image-generation.model` in `config.json`
+- **Skill-level config**: Pin a default model via `skills.image-generation.model` in `config.json`
 - **Image lightbox**: All images in the Web console now support click-to-enlarge preview

 Docs: [Image Generation Skill](https://docs.cowagent.ai/en/skills/image-generation)
--- a/docs/en/releases/v2.0.8.mdx
+++ b/docs/en/releases/v2.0.8.mdx
@@ -51,7 +51,7 @@ The voice and streaming building blocks come from a community contribution #2791

 ## 🔧 Tools and Safety

- **Vision model selection**: `tool.vision.model` config now actually takes effect, with automatic fallback when unconfigured #2792
+- **Vision model selection**: `tools.vision.model` config now actually takes effect, with automatic fallback when unconfigured #2792
 - **Bash safety prompt**: The destructive-deletion confirm prompt is now scoped to paths outside the workspace — routine in-workspace operations are no longer interrupted

 ## 🐛 Other Fixes
--- a/docs/en/skills/image-generation.mdx
+++ b/docs/en/skills/image-generation.mdx
@@ -87,7 +87,7 @@ Configure ARK_API_KEY as xxx
 To force all image generation through a specific provider's model, add this to `config.json`:

 ```json
-"skill": {
+"skills": {
  "image-generation": {
    "model": "seedream-5.0-lite"
  }
--- a/docs/en/tools/vision.mdx
+++ b/docs/en/tools/vision.mdx
@@ -51,7 +51,7 @@ To specify a particular model for the vision tool, add to `config.json`:

 ```json
 {
-    "tool": {
+    "tools": {
        "vision": {
            "model": "ernie-4.5-turbo-vl"
        }
--- a/docs/ja/models/qianfan.mdx
+++ b/docs/ja/models/qianfan.mdx
@@ -40,7 +40,7 @@ description: Baidu Qianfan ERNIE モデル設定

 ```json
 {
-  "tool": {
+  "tools": {
    "vision": {
      "model": "ernie-4.5-turbo-vl"
    }
--- a/docs/ja/releases/v2.0.7.mdx
+++ b/docs/ja/releases/v2.0.7.mdx
@@ -11,7 +11,7 @@ description: CowAgent 2.0.7 - 画像生成スキル（6プロバイダー自動
 - **モデル選択不要**：API Key を設定するだけで使用可能、モデルを手動で指定する必要なし。会話で特定モデルを指名することも可能（例：「seedream で猫を描いて」）
 - **柔軟な制御**：`quality`（画質）、`size`（解像度、512/1K〜4K）、`aspect_ratio`（アスペクト比）パラメータ対応、各プロバイダーが自動的に有効な値にマッピング
 - **画像編集**：既存の画像を渡して編集・スタイル変換・複数画像融合が可能（Seedream は最大 14 枚の参照画像をサポート）
- **スキルレベル設定**：`config.json` の `skill.image-generation.model` でデフォルトモデルを固定可能
+- **スキルレベル設定**：`config.json` の `skills.image-generation.model` でデフォルトモデルを固定可能
 - **画像ライトボックス**：Web コンソールのすべての画像がクリックで拡大プレビュー対応

 ドキュメント：[画像生成スキル](https://docs.cowagent.ai/ja/skills/image-generation)
--- a/docs/ja/releases/v2.0.8.mdx
+++ b/docs/ja/releases/v2.0.8.mdx
@@ -51,7 +51,7 @@ description: CowAgent 2.0.8 - 飛書チャネル全面アップグレード（

 ## 🔧 ツールと安全性

- **Vision モデル選択**：`tool.vision.model` 設定が実際に反映されるようになり、未設定時は自動フォールバック #2792
+- **Vision モデル選択**：`tools.vision.model` 設定が実際に反映されるようになり、未設定時は自動フォールバック #2792
 - **Bash セーフティ確認**：破壊的削除の確認プロンプトをワークスペース外のパスに限定。ワークスペース内の通常操作は中断されません

 ## 🐛 その他の修正
--- a/docs/ja/skills/image-generation.mdx
+++ b/docs/ja/skills/image-generation.mdx
@@ -87,7 +87,7 @@ ARK_API_KEY を xxx に設定して
 すべての画像生成を特定のプロバイダーのモデルで固定したい場合、`config.json` に以下を追加：

 ```json
-"skill": {
+"skills": {
  "image-generation": {
    "model": "seedream-5.0-lite"
  }
--- a/docs/ja/tools/vision.mdx
+++ b/docs/ja/tools/vision.mdx
@@ -51,7 +51,7 @@ Vision ツールで使用するモデルを指定するには、`config.json`

 ```json
 {
-    "tool": {
+    "tools": {
        "vision": {
            "model": "ernie-4.5-turbo-vl"
        }
--- a/docs/models/qianfan.mdx
+++ b/docs/models/qianfan.mdx
@@ -40,7 +40,7 @@ description: 百度千帆 ERNIE 模型配置

 ```json
 {
-  "tool": {
+  "tools": {
    "vision": {
      "model": "ernie-4.5-turbo-vl"
    }
--- a/docs/releases/v2.0.7.mdx
+++ b/docs/releases/v2.0.7.mdx
@@ -11,7 +11,7 @@ description: CowAgent 2.0.7 - 图像生成技能（六厂商自动路由）、
 - **开箱即用**：配置 API Key 即可使用，无需手动指定模型。也支持在对话中指定特定模型
 - **灵活控制**：支持 `quality`（画质）、`size`（分辨率，512/1K~4K）、`aspect_ratio`（宽高比）等参数，各厂商自动适配有效值
 - **图片编辑**：传入已有图片即可进行编辑、风格迁移、多图融合
- **Skill 级配置**：支持通过 `config.json` 中的 `skill.image-generation.model` 固定默认模型
+- **Skill 级配置**：支持通过 `config.json` 中的 `skills.image-generation.model` 固定默认模型

 相关文档：[图像生成技能](https://docs.cowagent.ai/skills/image-generation)

--- a/docs/releases/v2.0.8.mdx
+++ b/docs/releases/v2.0.8.mdx
@@ -46,7 +46,7 @@ description: CowAgent 2.0.8 - 飞书渠道全面升级（语音、流式打字

 ## 🔧 工具与安全

- **图像识别模型**：让 `tool.vision.model` 配置真正生效，未配置时自动 fallback #2792 Thanks CNXudiandian
+- **图像识别模型**：让 `tools.vision.model` 配置真正生效，未配置时自动 fallback #2792 Thanks CNXudiandian
 - **Bash 安全确认**：仅对工作区外的破坏性删除做二次确认，工作区内常规操作不再打扰

 ## 🐛 其他修复
--- a/docs/skills/image-generation.mdx
+++ b/docs/skills/image-generation.mdx
@@ -88,7 +88,7 @@ description: 文生图 / 图生图 / 多图融合，支持多家厂商自动路
 如果想让所有图像生成固定走某个厂商的模型，可以在 `config.json` 里加：

 ```json
-"skill": {
+"skills": {
  "image-generation": {
    "model": "seedream-5.0-lite"
  }
--- a/docs/tools/vision.mdx
+++ b/docs/tools/vision.mdx
@@ -40,7 +40,7 @@ Vision 工具采用多级自动选择 + 自动兜底策略，无需手动配置

 ```json
 {
-    "tool": {
+    "tools": {
        "vision": {
            "model": "gpt-4.1"
        }