feat: add skills and upgrade feishu/dingtalk channel

2026-07-17 11:07:11 +08:00 · 2026-02-02 00:42:39 +08:00
parent 77c2bfcc1e
commit a8d5309c90
32 changed files with 2931 additions and 200 deletions
--- a/skills/linkai-agent/README.md
+++ b/skills/linkai-agent/README.md
@@ -0,0 +1,297 @@
+# LinkAI Agent Skill
+
+这个 skill 允许你调用 LinkAI 平台上的多个应用(App)和工作流(Workflow)，通过简单的配置即可集成多个智能体能力。
+
+## 特性
+
+- ✅ **多应用支持** - 在一个配置文件中管理多个 LinkAI 应用/工作流
+- ✅ **动态加载** - skill 系统加载时自动从 `config.json` 读取应用列表
+- ✅ **自动技能描述** - 所有配置的应用会自动添加到技能描述中
+- ✅ **模型切换** - 可以为每个请求指定不同的模型
+- ✅ **知识库集成** - 支持应用绑定的知识库
+- ✅ **插件能力** - 支持应用启用的各类插件
+- ✅ **工作流执行** - 支持执行复杂的多步骤工作流
+
+## 快速开始
+
+### 1. 配置 API Key
+
+```bash
+env_config(action="set", key="LINKAI_API_KEY", value="your-linkai-api-key")
+```
+
+获取 API Key: https://link-ai.tech/console/interface
+
+### 2. 配置应用列表
+
+将 `config.json.template` 复制为 `config.json`：
+
+```bash
+cp config.json.template config.json
+```
+
+编辑 `config.json`，添加你的应用/工作流：
+
+```json
+{
+  "apps": [
+    {
+      "app_code": "G7z6vKwp",
+      "app_name": "通用助手",
+      "app_description": "通用AI助手，可以回答各类问题"
+    },
+    {
+      "app_code": "your_kb_app",
+      "app_name": "产品文档助手",
+      "app_description": "基于产品文档知识库的问答助手"
+    },
+    {
+      "app_code": "your_workflow",
+      "app_name": "数据分析工作流",
+      "app_description": "执行数据清洗、分析和可视化的完整工作流"
+    }
+  ]
+}
+```
+
+**注意：** 修改 `config.json` 后，Agent 在下次加载技能时会自动读取新配置。
+
+### 3. 调用应用
+
+```bash
+bash scripts/call.sh "G7z6vKwp" "What is artificial intelligence?"
+```
+
+## 使用示例
+
+### 基础调用
+
+```bash
+# 调用默认模型
+bash scripts/call.sh "G7z6vKwp" "解释一下量子计算"
+```
+
+### 指定模型
+
+```bash
+# 使用 GPT-4.1 模型
+bash scripts/call.sh "G7z6vKwp" "写一篇关于AI的文章" "LinkAI-4.1"
+
+# 使用 DeepSeek 模型
+bash scripts/call.sh "G7z6vKwp" "帮我写代码" "deepseek-chat"
+
+# 使用 Claude 模型
+bash scripts/call.sh "G7z6vKwp" "分析这段文本" "claude-4-sonnet"
+```
+
+### 调用工作流
+
+```bash
+# 工作流会按照配置的节点顺序执行
+bash scripts/call.sh "workflow_code" "输入数据或问题"
+```
+
+## ⚠️ 重要提示
+
+### 超时配置
+
+LinkAI 应用（特别是视频/图片生成、复杂工作流）可能需要较长时间处理。
+
+**脚本内置超时**：
+- 默认：120 秒（适合大多数场景）
+- 可通过第 5 个参数自定义：`bash scripts/call.sh <app_code> <question> "" "false" "180"`
+
+**推荐超时时间**：
+- **文本问答**：120 秒（默认）
+- **图片生成**：120-180 秒
+- **视频生成**：180-300 秒
+
+Agent 调用时会自动设置合适的超时时间。
+
+## 配置说明
+
+### config.json 字段
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| `app_code` | string | 应用或工作流的唯一标识码，从 LinkAI 控制台获取 |
+| `app_name` | string | 应用名称，会显示在技能描述中 |
+| `app_description` | string | 应用功能描述，帮助 Agent 理解何时使用该应用 |
+
+### 获取 app_code
+
+1. 登录 [LinkAI 控制台](https://link-ai.tech/console)
+2. 进入「应用管理」或「工作流管理」
+3. 选择要集成的应用/工作流
+4. 在应用详情页找到 `app_code`
+
+## 支持的模型
+
+LinkAI 支持多种主流 AI 模型：
+
+**OpenAI 系列：**
+- `LinkAI-4.1` - GPT-4.1 (1000K 上下文)
+- `LinkAI-4.1-mini` - GPT-4.1 mini (1000K)
+- `LinkAI-4.1-nano` - GPT-4.1 nano (1000K)
+- `LinkAI-4o` - GPT-4o (128K)
+- `LinkAI-4o-mini` - GPT-4o mini (128K)
+
+**DeepSeek 系列：**
+- `deepseek-chat` - DeepSeek-V3 对话模型 (64K)
+- `deepseek-reasoner` - DeepSeek-R1 推理模型 (64K)
+
+**Claude 系列：**
+- `claude-4-sonnet` - Claude 4 Sonnet (200K)
+- `claude-3-7-sonnet` - Claude 3.7 (200K)
+- `claude-3-5-sonnet` - Claude 3.5 (200K)
+
+**Google 系列：**
+- `gemini-2.5-pro` - Gemini 2.5 Pro (1000K)
+- `gemini-2.0-flash` - Gemini 2.0 Flash (1000K)
+
+**国产模型：**
+- `qwen3` - 通义千问3 (128K)
+- `wenxin-4.5` - 文心一言4.5 (8K)
+- `doubao-1.5-pro-256k` - 豆包1.5 (256K)
+- `glm-4-plus` - 智谱GLM-4-Plus (4K)
+
+完整模型列表：https://link-ai.tech/console/models
+
+## 应用类型
+
+### 1. 普通应用
+
+配置了系统提示词和参数的标准对话应用，可以：
+- 设置角色和性格
+- 绑定知识库
+- 启用插件（图像识别、网页搜索、代码执行等）
+
+### 2. 知识库应用
+
+基于特定知识库的问答应用，适合：
+- 企业内部知识库
+- 产品文档问答
+- 客户支持
+
+### 3. 工作流
+
+多步骤的自动化流程，可以：
+- 串联多个处理节点
+- 条件分支
+- 循环处理
+- 调用外部 API
+
+## 响应格式
+
+### 成功响应
+
+```json
+{
+  "app_code": "G7z6vKwp",
+  "content": "人工智能（AI）是计算机科学的一个分支...",
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 150,
+    "total_tokens": 160
+  }
+}
+```
+
+### 错误响应
+
+```json
+{
+  "error": "LinkAI API error",
+  "message": "应用不存在",
+  "response": { ... }
+}
+```
+
+## 常见错误
+
+### LINKAI_API_KEY environment variable is not set
+**原因：** 未配置 API Key  
+**解决：** 使用 `env_config` 工具设置 LINKAI_API_KEY
+
+### 应用不存在 (402)
+**原因：** app_code 不正确或应用已删除  
+**解决：** 检查 app_code 是否正确，确认应用存在
+
+### 无访问权限 (403)
+**原因：** 尝试访问他人的私有应用  
+**解决：** 确保应用是公开的或你是创建者
+
+### 账号积分额度不足 (406)
+**原因：** LinkAI 账户余额不足  
+**解决：** 前往控制台充值
+
+### 内容审核不通过 (409)
+**原因：** 请求或响应包含敏感内容  
+**解决：** 修改输入内容，避免敏感词
+
+## 技术实现
+
+### 自动技能描述生成
+
+当 skill 系统加载 `linkai-agent` 时，会自动：
+1. 读取 `config.json` 中的应用列表
+2. 将每个应用的 name 和 description 动态添加到技能描述中
+3. Agent 加载时会看到完整的应用列表
+
+这是在 `agent/skills/loader.py` 中实现的特殊处理。
+
+### 工作流程
+
+```
+用户配置 config.json
+  ↓
+Agent 启动/重新加载技能
+  ↓
+SkillLoader 检测到 linkai-agent
+  ↓
+动态读取 config.json
+  ↓
+生成包含所有应用描述的 description
+  ↓
+Agent 看到所有可用应用的完整信息
+  ↓
+用户请求触发
+  ↓
+Agent 根据描述选择合适的应用
+  ↓
+调用 call.sh <app_code> <question>
+  ↓
+LinkAI API 处理并返回结果
+```
+
+## 最佳实践
+
+1. **清晰的描述** - 为每个应用写清晰、具体的描述，帮助 Agent 理解应用用途
+2. **合理分工** - 不同应用负责不同领域，避免功能重叠
+3. **无需重启** - 修改 config.json 后，Agent 下次加载技能时会自动更新
+4. **模型选择** - 根据任务复杂度选择合适的模型
+5. **知识库优化** - 为专业领域的应用绑定相关知识库
+
+## 扩展用法
+
+### 在 Agent 系统中使用
+
+当 Agent 系统加载这个 skill 时，会自动从 `config.json` 读取应用列表并生成描述：
+
+```
+Call LinkAI apps/workflows. 通用助手(G7z6vKwp: 通用AI助手，可以回答各类问题); 产品文档助手(kb_app_001: 基于产品文档知识库的问答助手); 数据分析工作流(wf_002: 执行数据清洗、分析和可视化的完整工作流)
+```
+
+Agent 会根据用户问题自动选择最合适的应用进行调用。
+
+## 相关链接
+
+- LinkAI 平台: https://link-ai.tech
+- API 文档: https://docs.link-ai.tech
+- 控制台: https://link-ai.tech/console
+- 模型列表: https://link-ai.tech/console/models
+- 应用广场: https://link-ai.tech/square
+
+## License
+
+Part of the chatgpt-on-wechat project.
--- a/skills/linkai-agent/SKILL.md
+++ b/skills/linkai-agent/SKILL.md
@@ -0,0 +1,165 @@
+---
+name: linkai-agent
+description: Call LinkAI applications and workflows. Use bash command to execute like 'bash <base_dir>/scripts/call.sh <app_code> <question>'.
+homepage: https://link-ai.tech
+metadata:
+  emoji: 🤖
+  requires:
+    bins: ["curl"]
+    env: ["LINKAI_API_KEY"]
+  primaryEnv: "LINKAI_API_KEY"
+---
+
+# LinkAI Agent Caller
+
+Call LinkAI applications and workflows through API. Supports multiple apps/workflows configured in config.json.
+
+The available apps are dynamically loaded from `config.json` at skill loading time.
+
+## Setup
+
+This skill requires a LinkAI API key. If not configured:
+
+1. Get your API key from https://link-ai.tech/console/api-keys
+2. Set the key using: `env_config(action="set", key="LINKAI_API_KEY", value="your-key")`
+
+## Configuration
+
+1. Copy `config.json.template` to `config.json`
+2. Configure your apps/workflows:
+
+```json
+{
+  "apps": [
+    {
+      "app_code": "your_app_code",
+      "app_name": "App Name",
+      "app_description": "What this app does"
+    }
+  ]
+}
+```
+
+3. The skill description will be automatically updated when the agent loads this skill
+
+## Usage
+
+**Important**: Scripts are located relative to this skill's base directory.
+
+When you see this skill in `<available_skills>`, note the `<base_dir>` path.
+
+**CRITICAL**: Always use `bash` command to execute the script:
+
+```bash
+# General pattern (MUST start with bash):
+bash "<base_dir>/scripts/call.sh" "<app_code>" "<question>" [model] [stream] [timeout]
+
+# DO NOT execute the script directly like this (WRONG):
+# "<base_dir>/scripts/call.sh" ...
+
+# Parameters:
+# - app_code: LinkAI app or workflow code (required)
+# - question: User question (required)
+# - model: Override model (optional, uses app default if not specified)
+# - stream: Enable streaming (true/false, default: false)
+# - timeout: curl timeout in seconds (default: 120, recommended for video/image generation)
+```
+
+**IMPORTANT - Timeout Configuration**:
+- The script has a **default timeout of 120 seconds** (suitable for most cases)
+- For complex tasks (video generation, large workflows), pass a longer timeout as the 5th parameter
+- The bash tool also needs sufficient timeout - set its `timeout` parameter accordingly
+- Example: `bash(command="bash <script> <app_code> <question> '' 'false' 180", timeout=200)`
+
+## Examples
+
+### Call an app (uses default 60s timeout)
+```bash
+bash(command='bash "<base_dir>/scripts/call.sh" "G7z6vKwp" "What is AI?"', timeout=60)
+```
+
+### Call an app with specific model
+```bash
+bash(command='bash "<base_dir>/scripts/call.sh" "G7z6vKwp" "Explain machine learning" "LinkAI-4.1"', timeout=60)
+```
+
+### Call a workflow with custom timeout (video generation)
+```bash
+# Pass timeout as 5th parameter to script, and set bash timeout slightly longer
+bash(command='bash "<base_dir>/scripts/call.sh" "workflow_code" "Generate a sunset video" "" "false" "180"', timeout=180)
+```
+```bash
+bash "<base_dir>/scripts/call.sh" "workflow_code" "Analyze this data: ..."
+```
+
+## Supported Models
+
+You can specify any LinkAI supported model:
+- `LinkAI-4.1` - Latest GPT-4.1 model (1000K context)
+- `LinkAI-4.1-mini` - GPT-4.1 mini (1000K context)
+- `LinkAI-4o` - GPT-4o model (128K context)
+- `LinkAI-4o-mini` - GPT-4o mini (128K context)
+- `deepseek-chat` - DeepSeek-V3 (64K context)
+- `deepseek-reasoner` - DeepSeek-R1 reasoning model
+- `claude-4-sonnet` - Claude 4 Sonnet (200K context)
+- `gemini-2.5-pro` - Gemini 2.5 Pro (1000K context)
+- And many more...
+
+Full model list: https://link-ai.tech/console/models
+
+## Response Format
+
+Success response:
+```json
+{
+  "app_code": "G7z6vKwp",
+  "content": "AI stands for Artificial Intelligence...",
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 50,
+    "total_tokens": 60
+  }
+}
+```
+
+Error response:
+```json
+{
+  "error": "Error description",
+  "message": "Detailed error message"
+}
+```
+
+## Features
+
+- ✅ **Multiple Apps**: Configure and call multiple LinkAI apps/workflows
+- ✅ **Dynamic Loading**: Apps are loaded from config.json at runtime
+- ✅ **Model Override**: Optionally specify model per request
+- ✅ **Streaming Support**: Enable streaming output
+- ✅ **Knowledge Base**: Apps can use configured knowledge bases
+- ✅ **Plugins**: Apps can use enabled plugins (image recognition, web search, etc.)
+- ✅ **Workflows**: Execute complex multi-step workflows
+
+## Notes
+
+- Each app/workflow maintains its own configuration (prompt, model, temperature, etc.)
+- Apps can have knowledge bases attached for domain-specific Q&A
+- Workflows execute from start node to end node and return final output
+- Token usage and costs depend on the model used
+- See LinkAI documentation for pricing: https://link-ai.tech/console/funds
+- The skill description is automatically generated from config.json when loaded
+
+## Troubleshooting
+
+**"LINKAI_API_KEY environment variable is not set"**
+- Use env_config tool to set the API key
+
+**"app_code is required"**
+- Make sure you're passing the app_code as the first parameter
+
+**"应用不存在" (App not found)**
+- Check that the app_code is correct
+- Ensure you have access to the app
+
+**"账号积分额度不足" (Insufficient credits)**
+- Top up your LinkAI account credits
--- a/skills/linkai-agent/config.json.template
+++ b/skills/linkai-agent/config.json.template
@@ -0,0 +1,14 @@
+{
+  "apps": [
+    {
+      "app_code": "your_app_code_2",
+      "app_name": "知识库助手",
+      "app_description": "基于特定领域知识库提供智能问答的知识助手"
+    },
+    {
+      "app_code": "your_workflow_code",
+      "app_name": "数据分析工作流",
+      "app_description": "用于数据分析任务的工作流程"
+    }
+  ]
+}
--- a/skills/linkai-agent/scripts/call.sh
+++ b/skills/linkai-agent/scripts/call.sh
@@ -0,0 +1,138 @@
+#!/usr/bin/env bash
+# LinkAI Agent Caller
+# API Docs: https://api.link-ai.tech/v1/chat/completions
+
+set -euo pipefail
+
+app_code="${1:-}"
+question="${2:-}"
+model="${3:-}"
+stream="${4:-false}"
+timeout="${5:-120}"  # Default 120 seconds for video/image generation
+
+if [ -z "$app_code" ]; then
+    echo '{"error": "app_code is required", "usage": "bash call.sh <app_code> <question> [model] [stream] [timeout]"}'
+    exit 1
+fi
+
+if [ -z "$question" ]; then
+    echo '{"error": "question is required", "usage": "bash call.sh <app_code> <question> [model] [stream] [timeout]"}'
+    exit 1
+fi
+
+if [ -z "${LINKAI_API_KEY:-}" ]; then
+    echo '{"error": "LINKAI_API_KEY environment variable is not set", "help": "Use env_config to set LINKAI_API_KEY"}'
+    exit 1
+fi
+
+# API endpoint
+api_url="https://api.link-ai.tech/v1/chat/completions"
+
+# Build JSON request body
+if [ -n "$model" ]; then
+    request_body=$(cat <<EOF
+{
+  "app_code": "$app_code",
+  "model": "$model",
+  "messages": [
+    {
+      "role": "user",
+      "content": "$question"
+    }
+  ],
+  "stream": $stream
+}
+EOF
+)
+else
+    request_body=$(cat <<EOF
+{
+  "app_code": "$app_code",
+  "messages": [
+    {
+      "role": "user",
+      "content": "$question"
+    }
+  ],
+  "stream": $stream
+}
+EOF
+)
+fi
+
+# Call LinkAI API
+response=$(curl -sS --max-time "$timeout" \
+    -X POST \
+    -H "Authorization: Bearer $LINKAI_API_KEY" \
+    -H "Content-Type: application/json" \
+    -d "$request_body" \
+    "$api_url" 2>&1)
+
+curl_exit_code=$?
+
+if [ $curl_exit_code -ne 0 ]; then
+    echo "{\"error\": \"Failed to call LinkAI API\", \"details\": \"$response\"}"
+    exit 1
+fi
+
+# Simple JSON validation
+if [[ ! "$response" =~ ^[[:space:]]*[\{\[] ]]; then
+    echo "{\"error\": \"Invalid JSON response from API\", \"response\": \"$response\"}"
+    exit 1
+fi
+
+# Check for API error (top-level error only, not content_filter_result)
+if echo "$response" | grep -q '^[[:space:]]*{[[:space:]]*"error"[[:space:]]*:' || echo "$response" | grep -q '"error"[[:space:]]*:[[:space:]]*{[^}]*"code"[[:space:]]*:[[:space:]]*"[^"]*"[^}]*"message"'; then
+    # Make sure it's not just content_filter_result inside choices
+    if ! echo "$response" | grep -q '"choices"[[:space:]]*:[[:space:]]*\['; then
+        # Extract error message
+        error_msg=$(echo "$response" | grep -o '"message"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"message"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
+        error_code=$(echo "$response" | grep -o '"code"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"code"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
+        
+        if [ -z "$error_msg" ]; then
+            error_msg="Unknown API error"
+        fi
+        
+        # Provide friendly error message for content filter
+        if [ "$error_code" = "content_filter_error" ] || echo "$error_msg" | grep -qi "content.*filter"; then
+            echo "{\"error\": \"内容安全审核\", \"message\": \"您的问题或应用返回的内容触发了LinkAI的安全审核机制，请换一种方式提问或检查应用配置\", \"details\": \"$error_msg\"}"
+        else
+            echo "{\"error\": \"LinkAI API error\", \"message\": \"$error_msg\", \"code\": \"$error_code\"}"
+        fi
+        exit 1
+    fi
+fi
+
+# For non-stream mode, extract and format the response
+if [ "$stream" = "false" ]; then
+    # Extract content from response
+    content=$(echo "$response" | grep -o '"content"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"content"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
+    
+    # Extract usage information
+    prompt_tokens=$(echo "$response" | grep -o '"prompt_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
+    completion_tokens=$(echo "$response" | grep -o '"completion_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
+    total_tokens=$(echo "$response" | grep -o '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
+    
+    if [ -n "$content" ]; then
+        # Unescape JSON content
+        content=$(echo "$content" | sed 's/\\n/\n/g' | sed 's/\\"/"/g')
+        
+        cat <<EOF
+{
+  "app_code": "$app_code",
+  "content": "$content",
+  "usage": {
+    "prompt_tokens": ${prompt_tokens:-0},
+    "completion_tokens": ${completion_tokens:-0},
+    "total_tokens": ${total_tokens:-0}
+  }
+}
+EOF
+    else
+        # Return full response if we can't extract content
+        echo "$response"
+    fi
+else
+    # For stream mode, return raw response (caller needs to handle streaming)
+    echo "$response"
+fi
--- a/skills/openai-image-vision/EXAMPLE.md
+++ b/skills/openai-image-vision/EXAMPLE.md
@@ -0,0 +1,168 @@
+# OpenAI Image Vision - Usage Examples
+
+## Setup
+
+Set up your API credentials using the agent's env_config tool:
+
+```bash
+# Set your OpenAI API key
+env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
+
+# Optional: Set custom API base URL (for proxy or compatible services)
+env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
+```
+
+## Example 1: Analyze a Local Image
+
+```bash
+bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
+```
+
+**Expected Output:**
+```json
+{
+  "model": "gpt-4.1-mini",
+  "content": "The image shows a beautiful landscape with mountains in the background and a lake in the foreground. The sky is clear with some clouds, and there are trees along the shoreline.",
+  "usage": {
+    "prompt_tokens": 1234,
+    "completion_tokens": 45,
+    "total_tokens": 1279
+  }
+}
+```
+
+## Example 2: Analyze an Image from URL
+
+```bash
+bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image in detail"
+```
+
+## Example 3: Extract Text (OCR)
+
+```bash
+bash scripts/vision.sh "document.png" "Extract all text from this image"
+```
+
+**Use Case:** Extract text from screenshots, scanned documents, or photos of text.
+
+## Example 4: Identify Objects
+
+```bash
+bash scripts/vision.sh "scene.jpg" "List all objects you can identify in this image"
+```
+
+## Example 5: Analyze Colors and Composition
+
+```bash
+bash scripts/vision.sh "artwork.jpg" "Describe the color palette and composition of this image"
+```
+
+## Example 6: Count Items
+
+```bash
+bash scripts/vision.sh "crowd.jpg" "How many people are in this image?"
+```
+
+## Example 7: Use Different Models
+
+```bash
+# Use gpt-4.1-mini (default, latest mini model)
+bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
+
+# Use gpt-4.1 (most capable, best for complex analysis)
+bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
+
+# Use gpt-4o-mini (previous mini model)
+bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
+```
+
+## Example 8: Complex Analysis
+
+```bash
+bash scripts/vision.sh "product.jpg" "Analyze this product image. Describe the product, its features, colors, and suggest what kind of marketing copy would work well for it."
+```
+
+## Example 9: Safety and Content Moderation
+
+```bash
+bash scripts/vision.sh "content.jpg" "Is there any inappropriate or unsafe content in this image?"
+```
+
+## Example 10: Technical Analysis
+
+```bash
+bash scripts/vision.sh "diagram.png" "Explain what this technical diagram represents and how it works"
+```
+
+## Integration with Agent
+
+When the agent loads this skill, it will be available in the `<available_skills>` section. The agent can use it like:
+
+```bash
+bash "<base_dir>/scripts/vision.sh" "user_uploaded_image.jpg" "What's in this image?"
+```
+
+The `<base_dir>` will be automatically provided by the skill system.
+
+## Error Handling Examples
+
+### Missing API Key
+```bash
+$ bash scripts/vision.sh "image.jpg" "What is this?"
+{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}
+```
+
+### File Not Found
+```bash
+$ bash scripts/vision.sh "nonexistent.jpg" "What is this?"
+{"error": "Image file not found", "path": "nonexistent.jpg"}
+```
+
+### Unsupported Format
+```bash
+$ bash scripts/vision.sh "file.bmp" "What is this?"
+{"error": "Unsupported image format", "extension": "bmp", "supported": ["jpg", "jpeg", "png", "gif", "webp"]}
+```
+
+### Missing Parameters
+```bash
+$ bash scripts/vision.sh
+{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}
+```
+
+## Tips for Best Results
+
+1. **Be Specific**: Ask clear, specific questions about what you want to know
+2. **Image Quality**: Higher quality images generally produce better results
+3. **Model Selection**: 
+   - Use `gpt-4.1` for complex analysis requiring highest accuracy
+   - Use `gpt-4.1-mini` (default) for most tasks - latest mini model with good balance
+4. **Text Extraction**: For OCR tasks, ensure text is clearly visible and not too small
+5. **Multiple Aspects**: You can ask about multiple things in one question
+6. **Context**: Provide context in your question if needed (e.g., "This is a medical scan, what do you see?")
+
+## Performance Notes
+
+- **Local Files**: Automatically base64-encoded, adds ~33% size overhead
+- **URLs**: Passed directly to API, no encoding overhead
+- **Timeout**: 60 seconds for API calls
+- **Max Tokens**: 1000 tokens for responses (configurable in script)
+- **Rate Limits**: Subject to your OpenAI API plan
+
+## Supported Image Formats
+
+✅ JPEG (`.jpg`, `.jpeg`)  
+✅ PNG (`.png`)  
+✅ GIF (`.gif`)  
+✅ WebP (`.webp`)  
+
+❌ BMP, TIFF, SVG, and other formats are not supported
+
+## Cost Considerations
+
+Vision API calls cost more than text-only calls because they include image tokens. Costs vary by:
+- Model used (gpt-4.1 vs gpt-4.1-mini)
+- Image size and resolution
+- Length of response
+
+Check OpenAI's pricing page for current rates: https://openai.com/pricing
--- a/skills/openai-image-vision/README.md
+++ b/skills/openai-image-vision/README.md
@@ -0,0 +1,178 @@
+# OpenAI Image Vision Skill
+
+This skill enables image analysis using OpenAI's Vision API (GPT-4 Vision models).
+
+## Features
+
+- ✅ Analyze images from local files or URLs
+- ✅ Support for multiple image formats (JPEG, PNG, GIF, WebP)
+- ✅ Automatic base64 encoding for local files
+- ✅ Direct URL passing for remote images
+- ✅ Configurable model selection
+- ✅ Custom API base URL support
+- ✅ Pure bash/curl implementation (no Python dependencies)
+
+## Quick Start
+
+1. **Set up API credentials using env_config:**
+   ```bash
+   env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
+   # Optional: custom API base
+   env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
+   ```
+
+2. **Analyze an image:**
+   ```bash
+   bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
+   ```
+
+3. **Analyze from URL:**
+   ```bash
+   bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
+   ```
+   ```bash
+   bash scripts/vision.sh "/path/to/image.jpg" "What's in this image?"
+   ```
+
+3. **Analyze from URL:**
+   ```bash
+   bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
+   ```
+
+## Usage Examples
+
+### Basic image analysis
+```bash
+bash scripts/vision.sh "photo.jpg" "What objects can you see?"
+```
+
+### Text extraction (OCR)
+```bash
+bash scripts/vision.sh "document.png" "Extract all text from this image"
+```
+
+### Detailed description
+```bash
+bash scripts/vision.sh "scene.jpg" "Describe this scene in detail, including colors, mood, and composition"
+```
+
+### Using different models
+```bash
+# Use gpt-4.1-mini (default, latest mini model)
+bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
+
+# Use gpt-4.1 (most capable, latest model)
+bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
+
+# Use gpt-4o-mini (previous mini model)
+bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
+```
+
+## Environment Variables
+
+| Variable | Required | Default | Description |
+|----------|----------|---------|-------------|
+| `OPENAI_API_KEY` | Yes | - | Your OpenAI API key |
+| `OPENAI_API_BASE` | No | `https://api.openai.com/v1` | Custom API base URL |
+
+## Response Format
+
+Success response:
+```json
+{
+  "model": "gpt-4.1-mini",
+  "content": "The image shows a beautiful sunset over mountains...",
+  "usage": {
+    "prompt_tokens": 1234,
+    "completion_tokens": 567,
+    "total_tokens": 1801
+  }
+}
+```
+
+Error response:
+```json
+{
+  "error": "Error description",
+  "details": "Additional information"
+}
+```
+
+## Supported Models
+
+- `gpt-4.1-mini` (default) - Latest mini model, fast and cost-effective
+- `gpt-4.1` - Latest GPT-4 variant, most capable
+- `gpt-4o-mini` - Previous generation mini model
+- `gpt-4-turbo` - Previous generation turbo model
+
+## Supported Image Formats
+
+- JPEG (`.jpg`, `.jpeg`)
+- PNG (`.png`)
+- GIF (`.gif`)
+- WebP (`.webp`)
+
+## Technical Details
+
+- **Implementation**: Pure bash script using curl and base64
+- **Timeout**: 60 seconds for API calls
+- **Max tokens**: 1000 tokens for responses
+- **Image handling**: 
+  - Local files are automatically base64-encoded
+  - URLs are passed directly to the API
+  - MIME types are auto-detected from file extensions
+
+## Error Handling
+
+The script handles various error cases:
+- Missing required parameters
+- Missing API key
+- File not found
+- Unsupported image formats
+- API errors
+- Network timeouts
+- Invalid JSON responses
+
+## Integration with Agent System
+
+When loaded by the agent system, this skill will appear in `<available_skills>` with a `<base_dir>` path. Use it like:
+
+```bash
+bash "<base_dir>/scripts/vision.sh" "image.jpg" "What's in this image?"
+```
+
+The agent will automatically:
+- Load environment variables from `~/.cow/.env`
+- Provide the correct `<base_dir>` path
+- Handle skill discovery and registration
+
+## Notes
+
+- Images are sent to OpenAI's servers for processing
+- Large images may be automatically resized by the API
+- Rate limits depend on your OpenAI API plan
+- Token usage includes both the image and text in the prompt
+- Base64 encoding increases the size of local images by ~33%
+
+## Troubleshooting
+
+**"OPENAI_API_KEY environment variable is not set"**
+- Set the environment variable using env_config tool
+- Or use the agent's env_config tool
+
+**"Image file not found"**
+- Check the file path is correct
+- Use absolute paths or paths relative to current directory
+
+**"Unsupported image format"**
+- Only JPEG, PNG, GIF, and WebP are supported
+- Check the file extension matches the actual format
+
+**"Failed to call OpenAI API"**
+- Check your internet connection
+- Verify the API key is valid
+- Check if custom API base URL is correct
+
+## License
+
+Part of the chatgpt-on-wechat project.
--- a/skills/openai-image-vision/SKILL.md
+++ b/skills/openai-image-vision/SKILL.md
@@ -0,0 +1,119 @@
+---
+name: openai-image-vision
+description: Analyze images using OpenAI's Vision API. Use bash command to execute the vision script like 'bash <base_dir>/scripts/vision.sh <image> <question>'. Can understand image content, objects, text, colors, and answer questions about images.
+homepage: https://platform.openai.com/docs/guides/vision
+metadata:
+  emoji: 👁️
+  requires:
+    bins: ["curl", "base64"]
+    env: ["OPENAI_API_KEY"]
+  primaryEnv: "OPENAI_API_KEY"
+---
+
+# OpenAI Image Vision
+
+Analyze images using OpenAI's GPT-4 Vision API. The model can understand visual elements including objects, shapes, colors, textures, and text within images.
+
+## Setup
+
+This skill requires an OpenAI API key. If not configured:
+
+1. Get your API key from https://platform.openai.com/api-keys
+2. Set the key using: `env_config(action="set", key="OPENAI_API_KEY", value="your-key")`
+
+Optional: Set custom API base URL (default: https://api.openai.com/v1):
+```bash
+env_config(action="set", key="OPENAI_API_BASE", value="your-base-url")
+```
+
+## Usage
+
+**Important**: Scripts are located relative to this skill's base directory.
+
+When you see this skill in `<available_skills>`, note the `<base_dir>` path.
+
+**CRITICAL**: Always use `bash` command to execute the script:
+
+```bash
+# General pattern (MUST start with bash):
+bash "<base_dir>/scripts/vision.sh" "<image_path_or_url>" "<question>" [model]
+
+# DO NOT execute the script directly like this (WRONG):
+# "<base_dir>/scripts/vision.sh" ...
+
+# Parameters:
+# - image_path_or_url: Local image file path or HTTP(S) URL (required)
+# - question: Question to ask about the image (required)
+# - model: OpenAI model to use (default: gpt-4.1-mini)
+#   Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo
+```
+
+## Examples
+
+### Analyze a local image
+```bash
+bash "<base_dir>/scripts/vision.sh" "/path/to/image.jpg" "What's in this image?"
+```
+
+### Analyze an image from URL
+```bash
+bash "<base_dir>/scripts/vision.sh" "https://example.com/image.jpg" "Describe this image in detail"
+```
+
+### Use specific model
+```bash
+bash "<base_dir>/scripts/vision.sh" "/path/to/photo.png" "What colors are prominent?" "gpt-4o-mini"
+```
+
+### Extract text from image
+```bash
+bash "<base_dir>/scripts/vision.sh" "/path/to/document.jpg" "Extract all text from this image"
+```
+
+### Analyze multiple aspects
+```bash
+bash "<base_dir>/scripts/vision.sh" "image.jpg" "List all objects you can see and describe the overall scene"
+```
+
+## Supported Image Formats
+
+- JPEG (.jpg, .jpeg)
+- PNG (.png)
+- GIF (.gif)
+- WebP (.webp)
+
+**Performance Optimization**: Files larger than 1MB are automatically compressed to 800px (longest side) to avoid command-line parameter limits. This happens transparently without affecting analysis quality.
+
+## Response Format
+
+The script returns a JSON response:
+
+```json
+{
+  "model": "gpt-4.1-mini",
+  "content": "The image shows...",
+  "usage": {
+    "prompt_tokens": 1234,
+    "completion_tokens": 567,
+    "total_tokens": 1801
+  }
+}
+```
+
+Or in case of error:
+
+```json
+{
+  "error": "Error description",
+  "details": "Additional error information"
+}
+```
+
+## Notes
+
+- **Image size**: Images are automatically resized if too large
+- **Timeout**: 60 seconds for API calls
+- **Rate limits**: Subject to your OpenAI API plan limits
+- **Privacy**: Images are sent to OpenAI's servers for processing
+- **Local files**: Automatically converted to base64 for API submission
+- **URLs**: Can be passed directly to the API without downloading
--- a/skills/openai-image-vision/scripts/vision.sh
+++ b/skills/openai-image-vision/scripts/vision.sh
@@ -0,0 +1,233 @@
+#!/usr/bin/env bash
+# OpenAI Vision API wrapper
+# API Docs: https://platform.openai.com/docs/guides/vision
+
+set -euo pipefail
+
+image_input="${1:-}"
+question="${2:-}"
+model="${3:-gpt-4.1-mini}"
+
+if [ -z "$image_input" ]; then
+    echo '{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}'
+    exit 1
+fi
+
+if [ -z "$question" ]; then
+    echo '{"error": "Question is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}'
+    exit 1
+fi
+
+if [ -z "${OPENAI_API_KEY:-}" ]; then
+    echo '{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}'
+    exit 1
+fi
+
+# Set API base URL (default to OpenAI's official endpoint)
+api_base="${OPENAI_API_BASE:-https://api.openai.com/v1}"
+# Remove trailing slash if present
+api_base="${api_base%/}"
+
+# Determine if input is a URL or local file
+if [[ "$image_input" =~ ^https?:// ]]; then
+    # It's a URL - use it directly
+    image_url="$image_input"
+    
+    # Build JSON request body with URL
+    request_body=$(cat <<EOF
+{
+  "model": "$model",
+  "messages": [
+    {
+      "role": "user",
+      "content": [
+        {
+          "type": "text",
+          "text": "$question"
+        },
+        {
+          "type": "image_url",
+          "image_url": {
+            "url": "$image_url"
+          }
+        }
+      ]
+    }
+  ],
+  "max_tokens": 1000
+}
+EOF
+)
+else
+    # It's a local file - need to encode as base64
+    if [ ! -f "$image_input" ]; then
+        echo "{\"error\": \"Image file not found\", \"path\": \"$image_input\"}"
+        exit 1
+    fi
+    
+    # Check file size and compress if needed to avoid "Argument list too long" error
+    # Files larger than 1MB should be compressed
+    file_size=$(wc -c < "$image_input" | tr -d ' ')
+    max_size=1048576  # 1MB
+    
+    image_to_encode="$image_input"
+    temp_compressed=""
+    
+    if [ "$file_size" -gt "$max_size" ]; then
+        # File is too large, compress it
+        temp_compressed=$(mktemp "${TMPDIR:-/tmp}/vision_compressed_XXXXXX.jpg")
+        
+        # Use sips (macOS) or convert (ImageMagick) to compress
+        if command -v sips &> /dev/null; then
+            # macOS: resize to max 800px on longest side
+            sips -Z 800 "$image_input" --out "$temp_compressed" &> /dev/null
+            if [ $? -eq 0 ]; then
+                image_to_encode="$temp_compressed"
+                >&2 echo "[vision.sh] Compressed large image ($(($file_size / 1024))KB) to avoid parameter limit"
+            fi
+        elif command -v convert &> /dev/null; then
+            # Linux: use ImageMagick
+            convert "$image_input" -resize 800x800\> "$temp_compressed" 2>/dev/null
+            if [ $? -eq 0 ]; then
+                image_to_encode="$temp_compressed"
+                >&2 echo "[vision.sh] Compressed large image ($(($file_size / 1024))KB) to avoid parameter limit"
+            fi
+        fi
+    fi
+    
+    # Detect image format from file extension
+    extension="${image_to_encode##*.}"
+    extension_lower=$(echo "$extension" | tr '[:upper:]' '[:lower:]')
+    
+    case "$extension_lower" in
+        jpg|jpeg)
+            mime_type="image/jpeg"
+            ;;
+        png)
+            mime_type="image/png"
+            ;;
+        gif)
+            mime_type="image/gif"
+            ;;
+        webp)
+            mime_type="image/webp"
+            ;;
+        *)
+            echo "{\"error\": \"Unsupported image format\", \"extension\": \"$extension\", \"supported\": [\"jpg\", \"jpeg\", \"png\", \"gif\", \"webp\"]}"
+            # Clean up temp file if exists
+            [ -n "$temp_compressed" ] && rm -f "$temp_compressed"
+            exit 1
+            ;;
+    esac
+    
+    # Encode image to base64
+    if command -v base64 &> /dev/null; then
+        # macOS and most Linux systems
+        base64_image=$(base64 -i "$image_to_encode" 2>/dev/null || base64 "$image_to_encode" 2>/dev/null)
+    else
+        echo '{"error": "base64 command not found", "help": "Please install base64 utility"}'
+        # Clean up temp file if exists
+        [ -n "$temp_compressed" ] && rm -f "$temp_compressed"
+        exit 1
+    fi
+    
+    # Clean up temp compressed file
+    [ -n "$temp_compressed" ] && rm -f "$temp_compressed"
+    
+    if [ -z "$base64_image" ]; then
+        echo "{\"error\": \"Failed to encode image to base64\", \"path\": \"$image_input\"}"
+        exit 1
+    fi
+    
+    # Escape question for JSON (replace " with \")
+    escaped_question=$(echo "$question" | sed 's/"/\\"/g')
+    
+    # Build JSON request body with base64 image
+    # Note: Using printf to avoid issues with special characters
+    request_body=$(cat <<EOF
+{
+  "model": "$model",
+  "messages": [
+    {
+      "role": "user",
+      "content": [
+        {
+          "type": "text",
+          "text": "$escaped_question"
+        },
+        {
+          "type": "image_url",
+          "image_url": {
+            "url": "data:$mime_type;base64,$base64_image"
+          }
+        }
+      ]
+    }
+  ],
+  "max_tokens": 1000
+}
+EOF
+)
+fi
+
+# Call OpenAI API
+response=$(curl -sS --max-time 60 \
+    -X POST \
+    -H "Authorization: Bearer $OPENAI_API_KEY" \
+    -H "Content-Type: application/json" \
+    -d "$request_body" \
+    "$api_base/chat/completions" 2>&1)
+
+curl_exit_code=$?
+
+if [ $curl_exit_code -ne 0 ]; then
+    echo "{\"error\": \"Failed to call OpenAI API\", \"details\": \"$response\"}"
+    exit 1
+fi
+
+# Simple JSON validation - check if response starts with { or [
+if [[ ! "$response" =~ ^[[:space:]]*[\{\[] ]]; then
+    echo "{\"error\": \"Invalid JSON response from API\", \"response\": \"$response\"}"
+    exit 1
+fi
+
+# Check for API error (look for "error" field in response)
+if echo "$response" | grep -q '"error"[[:space:]]*:[[:space:]]*{'; then
+    # Extract error message if possible
+    error_msg=$(echo "$response" | grep -o '"message"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"message"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
+    if [ -z "$error_msg" ]; then
+        error_msg="Unknown API error"
+    fi
+    echo "{\"error\": \"OpenAI API error\", \"message\": \"$error_msg\", \"response\": $response}"
+    exit 1
+fi
+
+# Extract the content from the response
+# The response structure is: choices[0].message.content
+content=$(echo "$response" | grep -o '"content"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"content"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
+
+# Extract usage information
+prompt_tokens=$(echo "$response" | grep -o '"prompt_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
+completion_tokens=$(echo "$response" | grep -o '"completion_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
+total_tokens=$(echo "$response" | grep -o '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
+
+# Build simplified response
+if [ -n "$content" ]; then
+    # Unescape JSON content (basic unescaping)
+    content=$(echo "$content" | sed 's/\\n/\n/g' | sed 's/\\"/"/g')
+    
+    cat <<EOF
+{
+  "model": "$model",
+  "content": "$content",
+  "usage": {
+    "prompt_tokens": ${prompt_tokens:-0},
+    "completion_tokens": ${completion_tokens:-0},
+    "total_tokens": ${total_tokens:-0}
+  }
+}
+EOF
+else
+    # If we can't extract content, return the full response
+    echo "$response"
+fi