feat: add skills and upgrade feishu/dingtalk channel

This commit is contained in:
zhayujie
2026-02-02 00:42:39 +08:00
parent 77c2bfcc1e
commit a8d5309c90
32 changed files with 2931 additions and 200 deletions

View File

@@ -0,0 +1,297 @@
# LinkAI Agent Skill
这个 skill 允许你调用 LinkAI 平台上的多个应用(App)和工作流(Workflow),通过简单的配置即可集成多个智能体能力。
## 特性
-**多应用支持** - 在一个配置文件中管理多个 LinkAI 应用/工作流
-**动态加载** - skill 系统加载时自动从 `config.json` 读取应用列表
-**自动技能描述** - 所有配置的应用会自动添加到技能描述中
-**模型切换** - 可以为每个请求指定不同的模型
-**知识库集成** - 支持应用绑定的知识库
-**插件能力** - 支持应用启用的各类插件
-**工作流执行** - 支持执行复杂的多步骤工作流
## 快速开始
### 1. 配置 API Key
```bash
env_config(action="set", key="LINKAI_API_KEY", value="your-linkai-api-key")
```
获取 API Key: https://link-ai.tech/console/interface
### 2. 配置应用列表
`config.json.template` 复制为 `config.json`
```bash
cp config.json.template config.json
```
编辑 `config.json`,添加你的应用/工作流:
```json
{
"apps": [
{
"app_code": "G7z6vKwp",
"app_name": "通用助手",
"app_description": "通用AI助手可以回答各类问题"
},
{
"app_code": "your_kb_app",
"app_name": "产品文档助手",
"app_description": "基于产品文档知识库的问答助手"
},
{
"app_code": "your_workflow",
"app_name": "数据分析工作流",
"app_description": "执行数据清洗、分析和可视化的完整工作流"
}
]
}
```
**注意:** 修改 `config.json`Agent 在下次加载技能时会自动读取新配置。
### 3. 调用应用
```bash
bash scripts/call.sh "G7z6vKwp" "What is artificial intelligence?"
```
## 使用示例
### 基础调用
```bash
# 调用默认模型
bash scripts/call.sh "G7z6vKwp" "解释一下量子计算"
```
### 指定模型
```bash
# 使用 GPT-4.1 模型
bash scripts/call.sh "G7z6vKwp" "写一篇关于AI的文章" "LinkAI-4.1"
# 使用 DeepSeek 模型
bash scripts/call.sh "G7z6vKwp" "帮我写代码" "deepseek-chat"
# 使用 Claude 模型
bash scripts/call.sh "G7z6vKwp" "分析这段文本" "claude-4-sonnet"
```
### 调用工作流
```bash
# 工作流会按照配置的节点顺序执行
bash scripts/call.sh "workflow_code" "输入数据或问题"
```
## ⚠️ 重要提示
### 超时配置
LinkAI 应用(特别是视频/图片生成、复杂工作流)可能需要较长时间处理。
**脚本内置超时**
- 默认120 秒(适合大多数场景)
- 可通过第 5 个参数自定义:`bash scripts/call.sh <app_code> <question> "" "false" "180"`
**推荐超时时间**
- **文本问答**120 秒(默认)
- **图片生成**120-180 秒
- **视频生成**180-300 秒
Agent 调用时会自动设置合适的超时时间。
## 配置说明
### config.json 字段
| 字段 | 类型 | 说明 |
|------|------|------|
| `app_code` | string | 应用或工作流的唯一标识码,从 LinkAI 控制台获取 |
| `app_name` | string | 应用名称,会显示在技能描述中 |
| `app_description` | string | 应用功能描述,帮助 Agent 理解何时使用该应用 |
### 获取 app_code
1. 登录 [LinkAI 控制台](https://link-ai.tech/console)
2. 进入「应用管理」或「工作流管理」
3. 选择要集成的应用/工作流
4. 在应用详情页找到 `app_code`
## 支持的模型
LinkAI 支持多种主流 AI 模型:
**OpenAI 系列:**
- `LinkAI-4.1` - GPT-4.1 (1000K 上下文)
- `LinkAI-4.1-mini` - GPT-4.1 mini (1000K)
- `LinkAI-4.1-nano` - GPT-4.1 nano (1000K)
- `LinkAI-4o` - GPT-4o (128K)
- `LinkAI-4o-mini` - GPT-4o mini (128K)
**DeepSeek 系列:**
- `deepseek-chat` - DeepSeek-V3 对话模型 (64K)
- `deepseek-reasoner` - DeepSeek-R1 推理模型 (64K)
**Claude 系列:**
- `claude-4-sonnet` - Claude 4 Sonnet (200K)
- `claude-3-7-sonnet` - Claude 3.7 (200K)
- `claude-3-5-sonnet` - Claude 3.5 (200K)
**Google 系列:**
- `gemini-2.5-pro` - Gemini 2.5 Pro (1000K)
- `gemini-2.0-flash` - Gemini 2.0 Flash (1000K)
**国产模型:**
- `qwen3` - 通义千问3 (128K)
- `wenxin-4.5` - 文心一言4.5 (8K)
- `doubao-1.5-pro-256k` - 豆包1.5 (256K)
- `glm-4-plus` - 智谱GLM-4-Plus (4K)
完整模型列表https://link-ai.tech/console/models
## 应用类型
### 1. 普通应用
配置了系统提示词和参数的标准对话应用,可以:
- 设置角色和性格
- 绑定知识库
- 启用插件(图像识别、网页搜索、代码执行等)
### 2. 知识库应用
基于特定知识库的问答应用,适合:
- 企业内部知识库
- 产品文档问答
- 客户支持
### 3. 工作流
多步骤的自动化流程,可以:
- 串联多个处理节点
- 条件分支
- 循环处理
- 调用外部 API
## 响应格式
### 成功响应
```json
{
"app_code": "G7z6vKwp",
"content": "人工智能AI是计算机科学的一个分支...",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 150,
"total_tokens": 160
}
}
```
### 错误响应
```json
{
"error": "LinkAI API error",
"message": "应用不存在",
"response": { ... }
}
```
## 常见错误
### LINKAI_API_KEY environment variable is not set
**原因:** 未配置 API Key
**解决:** 使用 `env_config` 工具设置 LINKAI_API_KEY
### 应用不存在 (402)
**原因:** app_code 不正确或应用已删除
**解决:** 检查 app_code 是否正确,确认应用存在
### 无访问权限 (403)
**原因:** 尝试访问他人的私有应用
**解决:** 确保应用是公开的或你是创建者
### 账号积分额度不足 (406)
**原因:** LinkAI 账户余额不足
**解决:** 前往控制台充值
### 内容审核不通过 (409)
**原因:** 请求或响应包含敏感内容
**解决:** 修改输入内容,避免敏感词
## 技术实现
### 自动技能描述生成
当 skill 系统加载 `linkai-agent` 时,会自动:
1. 读取 `config.json` 中的应用列表
2. 将每个应用的 name 和 description 动态添加到技能描述中
3. Agent 加载时会看到完整的应用列表
这是在 `agent/skills/loader.py` 中实现的特殊处理。
### 工作流程
```
用户配置 config.json
Agent 启动/重新加载技能
SkillLoader 检测到 linkai-agent
动态读取 config.json
生成包含所有应用描述的 description
Agent 看到所有可用应用的完整信息
用户请求触发
Agent 根据描述选择合适的应用
调用 call.sh <app_code> <question>
LinkAI API 处理并返回结果
```
## 最佳实践
1. **清晰的描述** - 为每个应用写清晰、具体的描述,帮助 Agent 理解应用用途
2. **合理分工** - 不同应用负责不同领域,避免功能重叠
3. **无需重启** - 修改 config.json 后Agent 下次加载技能时会自动更新
4. **模型选择** - 根据任务复杂度选择合适的模型
5. **知识库优化** - 为专业领域的应用绑定相关知识库
## 扩展用法
### 在 Agent 系统中使用
当 Agent 系统加载这个 skill 时,会自动从 `config.json` 读取应用列表并生成描述:
```
Call LinkAI apps/workflows. 通用助手(G7z6vKwp: 通用AI助手可以回答各类问题); 产品文档助手(kb_app_001: 基于产品文档知识库的问答助手); 数据分析工作流(wf_002: 执行数据清洗、分析和可视化的完整工作流)
```
Agent 会根据用户问题自动选择最合适的应用进行调用。
## 相关链接
- LinkAI 平台: https://link-ai.tech
- API 文档: https://docs.link-ai.tech
- 控制台: https://link-ai.tech/console
- 模型列表: https://link-ai.tech/console/models
- 应用广场: https://link-ai.tech/square
## License
Part of the chatgpt-on-wechat project.

View File

@@ -0,0 +1,165 @@
---
name: linkai-agent
description: Call LinkAI applications and workflows. Use bash command to execute like 'bash <base_dir>/scripts/call.sh <app_code> <question>'.
homepage: https://link-ai.tech
metadata:
emoji: 🤖
requires:
bins: ["curl"]
env: ["LINKAI_API_KEY"]
primaryEnv: "LINKAI_API_KEY"
---
# LinkAI Agent Caller
Call LinkAI applications and workflows through API. Supports multiple apps/workflows configured in config.json.
The available apps are dynamically loaded from `config.json` at skill loading time.
## Setup
This skill requires a LinkAI API key. If not configured:
1. Get your API key from https://link-ai.tech/console/api-keys
2. Set the key using: `env_config(action="set", key="LINKAI_API_KEY", value="your-key")`
## Configuration
1. Copy `config.json.template` to `config.json`
2. Configure your apps/workflows:
```json
{
"apps": [
{
"app_code": "your_app_code",
"app_name": "App Name",
"app_description": "What this app does"
}
]
}
```
3. The skill description will be automatically updated when the agent loads this skill
## Usage
**Important**: Scripts are located relative to this skill's base directory.
When you see this skill in `<available_skills>`, note the `<base_dir>` path.
**CRITICAL**: Always use `bash` command to execute the script:
```bash
# General pattern (MUST start with bash):
bash "<base_dir>/scripts/call.sh" "<app_code>" "<question>" [model] [stream] [timeout]
# DO NOT execute the script directly like this (WRONG):
# "<base_dir>/scripts/call.sh" ...
# Parameters:
# - app_code: LinkAI app or workflow code (required)
# - question: User question (required)
# - model: Override model (optional, uses app default if not specified)
# - stream: Enable streaming (true/false, default: false)
# - timeout: curl timeout in seconds (default: 120, recommended for video/image generation)
```
**IMPORTANT - Timeout Configuration**:
- The script has a **default timeout of 120 seconds** (suitable for most cases)
- For complex tasks (video generation, large workflows), pass a longer timeout as the 5th parameter
- The bash tool also needs sufficient timeout - set its `timeout` parameter accordingly
- Example: `bash(command="bash <script> <app_code> <question> '' 'false' 180", timeout=200)`
## Examples
### Call an app (uses default 60s timeout)
```bash
bash(command='bash "<base_dir>/scripts/call.sh" "G7z6vKwp" "What is AI?"', timeout=60)
```
### Call an app with specific model
```bash
bash(command='bash "<base_dir>/scripts/call.sh" "G7z6vKwp" "Explain machine learning" "LinkAI-4.1"', timeout=60)
```
### Call a workflow with custom timeout (video generation)
```bash
# Pass timeout as 5th parameter to script, and set bash timeout slightly longer
bash(command='bash "<base_dir>/scripts/call.sh" "workflow_code" "Generate a sunset video" "" "false" "180"', timeout=180)
```
```bash
bash "<base_dir>/scripts/call.sh" "workflow_code" "Analyze this data: ..."
```
## Supported Models
You can specify any LinkAI supported model:
- `LinkAI-4.1` - Latest GPT-4.1 model (1000K context)
- `LinkAI-4.1-mini` - GPT-4.1 mini (1000K context)
- `LinkAI-4o` - GPT-4o model (128K context)
- `LinkAI-4o-mini` - GPT-4o mini (128K context)
- `deepseek-chat` - DeepSeek-V3 (64K context)
- `deepseek-reasoner` - DeepSeek-R1 reasoning model
- `claude-4-sonnet` - Claude 4 Sonnet (200K context)
- `gemini-2.5-pro` - Gemini 2.5 Pro (1000K context)
- And many more...
Full model list: https://link-ai.tech/console/models
## Response Format
Success response:
```json
{
"app_code": "G7z6vKwp",
"content": "AI stands for Artificial Intelligence...",
"usage": {
"prompt_tokens": 10,
"completion_tokens": 50,
"total_tokens": 60
}
}
```
Error response:
```json
{
"error": "Error description",
"message": "Detailed error message"
}
```
## Features
-**Multiple Apps**: Configure and call multiple LinkAI apps/workflows
-**Dynamic Loading**: Apps are loaded from config.json at runtime
-**Model Override**: Optionally specify model per request
-**Streaming Support**: Enable streaming output
-**Knowledge Base**: Apps can use configured knowledge bases
-**Plugins**: Apps can use enabled plugins (image recognition, web search, etc.)
-**Workflows**: Execute complex multi-step workflows
## Notes
- Each app/workflow maintains its own configuration (prompt, model, temperature, etc.)
- Apps can have knowledge bases attached for domain-specific Q&A
- Workflows execute from start node to end node and return final output
- Token usage and costs depend on the model used
- See LinkAI documentation for pricing: https://link-ai.tech/console/funds
- The skill description is automatically generated from config.json when loaded
## Troubleshooting
**"LINKAI_API_KEY environment variable is not set"**
- Use env_config tool to set the API key
**"app_code is required"**
- Make sure you're passing the app_code as the first parameter
**"应用不存在" (App not found)**
- Check that the app_code is correct
- Ensure you have access to the app
**"账号积分额度不足" (Insufficient credits)**
- Top up your LinkAI account credits

View File

@@ -0,0 +1,14 @@
{
"apps": [
{
"app_code": "your_app_code_2",
"app_name": "知识库助手",
"app_description": "基于特定领域知识库提供智能问答的知识助手"
},
{
"app_code": "your_workflow_code",
"app_name": "数据分析工作流",
"app_description": "用于数据分析任务的工作流程"
}
]
}

View File

@@ -0,0 +1,138 @@
#!/usr/bin/env bash
# LinkAI Agent Caller
# API Docs: https://api.link-ai.tech/v1/chat/completions
set -euo pipefail
app_code="${1:-}"
question="${2:-}"
model="${3:-}"
stream="${4:-false}"
timeout="${5:-120}" # Default 120 seconds for video/image generation
if [ -z "$app_code" ]; then
echo '{"error": "app_code is required", "usage": "bash call.sh <app_code> <question> [model] [stream] [timeout]"}'
exit 1
fi
if [ -z "$question" ]; then
echo '{"error": "question is required", "usage": "bash call.sh <app_code> <question> [model] [stream] [timeout]"}'
exit 1
fi
if [ -z "${LINKAI_API_KEY:-}" ]; then
echo '{"error": "LINKAI_API_KEY environment variable is not set", "help": "Use env_config to set LINKAI_API_KEY"}'
exit 1
fi
# API endpoint
api_url="https://api.link-ai.tech/v1/chat/completions"
# Build JSON request body
if [ -n "$model" ]; then
request_body=$(cat <<EOF
{
"app_code": "$app_code",
"model": "$model",
"messages": [
{
"role": "user",
"content": "$question"
}
],
"stream": $stream
}
EOF
)
else
request_body=$(cat <<EOF
{
"app_code": "$app_code",
"messages": [
{
"role": "user",
"content": "$question"
}
],
"stream": $stream
}
EOF
)
fi
# Call LinkAI API
response=$(curl -sS --max-time "$timeout" \
-X POST \
-H "Authorization: Bearer $LINKAI_API_KEY" \
-H "Content-Type: application/json" \
-d "$request_body" \
"$api_url" 2>&1)
curl_exit_code=$?
if [ $curl_exit_code -ne 0 ]; then
echo "{\"error\": \"Failed to call LinkAI API\", \"details\": \"$response\"}"
exit 1
fi
# Simple JSON validation
if [[ ! "$response" =~ ^[[:space:]]*[\{\[] ]]; then
echo "{\"error\": \"Invalid JSON response from API\", \"response\": \"$response\"}"
exit 1
fi
# Check for API error (top-level error only, not content_filter_result)
if echo "$response" | grep -q '^[[:space:]]*{[[:space:]]*"error"[[:space:]]*:' || echo "$response" | grep -q '"error"[[:space:]]*:[[:space:]]*{[^}]*"code"[[:space:]]*:[[:space:]]*"[^"]*"[^}]*"message"'; then
# Make sure it's not just content_filter_result inside choices
if ! echo "$response" | grep -q '"choices"[[:space:]]*:[[:space:]]*\['; then
# Extract error message
error_msg=$(echo "$response" | grep -o '"message"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"message"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
error_code=$(echo "$response" | grep -o '"code"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"code"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
if [ -z "$error_msg" ]; then
error_msg="Unknown API error"
fi
# Provide friendly error message for content filter
if [ "$error_code" = "content_filter_error" ] || echo "$error_msg" | grep -qi "content.*filter"; then
echo "{\"error\": \"内容安全审核\", \"message\": \"您的问题或应用返回的内容触发了LinkAI的安全审核机制请换一种方式提问或检查应用配置\", \"details\": \"$error_msg\"}"
else
echo "{\"error\": \"LinkAI API error\", \"message\": \"$error_msg\", \"code\": \"$error_code\"}"
fi
exit 1
fi
fi
# For non-stream mode, extract and format the response
if [ "$stream" = "false" ]; then
# Extract content from response
content=$(echo "$response" | grep -o '"content"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"content"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
# Extract usage information
prompt_tokens=$(echo "$response" | grep -o '"prompt_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
completion_tokens=$(echo "$response" | grep -o '"completion_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
total_tokens=$(echo "$response" | grep -o '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
if [ -n "$content" ]; then
# Unescape JSON content
content=$(echo "$content" | sed 's/\\n/\n/g' | sed 's/\\"/"/g')
cat <<EOF
{
"app_code": "$app_code",
"content": "$content",
"usage": {
"prompt_tokens": ${prompt_tokens:-0},
"completion_tokens": ${completion_tokens:-0},
"total_tokens": ${total_tokens:-0}
}
}
EOF
else
# Return full response if we can't extract content
echo "$response"
fi
else
# For stream mode, return raw response (caller needs to handle streaming)
echo "$response"
fi

View File

@@ -0,0 +1,168 @@
# OpenAI Image Vision - Usage Examples
## Setup
Set up your API credentials using the agent's env_config tool:
```bash
# Set your OpenAI API key
env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
# Optional: Set custom API base URL (for proxy or compatible services)
env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
```
## Example 1: Analyze a Local Image
```bash
bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
```
**Expected Output:**
```json
{
"model": "gpt-4.1-mini",
"content": "The image shows a beautiful landscape with mountains in the background and a lake in the foreground. The sky is clear with some clouds, and there are trees along the shoreline.",
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 45,
"total_tokens": 1279
}
}
```
## Example 2: Analyze an Image from URL
```bash
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image in detail"
```
## Example 3: Extract Text (OCR)
```bash
bash scripts/vision.sh "document.png" "Extract all text from this image"
```
**Use Case:** Extract text from screenshots, scanned documents, or photos of text.
## Example 4: Identify Objects
```bash
bash scripts/vision.sh "scene.jpg" "List all objects you can identify in this image"
```
## Example 5: Analyze Colors and Composition
```bash
bash scripts/vision.sh "artwork.jpg" "Describe the color palette and composition of this image"
```
## Example 6: Count Items
```bash
bash scripts/vision.sh "crowd.jpg" "How many people are in this image?"
```
## Example 7: Use Different Models
```bash
# Use gpt-4.1-mini (default, latest mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
# Use gpt-4.1 (most capable, best for complex analysis)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
# Use gpt-4o-mini (previous mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
```
## Example 8: Complex Analysis
```bash
bash scripts/vision.sh "product.jpg" "Analyze this product image. Describe the product, its features, colors, and suggest what kind of marketing copy would work well for it."
```
## Example 9: Safety and Content Moderation
```bash
bash scripts/vision.sh "content.jpg" "Is there any inappropriate or unsafe content in this image?"
```
## Example 10: Technical Analysis
```bash
bash scripts/vision.sh "diagram.png" "Explain what this technical diagram represents and how it works"
```
## Integration with Agent
When the agent loads this skill, it will be available in the `<available_skills>` section. The agent can use it like:
```bash
bash "<base_dir>/scripts/vision.sh" "user_uploaded_image.jpg" "What's in this image?"
```
The `<base_dir>` will be automatically provided by the skill system.
## Error Handling Examples
### Missing API Key
```bash
$ bash scripts/vision.sh "image.jpg" "What is this?"
{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}
```
### File Not Found
```bash
$ bash scripts/vision.sh "nonexistent.jpg" "What is this?"
{"error": "Image file not found", "path": "nonexistent.jpg"}
```
### Unsupported Format
```bash
$ bash scripts/vision.sh "file.bmp" "What is this?"
{"error": "Unsupported image format", "extension": "bmp", "supported": ["jpg", "jpeg", "png", "gif", "webp"]}
```
### Missing Parameters
```bash
$ bash scripts/vision.sh
{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}
```
## Tips for Best Results
1. **Be Specific**: Ask clear, specific questions about what you want to know
2. **Image Quality**: Higher quality images generally produce better results
3. **Model Selection**:
- Use `gpt-4.1` for complex analysis requiring highest accuracy
- Use `gpt-4.1-mini` (default) for most tasks - latest mini model with good balance
4. **Text Extraction**: For OCR tasks, ensure text is clearly visible and not too small
5. **Multiple Aspects**: You can ask about multiple things in one question
6. **Context**: Provide context in your question if needed (e.g., "This is a medical scan, what do you see?")
## Performance Notes
- **Local Files**: Automatically base64-encoded, adds ~33% size overhead
- **URLs**: Passed directly to API, no encoding overhead
- **Timeout**: 60 seconds for API calls
- **Max Tokens**: 1000 tokens for responses (configurable in script)
- **Rate Limits**: Subject to your OpenAI API plan
## Supported Image Formats
✅ JPEG (`.jpg`, `.jpeg`)
✅ PNG (`.png`)
✅ GIF (`.gif`)
✅ WebP (`.webp`)
❌ BMP, TIFF, SVG, and other formats are not supported
## Cost Considerations
Vision API calls cost more than text-only calls because they include image tokens. Costs vary by:
- Model used (gpt-4.1 vs gpt-4.1-mini)
- Image size and resolution
- Length of response
Check OpenAI's pricing page for current rates: https://openai.com/pricing

View File

@@ -0,0 +1,178 @@
# OpenAI Image Vision Skill
This skill enables image analysis using OpenAI's Vision API (GPT-4 Vision models).
## Features
- ✅ Analyze images from local files or URLs
- ✅ Support for multiple image formats (JPEG, PNG, GIF, WebP)
- ✅ Automatic base64 encoding for local files
- ✅ Direct URL passing for remote images
- ✅ Configurable model selection
- ✅ Custom API base URL support
- ✅ Pure bash/curl implementation (no Python dependencies)
## Quick Start
1. **Set up API credentials using env_config:**
```bash
env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
# Optional: custom API base
env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
```
2. **Analyze an image:**
```bash
bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
```
3. **Analyze from URL:**
```bash
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
```
```bash
bash scripts/vision.sh "/path/to/image.jpg" "What's in this image?"
```
3. **Analyze from URL:**
```bash
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
```
## Usage Examples
### Basic image analysis
```bash
bash scripts/vision.sh "photo.jpg" "What objects can you see?"
```
### Text extraction (OCR)
```bash
bash scripts/vision.sh "document.png" "Extract all text from this image"
```
### Detailed description
```bash
bash scripts/vision.sh "scene.jpg" "Describe this scene in detail, including colors, mood, and composition"
```
### Using different models
```bash
# Use gpt-4.1-mini (default, latest mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
# Use gpt-4.1 (most capable, latest model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
# Use gpt-4o-mini (previous mini model)
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
```
## Environment Variables
| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `OPENAI_API_KEY` | Yes | - | Your OpenAI API key |
| `OPENAI_API_BASE` | No | `https://api.openai.com/v1` | Custom API base URL |
## Response Format
Success response:
```json
{
"model": "gpt-4.1-mini",
"content": "The image shows a beautiful sunset over mountains...",
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801
}
}
```
Error response:
```json
{
"error": "Error description",
"details": "Additional information"
}
```
## Supported Models
- `gpt-4.1-mini` (default) - Latest mini model, fast and cost-effective
- `gpt-4.1` - Latest GPT-4 variant, most capable
- `gpt-4o-mini` - Previous generation mini model
- `gpt-4-turbo` - Previous generation turbo model
## Supported Image Formats
- JPEG (`.jpg`, `.jpeg`)
- PNG (`.png`)
- GIF (`.gif`)
- WebP (`.webp`)
## Technical Details
- **Implementation**: Pure bash script using curl and base64
- **Timeout**: 60 seconds for API calls
- **Max tokens**: 1000 tokens for responses
- **Image handling**:
- Local files are automatically base64-encoded
- URLs are passed directly to the API
- MIME types are auto-detected from file extensions
## Error Handling
The script handles various error cases:
- Missing required parameters
- Missing API key
- File not found
- Unsupported image formats
- API errors
- Network timeouts
- Invalid JSON responses
## Integration with Agent System
When loaded by the agent system, this skill will appear in `<available_skills>` with a `<base_dir>` path. Use it like:
```bash
bash "<base_dir>/scripts/vision.sh" "image.jpg" "What's in this image?"
```
The agent will automatically:
- Load environment variables from `~/.cow/.env`
- Provide the correct `<base_dir>` path
- Handle skill discovery and registration
## Notes
- Images are sent to OpenAI's servers for processing
- Large images may be automatically resized by the API
- Rate limits depend on your OpenAI API plan
- Token usage includes both the image and text in the prompt
- Base64 encoding increases the size of local images by ~33%
## Troubleshooting
**"OPENAI_API_KEY environment variable is not set"**
- Set the environment variable using env_config tool
- Or use the agent's env_config tool
**"Image file not found"**
- Check the file path is correct
- Use absolute paths or paths relative to current directory
**"Unsupported image format"**
- Only JPEG, PNG, GIF, and WebP are supported
- Check the file extension matches the actual format
**"Failed to call OpenAI API"**
- Check your internet connection
- Verify the API key is valid
- Check if custom API base URL is correct
## License
Part of the chatgpt-on-wechat project.

View File

@@ -0,0 +1,119 @@
---
name: openai-image-vision
description: Analyze images using OpenAI's Vision API. Use bash command to execute the vision script like 'bash <base_dir>/scripts/vision.sh <image> <question>'. Can understand image content, objects, text, colors, and answer questions about images.
homepage: https://platform.openai.com/docs/guides/vision
metadata:
emoji: 👁️
requires:
bins: ["curl", "base64"]
env: ["OPENAI_API_KEY"]
primaryEnv: "OPENAI_API_KEY"
---
# OpenAI Image Vision
Analyze images using OpenAI's GPT-4 Vision API. The model can understand visual elements including objects, shapes, colors, textures, and text within images.
## Setup
This skill requires an OpenAI API key. If not configured:
1. Get your API key from https://platform.openai.com/api-keys
2. Set the key using: `env_config(action="set", key="OPENAI_API_KEY", value="your-key")`
Optional: Set custom API base URL (default: https://api.openai.com/v1):
```bash
env_config(action="set", key="OPENAI_API_BASE", value="your-base-url")
```
## Usage
**Important**: Scripts are located relative to this skill's base directory.
When you see this skill in `<available_skills>`, note the `<base_dir>` path.
**CRITICAL**: Always use `bash` command to execute the script:
```bash
# General pattern (MUST start with bash):
bash "<base_dir>/scripts/vision.sh" "<image_path_or_url>" "<question>" [model]
# DO NOT execute the script directly like this (WRONG):
# "<base_dir>/scripts/vision.sh" ...
# Parameters:
# - image_path_or_url: Local image file path or HTTP(S) URL (required)
# - question: Question to ask about the image (required)
# - model: OpenAI model to use (default: gpt-4.1-mini)
# Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo
```
## Examples
### Analyze a local image
```bash
bash "<base_dir>/scripts/vision.sh" "/path/to/image.jpg" "What's in this image?"
```
### Analyze an image from URL
```bash
bash "<base_dir>/scripts/vision.sh" "https://example.com/image.jpg" "Describe this image in detail"
```
### Use specific model
```bash
bash "<base_dir>/scripts/vision.sh" "/path/to/photo.png" "What colors are prominent?" "gpt-4o-mini"
```
### Extract text from image
```bash
bash "<base_dir>/scripts/vision.sh" "/path/to/document.jpg" "Extract all text from this image"
```
### Analyze multiple aspects
```bash
bash "<base_dir>/scripts/vision.sh" "image.jpg" "List all objects you can see and describe the overall scene"
```
## Supported Image Formats
- JPEG (.jpg, .jpeg)
- PNG (.png)
- GIF (.gif)
- WebP (.webp)
**Performance Optimization**: Files larger than 1MB are automatically compressed to 800px (longest side) to avoid command-line parameter limits. This happens transparently without affecting analysis quality.
## Response Format
The script returns a JSON response:
```json
{
"model": "gpt-4.1-mini",
"content": "The image shows...",
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801
}
}
```
Or in case of error:
```json
{
"error": "Error description",
"details": "Additional error information"
}
```
## Notes
- **Image size**: Images are automatically resized if too large
- **Timeout**: 60 seconds for API calls
- **Rate limits**: Subject to your OpenAI API plan limits
- **Privacy**: Images are sent to OpenAI's servers for processing
- **Local files**: Automatically converted to base64 for API submission
- **URLs**: Can be passed directly to the API without downloading

View File

@@ -0,0 +1,233 @@
#!/usr/bin/env bash
# OpenAI Vision API wrapper
# API Docs: https://platform.openai.com/docs/guides/vision
set -euo pipefail
image_input="${1:-}"
question="${2:-}"
model="${3:-gpt-4.1-mini}"
if [ -z "$image_input" ]; then
echo '{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}'
exit 1
fi
if [ -z "$question" ]; then
echo '{"error": "Question is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}'
exit 1
fi
if [ -z "${OPENAI_API_KEY:-}" ]; then
echo '{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}'
exit 1
fi
# Set API base URL (default to OpenAI's official endpoint)
api_base="${OPENAI_API_BASE:-https://api.openai.com/v1}"
# Remove trailing slash if present
api_base="${api_base%/}"
# Determine if input is a URL or local file
if [[ "$image_input" =~ ^https?:// ]]; then
# It's a URL - use it directly
image_url="$image_input"
# Build JSON request body with URL
request_body=$(cat <<EOF
{
"model": "$model",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "$question"
},
{
"type": "image_url",
"image_url": {
"url": "$image_url"
}
}
]
}
],
"max_tokens": 1000
}
EOF
)
else
# It's a local file - need to encode as base64
if [ ! -f "$image_input" ]; then
echo "{\"error\": \"Image file not found\", \"path\": \"$image_input\"}"
exit 1
fi
# Check file size and compress if needed to avoid "Argument list too long" error
# Files larger than 1MB should be compressed
file_size=$(wc -c < "$image_input" | tr -d ' ')
max_size=1048576 # 1MB
image_to_encode="$image_input"
temp_compressed=""
if [ "$file_size" -gt "$max_size" ]; then
# File is too large, compress it
temp_compressed=$(mktemp "${TMPDIR:-/tmp}/vision_compressed_XXXXXX.jpg")
# Use sips (macOS) or convert (ImageMagick) to compress
if command -v sips &> /dev/null; then
# macOS: resize to max 800px on longest side
sips -Z 800 "$image_input" --out "$temp_compressed" &> /dev/null
if [ $? -eq 0 ]; then
image_to_encode="$temp_compressed"
>&2 echo "[vision.sh] Compressed large image ($(($file_size / 1024))KB) to avoid parameter limit"
fi
elif command -v convert &> /dev/null; then
# Linux: use ImageMagick
convert "$image_input" -resize 800x800\> "$temp_compressed" 2>/dev/null
if [ $? -eq 0 ]; then
image_to_encode="$temp_compressed"
>&2 echo "[vision.sh] Compressed large image ($(($file_size / 1024))KB) to avoid parameter limit"
fi
fi
fi
# Detect image format from file extension
extension="${image_to_encode##*.}"
extension_lower=$(echo "$extension" | tr '[:upper:]' '[:lower:]')
case "$extension_lower" in
jpg|jpeg)
mime_type="image/jpeg"
;;
png)
mime_type="image/png"
;;
gif)
mime_type="image/gif"
;;
webp)
mime_type="image/webp"
;;
*)
echo "{\"error\": \"Unsupported image format\", \"extension\": \"$extension\", \"supported\": [\"jpg\", \"jpeg\", \"png\", \"gif\", \"webp\"]}"
# Clean up temp file if exists
[ -n "$temp_compressed" ] && rm -f "$temp_compressed"
exit 1
;;
esac
# Encode image to base64
if command -v base64 &> /dev/null; then
# macOS and most Linux systems
base64_image=$(base64 -i "$image_to_encode" 2>/dev/null || base64 "$image_to_encode" 2>/dev/null)
else
echo '{"error": "base64 command not found", "help": "Please install base64 utility"}'
# Clean up temp file if exists
[ -n "$temp_compressed" ] && rm -f "$temp_compressed"
exit 1
fi
# Clean up temp compressed file
[ -n "$temp_compressed" ] && rm -f "$temp_compressed"
if [ -z "$base64_image" ]; then
echo "{\"error\": \"Failed to encode image to base64\", \"path\": \"$image_input\"}"
exit 1
fi
# Escape question for JSON (replace " with \")
escaped_question=$(echo "$question" | sed 's/"/\\"/g')
# Build JSON request body with base64 image
# Note: Using printf to avoid issues with special characters
request_body=$(cat <<EOF
{
"model": "$model",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "$escaped_question"
},
{
"type": "image_url",
"image_url": {
"url": "data:$mime_type;base64,$base64_image"
}
}
]
}
],
"max_tokens": 1000
}
EOF
)
fi
# Call OpenAI API
response=$(curl -sS --max-time 60 \
-X POST \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d "$request_body" \
"$api_base/chat/completions" 2>&1)
curl_exit_code=$?
if [ $curl_exit_code -ne 0 ]; then
echo "{\"error\": \"Failed to call OpenAI API\", \"details\": \"$response\"}"
exit 1
fi
# Simple JSON validation - check if response starts with { or [
if [[ ! "$response" =~ ^[[:space:]]*[\{\[] ]]; then
echo "{\"error\": \"Invalid JSON response from API\", \"response\": \"$response\"}"
exit 1
fi
# Check for API error (look for "error" field in response)
if echo "$response" | grep -q '"error"[[:space:]]*:[[:space:]]*{'; then
# Extract error message if possible
error_msg=$(echo "$response" | grep -o '"message"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"message"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
if [ -z "$error_msg" ]; then
error_msg="Unknown API error"
fi
echo "{\"error\": \"OpenAI API error\", \"message\": \"$error_msg\", \"response\": $response}"
exit 1
fi
# Extract the content from the response
# The response structure is: choices[0].message.content
content=$(echo "$response" | grep -o '"content"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"content"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
# Extract usage information
prompt_tokens=$(echo "$response" | grep -o '"prompt_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
completion_tokens=$(echo "$response" | grep -o '"completion_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
total_tokens=$(echo "$response" | grep -o '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
# Build simplified response
if [ -n "$content" ]; then
# Unescape JSON content (basic unescaping)
content=$(echo "$content" | sed 's/\\n/\n/g' | sed 's/\\"/"/g')
cat <<EOF
{
"model": "$model",
"content": "$content",
"usage": {
"prompt_tokens": ${prompt_tokens:-0},
"completion_tokens": ${completion_tokens:-0},
"total_tokens": ${total_tokens:-0}
}
}
EOF
else
# If we can't extract content, return the full response
echo "$response"
fi