mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 00:57:41 +08:00
feat: add skills and upgrade feishu/dingtalk channel
This commit is contained in:
297
skills/linkai-agent/README.md
Normal file
297
skills/linkai-agent/README.md
Normal file
@@ -0,0 +1,297 @@
|
||||
# LinkAI Agent Skill
|
||||
|
||||
这个 skill 允许你调用 LinkAI 平台上的多个应用(App)和工作流(Workflow),通过简单的配置即可集成多个智能体能力。
|
||||
|
||||
## 特性
|
||||
|
||||
- ✅ **多应用支持** - 在一个配置文件中管理多个 LinkAI 应用/工作流
|
||||
- ✅ **动态加载** - skill 系统加载时自动从 `config.json` 读取应用列表
|
||||
- ✅ **自动技能描述** - 所有配置的应用会自动添加到技能描述中
|
||||
- ✅ **模型切换** - 可以为每个请求指定不同的模型
|
||||
- ✅ **知识库集成** - 支持应用绑定的知识库
|
||||
- ✅ **插件能力** - 支持应用启用的各类插件
|
||||
- ✅ **工作流执行** - 支持执行复杂的多步骤工作流
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 1. 配置 API Key
|
||||
|
||||
```bash
|
||||
env_config(action="set", key="LINKAI_API_KEY", value="your-linkai-api-key")
|
||||
```
|
||||
|
||||
获取 API Key: https://link-ai.tech/console/interface
|
||||
|
||||
### 2. 配置应用列表
|
||||
|
||||
将 `config.json.template` 复制为 `config.json`:
|
||||
|
||||
```bash
|
||||
cp config.json.template config.json
|
||||
```
|
||||
|
||||
编辑 `config.json`,添加你的应用/工作流:
|
||||
|
||||
```json
|
||||
{
|
||||
"apps": [
|
||||
{
|
||||
"app_code": "G7z6vKwp",
|
||||
"app_name": "通用助手",
|
||||
"app_description": "通用AI助手,可以回答各类问题"
|
||||
},
|
||||
{
|
||||
"app_code": "your_kb_app",
|
||||
"app_name": "产品文档助手",
|
||||
"app_description": "基于产品文档知识库的问答助手"
|
||||
},
|
||||
{
|
||||
"app_code": "your_workflow",
|
||||
"app_name": "数据分析工作流",
|
||||
"app_description": "执行数据清洗、分析和可视化的完整工作流"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**注意:** 修改 `config.json` 后,Agent 在下次加载技能时会自动读取新配置。
|
||||
|
||||
### 3. 调用应用
|
||||
|
||||
```bash
|
||||
bash scripts/call.sh "G7z6vKwp" "What is artificial intelligence?"
|
||||
```
|
||||
|
||||
## 使用示例
|
||||
|
||||
### 基础调用
|
||||
|
||||
```bash
|
||||
# 调用默认模型
|
||||
bash scripts/call.sh "G7z6vKwp" "解释一下量子计算"
|
||||
```
|
||||
|
||||
### 指定模型
|
||||
|
||||
```bash
|
||||
# 使用 GPT-4.1 模型
|
||||
bash scripts/call.sh "G7z6vKwp" "写一篇关于AI的文章" "LinkAI-4.1"
|
||||
|
||||
# 使用 DeepSeek 模型
|
||||
bash scripts/call.sh "G7z6vKwp" "帮我写代码" "deepseek-chat"
|
||||
|
||||
# 使用 Claude 模型
|
||||
bash scripts/call.sh "G7z6vKwp" "分析这段文本" "claude-4-sonnet"
|
||||
```
|
||||
|
||||
### 调用工作流
|
||||
|
||||
```bash
|
||||
# 工作流会按照配置的节点顺序执行
|
||||
bash scripts/call.sh "workflow_code" "输入数据或问题"
|
||||
```
|
||||
|
||||
## ⚠️ 重要提示
|
||||
|
||||
### 超时配置
|
||||
|
||||
LinkAI 应用(特别是视频/图片生成、复杂工作流)可能需要较长时间处理。
|
||||
|
||||
**脚本内置超时**:
|
||||
- 默认:120 秒(适合大多数场景)
|
||||
- 可通过第 5 个参数自定义:`bash scripts/call.sh <app_code> <question> "" "false" "180"`
|
||||
|
||||
**推荐超时时间**:
|
||||
- **文本问答**:120 秒(默认)
|
||||
- **图片生成**:120-180 秒
|
||||
- **视频生成**:180-300 秒
|
||||
|
||||
Agent 调用时会自动设置合适的超时时间。
|
||||
|
||||
## 配置说明
|
||||
|
||||
### config.json 字段
|
||||
|
||||
| 字段 | 类型 | 说明 |
|
||||
|------|------|------|
|
||||
| `app_code` | string | 应用或工作流的唯一标识码,从 LinkAI 控制台获取 |
|
||||
| `app_name` | string | 应用名称,会显示在技能描述中 |
|
||||
| `app_description` | string | 应用功能描述,帮助 Agent 理解何时使用该应用 |
|
||||
|
||||
### 获取 app_code
|
||||
|
||||
1. 登录 [LinkAI 控制台](https://link-ai.tech/console)
|
||||
2. 进入「应用管理」或「工作流管理」
|
||||
3. 选择要集成的应用/工作流
|
||||
4. 在应用详情页找到 `app_code`
|
||||
|
||||
## 支持的模型
|
||||
|
||||
LinkAI 支持多种主流 AI 模型:
|
||||
|
||||
**OpenAI 系列:**
|
||||
- `LinkAI-4.1` - GPT-4.1 (1000K 上下文)
|
||||
- `LinkAI-4.1-mini` - GPT-4.1 mini (1000K)
|
||||
- `LinkAI-4.1-nano` - GPT-4.1 nano (1000K)
|
||||
- `LinkAI-4o` - GPT-4o (128K)
|
||||
- `LinkAI-4o-mini` - GPT-4o mini (128K)
|
||||
|
||||
**DeepSeek 系列:**
|
||||
- `deepseek-chat` - DeepSeek-V3 对话模型 (64K)
|
||||
- `deepseek-reasoner` - DeepSeek-R1 推理模型 (64K)
|
||||
|
||||
**Claude 系列:**
|
||||
- `claude-4-sonnet` - Claude 4 Sonnet (200K)
|
||||
- `claude-3-7-sonnet` - Claude 3.7 (200K)
|
||||
- `claude-3-5-sonnet` - Claude 3.5 (200K)
|
||||
|
||||
**Google 系列:**
|
||||
- `gemini-2.5-pro` - Gemini 2.5 Pro (1000K)
|
||||
- `gemini-2.0-flash` - Gemini 2.0 Flash (1000K)
|
||||
|
||||
**国产模型:**
|
||||
- `qwen3` - 通义千问3 (128K)
|
||||
- `wenxin-4.5` - 文心一言4.5 (8K)
|
||||
- `doubao-1.5-pro-256k` - 豆包1.5 (256K)
|
||||
- `glm-4-plus` - 智谱GLM-4-Plus (4K)
|
||||
|
||||
完整模型列表:https://link-ai.tech/console/models
|
||||
|
||||
## 应用类型
|
||||
|
||||
### 1. 普通应用
|
||||
|
||||
配置了系统提示词和参数的标准对话应用,可以:
|
||||
- 设置角色和性格
|
||||
- 绑定知识库
|
||||
- 启用插件(图像识别、网页搜索、代码执行等)
|
||||
|
||||
### 2. 知识库应用
|
||||
|
||||
基于特定知识库的问答应用,适合:
|
||||
- 企业内部知识库
|
||||
- 产品文档问答
|
||||
- 客户支持
|
||||
|
||||
### 3. 工作流
|
||||
|
||||
多步骤的自动化流程,可以:
|
||||
- 串联多个处理节点
|
||||
- 条件分支
|
||||
- 循环处理
|
||||
- 调用外部 API
|
||||
|
||||
## 响应格式
|
||||
|
||||
### 成功响应
|
||||
|
||||
```json
|
||||
{
|
||||
"app_code": "G7z6vKwp",
|
||||
"content": "人工智能(AI)是计算机科学的一个分支...",
|
||||
"usage": {
|
||||
"prompt_tokens": 10,
|
||||
"completion_tokens": 150,
|
||||
"total_tokens": 160
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 错误响应
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "LinkAI API error",
|
||||
"message": "应用不存在",
|
||||
"response": { ... }
|
||||
}
|
||||
```
|
||||
|
||||
## 常见错误
|
||||
|
||||
### LINKAI_API_KEY environment variable is not set
|
||||
**原因:** 未配置 API Key
|
||||
**解决:** 使用 `env_config` 工具设置 LINKAI_API_KEY
|
||||
|
||||
### 应用不存在 (402)
|
||||
**原因:** app_code 不正确或应用已删除
|
||||
**解决:** 检查 app_code 是否正确,确认应用存在
|
||||
|
||||
### 无访问权限 (403)
|
||||
**原因:** 尝试访问他人的私有应用
|
||||
**解决:** 确保应用是公开的或你是创建者
|
||||
|
||||
### 账号积分额度不足 (406)
|
||||
**原因:** LinkAI 账户余额不足
|
||||
**解决:** 前往控制台充值
|
||||
|
||||
### 内容审核不通过 (409)
|
||||
**原因:** 请求或响应包含敏感内容
|
||||
**解决:** 修改输入内容,避免敏感词
|
||||
|
||||
## 技术实现
|
||||
|
||||
### 自动技能描述生成
|
||||
|
||||
当 skill 系统加载 `linkai-agent` 时,会自动:
|
||||
1. 读取 `config.json` 中的应用列表
|
||||
2. 将每个应用的 name 和 description 动态添加到技能描述中
|
||||
3. Agent 加载时会看到完整的应用列表
|
||||
|
||||
这是在 `agent/skills/loader.py` 中实现的特殊处理。
|
||||
|
||||
### 工作流程
|
||||
|
||||
```
|
||||
用户配置 config.json
|
||||
↓
|
||||
Agent 启动/重新加载技能
|
||||
↓
|
||||
SkillLoader 检测到 linkai-agent
|
||||
↓
|
||||
动态读取 config.json
|
||||
↓
|
||||
生成包含所有应用描述的 description
|
||||
↓
|
||||
Agent 看到所有可用应用的完整信息
|
||||
↓
|
||||
用户请求触发
|
||||
↓
|
||||
Agent 根据描述选择合适的应用
|
||||
↓
|
||||
调用 call.sh <app_code> <question>
|
||||
↓
|
||||
LinkAI API 处理并返回结果
|
||||
```
|
||||
|
||||
## 最佳实践
|
||||
|
||||
1. **清晰的描述** - 为每个应用写清晰、具体的描述,帮助 Agent 理解应用用途
|
||||
2. **合理分工** - 不同应用负责不同领域,避免功能重叠
|
||||
3. **无需重启** - 修改 config.json 后,Agent 下次加载技能时会自动更新
|
||||
4. **模型选择** - 根据任务复杂度选择合适的模型
|
||||
5. **知识库优化** - 为专业领域的应用绑定相关知识库
|
||||
|
||||
## 扩展用法
|
||||
|
||||
### 在 Agent 系统中使用
|
||||
|
||||
当 Agent 系统加载这个 skill 时,会自动从 `config.json` 读取应用列表并生成描述:
|
||||
|
||||
```
|
||||
Call LinkAI apps/workflows. 通用助手(G7z6vKwp: 通用AI助手,可以回答各类问题); 产品文档助手(kb_app_001: 基于产品文档知识库的问答助手); 数据分析工作流(wf_002: 执行数据清洗、分析和可视化的完整工作流)
|
||||
```
|
||||
|
||||
Agent 会根据用户问题自动选择最合适的应用进行调用。
|
||||
|
||||
## 相关链接
|
||||
|
||||
- LinkAI 平台: https://link-ai.tech
|
||||
- API 文档: https://docs.link-ai.tech
|
||||
- 控制台: https://link-ai.tech/console
|
||||
- 模型列表: https://link-ai.tech/console/models
|
||||
- 应用广场: https://link-ai.tech/square
|
||||
|
||||
## License
|
||||
|
||||
Part of the chatgpt-on-wechat project.
|
||||
165
skills/linkai-agent/SKILL.md
Normal file
165
skills/linkai-agent/SKILL.md
Normal file
@@ -0,0 +1,165 @@
|
||||
---
|
||||
name: linkai-agent
|
||||
description: Call LinkAI applications and workflows. Use bash command to execute like 'bash <base_dir>/scripts/call.sh <app_code> <question>'.
|
||||
homepage: https://link-ai.tech
|
||||
metadata:
|
||||
emoji: 🤖
|
||||
requires:
|
||||
bins: ["curl"]
|
||||
env: ["LINKAI_API_KEY"]
|
||||
primaryEnv: "LINKAI_API_KEY"
|
||||
---
|
||||
|
||||
# LinkAI Agent Caller
|
||||
|
||||
Call LinkAI applications and workflows through API. Supports multiple apps/workflows configured in config.json.
|
||||
|
||||
The available apps are dynamically loaded from `config.json` at skill loading time.
|
||||
|
||||
## Setup
|
||||
|
||||
This skill requires a LinkAI API key. If not configured:
|
||||
|
||||
1. Get your API key from https://link-ai.tech/console/api-keys
|
||||
2. Set the key using: `env_config(action="set", key="LINKAI_API_KEY", value="your-key")`
|
||||
|
||||
## Configuration
|
||||
|
||||
1. Copy `config.json.template` to `config.json`
|
||||
2. Configure your apps/workflows:
|
||||
|
||||
```json
|
||||
{
|
||||
"apps": [
|
||||
{
|
||||
"app_code": "your_app_code",
|
||||
"app_name": "App Name",
|
||||
"app_description": "What this app does"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
3. The skill description will be automatically updated when the agent loads this skill
|
||||
|
||||
## Usage
|
||||
|
||||
**Important**: Scripts are located relative to this skill's base directory.
|
||||
|
||||
When you see this skill in `<available_skills>`, note the `<base_dir>` path.
|
||||
|
||||
**CRITICAL**: Always use `bash` command to execute the script:
|
||||
|
||||
```bash
|
||||
# General pattern (MUST start with bash):
|
||||
bash "<base_dir>/scripts/call.sh" "<app_code>" "<question>" [model] [stream] [timeout]
|
||||
|
||||
# DO NOT execute the script directly like this (WRONG):
|
||||
# "<base_dir>/scripts/call.sh" ...
|
||||
|
||||
# Parameters:
|
||||
# - app_code: LinkAI app or workflow code (required)
|
||||
# - question: User question (required)
|
||||
# - model: Override model (optional, uses app default if not specified)
|
||||
# - stream: Enable streaming (true/false, default: false)
|
||||
# - timeout: curl timeout in seconds (default: 120, recommended for video/image generation)
|
||||
```
|
||||
|
||||
**IMPORTANT - Timeout Configuration**:
|
||||
- The script has a **default timeout of 120 seconds** (suitable for most cases)
|
||||
- For complex tasks (video generation, large workflows), pass a longer timeout as the 5th parameter
|
||||
- The bash tool also needs sufficient timeout - set its `timeout` parameter accordingly
|
||||
- Example: `bash(command="bash <script> <app_code> <question> '' 'false' 180", timeout=200)`
|
||||
|
||||
## Examples
|
||||
|
||||
### Call an app (uses default 60s timeout)
|
||||
```bash
|
||||
bash(command='bash "<base_dir>/scripts/call.sh" "G7z6vKwp" "What is AI?"', timeout=60)
|
||||
```
|
||||
|
||||
### Call an app with specific model
|
||||
```bash
|
||||
bash(command='bash "<base_dir>/scripts/call.sh" "G7z6vKwp" "Explain machine learning" "LinkAI-4.1"', timeout=60)
|
||||
```
|
||||
|
||||
### Call a workflow with custom timeout (video generation)
|
||||
```bash
|
||||
# Pass timeout as 5th parameter to script, and set bash timeout slightly longer
|
||||
bash(command='bash "<base_dir>/scripts/call.sh" "workflow_code" "Generate a sunset video" "" "false" "180"', timeout=180)
|
||||
```
|
||||
```bash
|
||||
bash "<base_dir>/scripts/call.sh" "workflow_code" "Analyze this data: ..."
|
||||
```
|
||||
|
||||
## Supported Models
|
||||
|
||||
You can specify any LinkAI supported model:
|
||||
- `LinkAI-4.1` - Latest GPT-4.1 model (1000K context)
|
||||
- `LinkAI-4.1-mini` - GPT-4.1 mini (1000K context)
|
||||
- `LinkAI-4o` - GPT-4o model (128K context)
|
||||
- `LinkAI-4o-mini` - GPT-4o mini (128K context)
|
||||
- `deepseek-chat` - DeepSeek-V3 (64K context)
|
||||
- `deepseek-reasoner` - DeepSeek-R1 reasoning model
|
||||
- `claude-4-sonnet` - Claude 4 Sonnet (200K context)
|
||||
- `gemini-2.5-pro` - Gemini 2.5 Pro (1000K context)
|
||||
- And many more...
|
||||
|
||||
Full model list: https://link-ai.tech/console/models
|
||||
|
||||
## Response Format
|
||||
|
||||
Success response:
|
||||
```json
|
||||
{
|
||||
"app_code": "G7z6vKwp",
|
||||
"content": "AI stands for Artificial Intelligence...",
|
||||
"usage": {
|
||||
"prompt_tokens": 10,
|
||||
"completion_tokens": 50,
|
||||
"total_tokens": 60
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Error response:
|
||||
```json
|
||||
{
|
||||
"error": "Error description",
|
||||
"message": "Detailed error message"
|
||||
}
|
||||
```
|
||||
|
||||
## Features
|
||||
|
||||
- ✅ **Multiple Apps**: Configure and call multiple LinkAI apps/workflows
|
||||
- ✅ **Dynamic Loading**: Apps are loaded from config.json at runtime
|
||||
- ✅ **Model Override**: Optionally specify model per request
|
||||
- ✅ **Streaming Support**: Enable streaming output
|
||||
- ✅ **Knowledge Base**: Apps can use configured knowledge bases
|
||||
- ✅ **Plugins**: Apps can use enabled plugins (image recognition, web search, etc.)
|
||||
- ✅ **Workflows**: Execute complex multi-step workflows
|
||||
|
||||
## Notes
|
||||
|
||||
- Each app/workflow maintains its own configuration (prompt, model, temperature, etc.)
|
||||
- Apps can have knowledge bases attached for domain-specific Q&A
|
||||
- Workflows execute from start node to end node and return final output
|
||||
- Token usage and costs depend on the model used
|
||||
- See LinkAI documentation for pricing: https://link-ai.tech/console/funds
|
||||
- The skill description is automatically generated from config.json when loaded
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"LINKAI_API_KEY environment variable is not set"**
|
||||
- Use env_config tool to set the API key
|
||||
|
||||
**"app_code is required"**
|
||||
- Make sure you're passing the app_code as the first parameter
|
||||
|
||||
**"应用不存在" (App not found)**
|
||||
- Check that the app_code is correct
|
||||
- Ensure you have access to the app
|
||||
|
||||
**"账号积分额度不足" (Insufficient credits)**
|
||||
- Top up your LinkAI account credits
|
||||
14
skills/linkai-agent/config.json.template
Normal file
14
skills/linkai-agent/config.json.template
Normal file
@@ -0,0 +1,14 @@
|
||||
{
|
||||
"apps": [
|
||||
{
|
||||
"app_code": "your_app_code_2",
|
||||
"app_name": "知识库助手",
|
||||
"app_description": "基于特定领域知识库提供智能问答的知识助手"
|
||||
},
|
||||
{
|
||||
"app_code": "your_workflow_code",
|
||||
"app_name": "数据分析工作流",
|
||||
"app_description": "用于数据分析任务的工作流程"
|
||||
}
|
||||
]
|
||||
}
|
||||
138
skills/linkai-agent/scripts/call.sh
Executable file
138
skills/linkai-agent/scripts/call.sh
Executable file
@@ -0,0 +1,138 @@
|
||||
#!/usr/bin/env bash
|
||||
# LinkAI Agent Caller
|
||||
# API Docs: https://api.link-ai.tech/v1/chat/completions
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
app_code="${1:-}"
|
||||
question="${2:-}"
|
||||
model="${3:-}"
|
||||
stream="${4:-false}"
|
||||
timeout="${5:-120}" # Default 120 seconds for video/image generation
|
||||
|
||||
if [ -z "$app_code" ]; then
|
||||
echo '{"error": "app_code is required", "usage": "bash call.sh <app_code> <question> [model] [stream] [timeout]"}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "$question" ]; then
|
||||
echo '{"error": "question is required", "usage": "bash call.sh <app_code> <question> [model] [stream] [timeout]"}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "${LINKAI_API_KEY:-}" ]; then
|
||||
echo '{"error": "LINKAI_API_KEY environment variable is not set", "help": "Use env_config to set LINKAI_API_KEY"}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# API endpoint
|
||||
api_url="https://api.link-ai.tech/v1/chat/completions"
|
||||
|
||||
# Build JSON request body
|
||||
if [ -n "$model" ]; then
|
||||
request_body=$(cat <<EOF
|
||||
{
|
||||
"app_code": "$app_code",
|
||||
"model": "$model",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "$question"
|
||||
}
|
||||
],
|
||||
"stream": $stream
|
||||
}
|
||||
EOF
|
||||
)
|
||||
else
|
||||
request_body=$(cat <<EOF
|
||||
{
|
||||
"app_code": "$app_code",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "$question"
|
||||
}
|
||||
],
|
||||
"stream": $stream
|
||||
}
|
||||
EOF
|
||||
)
|
||||
fi
|
||||
|
||||
# Call LinkAI API
|
||||
response=$(curl -sS --max-time "$timeout" \
|
||||
-X POST \
|
||||
-H "Authorization: Bearer $LINKAI_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$request_body" \
|
||||
"$api_url" 2>&1)
|
||||
|
||||
curl_exit_code=$?
|
||||
|
||||
if [ $curl_exit_code -ne 0 ]; then
|
||||
echo "{\"error\": \"Failed to call LinkAI API\", \"details\": \"$response\"}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Simple JSON validation
|
||||
if [[ ! "$response" =~ ^[[:space:]]*[\{\[] ]]; then
|
||||
echo "{\"error\": \"Invalid JSON response from API\", \"response\": \"$response\"}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check for API error (top-level error only, not content_filter_result)
|
||||
if echo "$response" | grep -q '^[[:space:]]*{[[:space:]]*"error"[[:space:]]*:' || echo "$response" | grep -q '"error"[[:space:]]*:[[:space:]]*{[^}]*"code"[[:space:]]*:[[:space:]]*"[^"]*"[^}]*"message"'; then
|
||||
# Make sure it's not just content_filter_result inside choices
|
||||
if ! echo "$response" | grep -q '"choices"[[:space:]]*:[[:space:]]*\['; then
|
||||
# Extract error message
|
||||
error_msg=$(echo "$response" | grep -o '"message"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"message"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
|
||||
error_code=$(echo "$response" | grep -o '"code"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"code"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
|
||||
|
||||
if [ -z "$error_msg" ]; then
|
||||
error_msg="Unknown API error"
|
||||
fi
|
||||
|
||||
# Provide friendly error message for content filter
|
||||
if [ "$error_code" = "content_filter_error" ] || echo "$error_msg" | grep -qi "content.*filter"; then
|
||||
echo "{\"error\": \"内容安全审核\", \"message\": \"您的问题或应用返回的内容触发了LinkAI的安全审核机制,请换一种方式提问或检查应用配置\", \"details\": \"$error_msg\"}"
|
||||
else
|
||||
echo "{\"error\": \"LinkAI API error\", \"message\": \"$error_msg\", \"code\": \"$error_code\"}"
|
||||
fi
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
# For non-stream mode, extract and format the response
|
||||
if [ "$stream" = "false" ]; then
|
||||
# Extract content from response
|
||||
content=$(echo "$response" | grep -o '"content"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"content"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
|
||||
|
||||
# Extract usage information
|
||||
prompt_tokens=$(echo "$response" | grep -o '"prompt_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
|
||||
completion_tokens=$(echo "$response" | grep -o '"completion_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
|
||||
total_tokens=$(echo "$response" | grep -o '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
|
||||
|
||||
if [ -n "$content" ]; then
|
||||
# Unescape JSON content
|
||||
content=$(echo "$content" | sed 's/\\n/\n/g' | sed 's/\\"/"/g')
|
||||
|
||||
cat <<EOF
|
||||
{
|
||||
"app_code": "$app_code",
|
||||
"content": "$content",
|
||||
"usage": {
|
||||
"prompt_tokens": ${prompt_tokens:-0},
|
||||
"completion_tokens": ${completion_tokens:-0},
|
||||
"total_tokens": ${total_tokens:-0}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
else
|
||||
# Return full response if we can't extract content
|
||||
echo "$response"
|
||||
fi
|
||||
else
|
||||
# For stream mode, return raw response (caller needs to handle streaming)
|
||||
echo "$response"
|
||||
fi
|
||||
168
skills/openai-image-vision/EXAMPLE.md
Normal file
168
skills/openai-image-vision/EXAMPLE.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# OpenAI Image Vision - Usage Examples
|
||||
|
||||
## Setup
|
||||
|
||||
Set up your API credentials using the agent's env_config tool:
|
||||
|
||||
```bash
|
||||
# Set your OpenAI API key
|
||||
env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
|
||||
|
||||
# Optional: Set custom API base URL (for proxy or compatible services)
|
||||
env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
|
||||
```
|
||||
|
||||
## Example 1: Analyze a Local Image
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
|
||||
```
|
||||
|
||||
**Expected Output:**
|
||||
```json
|
||||
{
|
||||
"model": "gpt-4.1-mini",
|
||||
"content": "The image shows a beautiful landscape with mountains in the background and a lake in the foreground. The sky is clear with some clouds, and there are trees along the shoreline.",
|
||||
"usage": {
|
||||
"prompt_tokens": 1234,
|
||||
"completion_tokens": 45,
|
||||
"total_tokens": 1279
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Example 2: Analyze an Image from URL
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image in detail"
|
||||
```
|
||||
|
||||
## Example 3: Extract Text (OCR)
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "document.png" "Extract all text from this image"
|
||||
```
|
||||
|
||||
**Use Case:** Extract text from screenshots, scanned documents, or photos of text.
|
||||
|
||||
## Example 4: Identify Objects
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "scene.jpg" "List all objects you can identify in this image"
|
||||
```
|
||||
|
||||
## Example 5: Analyze Colors and Composition
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "artwork.jpg" "Describe the color palette and composition of this image"
|
||||
```
|
||||
|
||||
## Example 6: Count Items
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "crowd.jpg" "How many people are in this image?"
|
||||
```
|
||||
|
||||
## Example 7: Use Different Models
|
||||
|
||||
```bash
|
||||
# Use gpt-4.1-mini (default, latest mini model)
|
||||
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
|
||||
|
||||
# Use gpt-4.1 (most capable, best for complex analysis)
|
||||
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
|
||||
|
||||
# Use gpt-4o-mini (previous mini model)
|
||||
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
|
||||
```
|
||||
|
||||
## Example 8: Complex Analysis
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "product.jpg" "Analyze this product image. Describe the product, its features, colors, and suggest what kind of marketing copy would work well for it."
|
||||
```
|
||||
|
||||
## Example 9: Safety and Content Moderation
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "content.jpg" "Is there any inappropriate or unsafe content in this image?"
|
||||
```
|
||||
|
||||
## Example 10: Technical Analysis
|
||||
|
||||
```bash
|
||||
bash scripts/vision.sh "diagram.png" "Explain what this technical diagram represents and how it works"
|
||||
```
|
||||
|
||||
## Integration with Agent
|
||||
|
||||
When the agent loads this skill, it will be available in the `<available_skills>` section. The agent can use it like:
|
||||
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "user_uploaded_image.jpg" "What's in this image?"
|
||||
```
|
||||
|
||||
The `<base_dir>` will be automatically provided by the skill system.
|
||||
|
||||
## Error Handling Examples
|
||||
|
||||
### Missing API Key
|
||||
```bash
|
||||
$ bash scripts/vision.sh "image.jpg" "What is this?"
|
||||
{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}
|
||||
```
|
||||
|
||||
### File Not Found
|
||||
```bash
|
||||
$ bash scripts/vision.sh "nonexistent.jpg" "What is this?"
|
||||
{"error": "Image file not found", "path": "nonexistent.jpg"}
|
||||
```
|
||||
|
||||
### Unsupported Format
|
||||
```bash
|
||||
$ bash scripts/vision.sh "file.bmp" "What is this?"
|
||||
{"error": "Unsupported image format", "extension": "bmp", "supported": ["jpg", "jpeg", "png", "gif", "webp"]}
|
||||
```
|
||||
|
||||
### Missing Parameters
|
||||
```bash
|
||||
$ bash scripts/vision.sh
|
||||
{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}
|
||||
```
|
||||
|
||||
## Tips for Best Results
|
||||
|
||||
1. **Be Specific**: Ask clear, specific questions about what you want to know
|
||||
2. **Image Quality**: Higher quality images generally produce better results
|
||||
3. **Model Selection**:
|
||||
- Use `gpt-4.1` for complex analysis requiring highest accuracy
|
||||
- Use `gpt-4.1-mini` (default) for most tasks - latest mini model with good balance
|
||||
4. **Text Extraction**: For OCR tasks, ensure text is clearly visible and not too small
|
||||
5. **Multiple Aspects**: You can ask about multiple things in one question
|
||||
6. **Context**: Provide context in your question if needed (e.g., "This is a medical scan, what do you see?")
|
||||
|
||||
## Performance Notes
|
||||
|
||||
- **Local Files**: Automatically base64-encoded, adds ~33% size overhead
|
||||
- **URLs**: Passed directly to API, no encoding overhead
|
||||
- **Timeout**: 60 seconds for API calls
|
||||
- **Max Tokens**: 1000 tokens for responses (configurable in script)
|
||||
- **Rate Limits**: Subject to your OpenAI API plan
|
||||
|
||||
## Supported Image Formats
|
||||
|
||||
✅ JPEG (`.jpg`, `.jpeg`)
|
||||
✅ PNG (`.png`)
|
||||
✅ GIF (`.gif`)
|
||||
✅ WebP (`.webp`)
|
||||
|
||||
❌ BMP, TIFF, SVG, and other formats are not supported
|
||||
|
||||
## Cost Considerations
|
||||
|
||||
Vision API calls cost more than text-only calls because they include image tokens. Costs vary by:
|
||||
- Model used (gpt-4.1 vs gpt-4.1-mini)
|
||||
- Image size and resolution
|
||||
- Length of response
|
||||
|
||||
Check OpenAI's pricing page for current rates: https://openai.com/pricing
|
||||
178
skills/openai-image-vision/README.md
Normal file
178
skills/openai-image-vision/README.md
Normal file
@@ -0,0 +1,178 @@
|
||||
# OpenAI Image Vision Skill
|
||||
|
||||
This skill enables image analysis using OpenAI's Vision API (GPT-4 Vision models).
|
||||
|
||||
## Features
|
||||
|
||||
- ✅ Analyze images from local files or URLs
|
||||
- ✅ Support for multiple image formats (JPEG, PNG, GIF, WebP)
|
||||
- ✅ Automatic base64 encoding for local files
|
||||
- ✅ Direct URL passing for remote images
|
||||
- ✅ Configurable model selection
|
||||
- ✅ Custom API base URL support
|
||||
- ✅ Pure bash/curl implementation (no Python dependencies)
|
||||
|
||||
## Quick Start
|
||||
|
||||
1. **Set up API credentials using env_config:**
|
||||
```bash
|
||||
env_config(action="set", key="OPENAI_API_KEY", value="sk-your-api-key-here")
|
||||
# Optional: custom API base
|
||||
env_config(action="set", key="OPENAI_API_BASE", value="https://api.openai.com/v1")
|
||||
```
|
||||
|
||||
2. **Analyze an image:**
|
||||
```bash
|
||||
bash scripts/vision.sh "/path/to/photo.jpg" "What's in this image?"
|
||||
```
|
||||
|
||||
3. **Analyze from URL:**
|
||||
```bash
|
||||
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
|
||||
```
|
||||
```bash
|
||||
bash scripts/vision.sh "/path/to/image.jpg" "What's in this image?"
|
||||
```
|
||||
|
||||
3. **Analyze from URL:**
|
||||
```bash
|
||||
bash scripts/vision.sh "https://example.com/image.jpg" "Describe this image"
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic image analysis
|
||||
```bash
|
||||
bash scripts/vision.sh "photo.jpg" "What objects can you see?"
|
||||
```
|
||||
|
||||
### Text extraction (OCR)
|
||||
```bash
|
||||
bash scripts/vision.sh "document.png" "Extract all text from this image"
|
||||
```
|
||||
|
||||
### Detailed description
|
||||
```bash
|
||||
bash scripts/vision.sh "scene.jpg" "Describe this scene in detail, including colors, mood, and composition"
|
||||
```
|
||||
|
||||
### Using different models
|
||||
```bash
|
||||
# Use gpt-4.1-mini (default, latest mini model)
|
||||
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1-mini"
|
||||
|
||||
# Use gpt-4.1 (most capable, latest model)
|
||||
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4.1"
|
||||
|
||||
# Use gpt-4o-mini (previous mini model)
|
||||
bash scripts/vision.sh "image.jpg" "Analyze this" "gpt-4o-mini"
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `OPENAI_API_KEY` | Yes | - | Your OpenAI API key |
|
||||
| `OPENAI_API_BASE` | No | `https://api.openai.com/v1` | Custom API base URL |
|
||||
|
||||
## Response Format
|
||||
|
||||
Success response:
|
||||
```json
|
||||
{
|
||||
"model": "gpt-4.1-mini",
|
||||
"content": "The image shows a beautiful sunset over mountains...",
|
||||
"usage": {
|
||||
"prompt_tokens": 1234,
|
||||
"completion_tokens": 567,
|
||||
"total_tokens": 1801
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Error response:
|
||||
```json
|
||||
{
|
||||
"error": "Error description",
|
||||
"details": "Additional information"
|
||||
}
|
||||
```
|
||||
|
||||
## Supported Models
|
||||
|
||||
- `gpt-4.1-mini` (default) - Latest mini model, fast and cost-effective
|
||||
- `gpt-4.1` - Latest GPT-4 variant, most capable
|
||||
- `gpt-4o-mini` - Previous generation mini model
|
||||
- `gpt-4-turbo` - Previous generation turbo model
|
||||
|
||||
## Supported Image Formats
|
||||
|
||||
- JPEG (`.jpg`, `.jpeg`)
|
||||
- PNG (`.png`)
|
||||
- GIF (`.gif`)
|
||||
- WebP (`.webp`)
|
||||
|
||||
## Technical Details
|
||||
|
||||
- **Implementation**: Pure bash script using curl and base64
|
||||
- **Timeout**: 60 seconds for API calls
|
||||
- **Max tokens**: 1000 tokens for responses
|
||||
- **Image handling**:
|
||||
- Local files are automatically base64-encoded
|
||||
- URLs are passed directly to the API
|
||||
- MIME types are auto-detected from file extensions
|
||||
|
||||
## Error Handling
|
||||
|
||||
The script handles various error cases:
|
||||
- Missing required parameters
|
||||
- Missing API key
|
||||
- File not found
|
||||
- Unsupported image formats
|
||||
- API errors
|
||||
- Network timeouts
|
||||
- Invalid JSON responses
|
||||
|
||||
## Integration with Agent System
|
||||
|
||||
When loaded by the agent system, this skill will appear in `<available_skills>` with a `<base_dir>` path. Use it like:
|
||||
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "image.jpg" "What's in this image?"
|
||||
```
|
||||
|
||||
The agent will automatically:
|
||||
- Load environment variables from `~/.cow/.env`
|
||||
- Provide the correct `<base_dir>` path
|
||||
- Handle skill discovery and registration
|
||||
|
||||
## Notes
|
||||
|
||||
- Images are sent to OpenAI's servers for processing
|
||||
- Large images may be automatically resized by the API
|
||||
- Rate limits depend on your OpenAI API plan
|
||||
- Token usage includes both the image and text in the prompt
|
||||
- Base64 encoding increases the size of local images by ~33%
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**"OPENAI_API_KEY environment variable is not set"**
|
||||
- Set the environment variable using env_config tool
|
||||
- Or use the agent's env_config tool
|
||||
|
||||
**"Image file not found"**
|
||||
- Check the file path is correct
|
||||
- Use absolute paths or paths relative to current directory
|
||||
|
||||
**"Unsupported image format"**
|
||||
- Only JPEG, PNG, GIF, and WebP are supported
|
||||
- Check the file extension matches the actual format
|
||||
|
||||
**"Failed to call OpenAI API"**
|
||||
- Check your internet connection
|
||||
- Verify the API key is valid
|
||||
- Check if custom API base URL is correct
|
||||
|
||||
## License
|
||||
|
||||
Part of the chatgpt-on-wechat project.
|
||||
119
skills/openai-image-vision/SKILL.md
Normal file
119
skills/openai-image-vision/SKILL.md
Normal file
@@ -0,0 +1,119 @@
|
||||
---
|
||||
name: openai-image-vision
|
||||
description: Analyze images using OpenAI's Vision API. Use bash command to execute the vision script like 'bash <base_dir>/scripts/vision.sh <image> <question>'. Can understand image content, objects, text, colors, and answer questions about images.
|
||||
homepage: https://platform.openai.com/docs/guides/vision
|
||||
metadata:
|
||||
emoji: 👁️
|
||||
requires:
|
||||
bins: ["curl", "base64"]
|
||||
env: ["OPENAI_API_KEY"]
|
||||
primaryEnv: "OPENAI_API_KEY"
|
||||
---
|
||||
|
||||
# OpenAI Image Vision
|
||||
|
||||
Analyze images using OpenAI's GPT-4 Vision API. The model can understand visual elements including objects, shapes, colors, textures, and text within images.
|
||||
|
||||
## Setup
|
||||
|
||||
This skill requires an OpenAI API key. If not configured:
|
||||
|
||||
1. Get your API key from https://platform.openai.com/api-keys
|
||||
2. Set the key using: `env_config(action="set", key="OPENAI_API_KEY", value="your-key")`
|
||||
|
||||
Optional: Set custom API base URL (default: https://api.openai.com/v1):
|
||||
```bash
|
||||
env_config(action="set", key="OPENAI_API_BASE", value="your-base-url")
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
**Important**: Scripts are located relative to this skill's base directory.
|
||||
|
||||
When you see this skill in `<available_skills>`, note the `<base_dir>` path.
|
||||
|
||||
**CRITICAL**: Always use `bash` command to execute the script:
|
||||
|
||||
```bash
|
||||
# General pattern (MUST start with bash):
|
||||
bash "<base_dir>/scripts/vision.sh" "<image_path_or_url>" "<question>" [model]
|
||||
|
||||
# DO NOT execute the script directly like this (WRONG):
|
||||
# "<base_dir>/scripts/vision.sh" ...
|
||||
|
||||
# Parameters:
|
||||
# - image_path_or_url: Local image file path or HTTP(S) URL (required)
|
||||
# - question: Question to ask about the image (required)
|
||||
# - model: OpenAI model to use (default: gpt-4.1-mini)
|
||||
# Options: gpt-4.1-mini, gpt-4.1, gpt-4o-mini, gpt-4-turbo
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Analyze a local image
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "/path/to/image.jpg" "What's in this image?"
|
||||
```
|
||||
|
||||
### Analyze an image from URL
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "https://example.com/image.jpg" "Describe this image in detail"
|
||||
```
|
||||
|
||||
### Use specific model
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "/path/to/photo.png" "What colors are prominent?" "gpt-4o-mini"
|
||||
```
|
||||
|
||||
### Extract text from image
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "/path/to/document.jpg" "Extract all text from this image"
|
||||
```
|
||||
|
||||
### Analyze multiple aspects
|
||||
```bash
|
||||
bash "<base_dir>/scripts/vision.sh" "image.jpg" "List all objects you can see and describe the overall scene"
|
||||
```
|
||||
|
||||
## Supported Image Formats
|
||||
|
||||
- JPEG (.jpg, .jpeg)
|
||||
- PNG (.png)
|
||||
- GIF (.gif)
|
||||
- WebP (.webp)
|
||||
|
||||
**Performance Optimization**: Files larger than 1MB are automatically compressed to 800px (longest side) to avoid command-line parameter limits. This happens transparently without affecting analysis quality.
|
||||
|
||||
## Response Format
|
||||
|
||||
The script returns a JSON response:
|
||||
|
||||
```json
|
||||
{
|
||||
"model": "gpt-4.1-mini",
|
||||
"content": "The image shows...",
|
||||
"usage": {
|
||||
"prompt_tokens": 1234,
|
||||
"completion_tokens": 567,
|
||||
"total_tokens": 1801
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Or in case of error:
|
||||
|
||||
```json
|
||||
{
|
||||
"error": "Error description",
|
||||
"details": "Additional error information"
|
||||
}
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- **Image size**: Images are automatically resized if too large
|
||||
- **Timeout**: 60 seconds for API calls
|
||||
- **Rate limits**: Subject to your OpenAI API plan limits
|
||||
- **Privacy**: Images are sent to OpenAI's servers for processing
|
||||
- **Local files**: Automatically converted to base64 for API submission
|
||||
- **URLs**: Can be passed directly to the API without downloading
|
||||
233
skills/openai-image-vision/scripts/vision.sh
Executable file
233
skills/openai-image-vision/scripts/vision.sh
Executable file
@@ -0,0 +1,233 @@
|
||||
#!/usr/bin/env bash
|
||||
# OpenAI Vision API wrapper
|
||||
# API Docs: https://platform.openai.com/docs/guides/vision
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
image_input="${1:-}"
|
||||
question="${2:-}"
|
||||
model="${3:-gpt-4.1-mini}"
|
||||
|
||||
if [ -z "$image_input" ]; then
|
||||
echo '{"error": "Image path or URL is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "$question" ]; then
|
||||
echo '{"error": "Question is required", "usage": "bash vision.sh <image_path_or_url> <question> [model]"}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "${OPENAI_API_KEY:-}" ]; then
|
||||
echo '{"error": "OPENAI_API_KEY environment variable is not set", "help": "Visit https://platform.openai.com/api-keys to get an API key"}'
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Set API base URL (default to OpenAI's official endpoint)
|
||||
api_base="${OPENAI_API_BASE:-https://api.openai.com/v1}"
|
||||
# Remove trailing slash if present
|
||||
api_base="${api_base%/}"
|
||||
|
||||
# Determine if input is a URL or local file
|
||||
if [[ "$image_input" =~ ^https?:// ]]; then
|
||||
# It's a URL - use it directly
|
||||
image_url="$image_input"
|
||||
|
||||
# Build JSON request body with URL
|
||||
request_body=$(cat <<EOF
|
||||
{
|
||||
"model": "$model",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "$question"
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "$image_url"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"max_tokens": 1000
|
||||
}
|
||||
EOF
|
||||
)
|
||||
else
|
||||
# It's a local file - need to encode as base64
|
||||
if [ ! -f "$image_input" ]; then
|
||||
echo "{\"error\": \"Image file not found\", \"path\": \"$image_input\"}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check file size and compress if needed to avoid "Argument list too long" error
|
||||
# Files larger than 1MB should be compressed
|
||||
file_size=$(wc -c < "$image_input" | tr -d ' ')
|
||||
max_size=1048576 # 1MB
|
||||
|
||||
image_to_encode="$image_input"
|
||||
temp_compressed=""
|
||||
|
||||
if [ "$file_size" -gt "$max_size" ]; then
|
||||
# File is too large, compress it
|
||||
temp_compressed=$(mktemp "${TMPDIR:-/tmp}/vision_compressed_XXXXXX.jpg")
|
||||
|
||||
# Use sips (macOS) or convert (ImageMagick) to compress
|
||||
if command -v sips &> /dev/null; then
|
||||
# macOS: resize to max 800px on longest side
|
||||
sips -Z 800 "$image_input" --out "$temp_compressed" &> /dev/null
|
||||
if [ $? -eq 0 ]; then
|
||||
image_to_encode="$temp_compressed"
|
||||
>&2 echo "[vision.sh] Compressed large image ($(($file_size / 1024))KB) to avoid parameter limit"
|
||||
fi
|
||||
elif command -v convert &> /dev/null; then
|
||||
# Linux: use ImageMagick
|
||||
convert "$image_input" -resize 800x800\> "$temp_compressed" 2>/dev/null
|
||||
if [ $? -eq 0 ]; then
|
||||
image_to_encode="$temp_compressed"
|
||||
>&2 echo "[vision.sh] Compressed large image ($(($file_size / 1024))KB) to avoid parameter limit"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# Detect image format from file extension
|
||||
extension="${image_to_encode##*.}"
|
||||
extension_lower=$(echo "$extension" | tr '[:upper:]' '[:lower:]')
|
||||
|
||||
case "$extension_lower" in
|
||||
jpg|jpeg)
|
||||
mime_type="image/jpeg"
|
||||
;;
|
||||
png)
|
||||
mime_type="image/png"
|
||||
;;
|
||||
gif)
|
||||
mime_type="image/gif"
|
||||
;;
|
||||
webp)
|
||||
mime_type="image/webp"
|
||||
;;
|
||||
*)
|
||||
echo "{\"error\": \"Unsupported image format\", \"extension\": \"$extension\", \"supported\": [\"jpg\", \"jpeg\", \"png\", \"gif\", \"webp\"]}"
|
||||
# Clean up temp file if exists
|
||||
[ -n "$temp_compressed" ] && rm -f "$temp_compressed"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
|
||||
# Encode image to base64
|
||||
if command -v base64 &> /dev/null; then
|
||||
# macOS and most Linux systems
|
||||
base64_image=$(base64 -i "$image_to_encode" 2>/dev/null || base64 "$image_to_encode" 2>/dev/null)
|
||||
else
|
||||
echo '{"error": "base64 command not found", "help": "Please install base64 utility"}'
|
||||
# Clean up temp file if exists
|
||||
[ -n "$temp_compressed" ] && rm -f "$temp_compressed"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Clean up temp compressed file
|
||||
[ -n "$temp_compressed" ] && rm -f "$temp_compressed"
|
||||
|
||||
if [ -z "$base64_image" ]; then
|
||||
echo "{\"error\": \"Failed to encode image to base64\", \"path\": \"$image_input\"}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Escape question for JSON (replace " with \")
|
||||
escaped_question=$(echo "$question" | sed 's/"/\\"/g')
|
||||
|
||||
# Build JSON request body with base64 image
|
||||
# Note: Using printf to avoid issues with special characters
|
||||
request_body=$(cat <<EOF
|
||||
{
|
||||
"model": "$model",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": "$escaped_question"
|
||||
},
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": "data:$mime_type;base64,$base64_image"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"max_tokens": 1000
|
||||
}
|
||||
EOF
|
||||
)
|
||||
fi
|
||||
|
||||
# Call OpenAI API
|
||||
response=$(curl -sS --max-time 60 \
|
||||
-X POST \
|
||||
-H "Authorization: Bearer $OPENAI_API_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$request_body" \
|
||||
"$api_base/chat/completions" 2>&1)
|
||||
|
||||
curl_exit_code=$?
|
||||
|
||||
if [ $curl_exit_code -ne 0 ]; then
|
||||
echo "{\"error\": \"Failed to call OpenAI API\", \"details\": \"$response\"}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Simple JSON validation - check if response starts with { or [
|
||||
if [[ ! "$response" =~ ^[[:space:]]*[\{\[] ]]; then
|
||||
echo "{\"error\": \"Invalid JSON response from API\", \"response\": \"$response\"}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check for API error (look for "error" field in response)
|
||||
if echo "$response" | grep -q '"error"[[:space:]]*:[[:space:]]*{'; then
|
||||
# Extract error message if possible
|
||||
error_msg=$(echo "$response" | grep -o '"message"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"message"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
|
||||
if [ -z "$error_msg" ]; then
|
||||
error_msg="Unknown API error"
|
||||
fi
|
||||
echo "{\"error\": \"OpenAI API error\", \"message\": \"$error_msg\", \"response\": $response}"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Extract the content from the response
|
||||
# The response structure is: choices[0].message.content
|
||||
content=$(echo "$response" | grep -o '"content"[[:space:]]*:[[:space:]]*"[^"]*"' | sed 's/"content"[[:space:]]*:[[:space:]]*"\(.*\)"/\1/' | head -1)
|
||||
|
||||
# Extract usage information
|
||||
prompt_tokens=$(echo "$response" | grep -o '"prompt_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
|
||||
completion_tokens=$(echo "$response" | grep -o '"completion_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
|
||||
total_tokens=$(echo "$response" | grep -o '"total_tokens"[[:space:]]*:[[:space:]]*[0-9]*' | grep -o '[0-9]*' | head -1)
|
||||
|
||||
# Build simplified response
|
||||
if [ -n "$content" ]; then
|
||||
# Unescape JSON content (basic unescaping)
|
||||
content=$(echo "$content" | sed 's/\\n/\n/g' | sed 's/\\"/"/g')
|
||||
|
||||
cat <<EOF
|
||||
{
|
||||
"model": "$model",
|
||||
"content": "$content",
|
||||
"usage": {
|
||||
"prompt_tokens": ${prompt_tokens:-0},
|
||||
"completion_tokens": ${completion_tokens:-0},
|
||||
"total_tokens": ${total_tokens:-0}
|
||||
}
|
||||
}
|
||||
EOF
|
||||
else
|
||||
# If we can't extract content, return the full response
|
||||
echo "$response"
|
||||
fi
|
||||
Reference in New Issue
Block a user