mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 00:57:41 +08:00
feat(vision): prioritize main model for image recognition with multi-provider fallback
- Add call_vision method to all bot implementations (DashScope, Claude, Gemini, ZhipuAI, MiniMax, Doubao, Moonshot, OpenAICompatibleBot) using each vendor's native multimodal API format - Remove call_with_tools/call_vision from Bot base class to fix MRO shadowing issue with OpenAICompatibleBot mixin - Refactor vision tool provider resolution: MainModel → other configured models (auto-discovered) → OpenAI → LinkAI, with automatic fallback - Return actual model name used in call_vision responses - Sync config.json API keys to .env bidirectionally on startup - Fix bot instance cache to detect bot_type/use_linkai config changes - Add SSE reconnection support for web console - Preserve image path hints in Gemini text for correct vision tool calls - Update docs/tools/vision.mdx
This commit is contained in:
72
docs/en/tools/vision.mdx
Normal file
72
docs/en/tools/vision.mdx
Normal file
@@ -0,0 +1,72 @@
|
||||
---
|
||||
title: vision - Image Analysis
|
||||
description: Analyze image content (recognition, description, OCR, etc.)
|
||||
---
|
||||
|
||||
Analyze local images or image URLs using Vision API. Supports content description, text extraction (OCR), object recognition, and more.
|
||||
|
||||
## Model Selection
|
||||
|
||||
The vision tool uses a multi-level auto-selection strategy with automatic fallback — no manual configuration required:
|
||||
|
||||
1. **Main model** — uses the currently configured main model for image recognition (zero extra cost)
|
||||
2. **Other configured models** — auto-discovers other models with configured API keys as alternatives
|
||||
3. **OpenAI** — uses `open_ai_api_key` to call gpt-4.1-mini
|
||||
4. **LinkAI** — uses `linkai_api_key` to call LinkAI vision service
|
||||
|
||||
When `use_linkai=true`, LinkAI is promoted to the highest priority.
|
||||
|
||||
If the current provider fails, the tool automatically tries the next one until it succeeds or all fail.
|
||||
|
||||
### Supported Models
|
||||
|
||||
| Vendor | Vision Model | Notes |
|
||||
| --- | --- | --- |
|
||||
| OpenAI / Compatible | Main model | All OpenAI-compatible multimodal models |
|
||||
| Qwen (DashScope) | Main model | Via MultiModalConversation API |
|
||||
| Claude | Main model | Anthropic native image format |
|
||||
| Gemini | Main model | inlineData format |
|
||||
| Doubao | Main model | doubao-seed-2-0 series natively supported |
|
||||
| Kimi (Moonshot) | Main model | kimi-k2.5 natively supported |
|
||||
| ZhipuAI | glm-5v-turbo | Always uses dedicated vision model |
|
||||
| MiniMax | MiniMax-Text-01 | Always uses dedicated vision model |
|
||||
|
||||
<Note>
|
||||
ZhipuAI and MiniMax text models do not support image understanding, so their dedicated vision models are always used automatically.
|
||||
</Note>
|
||||
|
||||
## Parameters
|
||||
|
||||
| Parameter | Type | Required | Description |
|
||||
| --- | --- | --- | --- |
|
||||
| `image` | string | Yes | Local file path or HTTP(S) image URL |
|
||||
| `question` | string | Yes | Question to ask about the image |
|
||||
|
||||
Supported image formats: jpg, jpeg, png, gif, webp
|
||||
|
||||
## Custom Configuration
|
||||
|
||||
To specify a particular model for the vision tool, add to `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"tool": {
|
||||
"vision": {
|
||||
"model": "gpt-4o"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
In most cases no configuration is needed. The tool works automatically as long as the main model supports multimodal input or any vision-capable API key is configured.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- Describe image content
|
||||
- Extract text from images (OCR)
|
||||
- Identify objects, colors, scenes
|
||||
- Analyze screenshots and scanned documents
|
||||
|
||||
<Note>
|
||||
Images larger than 1MB are automatically compressed (max edge 1536px). All images (including remote URLs) are converted to base64 for transmission to ensure compatibility with all model backends.
|
||||
</Note>
|
||||
Reference in New Issue
Block a user