mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 00:57:41 +08:00
99 lines
4.6 KiB
Plaintext
99 lines
4.6 KiB
Plaintext
---
|
||
title: image-generation
|
||
description: Text-to-image / image-to-image / multi-image fusion with automatic multi-provider routing and fallback
|
||
---
|
||
|
||
A general-purpose image generation and editing skill supporting six providers: OpenAI, Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax, and LinkAI. Configure any one provider's key to start using it; configure multiple to enable automatic fallback.
|
||
|
||
## Supported Models
|
||
|
||
| Provider | Models / Aliases | Notes |
|
||
| --- | --- | --- |
|
||
| OpenAI | `gpt-image-2`, `gpt-image-1` | General-purpose, high quality, supports `quality` parameter |
|
||
| Gemini Nano Banana | `nano-banana-2`, `nano-banana-pro`, `nano-banana` | Corresponds to the image variants of `gemini-3.1-flash`, `gemini-3-pro`, `gemini-2.5-flash` |
|
||
| Seedream (Volcengine Ark) | `seedream-5.0-lite`, `seedream-4.5` | Native 2K–4K, up to 14 reference images for fusion |
|
||
| Qwen (DashScope) | `qwen-image-2.0`, `qwen-image-2.0-pro` | Strong with Chinese text rendering and text-image layouts |
|
||
| MiniMax | `image-01` | Fast and simple |
|
||
| LinkAI | Any model | Universal gateway, used as fallback |
|
||
|
||
## Model Selection
|
||
|
||
By default, "auto routing + automatic fallback" is used:
|
||
|
||
1. Pick the first configured provider in the order `OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI`
|
||
2. On errors such as 401, model not enabled, or network issues, automatically switch to the next provider
|
||
3. If the user specifies a model in the conversation (e.g. "use seedream to draw a cat"), the corresponding provider is promoted to the front
|
||
|
||
To pin a specific model:
|
||
|
||
```json
|
||
{
|
||
"skills": {
|
||
"image-generation": {
|
||
"model": "seedream-5.0-lite"
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## Configuring API Keys
|
||
|
||
<Tip>
|
||
It is recommended to configure providers from the "Model Management" page in the [Web console](/channels/web). Chat model keys configured there are automatically reused by the image generation skill — no need to set them twice. You can also edit the configuration file manually or temporarily set keys in a conversation using the `env_config` tool.
|
||
</Tip>
|
||
|
||
Credentials are shared with the main model providers:
|
||
|
||
| Field | Provider |
|
||
| --- | --- |
|
||
| `openai_api_key` | OpenAI |
|
||
| `gemini_api_key` | Gemini |
|
||
| `ark_api_key` | Volcengine Ark (Seedream) |
|
||
| `dashscope_api_key` | Alibaba DashScope (Qwen) |
|
||
| `minimax_api_key` | MiniMax |
|
||
| `linkai_api_key` | LinkAI |
|
||
|
||
|
||
## Enabling and Disabling
|
||
|
||
The skill automatically adjusts its status based on API keys:
|
||
|
||
- **Key configured**: the Agent calls the skill directly when it receives a drawing request
|
||
- **Key not configured**: the skill still appears in context (marked as "needs configuration") — the Agent will guide the user to set up a key
|
||
|
||
To control it manually:
|
||
|
||
```text
|
||
/skill disable image-generation # Disable
|
||
/skill enable image-generation # Re-enable
|
||
```
|
||
|
||
Equivalent terminal commands: `cow skill disable image-generation` / `cow skill enable image-generation`.
|
||
|
||
## Parameters
|
||
|
||
| Parameter | Type | Required | Default | Description |
|
||
| --- | --- | --- | --- | --- |
|
||
| `prompt` | string | Yes | — | Image description |
|
||
| `image_url` | string / list | No | null | Input image for editing — local path or URL; pass a list for multi-image fusion |
|
||
| `quality` | string | No | auto | `low` / `medium` / `high`, supported only by some providers |
|
||
| `size` | string | No | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value like `1024x1024` |
|
||
| `aspect_ratio` | string | No | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9`; Gemini also supports `1:4` / `4:1` / `1:8` / `8:1` |
|
||
|
||
<Warning>
|
||
**Higher quality and larger size cost more and take longer.** For everyday conversations, use the defaults (`auto`) or `quality=low` + `size=1K` — about 20 seconds per image. For posters or when high resolution is explicitly requested, use `quality=high` + `size=2K/4K` — may take 1–5 minutes.
|
||
</Warning>
|
||
|
||
## Common Use Cases
|
||
|
||
- **Text-to-image**: generate illustrations, posters, icons, avatars, storyboards, etc. from a description
|
||
- **Image-to-image**: change styles, swap elements, add decorations or text on an existing image
|
||
- **Multi-image fusion**: combine multiple reference images into one (outfit swaps, character group photos, etc.)
|
||
|
||
<Note>
|
||
- Bash timeout should be set to 600 seconds: each provider has a 300-second HTTP timeout, and the script may try multiple providers sequentially
|
||
- Input images are automatically compressed to ≤ 4 MB with the longest edge ≤ 4096 px
|
||
- Gemini / Seedream / Qwen / MiniMax do not support the `quality` parameter
|
||
- Seedream defaults to 2K; `seedream-5.0-lite` supports up to 3K; `seedream-4.5` supports up to 4K
|
||
</Note>
|