Files
chatgpt-on-wechat/docs/skills/image-generation.mdx

99 lines
4.6 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: image-generation
description: Text-to-image / image-to-image / multi-image fusion with automatic multi-provider routing and fallback
---
A general-purpose image generation and editing skill supporting six providers: OpenAI, Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax, and LinkAI. Configure any one provider's key to start using it; configure multiple to enable automatic fallback.
## Supported Models
| Provider | Models / Aliases | Notes |
| --- | --- | --- |
| OpenAI | `gpt-image-2`, `gpt-image-1` | General-purpose, high quality, supports `quality` parameter |
| Gemini Nano Banana | `nano-banana-2`, `nano-banana-pro`, `nano-banana` | Corresponds to the image variants of `gemini-3.1-flash`, `gemini-3-pro`, `gemini-2.5-flash` |
| Seedream (Volcengine Ark) | `seedream-5.0-lite`, `seedream-4.5` | Native 2K4K, up to 14 reference images for fusion |
| Qwen (DashScope) | `qwen-image-2.0`, `qwen-image-2.0-pro` | Strong with Chinese text rendering and text-image layouts |
| MiniMax | `image-01` | Fast and simple |
| LinkAI | Any model | Universal gateway, used as fallback |
## Model Selection
By default, "auto routing + automatic fallback" is used:
1. Pick the first configured provider in the order `OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI`
2. On errors such as 401, model not enabled, or network issues, automatically switch to the next provider
3. If the user specifies a model in the conversation (e.g. "use seedream to draw a cat"), the corresponding provider is promoted to the front
To pin a specific model:
```json
{
"skills": {
"image-generation": {
"model": "seedream-5.0-lite"
}
}
}
```
## Configuring API Keys
<Tip>
It is recommended to configure providers from the "Model Management" page in the [Web console](/channels/web). Chat model keys configured there are automatically reused by the image generation skill — no need to set them twice. You can also edit the configuration file manually or temporarily set keys in a conversation using the `env_config` tool.
</Tip>
Credentials are shared with the main model providers:
| Field | Provider |
| --- | --- |
| `openai_api_key` | OpenAI |
| `gemini_api_key` | Gemini |
| `ark_api_key` | Volcengine Ark (Seedream) |
| `dashscope_api_key` | Alibaba DashScope (Qwen) |
| `minimax_api_key` | MiniMax |
| `linkai_api_key` | LinkAI |
## Enabling and Disabling
The skill automatically adjusts its status based on API keys:
- **Key configured**: the Agent calls the skill directly when it receives a drawing request
- **Key not configured**: the skill still appears in context (marked as "needs configuration") — the Agent will guide the user to set up a key
To control it manually:
```text
/skill disable image-generation # Disable
/skill enable image-generation # Re-enable
```
Equivalent terminal commands: `cow skill disable image-generation` / `cow skill enable image-generation`.
## Parameters
| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `prompt` | string | Yes | — | Image description |
| `image_url` | string / list | No | null | Input image for editing — local path or URL; pass a list for multi-image fusion |
| `quality` | string | No | auto | `low` / `medium` / `high`, supported only by some providers |
| `size` | string | No | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value like `1024x1024` |
| `aspect_ratio` | string | No | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9`; Gemini also supports `1:4` / `4:1` / `1:8` / `8:1` |
<Warning>
**Higher quality and larger size cost more and take longer.** For everyday conversations, use the defaults (`auto`) or `quality=low` + `size=1K` — about 20 seconds per image. For posters or when high resolution is explicitly requested, use `quality=high` + `size=2K/4K` — may take 15 minutes.
</Warning>
## Common Use Cases
- **Text-to-image**: generate illustrations, posters, icons, avatars, storyboards, etc. from a description
- **Image-to-image**: change styles, swap elements, add decorations or text on an existing image
- **Multi-image fusion**: combine multiple reference images into one (outfit swaps, character group photos, etc.)
<Note>
- Bash timeout should be set to 600 seconds: each provider has a 300-second HTTP timeout, and the script may try multiple providers sequentially
- Input images are automatically compressed to ≤ 4 MB with the longest edge ≤ 4096 px
- Gemini / Seedream / Qwen / MiniMax do not support the `quality` parameter
- Seedream defaults to 2K; `seedream-5.0-lite` supports up to 3K; `seedream-4.5` supports up to 4K
</Note>