chatgpt-on-wechat/docs/skills/image-generation.mdx

---
title: image-generation
description: Text-to-image / image-to-image / multi-image fusion with automatic multi-provider routing and fallback
---

A general-purpose image generation and editing skill supporting six providers: OpenAI, Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax, and LinkAI. Configure any one provider's key to start using it; configure multiple to enable automatic fallback.

## Supported Models

| Provider | Models / Aliases | Notes |
| --- | --- | --- |
| OpenAI | `gpt-image-2`, `gpt-image-1` | General-purpose, high quality, supports `quality` parameter |
| Gemini Nano Banana | `nano-banana-2`, `nano-banana-pro`, `nano-banana` | Corresponds to the image variants of `gemini-3.1-flash`, `gemini-3-pro`, `gemini-2.5-flash` |
| Seedream (Volcengine Ark) | `seedream-5.0-lite`, `seedream-4.5` | Native 2K–4K, up to 14 reference images for fusion |
| Qwen (DashScope) | `qwen-image-2.0`, `qwen-image-2.0-pro` | Strong with Chinese text rendering and text-image layouts |
| MiniMax | `image-01` | Fast and simple |
| LinkAI | Any model | Universal gateway, used as fallback |

## Model Selection

By default, "auto routing + automatic fallback" is used:

1. Pick the first configured provider in the order `OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI`
2. On errors such as 401, model not enabled, or network issues, automatically switch to the next provider
3. If the user specifies a model in the conversation (e.g. "use seedream to draw a cat"), the corresponding provider is promoted to the front

To pin a specific model:

```json
{
  "skills": {
    "image-generation": {
      "model": "seedream-5.0-lite"
    }
  }
}
```

## Configuring API Keys

<Tip>
  It is recommended to configure providers from the "Model Management" page in the [Web console](/channels/web). Chat model keys configured there are automatically reused by the image generation skill — no need to set them twice. You can also edit the configuration file manually or temporarily set keys in a conversation using the `env_config` tool.
</Tip>

Credentials are shared with the main model providers:

| Field | Provider |
| --- | --- |
| `openai_api_key` | OpenAI |
| `gemini_api_key` | Gemini |
| `ark_api_key` | Volcengine Ark (Seedream) |
| `dashscope_api_key` | Alibaba DashScope (Qwen) |
| `minimax_api_key` | MiniMax |
| `linkai_api_key` | LinkAI |


## Enabling and Disabling

The skill automatically adjusts its status based on API keys:

- **Key configured**: the Agent calls the skill directly when it receives a drawing request
- **Key not configured**: the skill still appears in context (marked as "needs configuration") — the Agent will guide the user to set up a key

To control it manually:

```text
/skill disable image-generation    # Disable
/skill enable image-generation     # Re-enable
```

Equivalent terminal commands: `cow skill disable image-generation` / `cow skill enable image-generation`.

## Parameters

| Parameter | Type | Required | Default | Description |
| --- | --- | --- | --- | --- |
| `prompt` | string | Yes | — | Image description |
| `image_url` | string / list | No | null | Input image for editing — local path or URL; pass a list for multi-image fusion |
| `quality` | string | No | auto | `low` / `medium` / `high`, supported only by some providers |
| `size` | string | No | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value like `1024x1024` |
| `aspect_ratio` | string | No | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9`; Gemini also supports `1:4` / `4:1` / `1:8` / `8:1` |

<Warning>
  **Higher quality and larger size cost more and take longer.** For everyday conversations, use the defaults (`auto`) or `quality=low` + `size=1K` — about 20 seconds per image. For posters or when high resolution is explicitly requested, use `quality=high` + `size=2K/4K` — may take 1–5 minutes.
</Warning>

## Common Use Cases

- **Text-to-image**: generate illustrations, posters, icons, avatars, storyboards, etc. from a description
- **Image-to-image**: change styles, swap elements, add decorations or text on an existing image
- **Multi-image fusion**: combine multiple reference images into one (outfit swaps, character group photos, etc.)

<Note>
- Bash timeout should be set to 600 seconds: each provider has a 300-second HTTP timeout, and the script may try multiple providers sequentially
- Input images are automatically compressed to ≤ 4 MB with the longest edge ≤ 4096 px
- Gemini / Seedream / Qwen / MiniMax do not support the `quality` parameter
- Seedream defaults to 2K; `seedream-5.0-lite` supports up to 3K; `seedream-4.5` supports up to 4K
</Note>