feat(skill): multi-provider image generation with auto-fallback

- Add Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax providers to image-generation skill with universal sequential fallback: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI - Each provider filters unsupported size tiers to valid values (e.g. Seedream 1K→2K, Qwen 3K→2K, Gemini 3K→2K) - Pinned model only tries its native provider; auto-routing uses each provider's default model - Support skill-namespaced config (config.skill.image-generation.model → SKILL_IMAGE_GENERATION_MODEL env var) - Add image lightbox (click-to-enlarge) in web console - Add docs for built-in skills (skill-creator, knowledge-wiki, image-generation) under docs/skills/
2026-07-17 11:07:11 +08:00 · 2026-04-23 12:39:39 +08:00
parent 81e8bb62ae
commit 68ce2e5232
16 changed files with 2189 additions and 84 deletions
--- a/skills/image-generation/SKILL.md
+++ b/skills/image-generation/SKILL.md
@@ -6,12 +6,24 @@ metadata:
    requires:
      anyEnv:
        - OPENAI_API_KEY
+        - GEMINI_API_KEY
+        - ARK_API_KEY
+        - DASHSCOPE_API_KEY
+        - MINIMAX_API_KEY
        - LINKAI_API_KEY
 ---

 # Image Generation

-Generate and edit images using AI models (GPT-Image-2, GPT-Image-1, etc.).
+Generate and edit images using AI models. The script automatically picks a backend based on which API keys are configured — **you don't need to specify a model unless the user explicitly names one**.
+
+Supported models (passed via `model` only when the user asks for a specific one):
+
+- **OpenAI** — `gpt-image-2`, `gpt-image-1`
+- **Gemini Nano Banana** — `nano-banana-2`, `nano-banana-pro`, `nano-banana`
+- **Seedream (Volcengine Ark)** — `seedream-5.0-lite`, `seedream-4.5`
+- **Qwen (DashScope)** — `qwen-image-2.0`, `qwen-image-2.0-pro`
+- **MiniMax** — `image-01`

 ## Usage

@@ -21,18 +33,19 @@ Run `scripts/generate.py` with a JSON argument. The path is relative to this ski
 python <base_dir>/scripts/generate.py '<json_args>'
 ```

-**Set bash timeout to at least 300 seconds**, as image generation can take 30–200s depending on quality/size.
+**Set bash timeout to at least 600 seconds**, as image generation can take 30–200s per provider, and the script may try multiple providers sequentially.

 ### Parameters

 | Parameter | Type | Required | Default | Description |
 |-----------|------|----------|---------|-------------|
 | `prompt` | string | yes | — | Image description |
-| `model` | string | no | `gpt-image-2` | Model name (`gpt-image-2`, `gpt-image-1`) |
-| `image_url` | string / list | no | null | Input image(s) for editing: local file path or URL |
-| `quality` | string | no | auto | `low` / `medium` / `high`; omit to let the model choose |
-| `size` | string | no | auto | `1K`/`2K`/`4K`, pixel value (`1024x1024`), or omit to let the model choose |
-| `aspect_ratio` | string | no | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` |
+| `image_url` | string / list | no | null | Input image(s) for editing: local file path or URL. Multi-image fusion is supported (pass a list) |
+| `quality` | string | no | auto | `low` / `medium` / `high` (only some backends honour this) |
+| `size` | string | no | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value (`1024x1024`) |
+| `aspect_ratio` | string | no | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9` (some backends also support extreme ratios like `1:4` / `8:1`) |
+
+**Higher `quality` and larger `size` cost more and run slower.** Default to omitting both (`auto`) so the model picks a balanced setting. Only raise them when the user explicitly asks for high quality / a poster / print-ready output. For quick previews or chat scenarios prefer `quality=low` + `size=1K`.

 ### Example — generate

@@ -40,28 +53,26 @@ python <base_dir>/scripts/generate.py '<json_args>'
 python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut floating in space"}'
 ```

-With explicit quality/size:
+With aspect ratio:

 ```bash
-python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut", "quality": "low", "size": "1K", "aspect_ratio": "1:1"}'
+python <base_dir>/scripts/generate.py '{"prompt": "Isometric miniature city of Shanghai at sunset", "size": "2K", "aspect_ratio": "16:9"}'
 ```

 ### Important: Editing vs Generating

-When the user asks to **edit, modify, or improve an existing image**, you need to pass the original image via `image_url`. Prefer passing **local file paths** directly — the script handles file reading internally. Without `image_url`, the script generates a brand-new image instead of editing.
+When the user asks to **edit, modify, or improve an existing image**, pass the original image via `image_url`. Prefer **local file paths** directly — the script handles file reading internally. Without `image_url`, the script generates a brand-new image instead of editing.

 ### Example — edit (image-to-image)

-Local file (preferred):
-
 ```bash
 python <base_dir>/scripts/generate.py '{"prompt": "Add a Santa hat to the dog", "image_url": "/path/to/dog.png"}'
 ```

-URL:
+Multi-image fusion — pass a list:

 ```bash
-python <base_dir>/scripts/generate.py '{"prompt": "Make the background blue", "image_url": "https://example.com/photo.png"}'
+python <base_dir>/scripts/generate.py '{"prompt": "Combine these characters into a group photo", "image_url": ["/path/a.png", "/path/b.png"]}'
 ```

 ### Output
@@ -70,6 +81,7 @@ Prints JSON to stdout:

 ```json
 {
+  "model": "doubao-seedream-5-0-260128",
  "images": [
    {"url": "/path/to/output.png"}
  ]
@@ -86,39 +98,20 @@ On error:
 }
 ```

-### Environment Variables
+### Setup

-| Variable | Required | Description |
-|----------|----------|-------------|
-| `OPENAI_API_KEY` | yes (unless using LinkAI) | OpenAI API key |
-| `OPENAI_API_BASE` | no | Custom API base URL (default: `https://api.openai.com/v1`) |
-| `LINKAI_API_KEY` | alt | LinkAI API key (used when `OPENAI_API_KEY` is absent) |
-| `LINKAI_API_BASE` | no | LinkAI API base URL |
+The script needs **at least one** of these API keys (set via `env_config` or `config.json`):

-### Size + Aspect Ratio Resolution
+`OPENAI_API_KEY` / `GEMINI_API_KEY` / `ARK_API_KEY` / `DASHSCOPE_API_KEY` / `MINIMAX_API_KEY` / `LINKAI_API_KEY`

-`size` and `aspect_ratio` are combined to determine the actual pixel dimensions:
-
-| size | aspect_ratio | pixels |
-|------|-------------|--------|
-| `1K` | `1:1` | 1024×1024 |
-| `1K` | `3:2` | 1536×1024 |
-| `1K` | `2:3` | 1024×1536 |
-| `2K` | `1:1` | 2048×2048 |
-| `2K` | `16:9` | 2048×1152 |
-| `2K` | `9:16` | 1152×2048 |
-| `4K` | `16:9` | 3840×2160 |
-| `4K` | `9:16` | 2160×3840 |
-
-When an exact match isn't found, the script tries: exact match → upgrade to higher tier with same ratio → cross-tier match by ratio → tier default.
+Each also has an optional `*_API_BASE` for custom endpoints. The script automatically picks the first configured backend and falls back to the next if it fails — no need to specify a model.

 ### Error Handling

-The script internally tries all available providers (OpenAI → LinkAI) in sequence. If it returns an error, **do NOT retry with the same or similar parameters** — the failure is a configuration issue (wrong API key, unsupported API base, etc.), not a transient error. Instead, inform the user about the configuration problem and ask them to fix it (e.g. set the correct `OPENAI_API_KEY` / `OPENAI_API_BASE` via `env_config`), then retry after the configuration is updated.
+If the script returns an error after trying all configured backends, **do NOT retry with the same parameters** — the failure is almost always a configuration issue (wrong API key, unsupported API base). Tell the user to fix it via `env_config`, then retry.

 ### Notes

- HTTP timeout is 300s — high-resolution + high-quality generation can take over 200s.
- When `quality` and `size` are omitted, the API uses `auto` — the model picks the best quality/size based on the prompt.
- `quality=low` + `size=1K` is the fastest combination (~20s). Use when speed matters more than fidelity.
+- HTTP timeout is 300s — high-resolution generation can take over 200s.
+- Omit `quality` / `size` to let the model pick automatically (`auto`).
 - Input images for editing are auto-compressed to ≤ 4MB / longest edge ≤ 4096px.
--- a/skills/image-generation/scripts/generate.py
+++ b/skills/image-generation/scripts/generate.py
@@ -5,8 +5,17 @@ Unified image generation script.
 Usage:
    python generate.py '<json_args>'

-Supports GPT-Image-2 / GPT-Image-1 via the OpenAI-compatible Images API.
-Designed for easy extension to other providers (Gemini, etc.).
+Supported model families (each provider is tried in priority order:
+OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI; missing API keys
+are skipped, and the provider that natively owns the requested model is
+promoted to the front of the queue):
+
+    - gpt-image-2 / gpt-image-1                    → OpenAI
+    - nano-banana / gemini-*-image-*               → Gemini
+    - doubao-seedream-* / seedream-*               → Seedream (Volcengine Ark)
+    - qwen-image-2.0 / qwen-image-2.0-pro / etc.   → Qwen (DashScope)
+    - image-01 / minimax-image                     → MiniMax
+    - any model                                    → LinkAI (universal proxy)

 Dependencies: requests (stdlib: json, sys, os, base64, io, abc, uuid, pathlib, urllib)
 """
@@ -16,6 +25,7 @@ import sys
 import os
 import base64
 import io
+import time
 import uuid
 import re
 from abc import ABC, abstractmethod
@@ -192,9 +202,14 @@ class ImageProvider(ABC):
        image_url: str | list | None = None,
        quality: str | None = None,
        size: str | None = None,
+        aspect_ratio: str | None = None,
        output_dir: str = ".",
    ) -> list[str]:
-        """Generate image(s) and return list of local file paths."""
+        """Generate image(s) and return list of local file paths.
+
+        `size` may be a tier ("1K" / "2K" / "4K" / "512") or pixels ("WxH").
+        Providers that need pixel sizes should call `resolve_size(size, aspect_ratio)`.
+        """
        ...


@@ -205,10 +220,12 @@ class ImageProvider(ABC):
 class OpenAIProvider(ImageProvider):
    """Provider for OpenAI Image API (generations + edits)."""

+    DEFAULT_MODEL = "gpt-image-2"
+
    def __init__(self, api_key: str, api_base: str, model: str):
        self.api_key = api_key
        self.api_base = api_base.rstrip("/")
-        self.model = model
+        self.model = model or self.DEFAULT_MODEL

    def _headers(self) -> dict:
        return {
@@ -267,11 +284,14 @@ class OpenAIProvider(ImageProvider):
        image_url=None,
        quality: str | None = None,
        size: str | None = None,
+        aspect_ratio: str | None = None,
        output_dir: str = ".",
    ) -> list[str]:
+        # OpenAI Images API expects pixel size like 1024x1024.
+        resolved = resolve_size(size, aspect_ratio) if (size or aspect_ratio) else None
        if image_url:
-            return self._edit(prompt, image_url=image_url, quality=quality, size=size, output_dir=output_dir)
-        return self._create(prompt, quality=quality, size=size, output_dir=output_dir)
+            return self._edit(prompt, image_url=image_url, quality=quality, size=resolved, output_dir=output_dir)
+        return self._create(prompt, quality=quality, size=resolved, output_dir=output_dir)

    def _create(self, prompt: str, *, quality: str | None, size: str | None, output_dir: str) -> list[str]:
        url = f"{self.api_base}/images/generations"
@@ -337,10 +357,12 @@ class OpenAIProvider(ImageProvider):
 class LinkAIProvider(ImageProvider):
    """Provider for LinkAI unified image generation API."""

+    DEFAULT_MODEL = "gpt-image-2"
+
    def __init__(self, api_key: str, api_base: str, model: str):
        self.api_key = api_key
        self.api_base = api_base.rstrip("/")
-        self.model = model
+        self.model = model or self.DEFAULT_MODEL

    def generate(
        self,
@@ -349,6 +371,7 @@ class LinkAIProvider(ImageProvider):
        image_url=None,
        quality: str | None = None,
        size: str | None = None,
+        aspect_ratio: str | None = None,
        output_dir: str = ".",
    ) -> list[str]:
        url = f"{self.api_base}/v1/images/generations"
@@ -358,8 +381,12 @@ class LinkAIProvider(ImageProvider):
        }
        if quality:
            payload["quality"] = quality
+        # LinkAI accepts both pixel sizes (1024x1024) and tier shorthand (1K/2K/4K).
+        # Pass through whatever the caller gave us; also forward aspect_ratio.
        if size:
            payload["size"] = size
+        if aspect_ratio:
+            payload["aspect_ratio"] = aspect_ratio
        if image_url:
            urls = image_url if isinstance(image_url, list) else [image_url]
            resolved = []
@@ -408,23 +435,654 @@ class LinkAIProvider(ImageProvider):
        return paths


+# ---------------------------------------------------------------------------
+# Gemini provider (Nano Banana family — gemini-*-image-*)
+# ---------------------------------------------------------------------------
+
+# Friendly aliases → real Gemini model id
+_GEMINI_MODEL_ALIASES = {
+    "nano-banana": "gemini-2.5-flash-image",
+    "nano-banana-2": "gemini-3.1-flash-image-preview",
+    "nano-banana-pro": "gemini-3-pro-image-preview",
+}
+
+
+class GeminiProvider(ImageProvider):
+    """Provider for Google Gemini native image generation (Nano Banana family)."""
+
+    DEFAULT_MODEL = "gemini-3.1-flash-image-preview"  # nano-banana-2
+
+    def __init__(self, api_key: str, api_base: str, model: str):
+        self.api_key = api_key
+        self.api_base = api_base.rstrip("/")
+        self.model = _GEMINI_MODEL_ALIASES.get(model, model or self.DEFAULT_MODEL)
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url=None,
+        quality: str | None = None,  # not used; Gemini has no `quality` param
+        size: str | None = None,
+        aspect_ratio: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        # Build request parts: prompt text + optional inline images
+        parts: list[dict] = [{"text": prompt}]
+        if image_url:
+            urls = image_url if isinstance(image_url, list) else [image_url]
+            for u in urls:
+                data = _compress_image(_load_image(u))
+                mime = _guess_mime(data)
+                parts.append({
+                    "inline_data": {
+                        "mime_type": mime,
+                        "data": base64.b64encode(data).decode(),
+                    }
+                })
+
+        payload: dict = {
+            "contents": [{"parts": parts}],
+            "generationConfig": {"responseModalities": ["IMAGE"]},
+        }
+
+        # Gemini natively supports aspectRatio + imageSize tiers (512/1K/2K/4K).
+        _GEMINI_VALID_TIERS = {"512", "1K", "2K", "4K"}
+        _GEMINI_TIER_FALLBACK = {"3K": "2K"}
+        image_config: dict = {}
+        if size:
+            if "x" in size.lower():
+                tier = _pixels_to_tier(size)
+            else:
+                tier = size.upper()
+            tier = _GEMINI_TIER_FALLBACK.get(tier, tier)
+            if tier in _GEMINI_VALID_TIERS:
+                image_config["imageSize"] = tier
+        if aspect_ratio:
+            image_config["aspectRatio"] = aspect_ratio
+        elif size and "x" in size.lower():
+            ratio = _pixels_to_ratio(size)
+            if ratio:
+                image_config["aspectRatio"] = ratio
+        if image_config:
+            payload["generationConfig"]["imageConfig"] = image_config
+
+        url = f"{self.api_base}/v1beta/models/{self.model}:generateContent"
+        headers = {
+            "x-goog-api-key": self.api_key,
+            "Content-Type": "application/json",
+        }
+
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, json=payload, timeout=300)
+            if resp.status_code >= 400:
+                try:
+                    body = resp.json()
+                    msg = body.get("error", {}).get("message") or resp.text
+                except Exception:
+                    msg = resp.text or resp.reason
+                raise RuntimeError(f"API {resp.status_code}: {msg}")
+            result = resp.json()
+        else:
+            data = json.dumps(payload).encode()
+            req = Request(url, data=data, headers=headers, method="POST")
+            with urlopen(req, timeout=300) as r:
+                result = json.loads(r.read())
+
+        return self._extract_images(result, output_dir)
+
+    @staticmethod
+    def _extract_images(result: dict, output_dir: str) -> list[str]:
+        paths: list[str] = []
+        for cand in result.get("candidates", []):
+            for part in cand.get("content", {}).get("parts", []):
+                if part.get("thought"):
+                    continue  # skip thinking-stage interim images
+                inline = part.get("inlineData") or part.get("inline_data")
+                if inline and inline.get("data"):
+                    raw = base64.b64decode(inline["data"])
+                    paths.append(_save_image(raw, output_dir))
+        if not paths:
+            # Surface the model's text reply (often a refusal explanation)
+            for cand in result.get("candidates", []):
+                for part in cand.get("content", {}).get("parts", []):
+                    if part.get("text"):
+                        raise RuntimeError(f"Gemini returned no image: {part['text'][:200]}")
+            raise RuntimeError("Gemini returned no image (empty response)")
+        return paths
+
+
+def _guess_mime(data: bytes) -> str:
+    if data[:3] == b"\xff\xd8\xff":
+        return "image/jpeg"
+    if data[:4] == b"RIFF":
+        return "image/webp"
+    if data[:8] == b"\x89PNG\r\n\x1a\n":
+        return "image/png"
+    return "image/png"
+
+
+def _pixels_to_tier(pixel_str: str) -> str:
+    """Map 'WxH' to nearest Gemini tier (512 / 1K / 2K / 4K)."""
+    try:
+        w, h = (int(x) for x in pixel_str.lower().split("x"))
+        long_edge = max(w, h)
+    except Exception:
+        return "1K"
+    if long_edge <= 768:
+        return "512"
+    if long_edge <= 1536:
+        return "1K"
+    if long_edge <= 3072:
+        return "2K"
+    return "4K"
+
+
+def _pixels_to_ratio(pixel_str: str) -> str | None:
+    """Map 'WxH' to a Gemini-supported aspect ratio string when possible."""
+    try:
+        w, h = (int(x) for x in pixel_str.lower().split("x"))
+    except Exception:
+        return None
+    # Reduce to a small ratio
+    from math import gcd
+    g = gcd(w, h)
+    rw, rh = w // g, h // g
+    candidate = f"{rw}:{rh}"
+    supported = {"1:1", "1:4", "1:8", "2:3", "3:2", "3:4", "4:1", "4:3",
+                 "4:5", "5:4", "8:1", "9:16", "16:9", "21:9"}
+    return candidate if candidate in supported else None
+
+
+# ---------------------------------------------------------------------------
+# Seedream provider (Volcengine Ark, OpenAI-compatible /images/generations)
+# ---------------------------------------------------------------------------
+
+# Friendly aliases → real Seedream model id (Ark Model IDs).
+_SEEDREAM_MODEL_ALIASES = {
+    "seedream": "doubao-seedream-5-0-260128",
+    "seedream-lite": "doubao-seedream-5-0-260128",
+    "seedream-5.0": "doubao-seedream-5-0-260128",
+    "seedream-5.0-lite": "doubao-seedream-5-0-260128",
+    "seedream-5-0-lite": "doubao-seedream-5-0-260128",
+    "doubao-seedream-5-0": "doubao-seedream-5-0-260128",
+    "doubao-seedream-5-0-lite": "doubao-seedream-5-0-260128",
+    "seedream-4.5": "doubao-seedream-4-5-251128",
+    "seedream-4-5": "doubao-seedream-4-5-251128",
+    "doubao-seedream-4-5": "doubao-seedream-4-5-251128",
+}
+
+# Seedream supports either a coarse tier ("2K"/"3K"/"4K") or explicit "WxH".
+# We pass the user's tier through as-is when valid; otherwise translate ratio
+# hints into the recommended pixel sizes from the Ark docs.
+# Valid size tiers for Seedream (5.0 lite: 2K/3K, 4.5: 2K/4K).
+# Unsupported tiers are mapped to the nearest valid one.
+_SEEDREAM_VALID_TIERS = {"2K", "3K", "4K"}
+_SEEDREAM_TIER_FALLBACK = {"512": "2K", "1K": "2K"}
+_SEEDREAM_SIZE_TABLE = {
+    # (tier, ratio) -> "WxH" recommended pixel sizes (Seedream 5.0 lite + 4.5 share most)
+    ("2K", "1:1"): "2048x2048",
+    ("2K", "3:4"): "1728x2304",
+    ("2K", "4:3"): "2304x1728",
+    ("2K", "16:9"): "2848x1600",
+    ("2K", "9:16"): "1600x2848",
+    ("2K", "3:2"): "2496x1664",
+    ("2K", "2:3"): "1664x2496",
+    ("2K", "21:9"): "3136x1344",
+    ("3K", "1:1"): "3072x3072",
+    ("3K", "3:4"): "2592x3456",
+    ("3K", "4:3"): "3456x2592",
+    ("3K", "16:9"): "4096x2304",
+    ("3K", "9:16"): "2304x4096",
+    ("3K", "2:3"): "2496x3744",
+    ("3K", "3:2"): "3744x2496",
+    ("3K", "21:9"): "4704x2016",
+    ("4K", "1:1"): "4096x4096",
+    ("4K", "3:4"): "3520x4704",
+    ("4K", "4:3"): "4704x3520",
+    ("4K", "16:9"): "5504x3040",
+    ("4K", "9:16"): "3040x5504",
+    ("4K", "2:3"): "3328x4992",
+    ("4K", "3:2"): "4992x3328",
+    ("4K", "21:9"): "6240x2656",
+}
+
+
+class SeedreamProvider(ImageProvider):
+    """Provider for Volcengine Ark Seedream image generation API.
+
+    The endpoint is OpenAI-compatible (POST {base}/images/generations) but
+    accepts an extra `image` field (string or list) for image-to-image and
+    multi-image fusion, plus `sequential_image_generation` / `watermark` flags.
+    Reference docs accept both `2K` shorthand and explicit `WxH` for `size`.
+    """
+
+    DEFAULT_MODEL = "doubao-seedream-5-0-260128"  # seedream 5.0 lite
+
+    def __init__(self, api_key: str, api_base: str, model: str):
+        self.api_key = api_key
+        self.api_base = api_base.rstrip("/")
+        self.model = _SEEDREAM_MODEL_ALIASES.get((model or "").lower(), model or self.DEFAULT_MODEL)
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url=None,
+        quality: str | None = None,  # not honoured by Seedream
+        size: str | None = None,
+        aspect_ratio: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        url = f"{self.api_base}/images/generations"
+
+        payload: dict = {
+            "model": self.model,
+            "prompt": prompt,
+            "response_format": "url",
+            "watermark": False,
+        }
+
+        # Default to 2K (Seedream 5.0 lite minimum tier), unless caller picks one.
+        seedream_size = self._resolve_seedream_size(size, aspect_ratio)
+        if seedream_size:
+            payload["size"] = seedream_size
+
+        # Image-to-image / multi-image fusion (up to 14 reference images).
+        if image_url:
+            urls = image_url if isinstance(image_url, list) else [image_url]
+            prepared: list[str] = []
+            for u in urls[:14]:
+                if os.path.isfile(u):
+                    data = _compress_image(_load_image(u))
+                    mime = _guess_mime(data)
+                    prepared.append(f"data:{mime};base64,{base64.b64encode(data).decode()}")
+                else:
+                    prepared.append(u)
+            payload["image"] = prepared if len(prepared) > 1 else prepared[0]
+
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json",
+        }
+
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, json=payload, timeout=300)
+            if resp.status_code >= 400:
+                try:
+                    body = resp.json()
+                    err = body.get("error") or {}
+                    msg = err.get("message") or body.get("message") or resp.text
+                except Exception:
+                    msg = resp.text or resp.reason
+                raise RuntimeError(f"API {resp.status_code}: {msg}")
+            result = resp.json()
+        else:
+            data = json.dumps(payload).encode()
+            req = Request(url, data=data, headers=headers, method="POST")
+            with urlopen(req, timeout=300) as r:
+                result = json.loads(r.read())
+
+        if result.get("error"):
+            err = result["error"]
+            raise RuntimeError(f"Seedream {err.get('code')}: {err.get('message')}")
+
+        paths: list[str] = []
+        for item in result.get("data") or []:
+            u = item.get("url")
+            b64 = item.get("b64_json")
+            if u:
+                paths.append(_save_image(_load_image(u), output_dir))
+            elif b64:
+                paths.append(_save_image(base64.b64decode(b64), output_dir))
+        if not paths:
+            raise RuntimeError(f"Seedream returned no image: {result}")
+        return paths
+
+    @staticmethod
+    def _resolve_seedream_size(size: str | None, aspect_ratio: str | None) -> str | None:
+        if not size and not aspect_ratio:
+            return "2K"
+        # Explicit pixel values: pass through (normalise separator)
+        if size and "x" in size.lower() and "*" not in size:
+            return size.lower()
+        if size and "*" in size:
+            return size.replace("*", "x")
+        tier = (size or "2K").upper()
+        # Map unsupported tiers (512, 1K) to the nearest valid one
+        tier = _SEEDREAM_TIER_FALLBACK.get(tier, tier)
+        if tier not in _SEEDREAM_VALID_TIERS:
+            tier = "2K"
+        ratio = aspect_ratio or "1:1"
+        if (tier, ratio) in _SEEDREAM_SIZE_TABLE:
+            return _SEEDREAM_SIZE_TABLE[(tier, ratio)]
+        return tier
+
+
+# ---------------------------------------------------------------------------
+# Qwen provider (DashScope multimodal-generation: qwen-image-* family)
+# ---------------------------------------------------------------------------
+
+# Friendly aliases → real Qwen model id
+_QWEN_MODEL_ALIASES = {
+    "qwen": "qwen-image-2.0-pro",
+    "qwen-image": "qwen-image-2.0-pro",
+    "qwen-image-pro": "qwen-image-2.0-pro",
+}
+
+# Qwen pixel-size table (closest match by tier+ratio).
+# qwen-image-2.0(*) supports any WxH between 512*512 and 2048*2048.
+_QWEN_SIZE_TABLE = {
+    # (tier, ratio) -> "W*H"
+    ("1K", "1:1"): "1024*1024",
+    ("1K", "16:9"): "1280*720",
+    ("1K", "9:16"): "720*1280",
+    ("1K", "4:3"): "1184*888",
+    ("1K", "3:4"): "888*1184",
+    ("1K", "3:2"): "1248*832",
+    ("1K", "2:3"): "832*1248",
+    ("2K", "1:1"): "2048*2048",
+    ("2K", "16:9"): "2688*1536",  # exceeds 2048 cap → clamped at runtime if needed
+    ("2K", "9:16"): "1536*2688",
+    ("2K", "4:3"): "2368*1728",
+    ("2K", "3:4"): "1728*2368",
+}
+
+
+class QwenProvider(ImageProvider):
+    """Provider for Alibaba DashScope Qwen image API (qwen-image-2.0[-pro])."""
+
+    DEFAULT_MODEL = "qwen-image-2.0"
+
+    def __init__(self, api_key: str, api_base: str, model: str):
+        self.api_key = api_key
+        self.api_base = api_base.rstrip("/")
+        self.model = _QWEN_MODEL_ALIASES.get((model or "").lower(), model or self.DEFAULT_MODEL)
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url=None,
+        quality: str | None = None,  # not supported by Qwen image API
+        size: str | None = None,
+        aspect_ratio: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        url = f"{self.api_base}/api/v1/services/aigc/multimodal-generation/generation"
+
+        # Build content array: 0..3 images then a single text part.
+        content: list[dict] = []
+        if image_url:
+            urls = image_url if isinstance(image_url, list) else [image_url]
+            for u in urls[:3]:  # API caps at 3 reference images
+                if os.path.isfile(u):
+                    data = _compress_image(_load_image(u))
+                    mime = _guess_mime(data)
+                    image_field = f"data:{mime};base64,{base64.b64encode(data).decode()}"
+                else:
+                    image_field = u
+                content.append({"image": image_field})
+        content.append({"text": prompt})
+
+        payload: dict = {
+            "model": self.model,
+            "input": {"messages": [{"role": "user", "content": content}]},
+        }
+
+        # Map (size, aspect_ratio) → Qwen "W*H"
+        qwen_size = self._resolve_qwen_size(size, aspect_ratio)
+        if qwen_size:
+            payload["parameters"] = {"size": qwen_size}
+
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json",
+        }
+
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, json=payload, timeout=300)
+            if resp.status_code >= 400:
+                try:
+                    body = resp.json()
+                    msg = body.get("message") or body.get("error", {}).get("message") or resp.text
+                except Exception:
+                    msg = resp.text or resp.reason
+                raise RuntimeError(f"API {resp.status_code}: {msg}")
+            result = resp.json()
+        else:
+            data = json.dumps(payload).encode()
+            req = Request(url, data=data, headers=headers, method="POST")
+            with urlopen(req, timeout=300) as r:
+                result = json.loads(r.read())
+
+        # Business-level errors arrive on HTTP 200 with a `code` field.
+        if result.get("code"):
+            raise RuntimeError(f"Qwen {result.get('code')}: {result.get('message')}")
+
+        paths: list[str] = []
+        choices = (result.get("output") or {}).get("choices") or []
+        for ch in choices:
+            for part in ((ch.get("message") or {}).get("content") or []):
+                u = part.get("image")
+                if u:
+                    paths.append(_save_image(_load_image(u), output_dir))
+        if not paths:
+            raise RuntimeError(f"Qwen returned no image: {result}")
+        return paths
+
+    @staticmethod
+    def _resolve_qwen_size(size: str | None, aspect_ratio: str | None) -> str | None:
+        if not size and not aspect_ratio:
+            return None
+        if size and "x" in size.lower() and "*" not in size:
+            return size.lower().replace("x", "*")
+        if size and "*" in size:
+            return size
+        tier = (size or "1K").upper()
+        # Qwen supports 1K and 2K; clamp others
+        _QWEN_TIER_MAP = {"512": "1K", "3K": "2K", "4K": "2K"}
+        tier = _QWEN_TIER_MAP.get(tier, tier)
+        if tier not in ("1K", "2K"):
+            tier = "1K"
+        ratio = aspect_ratio or "1:1"
+        if (tier, ratio) in _QWEN_SIZE_TABLE:
+            return _QWEN_SIZE_TABLE[(tier, ratio)]
+        return _QWEN_SIZE_TABLE.get((tier, "1:1"))
+
+
+# ---------------------------------------------------------------------------
+# MiniMax provider (image-01 family)
+# ---------------------------------------------------------------------------
+
+# Friendly aliases → real MiniMax model id
+_MINIMAX_MODEL_ALIASES = {
+    "minimax": "image-01",
+    "minimax-image": "image-01",
+    "minimax-image-01": "image-01",
+}
+
+_MINIMAX_SUPPORTED_RATIOS = {"1:1", "16:9", "4:3", "3:2", "2:3", "3:4", "9:16", "21:9"}
+
+
+class MinimaxProvider(ImageProvider):
+    """Provider for MiniMax image generation API (image-01)."""
+
+    DEFAULT_MODEL = "image-01"
+
+    def __init__(self, api_key: str, api_base: str, model: str):
+        self.api_key = api_key
+        self.api_base = api_base.rstrip("/")
+        self.model = _MINIMAX_MODEL_ALIASES.get((model or "").lower(), model or self.DEFAULT_MODEL)
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url=None,
+        quality: str | None = None,  # not supported by MiniMax
+        size: str | None = None,
+        aspect_ratio: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        url = f"{self.api_base}/v1/image_generation"
+        payload: dict = {
+            "model": self.model,
+            "prompt": prompt,
+            "response_format": "base64",
+        }
+
+        # MiniMax accepts aspect_ratio directly; derive from pixels if needed.
+        ratio = aspect_ratio
+        if not ratio and size and "x" in size.lower():
+            ratio = _pixels_to_ratio(size)
+        if ratio and ratio in _MINIMAX_SUPPORTED_RATIOS:
+            payload["aspect_ratio"] = ratio
+
+        # Image-to-image uses subject_reference; accept URL or local file (→ base64).
+        if image_url:
+            urls = image_url if isinstance(image_url, list) else [image_url]
+            refs = []
+            for u in urls:
+                if os.path.isfile(u):
+                    data = _compress_image(_load_image(u))
+                    mime = _guess_mime(data)
+                    image_file = f"data:{mime};base64,{base64.b64encode(data).decode()}"
+                else:
+                    image_file = u
+                refs.append({"type": "character", "image_file": image_file})
+            payload["subject_reference"] = refs
+
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json",
+        }
+
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, json=payload, timeout=300)
+            if resp.status_code >= 400:
+                try:
+                    body = resp.json()
+                    msg = body.get("base_resp", {}).get("status_msg") or body.get("error", {}).get("message") or resp.text
+                except Exception:
+                    msg = resp.text or resp.reason
+                raise RuntimeError(f"API {resp.status_code}: {msg}")
+            result = resp.json()
+        else:
+            data = json.dumps(payload).encode()
+            req = Request(url, data=data, headers=headers, method="POST")
+            with urlopen(req, timeout=300) as r:
+                result = json.loads(r.read())
+
+        # MiniMax returns business errors inside base_resp even on HTTP 200.
+        base_resp = result.get("base_resp") or {}
+        if base_resp.get("status_code") not in (None, 0):
+            raise RuntimeError(f"MiniMax {base_resp.get('status_code')}: {base_resp.get('status_msg')}")
+
+        data_obj = result.get("data") or {}
+        b64_list = data_obj.get("image_base64") or []
+        urls_list = data_obj.get("image_urls") or []
+
+        paths: list[str] = []
+        for b64 in b64_list:
+            paths.append(_save_image(base64.b64decode(b64), output_dir))
+        for u in urls_list:
+            paths.append(_save_image(_load_image(u), output_dir))
+        if not paths:
+            raise RuntimeError(f"MiniMax returned no image: {result}")
+        return paths
+
+
 # ---------------------------------------------------------------------------
 # Provider factory
 # ---------------------------------------------------------------------------

-def _build_providers(model: str) -> list[tuple[str, ImageProvider]]:
-    """Build an ordered list of (label, provider) to try."""
-    openai_key = os.environ.get("OPENAI_API_KEY", "")
-    openai_base = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1")
-    linkai_key = os.environ.get("LINKAI_API_KEY", "")
-    linkai_base = os.environ.get("LINKAI_API_BASE", "https://api.link-ai.tech")
+# Model-prefix → preferred provider label.
+# When the requested model matches a prefix, that provider is promoted to the
+# front of the queue. All other configured providers still run as fallbacks.
+_MODEL_PREFERRED_PROVIDER: list[tuple[tuple[str, ...], str]] = [
+    (("gpt-image",), "OpenAI"),
+    (("nano-banana", "gemini-"), "Gemini"),
+    (("seedream", "doubao-seedream"), "Seedream"),
+    (("qwen-image", "qwen"), "Qwen"),
+    (("minimax", "image-01"), "MiniMax"),
+]

-    providers = []
-    if openai_key:
-        providers.append(("OpenAI", OpenAIProvider(api_key=openai_key, api_base=openai_base, model=model)))
-    if linkai_key:
-        providers.append(("LinkAI", LinkAIProvider(api_key=linkai_key, api_base=linkai_base, model=model)))
-    return providers
+# Default global priority when the model has no preferred provider.
+_DEFAULT_PROVIDER_ORDER = ["OpenAI", "Gemini", "Seedream", "Qwen", "MiniMax", "LinkAI"]
+
+
+def _preferred_provider(model: str) -> str | None:
+    m = (model or "").lower()
+    for prefixes, label in _MODEL_PREFERRED_PROVIDER:
+        if m.startswith(prefixes):
+            return label
+    return None
+
+
+def _build_providers(model: str) -> list[tuple[str, ImageProvider]]:
+    """Build an ordered list of (label, provider) to try.
+
+    Behaviour:
+      1. All providers with a configured API key are added in the global
+         priority order: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI.
+      2. If `model` natively belongs to one of the providers AND that provider
+         is configured, it is promoted to the front so it gets the first
+         attempt with the right model id.
+      3. If the preferred provider is NOT configured (no API key), the model
+         id would 100% fail on every other backend, so we drop the explicit
+         model and fall back to automatic routing — every provider then uses
+         its own DEFAULT_MODEL.
+    """
+    keys = {
+        "OpenAI": os.environ.get("OPENAI_API_KEY", ""),
+        "Gemini": os.environ.get("GEMINI_API_KEY", ""),
+        "Seedream": os.environ.get("ARK_API_KEY", ""),
+        "Qwen": os.environ.get("DASHSCOPE_API_KEY", ""),
+        "MiniMax": os.environ.get("MINIMAX_API_KEY", ""),
+        "LinkAI": os.environ.get("LINKAI_API_KEY", ""),
+    }
+    bases = {
+        "OpenAI": os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1"),
+        "Gemini": os.environ.get("GEMINI_API_BASE", "https://generativelanguage.googleapis.com"),
+        "Seedream": os.environ.get("ARK_API_BASE", "https://ark.cn-beijing.volces.com/api/v3"),
+        "Qwen": os.environ.get("DASHSCOPE_API_BASE", "https://dashscope.aliyuncs.com"),
+        "MiniMax": os.environ.get("MINIMAX_API_BASE", "https://api.minimaxi.com"),
+        "LinkAI": os.environ.get("LINKAI_API_BASE", "https://api.link-ai.tech"),
+    }
+
+    pref = _preferred_provider(model)
+
+    # If a specific model is requested and its native provider has no key,
+    # other backends won't recognise the id → reset to auto routing.
+    if pref and not keys.get(pref):
+        model = ""
+        pref = None
+
+    factories = {
+        "OpenAI": OpenAIProvider,
+        "Gemini": GeminiProvider,
+        "Seedream": SeedreamProvider,
+        "Qwen": QwenProvider,
+        "MiniMax": MinimaxProvider,
+        "LinkAI": LinkAIProvider,
+    }
+    available: dict[str, ImageProvider] = {}
+    for label, key in keys.items():
+        if key:
+            available[label] = factories[label](api_key=key, api_base=bases[label], model=model)
+
+    # When a specific model is pinned, only try its native provider — other
+    # backends won't recognise the model id so retrying them is pointless.
+    if pref and pref in available:
+        return [(pref, available[pref])]
+
+    # Auto routing: try every configured provider in priority order.
+    ordered: list[str] = []
+    for label in _DEFAULT_PROVIDER_ORDER:
+        if label in available:
+            ordered.append(label)
+    return [(label, available[label]) for label in ordered]


 # ---------------------------------------------------------------------------
@@ -447,40 +1105,59 @@ def main():
        print(json.dumps({"error": "Missing required parameter: prompt"}))
        sys.exit(1)

-    model = args.get("model", "gpt-image-2")
+    # Model resolution priority:
+    #   1. Explicit `model` in the call args (agent / user override)
+    #   2. SKILL_IMAGE_GENERATION_MODEL env var (synced from
+    #      config["skill"]["image-generation"]["model"] at startup)
+    #   3. None → fall back to automatic provider routing (try every
+    #      provider with a configured API key in global priority order)
+    model = args.get("model") or os.environ.get("SKILL_IMAGE_GENERATION_MODEL") or ""
    quality = args.get("quality")
-    raw_size = args.get("size")
+    size = args.get("size")
    aspect_ratio = args.get("aspect_ratio")
    image_url = args.get("image_url")

-    resolved_size = resolve_size(raw_size, aspect_ratio)
-
    output_dir = os.environ.get("IMAGE_OUTPUT_DIR", os.path.join(os.getcwd(), "images"))

    providers = _build_providers(model)
    if not providers:
+        target = f"model '{model}'" if model else "image generation"
        print(json.dumps({
-            "error": "No API key configured. Please set OPENAI_API_KEY or LINKAI_API_KEY via env_config tool, then try again."
+            "error": (
+                f"No API key configured for {target}. "
+                "Set at least one of OPENAI_API_KEY / GEMINI_API_KEY / "
+                "ARK_API_KEY / DASHSCOPE_API_KEY / MINIMAX_API_KEY / "
+                "LINKAI_API_KEY via the env_config tool, then try again."
+            )
        }, ensure_ascii=False))
        sys.exit(1)

-    import time
-
    errors = []
    for label, provider in providers:
        try:
-            print(f"[image-generation] Trying {label} (model={model})...", file=sys.stderr)
+            attempt_model = getattr(provider, "model", model) or "auto"
+            print(f"[image-generation] Trying {label} (model={attempt_model})...", file=sys.stderr)
            t0 = time.time()
            paths = provider.generate(
                prompt,
                image_url=image_url,
                quality=quality,
-                size=resolved_size,
+                size=size,
+                aspect_ratio=aspect_ratio,
                output_dir=output_dir,
            )
            elapsed = time.time() - t0
-            print(f"[image-generation] ✅ {label} succeeded in {elapsed:.1f}s", file=sys.stderr)
-            result = {"images": [{"url": p} for p in paths]}
+            # Resolved model id (after alias expansion) actually sent to the API
+            actual_model = getattr(provider, "model", model)
+            print(
+                f"[image-generation] ✅ {label} succeeded in {elapsed:.1f}s "
+                f"(model={actual_model})",
+                file=sys.stderr,
+            )
+            result = {
+                "model": actual_model,
+                "images": [{"url": p} for p in paths],
+            }
            print(json.dumps(result, ensure_ascii=False))
            return
        except Exception as e:
@@ -493,8 +1170,10 @@ def main():
        "error": f"All providers failed — {hint}. "
                 "This is likely an API key or base URL configuration issue. "
                 "Do NOT retry with the same parameters. "
-                 "Ask the user to verify their OPENAI_API_KEY / OPENAI_API_BASE "
-                 "(or LINKAI_API_KEY / LINKAI_API_BASE) settings via env_config."
+                 "Ask the user to verify their API key / base URL "
+                 "(OPENAI_API_KEY, GEMINI_API_KEY, ARK_API_KEY, "
+                 "DASHSCOPE_API_KEY, MINIMAX_API_KEY, or LINKAI_API_KEY) "
+                 "via env_config."
    }, ensure_ascii=False))
    sys.exit(1)