feat(skill): multi-provider image generation with auto-fallback

- Add Gemini, Seedream (Volcengine Ark), Qwen (DashScope), MiniMax
  providers to image-generation skill with universal sequential
  fallback: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI
- Each provider filters unsupported size tiers to valid values
  (e.g. Seedream 1K→2K, Qwen 3K→2K, Gemini 3K→2K)
- Pinned model only tries its native provider; auto-routing uses
  each provider's default model
- Support skill-namespaced config (config.skill.image-generation.model
  → SKILL_IMAGE_GENERATION_MODEL env var)
- Add image lightbox (click-to-enlarge) in web console
- Add  docs for built-in skills (skill-creator, knowledge-wiki,
  image-generation) under docs/skills/
This commit is contained in:
zhayujie
2026-04-23 12:39:39 +08:00
parent 81e8bb62ae
commit 68ce2e5232
16 changed files with 2189 additions and 84 deletions

View File

@@ -6,12 +6,24 @@ metadata:
requires:
anyEnv:
- OPENAI_API_KEY
- GEMINI_API_KEY
- ARK_API_KEY
- DASHSCOPE_API_KEY
- MINIMAX_API_KEY
- LINKAI_API_KEY
---
# Image Generation
Generate and edit images using AI models (GPT-Image-2, GPT-Image-1, etc.).
Generate and edit images using AI models. The script automatically picks a backend based on which API keys are configured — **you don't need to specify a model unless the user explicitly names one**.
Supported models (passed via `model` only when the user asks for a specific one):
- **OpenAI** — `gpt-image-2`, `gpt-image-1`
- **Gemini Nano Banana** — `nano-banana-2`, `nano-banana-pro`, `nano-banana`
- **Seedream (Volcengine Ark)** — `seedream-5.0-lite`, `seedream-4.5`
- **Qwen (DashScope)** — `qwen-image-2.0`, `qwen-image-2.0-pro`
- **MiniMax** — `image-01`
## Usage
@@ -21,18 +33,19 @@ Run `scripts/generate.py` with a JSON argument. The path is relative to this ski
python <base_dir>/scripts/generate.py '<json_args>'
```
**Set bash timeout to at least 300 seconds**, as image generation can take 30200s depending on quality/size.
**Set bash timeout to at least 600 seconds**, as image generation can take 30200s per provider, and the script may try multiple providers sequentially.
### Parameters
| Parameter | Type | Required | Default | Description |
|-----------|------|----------|---------|-------------|
| `prompt` | string | yes | — | Image description |
| `model` | string | no | `gpt-image-2` | Model name (`gpt-image-2`, `gpt-image-1`) |
| `image_url` | string / list | no | null | Input image(s) for editing: local file path or URL |
| `quality` | string | no | auto | `low` / `medium` / `high`; omit to let the model choose |
| `size` | string | no | auto | `1K`/`2K`/`4K`, pixel value (`1024x1024`), or omit to let the model choose |
| `aspect_ratio` | string | no | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` |
| `image_url` | string / list | no | null | Input image(s) for editing: local file path or URL. Multi-image fusion is supported (pass a list) |
| `quality` | string | no | auto | `low` / `medium` / `high` (only some backends honour this) |
| `size` | string | no | auto | `512` / `1K` / `2K` / `3K` / `4K`, or pixel value (`1024x1024`) |
| `aspect_ratio` | string | no | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` / `21:9` (some backends also support extreme ratios like `1:4` / `8:1`) |
**Higher `quality` and larger `size` cost more and run slower.** Default to omitting both (`auto`) so the model picks a balanced setting. Only raise them when the user explicitly asks for high quality / a poster / print-ready output. For quick previews or chat scenarios prefer `quality=low` + `size=1K`.
### Example — generate
@@ -40,28 +53,26 @@ python <base_dir>/scripts/generate.py '<json_args>'
python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut floating in space"}'
```
With explicit quality/size:
With aspect ratio:
```bash
python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut", "quality": "low", "size": "1K", "aspect_ratio": "1:1"}'
python <base_dir>/scripts/generate.py '{"prompt": "Isometric miniature city of Shanghai at sunset", "size": "2K", "aspect_ratio": "16:9"}'
```
### Important: Editing vs Generating
When the user asks to **edit, modify, or improve an existing image**, you need to pass the original image via `image_url`. Prefer passing **local file paths** directly — the script handles file reading internally. Without `image_url`, the script generates a brand-new image instead of editing.
When the user asks to **edit, modify, or improve an existing image**, pass the original image via `image_url`. Prefer **local file paths** directly — the script handles file reading internally. Without `image_url`, the script generates a brand-new image instead of editing.
### Example — edit (image-to-image)
Local file (preferred):
```bash
python <base_dir>/scripts/generate.py '{"prompt": "Add a Santa hat to the dog", "image_url": "/path/to/dog.png"}'
```
URL:
Multi-image fusion — pass a list:
```bash
python <base_dir>/scripts/generate.py '{"prompt": "Make the background blue", "image_url": "https://example.com/photo.png"}'
python <base_dir>/scripts/generate.py '{"prompt": "Combine these characters into a group photo", "image_url": ["/path/a.png", "/path/b.png"]}'
```
### Output
@@ -70,6 +81,7 @@ Prints JSON to stdout:
```json
{
"model": "doubao-seedream-5-0-260128",
"images": [
{"url": "/path/to/output.png"}
]
@@ -86,39 +98,20 @@ On error:
}
```
### Environment Variables
### Setup
| Variable | Required | Description |
|----------|----------|-------------|
| `OPENAI_API_KEY` | yes (unless using LinkAI) | OpenAI API key |
| `OPENAI_API_BASE` | no | Custom API base URL (default: `https://api.openai.com/v1`) |
| `LINKAI_API_KEY` | alt | LinkAI API key (used when `OPENAI_API_KEY` is absent) |
| `LINKAI_API_BASE` | no | LinkAI API base URL |
The script needs **at least one** of these API keys (set via `env_config` or `config.json`):
### Size + Aspect Ratio Resolution
`OPENAI_API_KEY` / `GEMINI_API_KEY` / `ARK_API_KEY` / `DASHSCOPE_API_KEY` / `MINIMAX_API_KEY` / `LINKAI_API_KEY`
`size` and `aspect_ratio` are combined to determine the actual pixel dimensions:
| size | aspect_ratio | pixels |
|------|-------------|--------|
| `1K` | `1:1` | 1024×1024 |
| `1K` | `3:2` | 1536×1024 |
| `1K` | `2:3` | 1024×1536 |
| `2K` | `1:1` | 2048×2048 |
| `2K` | `16:9` | 2048×1152 |
| `2K` | `9:16` | 1152×2048 |
| `4K` | `16:9` | 3840×2160 |
| `4K` | `9:16` | 2160×3840 |
When an exact match isn't found, the script tries: exact match → upgrade to higher tier with same ratio → cross-tier match by ratio → tier default.
Each also has an optional `*_API_BASE` for custom endpoints. The script automatically picks the first configured backend and falls back to the next if it fails — no need to specify a model.
### Error Handling
The script internally tries all available providers (OpenAI → LinkAI) in sequence. If it returns an error, **do NOT retry with the same or similar parameters** — the failure is a configuration issue (wrong API key, unsupported API base, etc.), not a transient error. Instead, inform the user about the configuration problem and ask them to fix it (e.g. set the correct `OPENAI_API_KEY` / `OPENAI_API_BASE` via `env_config`), then retry after the configuration is updated.
If the script returns an error after trying all configured backends, **do NOT retry with the same parameters** — the failure is almost always a configuration issue (wrong API key, unsupported API base). Tell the user to fix it via `env_config`, then retry.
### Notes
- HTTP timeout is 300s — high-resolution + high-quality generation can take over 200s.
- When `quality` and `size` are omitted, the API uses `auto` — the model picks the best quality/size based on the prompt.
- `quality=low` + `size=1K` is the fastest combination (~20s). Use when speed matters more than fidelity.
- HTTP timeout is 300s — high-resolution generation can take over 200s.
- Omit `quality` / `size` to let the model pick automatically (`auto`).
- Input images for editing are auto-compressed to ≤ 4MB / longest edge ≤ 4096px.

View File

@@ -5,8 +5,17 @@ Unified image generation script.
Usage:
python generate.py '<json_args>'
Supports GPT-Image-2 / GPT-Image-1 via the OpenAI-compatible Images API.
Designed for easy extension to other providers (Gemini, etc.).
Supported model families (each provider is tried in priority order:
OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI; missing API keys
are skipped, and the provider that natively owns the requested model is
promoted to the front of the queue):
- gpt-image-2 / gpt-image-1 → OpenAI
- nano-banana / gemini-*-image-* → Gemini
- doubao-seedream-* / seedream-* → Seedream (Volcengine Ark)
- qwen-image-2.0 / qwen-image-2.0-pro / etc. → Qwen (DashScope)
- image-01 / minimax-image → MiniMax
- any model → LinkAI (universal proxy)
Dependencies: requests (stdlib: json, sys, os, base64, io, abc, uuid, pathlib, urllib)
"""
@@ -16,6 +25,7 @@ import sys
import os
import base64
import io
import time
import uuid
import re
from abc import ABC, abstractmethod
@@ -192,9 +202,14 @@ class ImageProvider(ABC):
image_url: str | list | None = None,
quality: str | None = None,
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
"""Generate image(s) and return list of local file paths."""
"""Generate image(s) and return list of local file paths.
`size` may be a tier ("1K" / "2K" / "4K" / "512") or pixels ("WxH").
Providers that need pixel sizes should call `resolve_size(size, aspect_ratio)`.
"""
...
@@ -205,10 +220,12 @@ class ImageProvider(ABC):
class OpenAIProvider(ImageProvider):
"""Provider for OpenAI Image API (generations + edits)."""
DEFAULT_MODEL = "gpt-image-2"
def __init__(self, api_key: str, api_base: str, model: str):
self.api_key = api_key
self.api_base = api_base.rstrip("/")
self.model = model
self.model = model or self.DEFAULT_MODEL
def _headers(self) -> dict:
return {
@@ -267,11 +284,14 @@ class OpenAIProvider(ImageProvider):
image_url=None,
quality: str | None = None,
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
# OpenAI Images API expects pixel size like 1024x1024.
resolved = resolve_size(size, aspect_ratio) if (size or aspect_ratio) else None
if image_url:
return self._edit(prompt, image_url=image_url, quality=quality, size=size, output_dir=output_dir)
return self._create(prompt, quality=quality, size=size, output_dir=output_dir)
return self._edit(prompt, image_url=image_url, quality=quality, size=resolved, output_dir=output_dir)
return self._create(prompt, quality=quality, size=resolved, output_dir=output_dir)
def _create(self, prompt: str, *, quality: str | None, size: str | None, output_dir: str) -> list[str]:
url = f"{self.api_base}/images/generations"
@@ -337,10 +357,12 @@ class OpenAIProvider(ImageProvider):
class LinkAIProvider(ImageProvider):
"""Provider for LinkAI unified image generation API."""
DEFAULT_MODEL = "gpt-image-2"
def __init__(self, api_key: str, api_base: str, model: str):
self.api_key = api_key
self.api_base = api_base.rstrip("/")
self.model = model
self.model = model or self.DEFAULT_MODEL
def generate(
self,
@@ -349,6 +371,7 @@ class LinkAIProvider(ImageProvider):
image_url=None,
quality: str | None = None,
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
url = f"{self.api_base}/v1/images/generations"
@@ -358,8 +381,12 @@ class LinkAIProvider(ImageProvider):
}
if quality:
payload["quality"] = quality
# LinkAI accepts both pixel sizes (1024x1024) and tier shorthand (1K/2K/4K).
# Pass through whatever the caller gave us; also forward aspect_ratio.
if size:
payload["size"] = size
if aspect_ratio:
payload["aspect_ratio"] = aspect_ratio
if image_url:
urls = image_url if isinstance(image_url, list) else [image_url]
resolved = []
@@ -408,23 +435,654 @@ class LinkAIProvider(ImageProvider):
return paths
# ---------------------------------------------------------------------------
# Gemini provider (Nano Banana family — gemini-*-image-*)
# ---------------------------------------------------------------------------
# Friendly aliases → real Gemini model id
_GEMINI_MODEL_ALIASES = {
"nano-banana": "gemini-2.5-flash-image",
"nano-banana-2": "gemini-3.1-flash-image-preview",
"nano-banana-pro": "gemini-3-pro-image-preview",
}
class GeminiProvider(ImageProvider):
"""Provider for Google Gemini native image generation (Nano Banana family)."""
DEFAULT_MODEL = "gemini-3.1-flash-image-preview" # nano-banana-2
def __init__(self, api_key: str, api_base: str, model: str):
self.api_key = api_key
self.api_base = api_base.rstrip("/")
self.model = _GEMINI_MODEL_ALIASES.get(model, model or self.DEFAULT_MODEL)
def generate(
self,
prompt: str,
*,
image_url=None,
quality: str | None = None, # not used; Gemini has no `quality` param
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
# Build request parts: prompt text + optional inline images
parts: list[dict] = [{"text": prompt}]
if image_url:
urls = image_url if isinstance(image_url, list) else [image_url]
for u in urls:
data = _compress_image(_load_image(u))
mime = _guess_mime(data)
parts.append({
"inline_data": {
"mime_type": mime,
"data": base64.b64encode(data).decode(),
}
})
payload: dict = {
"contents": [{"parts": parts}],
"generationConfig": {"responseModalities": ["IMAGE"]},
}
# Gemini natively supports aspectRatio + imageSize tiers (512/1K/2K/4K).
_GEMINI_VALID_TIERS = {"512", "1K", "2K", "4K"}
_GEMINI_TIER_FALLBACK = {"3K": "2K"}
image_config: dict = {}
if size:
if "x" in size.lower():
tier = _pixels_to_tier(size)
else:
tier = size.upper()
tier = _GEMINI_TIER_FALLBACK.get(tier, tier)
if tier in _GEMINI_VALID_TIERS:
image_config["imageSize"] = tier
if aspect_ratio:
image_config["aspectRatio"] = aspect_ratio
elif size and "x" in size.lower():
ratio = _pixels_to_ratio(size)
if ratio:
image_config["aspectRatio"] = ratio
if image_config:
payload["generationConfig"]["imageConfig"] = image_config
url = f"{self.api_base}/v1beta/models/{self.model}:generateContent"
headers = {
"x-goog-api-key": self.api_key,
"Content-Type": "application/json",
}
if _HAS_REQUESTS:
resp = requests.post(url, headers=headers, json=payload, timeout=300)
if resp.status_code >= 400:
try:
body = resp.json()
msg = body.get("error", {}).get("message") or resp.text
except Exception:
msg = resp.text or resp.reason
raise RuntimeError(f"API {resp.status_code}: {msg}")
result = resp.json()
else:
data = json.dumps(payload).encode()
req = Request(url, data=data, headers=headers, method="POST")
with urlopen(req, timeout=300) as r:
result = json.loads(r.read())
return self._extract_images(result, output_dir)
@staticmethod
def _extract_images(result: dict, output_dir: str) -> list[str]:
paths: list[str] = []
for cand in result.get("candidates", []):
for part in cand.get("content", {}).get("parts", []):
if part.get("thought"):
continue # skip thinking-stage interim images
inline = part.get("inlineData") or part.get("inline_data")
if inline and inline.get("data"):
raw = base64.b64decode(inline["data"])
paths.append(_save_image(raw, output_dir))
if not paths:
# Surface the model's text reply (often a refusal explanation)
for cand in result.get("candidates", []):
for part in cand.get("content", {}).get("parts", []):
if part.get("text"):
raise RuntimeError(f"Gemini returned no image: {part['text'][:200]}")
raise RuntimeError("Gemini returned no image (empty response)")
return paths
def _guess_mime(data: bytes) -> str:
if data[:3] == b"\xff\xd8\xff":
return "image/jpeg"
if data[:4] == b"RIFF":
return "image/webp"
if data[:8] == b"\x89PNG\r\n\x1a\n":
return "image/png"
return "image/png"
def _pixels_to_tier(pixel_str: str) -> str:
"""Map 'WxH' to nearest Gemini tier (512 / 1K / 2K / 4K)."""
try:
w, h = (int(x) for x in pixel_str.lower().split("x"))
long_edge = max(w, h)
except Exception:
return "1K"
if long_edge <= 768:
return "512"
if long_edge <= 1536:
return "1K"
if long_edge <= 3072:
return "2K"
return "4K"
def _pixels_to_ratio(pixel_str: str) -> str | None:
"""Map 'WxH' to a Gemini-supported aspect ratio string when possible."""
try:
w, h = (int(x) for x in pixel_str.lower().split("x"))
except Exception:
return None
# Reduce to a small ratio
from math import gcd
g = gcd(w, h)
rw, rh = w // g, h // g
candidate = f"{rw}:{rh}"
supported = {"1:1", "1:4", "1:8", "2:3", "3:2", "3:4", "4:1", "4:3",
"4:5", "5:4", "8:1", "9:16", "16:9", "21:9"}
return candidate if candidate in supported else None
# ---------------------------------------------------------------------------
# Seedream provider (Volcengine Ark, OpenAI-compatible /images/generations)
# ---------------------------------------------------------------------------
# Friendly aliases → real Seedream model id (Ark Model IDs).
_SEEDREAM_MODEL_ALIASES = {
"seedream": "doubao-seedream-5-0-260128",
"seedream-lite": "doubao-seedream-5-0-260128",
"seedream-5.0": "doubao-seedream-5-0-260128",
"seedream-5.0-lite": "doubao-seedream-5-0-260128",
"seedream-5-0-lite": "doubao-seedream-5-0-260128",
"doubao-seedream-5-0": "doubao-seedream-5-0-260128",
"doubao-seedream-5-0-lite": "doubao-seedream-5-0-260128",
"seedream-4.5": "doubao-seedream-4-5-251128",
"seedream-4-5": "doubao-seedream-4-5-251128",
"doubao-seedream-4-5": "doubao-seedream-4-5-251128",
}
# Seedream supports either a coarse tier ("2K"/"3K"/"4K") or explicit "WxH".
# We pass the user's tier through as-is when valid; otherwise translate ratio
# hints into the recommended pixel sizes from the Ark docs.
# Valid size tiers for Seedream (5.0 lite: 2K/3K, 4.5: 2K/4K).
# Unsupported tiers are mapped to the nearest valid one.
_SEEDREAM_VALID_TIERS = {"2K", "3K", "4K"}
_SEEDREAM_TIER_FALLBACK = {"512": "2K", "1K": "2K"}
_SEEDREAM_SIZE_TABLE = {
# (tier, ratio) -> "WxH" recommended pixel sizes (Seedream 5.0 lite + 4.5 share most)
("2K", "1:1"): "2048x2048",
("2K", "3:4"): "1728x2304",
("2K", "4:3"): "2304x1728",
("2K", "16:9"): "2848x1600",
("2K", "9:16"): "1600x2848",
("2K", "3:2"): "2496x1664",
("2K", "2:3"): "1664x2496",
("2K", "21:9"): "3136x1344",
("3K", "1:1"): "3072x3072",
("3K", "3:4"): "2592x3456",
("3K", "4:3"): "3456x2592",
("3K", "16:9"): "4096x2304",
("3K", "9:16"): "2304x4096",
("3K", "2:3"): "2496x3744",
("3K", "3:2"): "3744x2496",
("3K", "21:9"): "4704x2016",
("4K", "1:1"): "4096x4096",
("4K", "3:4"): "3520x4704",
("4K", "4:3"): "4704x3520",
("4K", "16:9"): "5504x3040",
("4K", "9:16"): "3040x5504",
("4K", "2:3"): "3328x4992",
("4K", "3:2"): "4992x3328",
("4K", "21:9"): "6240x2656",
}
class SeedreamProvider(ImageProvider):
"""Provider for Volcengine Ark Seedream image generation API.
The endpoint is OpenAI-compatible (POST {base}/images/generations) but
accepts an extra `image` field (string or list) for image-to-image and
multi-image fusion, plus `sequential_image_generation` / `watermark` flags.
Reference docs accept both `2K` shorthand and explicit `WxH` for `size`.
"""
DEFAULT_MODEL = "doubao-seedream-5-0-260128" # seedream 5.0 lite
def __init__(self, api_key: str, api_base: str, model: str):
self.api_key = api_key
self.api_base = api_base.rstrip("/")
self.model = _SEEDREAM_MODEL_ALIASES.get((model or "").lower(), model or self.DEFAULT_MODEL)
def generate(
self,
prompt: str,
*,
image_url=None,
quality: str | None = None, # not honoured by Seedream
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
url = f"{self.api_base}/images/generations"
payload: dict = {
"model": self.model,
"prompt": prompt,
"response_format": "url",
"watermark": False,
}
# Default to 2K (Seedream 5.0 lite minimum tier), unless caller picks one.
seedream_size = self._resolve_seedream_size(size, aspect_ratio)
if seedream_size:
payload["size"] = seedream_size
# Image-to-image / multi-image fusion (up to 14 reference images).
if image_url:
urls = image_url if isinstance(image_url, list) else [image_url]
prepared: list[str] = []
for u in urls[:14]:
if os.path.isfile(u):
data = _compress_image(_load_image(u))
mime = _guess_mime(data)
prepared.append(f"data:{mime};base64,{base64.b64encode(data).decode()}")
else:
prepared.append(u)
payload["image"] = prepared if len(prepared) > 1 else prepared[0]
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
if _HAS_REQUESTS:
resp = requests.post(url, headers=headers, json=payload, timeout=300)
if resp.status_code >= 400:
try:
body = resp.json()
err = body.get("error") or {}
msg = err.get("message") or body.get("message") or resp.text
except Exception:
msg = resp.text or resp.reason
raise RuntimeError(f"API {resp.status_code}: {msg}")
result = resp.json()
else:
data = json.dumps(payload).encode()
req = Request(url, data=data, headers=headers, method="POST")
with urlopen(req, timeout=300) as r:
result = json.loads(r.read())
if result.get("error"):
err = result["error"]
raise RuntimeError(f"Seedream {err.get('code')}: {err.get('message')}")
paths: list[str] = []
for item in result.get("data") or []:
u = item.get("url")
b64 = item.get("b64_json")
if u:
paths.append(_save_image(_load_image(u), output_dir))
elif b64:
paths.append(_save_image(base64.b64decode(b64), output_dir))
if not paths:
raise RuntimeError(f"Seedream returned no image: {result}")
return paths
@staticmethod
def _resolve_seedream_size(size: str | None, aspect_ratio: str | None) -> str | None:
if not size and not aspect_ratio:
return "2K"
# Explicit pixel values: pass through (normalise separator)
if size and "x" in size.lower() and "*" not in size:
return size.lower()
if size and "*" in size:
return size.replace("*", "x")
tier = (size or "2K").upper()
# Map unsupported tiers (512, 1K) to the nearest valid one
tier = _SEEDREAM_TIER_FALLBACK.get(tier, tier)
if tier not in _SEEDREAM_VALID_TIERS:
tier = "2K"
ratio = aspect_ratio or "1:1"
if (tier, ratio) in _SEEDREAM_SIZE_TABLE:
return _SEEDREAM_SIZE_TABLE[(tier, ratio)]
return tier
# ---------------------------------------------------------------------------
# Qwen provider (DashScope multimodal-generation: qwen-image-* family)
# ---------------------------------------------------------------------------
# Friendly aliases → real Qwen model id
_QWEN_MODEL_ALIASES = {
"qwen": "qwen-image-2.0-pro",
"qwen-image": "qwen-image-2.0-pro",
"qwen-image-pro": "qwen-image-2.0-pro",
}
# Qwen pixel-size table (closest match by tier+ratio).
# qwen-image-2.0(*) supports any WxH between 512*512 and 2048*2048.
_QWEN_SIZE_TABLE = {
# (tier, ratio) -> "W*H"
("1K", "1:1"): "1024*1024",
("1K", "16:9"): "1280*720",
("1K", "9:16"): "720*1280",
("1K", "4:3"): "1184*888",
("1K", "3:4"): "888*1184",
("1K", "3:2"): "1248*832",
("1K", "2:3"): "832*1248",
("2K", "1:1"): "2048*2048",
("2K", "16:9"): "2688*1536", # exceeds 2048 cap → clamped at runtime if needed
("2K", "9:16"): "1536*2688",
("2K", "4:3"): "2368*1728",
("2K", "3:4"): "1728*2368",
}
class QwenProvider(ImageProvider):
"""Provider for Alibaba DashScope Qwen image API (qwen-image-2.0[-pro])."""
DEFAULT_MODEL = "qwen-image-2.0"
def __init__(self, api_key: str, api_base: str, model: str):
self.api_key = api_key
self.api_base = api_base.rstrip("/")
self.model = _QWEN_MODEL_ALIASES.get((model or "").lower(), model or self.DEFAULT_MODEL)
def generate(
self,
prompt: str,
*,
image_url=None,
quality: str | None = None, # not supported by Qwen image API
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
url = f"{self.api_base}/api/v1/services/aigc/multimodal-generation/generation"
# Build content array: 0..3 images then a single text part.
content: list[dict] = []
if image_url:
urls = image_url if isinstance(image_url, list) else [image_url]
for u in urls[:3]: # API caps at 3 reference images
if os.path.isfile(u):
data = _compress_image(_load_image(u))
mime = _guess_mime(data)
image_field = f"data:{mime};base64,{base64.b64encode(data).decode()}"
else:
image_field = u
content.append({"image": image_field})
content.append({"text": prompt})
payload: dict = {
"model": self.model,
"input": {"messages": [{"role": "user", "content": content}]},
}
# Map (size, aspect_ratio) → Qwen "W*H"
qwen_size = self._resolve_qwen_size(size, aspect_ratio)
if qwen_size:
payload["parameters"] = {"size": qwen_size}
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
if _HAS_REQUESTS:
resp = requests.post(url, headers=headers, json=payload, timeout=300)
if resp.status_code >= 400:
try:
body = resp.json()
msg = body.get("message") or body.get("error", {}).get("message") or resp.text
except Exception:
msg = resp.text or resp.reason
raise RuntimeError(f"API {resp.status_code}: {msg}")
result = resp.json()
else:
data = json.dumps(payload).encode()
req = Request(url, data=data, headers=headers, method="POST")
with urlopen(req, timeout=300) as r:
result = json.loads(r.read())
# Business-level errors arrive on HTTP 200 with a `code` field.
if result.get("code"):
raise RuntimeError(f"Qwen {result.get('code')}: {result.get('message')}")
paths: list[str] = []
choices = (result.get("output") or {}).get("choices") or []
for ch in choices:
for part in ((ch.get("message") or {}).get("content") or []):
u = part.get("image")
if u:
paths.append(_save_image(_load_image(u), output_dir))
if not paths:
raise RuntimeError(f"Qwen returned no image: {result}")
return paths
@staticmethod
def _resolve_qwen_size(size: str | None, aspect_ratio: str | None) -> str | None:
if not size and not aspect_ratio:
return None
if size and "x" in size.lower() and "*" not in size:
return size.lower().replace("x", "*")
if size and "*" in size:
return size
tier = (size or "1K").upper()
# Qwen supports 1K and 2K; clamp others
_QWEN_TIER_MAP = {"512": "1K", "3K": "2K", "4K": "2K"}
tier = _QWEN_TIER_MAP.get(tier, tier)
if tier not in ("1K", "2K"):
tier = "1K"
ratio = aspect_ratio or "1:1"
if (tier, ratio) in _QWEN_SIZE_TABLE:
return _QWEN_SIZE_TABLE[(tier, ratio)]
return _QWEN_SIZE_TABLE.get((tier, "1:1"))
# ---------------------------------------------------------------------------
# MiniMax provider (image-01 family)
# ---------------------------------------------------------------------------
# Friendly aliases → real MiniMax model id
_MINIMAX_MODEL_ALIASES = {
"minimax": "image-01",
"minimax-image": "image-01",
"minimax-image-01": "image-01",
}
_MINIMAX_SUPPORTED_RATIOS = {"1:1", "16:9", "4:3", "3:2", "2:3", "3:4", "9:16", "21:9"}
class MinimaxProvider(ImageProvider):
"""Provider for MiniMax image generation API (image-01)."""
DEFAULT_MODEL = "image-01"
def __init__(self, api_key: str, api_base: str, model: str):
self.api_key = api_key
self.api_base = api_base.rstrip("/")
self.model = _MINIMAX_MODEL_ALIASES.get((model or "").lower(), model or self.DEFAULT_MODEL)
def generate(
self,
prompt: str,
*,
image_url=None,
quality: str | None = None, # not supported by MiniMax
size: str | None = None,
aspect_ratio: str | None = None,
output_dir: str = ".",
) -> list[str]:
url = f"{self.api_base}/v1/image_generation"
payload: dict = {
"model": self.model,
"prompt": prompt,
"response_format": "base64",
}
# MiniMax accepts aspect_ratio directly; derive from pixels if needed.
ratio = aspect_ratio
if not ratio and size and "x" in size.lower():
ratio = _pixels_to_ratio(size)
if ratio and ratio in _MINIMAX_SUPPORTED_RATIOS:
payload["aspect_ratio"] = ratio
# Image-to-image uses subject_reference; accept URL or local file (→ base64).
if image_url:
urls = image_url if isinstance(image_url, list) else [image_url]
refs = []
for u in urls:
if os.path.isfile(u):
data = _compress_image(_load_image(u))
mime = _guess_mime(data)
image_file = f"data:{mime};base64,{base64.b64encode(data).decode()}"
else:
image_file = u
refs.append({"type": "character", "image_file": image_file})
payload["subject_reference"] = refs
headers = {
"Authorization": f"Bearer {self.api_key}",
"Content-Type": "application/json",
}
if _HAS_REQUESTS:
resp = requests.post(url, headers=headers, json=payload, timeout=300)
if resp.status_code >= 400:
try:
body = resp.json()
msg = body.get("base_resp", {}).get("status_msg") or body.get("error", {}).get("message") or resp.text
except Exception:
msg = resp.text or resp.reason
raise RuntimeError(f"API {resp.status_code}: {msg}")
result = resp.json()
else:
data = json.dumps(payload).encode()
req = Request(url, data=data, headers=headers, method="POST")
with urlopen(req, timeout=300) as r:
result = json.loads(r.read())
# MiniMax returns business errors inside base_resp even on HTTP 200.
base_resp = result.get("base_resp") or {}
if base_resp.get("status_code") not in (None, 0):
raise RuntimeError(f"MiniMax {base_resp.get('status_code')}: {base_resp.get('status_msg')}")
data_obj = result.get("data") or {}
b64_list = data_obj.get("image_base64") or []
urls_list = data_obj.get("image_urls") or []
paths: list[str] = []
for b64 in b64_list:
paths.append(_save_image(base64.b64decode(b64), output_dir))
for u in urls_list:
paths.append(_save_image(_load_image(u), output_dir))
if not paths:
raise RuntimeError(f"MiniMax returned no image: {result}")
return paths
# ---------------------------------------------------------------------------
# Provider factory
# ---------------------------------------------------------------------------
def _build_providers(model: str) -> list[tuple[str, ImageProvider]]:
"""Build an ordered list of (label, provider) to try."""
openai_key = os.environ.get("OPENAI_API_KEY", "")
openai_base = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1")
linkai_key = os.environ.get("LINKAI_API_KEY", "")
linkai_base = os.environ.get("LINKAI_API_BASE", "https://api.link-ai.tech")
# Model-prefix → preferred provider label.
# When the requested model matches a prefix, that provider is promoted to the
# front of the queue. All other configured providers still run as fallbacks.
_MODEL_PREFERRED_PROVIDER: list[tuple[tuple[str, ...], str]] = [
(("gpt-image",), "OpenAI"),
(("nano-banana", "gemini-"), "Gemini"),
(("seedream", "doubao-seedream"), "Seedream"),
(("qwen-image", "qwen"), "Qwen"),
(("minimax", "image-01"), "MiniMax"),
]
providers = []
if openai_key:
providers.append(("OpenAI", OpenAIProvider(api_key=openai_key, api_base=openai_base, model=model)))
if linkai_key:
providers.append(("LinkAI", LinkAIProvider(api_key=linkai_key, api_base=linkai_base, model=model)))
return providers
# Default global priority when the model has no preferred provider.
_DEFAULT_PROVIDER_ORDER = ["OpenAI", "Gemini", "Seedream", "Qwen", "MiniMax", "LinkAI"]
def _preferred_provider(model: str) -> str | None:
m = (model or "").lower()
for prefixes, label in _MODEL_PREFERRED_PROVIDER:
if m.startswith(prefixes):
return label
return None
def _build_providers(model: str) -> list[tuple[str, ImageProvider]]:
"""Build an ordered list of (label, provider) to try.
Behaviour:
1. All providers with a configured API key are added in the global
priority order: OpenAI → Gemini → Seedream → Qwen → MiniMax → LinkAI.
2. If `model` natively belongs to one of the providers AND that provider
is configured, it is promoted to the front so it gets the first
attempt with the right model id.
3. If the preferred provider is NOT configured (no API key), the model
id would 100% fail on every other backend, so we drop the explicit
model and fall back to automatic routing — every provider then uses
its own DEFAULT_MODEL.
"""
keys = {
"OpenAI": os.environ.get("OPENAI_API_KEY", ""),
"Gemini": os.environ.get("GEMINI_API_KEY", ""),
"Seedream": os.environ.get("ARK_API_KEY", ""),
"Qwen": os.environ.get("DASHSCOPE_API_KEY", ""),
"MiniMax": os.environ.get("MINIMAX_API_KEY", ""),
"LinkAI": os.environ.get("LINKAI_API_KEY", ""),
}
bases = {
"OpenAI": os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1"),
"Gemini": os.environ.get("GEMINI_API_BASE", "https://generativelanguage.googleapis.com"),
"Seedream": os.environ.get("ARK_API_BASE", "https://ark.cn-beijing.volces.com/api/v3"),
"Qwen": os.environ.get("DASHSCOPE_API_BASE", "https://dashscope.aliyuncs.com"),
"MiniMax": os.environ.get("MINIMAX_API_BASE", "https://api.minimaxi.com"),
"LinkAI": os.environ.get("LINKAI_API_BASE", "https://api.link-ai.tech"),
}
pref = _preferred_provider(model)
# If a specific model is requested and its native provider has no key,
# other backends won't recognise the id → reset to auto routing.
if pref and not keys.get(pref):
model = ""
pref = None
factories = {
"OpenAI": OpenAIProvider,
"Gemini": GeminiProvider,
"Seedream": SeedreamProvider,
"Qwen": QwenProvider,
"MiniMax": MinimaxProvider,
"LinkAI": LinkAIProvider,
}
available: dict[str, ImageProvider] = {}
for label, key in keys.items():
if key:
available[label] = factories[label](api_key=key, api_base=bases[label], model=model)
# When a specific model is pinned, only try its native provider — other
# backends won't recognise the model id so retrying them is pointless.
if pref and pref in available:
return [(pref, available[pref])]
# Auto routing: try every configured provider in priority order.
ordered: list[str] = []
for label in _DEFAULT_PROVIDER_ORDER:
if label in available:
ordered.append(label)
return [(label, available[label]) for label in ordered]
# ---------------------------------------------------------------------------
@@ -447,40 +1105,59 @@ def main():
print(json.dumps({"error": "Missing required parameter: prompt"}))
sys.exit(1)
model = args.get("model", "gpt-image-2")
# Model resolution priority:
# 1. Explicit `model` in the call args (agent / user override)
# 2. SKILL_IMAGE_GENERATION_MODEL env var (synced from
# config["skill"]["image-generation"]["model"] at startup)
# 3. None → fall back to automatic provider routing (try every
# provider with a configured API key in global priority order)
model = args.get("model") or os.environ.get("SKILL_IMAGE_GENERATION_MODEL") or ""
quality = args.get("quality")
raw_size = args.get("size")
size = args.get("size")
aspect_ratio = args.get("aspect_ratio")
image_url = args.get("image_url")
resolved_size = resolve_size(raw_size, aspect_ratio)
output_dir = os.environ.get("IMAGE_OUTPUT_DIR", os.path.join(os.getcwd(), "images"))
providers = _build_providers(model)
if not providers:
target = f"model '{model}'" if model else "image generation"
print(json.dumps({
"error": "No API key configured. Please set OPENAI_API_KEY or LINKAI_API_KEY via env_config tool, then try again."
"error": (
f"No API key configured for {target}. "
"Set at least one of OPENAI_API_KEY / GEMINI_API_KEY / "
"ARK_API_KEY / DASHSCOPE_API_KEY / MINIMAX_API_KEY / "
"LINKAI_API_KEY via the env_config tool, then try again."
)
}, ensure_ascii=False))
sys.exit(1)
import time
errors = []
for label, provider in providers:
try:
print(f"[image-generation] Trying {label} (model={model})...", file=sys.stderr)
attempt_model = getattr(provider, "model", model) or "auto"
print(f"[image-generation] Trying {label} (model={attempt_model})...", file=sys.stderr)
t0 = time.time()
paths = provider.generate(
prompt,
image_url=image_url,
quality=quality,
size=resolved_size,
size=size,
aspect_ratio=aspect_ratio,
output_dir=output_dir,
)
elapsed = time.time() - t0
print(f"[image-generation] ✅ {label} succeeded in {elapsed:.1f}s", file=sys.stderr)
result = {"images": [{"url": p} for p in paths]}
# Resolved model id (after alias expansion) actually sent to the API
actual_model = getattr(provider, "model", model)
print(
f"[image-generation] ✅ {label} succeeded in {elapsed:.1f}s "
f"(model={actual_model})",
file=sys.stderr,
)
result = {
"model": actual_model,
"images": [{"url": p} for p in paths],
}
print(json.dumps(result, ensure_ascii=False))
return
except Exception as e:
@@ -493,8 +1170,10 @@ def main():
"error": f"All providers failed — {hint}. "
"This is likely an API key or base URL configuration issue. "
"Do NOT retry with the same parameters. "
"Ask the user to verify their OPENAI_API_KEY / OPENAI_API_BASE "
"(or LINKAI_API_KEY / LINKAI_API_BASE) settings via env_config."
"Ask the user to verify their API key / base URL "
"(OPENAI_API_KEY, GEMINI_API_KEY, ARK_API_KEY, "
"DASHSCOPE_API_KEY, MINIMAX_API_KEY, or LINKAI_API_KEY) "
"via env_config."
}, ensure_ascii=False))
sys.exit(1)