feat(skill): support gpt-image-2 in image generation skill

2026-07-17 11:07:11 +08:00 · 2026-04-22 20:39:49 +08:00
parent 2c13e1b923
commit 81e8bb62ae
9 changed files with 794 additions and 33 deletions
--- a/skills/image-generation/SKILL.md
+++ b/skills/image-generation/SKILL.md
@@ -0,0 +1,124 @@
+---
+name: image-generation
+description: Generate or edit images from text prompts. Use when the user asks to create, draw, design, or edit an image, illustration, photo, icon, poster, or any visual content.
+metadata:
+  cowagent:
+    requires:
+      anyEnv:
+        - OPENAI_API_KEY
+        - LINKAI_API_KEY
+---
+
+# Image Generation
+
+Generate and edit images using AI models (GPT-Image-2, GPT-Image-1, etc.).
+
+## Usage
+
+Run `scripts/generate.py` with a JSON argument. The path is relative to this skill's `base_dir`.
+
+```bash
+python <base_dir>/scripts/generate.py '<json_args>'
+```
+
+**Set bash timeout to at least 300 seconds**, as image generation can take 30–200s depending on quality/size.
+
+### Parameters
+
+| Parameter | Type | Required | Default | Description |
+|-----------|------|----------|---------|-------------|
+| `prompt` | string | yes | — | Image description |
+| `model` | string | no | `gpt-image-2` | Model name (`gpt-image-2`, `gpt-image-1`) |
+| `image_url` | string / list | no | null | Input image(s) for editing: local file path or URL |
+| `quality` | string | no | auto | `low` / `medium` / `high`; omit to let the model choose |
+| `size` | string | no | auto | `1K`/`2K`/`4K`, pixel value (`1024x1024`), or omit to let the model choose |
+| `aspect_ratio` | string | no | null | `1:1` / `3:2` / `2:3` / `16:9` / `9:16` |
+
+### Example — generate
+
+```bash
+python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut floating in space"}'
+```
+
+With explicit quality/size:
+
+```bash
+python <base_dir>/scripts/generate.py '{"prompt": "A corgi astronaut", "quality": "low", "size": "1K", "aspect_ratio": "1:1"}'
+```
+
+### Important: Editing vs Generating
+
+When the user asks to **edit, modify, or improve an existing image**, you need to pass the original image via `image_url`. Prefer passing **local file paths** directly — the script handles file reading internally. Without `image_url`, the script generates a brand-new image instead of editing.
+
+### Example — edit (image-to-image)
+
+Local file (preferred):
+
+```bash
+python <base_dir>/scripts/generate.py '{"prompt": "Add a Santa hat to the dog", "image_url": "/path/to/dog.png"}'
+```
+
+URL:
+
+```bash
+python <base_dir>/scripts/generate.py '{"prompt": "Make the background blue", "image_url": "https://example.com/photo.png"}'
+```
+
+### Output
+
+Prints JSON to stdout:
+
+```json
+{
+  "images": [
+    {"url": "/path/to/output.png"}
+  ]
+}
+```
+
+After success, display the image to the user. You can either embed it in markdown (`![description](/path/to/output.png)`) or use the `send` tool.
+
+On error:
+
+```json
+{
+  "error": "error message"
+}
+```
+
+### Environment Variables
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| `OPENAI_API_KEY` | yes (unless using LinkAI) | OpenAI API key |
+| `OPENAI_API_BASE` | no | Custom API base URL (default: `https://api.openai.com/v1`) |
+| `LINKAI_API_KEY` | alt | LinkAI API key (used when `OPENAI_API_KEY` is absent) |
+| `LINKAI_API_BASE` | no | LinkAI API base URL |
+
+### Size + Aspect Ratio Resolution
+
+`size` and `aspect_ratio` are combined to determine the actual pixel dimensions:
+
+| size | aspect_ratio | pixels |
+|------|-------------|--------|
+| `1K` | `1:1` | 1024×1024 |
+| `1K` | `3:2` | 1536×1024 |
+| `1K` | `2:3` | 1024×1536 |
+| `2K` | `1:1` | 2048×2048 |
+| `2K` | `16:9` | 2048×1152 |
+| `2K` | `9:16` | 1152×2048 |
+| `4K` | `16:9` | 3840×2160 |
+| `4K` | `9:16` | 2160×3840 |
+
+When an exact match isn't found, the script tries: exact match → upgrade to higher tier with same ratio → cross-tier match by ratio → tier default.
+
+### Error Handling
+
+The script internally tries all available providers (OpenAI → LinkAI) in sequence. If it returns an error, **do NOT retry with the same or similar parameters** — the failure is a configuration issue (wrong API key, unsupported API base, etc.), not a transient error. Instead, inform the user about the configuration problem and ask them to fix it (e.g. set the correct `OPENAI_API_KEY` / `OPENAI_API_BASE` via `env_config`), then retry after the configuration is updated.
+
+### Notes
+
+- HTTP timeout is 300s — high-resolution + high-quality generation can take over 200s.
+- When `quality` and `size` are omitted, the API uses `auto` — the model picks the best quality/size based on the prompt.
+- `quality=low` + `size=1K` is the fastest combination (~20s). Use when speed matters more than fidelity.
+- Input images for editing are auto-compressed to ≤ 4MB / longest edge ≤ 4096px.
--- a/skills/image-generation/scripts/generate.py
+++ b/skills/image-generation/scripts/generate.py
@@ -0,0 +1,503 @@
+#!/usr/bin/env python3
+"""
+Unified image generation script.
+
+Usage:
+    python generate.py '<json_args>'
+
+Supports GPT-Image-2 / GPT-Image-1 via the OpenAI-compatible Images API.
+Designed for easy extension to other providers (Gemini, etc.).
+
+Dependencies: requests (stdlib: json, sys, os, base64, io, abc, uuid, pathlib, urllib)
+"""
+
+import json
+import sys
+import os
+import base64
+import io
+import uuid
+import re
+from abc import ABC, abstractmethod
+from pathlib import Path
+from urllib.request import urlopen, Request
+from urllib.parse import urlparse
+from urllib.error import URLError
+
+try:
+    import requests
+
+    _HAS_REQUESTS = True
+except ImportError:
+    _HAS_REQUESTS = False
+
+
+# ---------------------------------------------------------------------------
+# Size / aspect-ratio resolution
+# ---------------------------------------------------------------------------
+
+_SIZE_TABLE = {
+    # (tier, ratio) -> "WxH"
+    ("1K", "1:1"): "1024x1024",
+    ("1K", "3:2"): "1536x1024",
+    ("1K", "2:3"): "1024x1536",
+    ("2K", "1:1"): "2048x2048",
+    ("2K", "16:9"): "2048x1152",
+    ("2K", "9:16"): "1152x2048",
+    ("4K", "16:9"): "3840x2160",
+    ("4K", "9:16"): "2160x3840",
+}
+
+_TIER_ORDER = ["1K", "2K", "4K"]
+_RATIO_DEFAULT = {"1K": "1:1", "2K": "1:1", "4K": "16:9"}
+
+_PIXEL_RE = re.compile(r"^\d+x\d+$")
+
+
+def resolve_size(size: str | None, aspect_ratio: str | None) -> str | None:
+    """Resolve (size, aspect_ratio) to a concrete 'WxH' string or None."""
+    if size and _PIXEL_RE.match(size):
+        return size
+    if size and size.lower() == "auto":
+        size = None
+    if not size and not aspect_ratio:
+        return None
+
+    tier = size.upper() if size else None
+    ratio = aspect_ratio
+
+    if tier and ratio:
+        key = (tier, ratio)
+        if key in _SIZE_TABLE:
+            return _SIZE_TABLE[key]
+        # Upgrade: try higher tiers with same ratio
+        start = _TIER_ORDER.index(tier) + 1 if tier in _TIER_ORDER else 0
+        for t in _TIER_ORDER[start:]:
+            if (t, ratio) in _SIZE_TABLE:
+                return _SIZE_TABLE[(t, ratio)]
+        # Cross-tier: any tier with this ratio
+        for t in _TIER_ORDER:
+            if (t, ratio) in _SIZE_TABLE:
+                return _SIZE_TABLE[(t, ratio)]
+        # Tier default
+        if tier in _RATIO_DEFAULT:
+            return _SIZE_TABLE.get((tier, _RATIO_DEFAULT[tier]))
+
+    if tier and not ratio:
+        default_ratio = _RATIO_DEFAULT.get(tier)
+        if default_ratio:
+            return _SIZE_TABLE.get((tier, default_ratio))
+
+    if ratio and not tier:
+        for t in _TIER_ORDER:
+            if (t, ratio) in _SIZE_TABLE:
+                return _SIZE_TABLE[(t, ratio)]
+
+    return None
+
+
+# ---------------------------------------------------------------------------
+# Image helpers
+# ---------------------------------------------------------------------------
+
+def _load_image(source: str) -> bytes:
+    """Load image from a local file path or URL."""
+    if os.path.isfile(source):
+        with open(source, "rb") as f:
+            return f.read()
+    if _HAS_REQUESTS:
+        resp = requests.get(source, timeout=60)
+        resp.raise_for_status()
+        return resp.content
+    req = Request(source)
+    with urlopen(req, timeout=60) as resp:
+        return resp.read()
+
+
+def _compress_image(data: bytes, max_bytes: int = 4 * 1024 * 1024, max_edge: int = 4096) -> bytes:
+    """Compress image to fit size/dimension limits. Requires Pillow only when needed."""
+    if len(data) <= max_bytes:
+        try:
+            from PIL import Image
+
+            img = Image.open(io.BytesIO(data))
+            w, h = img.size
+            if max(w, h) <= max_edge:
+                return data
+        except ImportError:
+            return data
+        except Exception:
+            return data
+
+    try:
+        from PIL import Image
+    except ImportError:
+        return data
+
+    img = Image.open(io.BytesIO(data))
+    w, h = img.size
+
+    if max(w, h) > max_edge:
+        ratio = max_edge / max(w, h)
+        w, h = int(w * ratio), int(h * ratio)
+        img = img.resize((w, h), Image.LANCZOS)
+
+    buf = io.BytesIO()
+    fmt = img.format or "PNG"
+    if fmt.upper() == "JPEG":
+        quality = 85
+        while True:
+            buf.seek(0)
+            buf.truncate()
+            img.save(buf, format="JPEG", quality=quality)
+            if buf.tell() <= max_bytes or quality <= 20:
+                break
+            quality -= 10
+    else:
+        img.save(buf, format=fmt)
+        if buf.tell() > max_bytes:
+            buf.seek(0)
+            buf.truncate()
+            img.save(buf, format="JPEG", quality=75)
+    return buf.getvalue()
+
+
+def _save_image(data: bytes, output_dir: str) -> str:
+    """Save image bytes to output_dir and return the path."""
+    os.makedirs(output_dir, exist_ok=True)
+    ext = "png"
+    if data[:3] == b"\xff\xd8\xff":
+        ext = "jpg"
+    elif data[:4] == b"RIFF":
+        ext = "webp"
+    filename = f"{uuid.uuid4().hex[:12]}.{ext}"
+    path = os.path.join(output_dir, filename)
+    with open(path, "wb") as f:
+        f.write(data)
+    return path
+
+
+# ---------------------------------------------------------------------------
+# Provider interface
+# ---------------------------------------------------------------------------
+
+class ImageProvider(ABC):
+    """Abstract base class for image generation providers."""
+
+    @abstractmethod
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url: str | list | None = None,
+        quality: str | None = None,
+        size: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        """Generate image(s) and return list of local file paths."""
+        ...
+
+
+# ---------------------------------------------------------------------------
+# OpenAI-compatible provider (gpt-image-2, gpt-image-1)
+# ---------------------------------------------------------------------------
+
+class OpenAIProvider(ImageProvider):
+    """Provider for OpenAI Image API (generations + edits)."""
+
+    def __init__(self, api_key: str, api_base: str, model: str):
+        self.api_key = api_key
+        self.api_base = api_base.rstrip("/")
+        self.model = model
+
+    def _headers(self) -> dict:
+        return {
+            "Authorization": f"Bearer {self.api_key}",
+        }
+
+    @staticmethod
+    def _raise_for_api_error(resp):
+        """Raise with server error details instead of bare HTTP status."""
+        if resp.status_code >= 400:
+            try:
+                body = resp.json()
+                msg = body.get("error", {}).get("message") or body.get("message") or resp.text
+            except Exception:
+                msg = resp.text or resp.reason
+            raise RuntimeError(f"API {resp.status_code}: {msg} (url: {resp.url})")
+
+    def _post_json(self, url: str, payload: dict) -> dict:
+        headers = {**self._headers(), "Content-Type": "application/json"}
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, json=payload, timeout=300)
+            self._raise_for_api_error(resp)
+            return resp.json()
+        data = json.dumps(payload).encode()
+        req = Request(url, data=data, headers=headers, method="POST")
+        with urlopen(req, timeout=300) as r:
+            return json.loads(r.read())
+
+    def _post_multipart(self, url: str, fields: dict, files: list[tuple]) -> dict:
+        """POST multipart/form-data using requests (or fall back to urllib)."""
+        headers = self._headers()
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, data=fields, files=files, timeout=300)
+            self._raise_for_api_error(resp)
+            return resp.json()
+        boundary = uuid.uuid4().hex
+        body = b""
+        for key, val in fields.items():
+            body += f"--{boundary}\r\nContent-Disposition: form-data; name=\"{key}\"\r\n\r\n{val}\r\n".encode()
+        for field_name, (filename, filedata, content_type) in files:
+            body += (
+                f"--{boundary}\r\n"
+                f"Content-Disposition: form-data; name=\"{field_name}\"; filename=\"{filename}\"\r\n"
+                f"Content-Type: {content_type}\r\n\r\n"
+            ).encode() + filedata + b"\r\n"
+        body += f"--{boundary}--\r\n".encode()
+        headers["Content-Type"] = f"multipart/form-data; boundary={boundary}"
+        req = Request(url, data=body, headers=headers, method="POST")
+        with urlopen(req, timeout=300) as r:
+            return json.loads(r.read())
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url=None,
+        quality: str | None = None,
+        size: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        if image_url:
+            return self._edit(prompt, image_url=image_url, quality=quality, size=size, output_dir=output_dir)
+        return self._create(prompt, quality=quality, size=size, output_dir=output_dir)
+
+    def _create(self, prompt: str, *, quality: str | None, size: str | None, output_dir: str) -> list[str]:
+        url = f"{self.api_base}/images/generations"
+        payload: dict = {
+            "model": self.model,
+            "prompt": prompt,
+        }
+        if quality:
+            payload["quality"] = quality
+        if size:
+            payload["size"] = size
+        result = self._post_json(url, payload)
+        return self._save_results(result, output_dir)
+
+    def _edit(
+        self,
+        prompt: str,
+        *,
+        image_url,
+        quality: str | None,
+        size: str | None,
+        output_dir: str,
+    ) -> list[str]:
+        urls = image_url if isinstance(image_url, list) else [image_url]
+        image_data_list = [_compress_image(_load_image(u)) for u in urls]
+
+        url = f"{self.api_base}/images/edits"
+
+        fields = {"model": self.model, "prompt": prompt}
+        if quality:
+            fields["quality"] = quality
+        if size:
+            fields["size"] = size
+
+        files = []
+        for i, img_bytes in enumerate(image_data_list):
+            ext = "png"
+            if img_bytes[:3] == b"\xff\xd8\xff":
+                ext = "jpg"
+            field_name = "image[]" if len(image_data_list) > 1 else "image"
+            files.append((field_name, (f"image_{i}.{ext}", img_bytes, f"image/{ext}")))
+
+        result = self._post_multipart(url, fields, files)
+        return self._save_results(result, output_dir)
+
+    @staticmethod
+    def _save_results(result: dict, output_dir: str) -> list[str]:
+        paths = []
+        for item in result.get("data", []):
+            if "b64_json" in item:
+                raw = base64.b64decode(item["b64_json"])
+                paths.append(_save_image(raw, output_dir))
+            elif "url" in item:
+                raw = _load_image(item["url"])
+                paths.append(_save_image(raw, output_dir))
+        return paths
+
+
+# ---------------------------------------------------------------------------
+# LinkAI provider (uses unified /v1/images/generations)
+# ---------------------------------------------------------------------------
+
+class LinkAIProvider(ImageProvider):
+    """Provider for LinkAI unified image generation API."""
+
+    def __init__(self, api_key: str, api_base: str, model: str):
+        self.api_key = api_key
+        self.api_base = api_base.rstrip("/")
+        self.model = model
+
+    def generate(
+        self,
+        prompt: str,
+        *,
+        image_url=None,
+        quality: str | None = None,
+        size: str | None = None,
+        output_dir: str = ".",
+    ) -> list[str]:
+        url = f"{self.api_base}/v1/images/generations"
+        payload: dict = {
+            "model": self.model,
+            "prompt": prompt,
+        }
+        if quality:
+            payload["quality"] = quality
+        if size:
+            payload["size"] = size
+        if image_url:
+            urls = image_url if isinstance(image_url, list) else [image_url]
+            resolved = []
+            for u in urls:
+                if os.path.isfile(u):
+                    data = _load_image(u)
+                    ext = u.rsplit(".", 1)[-1].lower() if "." in u else "png"
+                    mime = {"jpg": "image/jpeg", "jpeg": "image/jpeg", "webp": "image/webp"}.get(ext, "image/png")
+                    resolved.append(f"data:{mime};base64,{base64.b64encode(data).decode()}")
+                else:
+                    resolved.append(u)
+            payload["image_url"] = resolved if len(resolved) > 1 else resolved[0]
+
+        headers = {
+            "Authorization": f"Bearer {self.api_key}",
+            "Content-Type": "application/json",
+        }
+
+        if _HAS_REQUESTS:
+            resp = requests.post(url, headers=headers, json=payload, timeout=300)
+            if resp.status_code >= 400:
+                try:
+                    body = resp.json()
+                    msg = body.get("error", {}).get("message") or body.get("message") or resp.text
+                except Exception:
+                    msg = resp.text or resp.reason
+                raise RuntimeError(f"API {resp.status_code}: {msg}")
+            result = resp.json()
+        else:
+            data = json.dumps(payload).encode()
+            req = Request(url, data=data, headers=headers, method="POST")
+            with urlopen(req, timeout=300) as r:
+                result = json.loads(r.read())
+
+        if "error" in result:
+            raise RuntimeError(result["error"].get("message", str(result["error"])))
+
+        paths = []
+        for item in result.get("data", []):
+            if "url" in item:
+                raw = _load_image(item["url"])
+                paths.append(_save_image(raw, output_dir))
+            elif "b64_json" in item:
+                raw = base64.b64decode(item["b64_json"])
+                paths.append(_save_image(raw, output_dir))
+        return paths
+
+
+# ---------------------------------------------------------------------------
+# Provider factory
+# ---------------------------------------------------------------------------
+
+def _build_providers(model: str) -> list[tuple[str, ImageProvider]]:
+    """Build an ordered list of (label, provider) to try."""
+    openai_key = os.environ.get("OPENAI_API_KEY", "")
+    openai_base = os.environ.get("OPENAI_API_BASE", "https://api.openai.com/v1")
+    linkai_key = os.environ.get("LINKAI_API_KEY", "")
+    linkai_base = os.environ.get("LINKAI_API_BASE", "https://api.link-ai.tech")
+
+    providers = []
+    if openai_key:
+        providers.append(("OpenAI", OpenAIProvider(api_key=openai_key, api_base=openai_base, model=model)))
+    if linkai_key:
+        providers.append(("LinkAI", LinkAIProvider(api_key=linkai_key, api_base=linkai_base, model=model)))
+    return providers
+
+
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+
+def main():
+    if len(sys.argv) < 2:
+        print(json.dumps({"error": "Usage: python generate.py '<json_args>'"}))
+        sys.exit(1)
+
+    try:
+        args = json.loads(sys.argv[1])
+    except json.JSONDecodeError as e:
+        print(json.dumps({"error": f"Invalid JSON: {e}"}))
+        sys.exit(1)
+
+    prompt = args.get("prompt")
+    if not prompt:
+        print(json.dumps({"error": "Missing required parameter: prompt"}))
+        sys.exit(1)
+
+    model = args.get("model", "gpt-image-2")
+    quality = args.get("quality")
+    raw_size = args.get("size")
+    aspect_ratio = args.get("aspect_ratio")
+    image_url = args.get("image_url")
+
+    resolved_size = resolve_size(raw_size, aspect_ratio)
+
+    output_dir = os.environ.get("IMAGE_OUTPUT_DIR", os.path.join(os.getcwd(), "images"))
+
+    providers = _build_providers(model)
+    if not providers:
+        print(json.dumps({
+            "error": "No API key configured. Please set OPENAI_API_KEY or LINKAI_API_KEY via env_config tool, then try again."
+        }, ensure_ascii=False))
+        sys.exit(1)
+
+    import time
+
+    errors = []
+    for label, provider in providers:
+        try:
+            print(f"[image-generation] Trying {label} (model={model})...", file=sys.stderr)
+            t0 = time.time()
+            paths = provider.generate(
+                prompt,
+                image_url=image_url,
+                quality=quality,
+                size=resolved_size,
+                output_dir=output_dir,
+            )
+            elapsed = time.time() - t0
+            print(f"[image-generation] ✅ {label} succeeded in {elapsed:.1f}s", file=sys.stderr)
+            result = {"images": [{"url": p} for p in paths]}
+            print(json.dumps(result, ensure_ascii=False))
+            return
+        except Exception as e:
+            elapsed = time.time() - t0
+            print(f"[image-generation] ❌ {label} failed in {elapsed:.1f}s: {e}", file=sys.stderr)
+            errors.append(f"{label}: {e}")
+
+    hint = " | ".join(errors)
+    print(json.dumps({
+        "error": f"All providers failed — {hint}. "
+                 "This is likely an API key or base URL configuration issue. "
+                 "Do NOT retry with the same parameters. "
+                 "Ask the user to verify their OPENAI_API_KEY / OPENAI_API_BASE "
+                 "(or LINKAI_API_KEY / LINKAI_API_BASE) settings via env_config."
+    }, ensure_ascii=False))
+    sys.exit(1)
+
+
+if __name__ == "__main__":
+    main()