chatgpt-on-wechat/docs/models/openai.mdx

---
title: OpenAI
description: OpenAI model configuration (Text / Vision / Image / Speech / Embedding)
---

OpenAI offers the most complete coverage and can simultaneously serve text chat, vision understanding, image generation, speech-to-text (ASR), text-to-speech (TTS), and embedding. A single `open_ai_api_key` lets the Agent use all of these capabilities.

<Tip>
  All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file.
</Tip>


## Text Chat

```json
{
  "model": "gpt-5.5",
  "open_ai_api_key": "YOUR_API_KEY",
  "open_ai_api_base": "https://api.openai.com/v1"
}
```

| Parameter | Description |
| --- | --- |
| `model` | Same as OpenAI's [model parameter](https://platform.openai.com/docs/models); supports `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`, the `gpt-5` series, `gpt-4.1`, the o-series, etc. Agent mode defaults to `gpt-5.5`; use `gpt-5.4` for better cost-efficiency |
| `open_ai_api_key` | Create one on the [OpenAI Platform](https://platform.openai.com/api-keys) |
| `open_ai_api_base` | Optional; change it to access a third-party proxy |
| `bot_type` | Not required when using OpenAI's official models; set to `openai` when accessing other vendors via the compatible protocol |

## Image Understanding

OpenAI models like `gpt-5.5`, `gpt-5.4`, `gpt-4o`, and `gpt-4.1` natively support vision. Once `open_ai_api_key` is configured, the Agent's Vision tool automatically uses the main model to recognize images. If the main model does not support vision or you want to specify it explicitly, set it in the configuration file:

```json
{
  "tools": {
    "vision": {
      "model": "gpt-5.4-mini"
    }
  }
}
```

Supported Vision models: `gpt-5.5`, `gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`, `gpt-5`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4o`.

## Image Generation

Specify the image generation model in the configuration file; the Agent automatically routes image generation skill calls to OpenAI:

```json
{
  "skills": {
    "image-generation": {
      "model": "gpt-image-2"
    }
  }
}
```

Supported image generation models: `gpt-image-2`, `gpt-image-1`.

## Speech-to-Text (ASR)

```json
{
  "voice_to_text": "openai",
  "voice_to_text_model": "gpt-4o-mini-transcribe"
}
```

| Parameter | Description |
| --- | --- |
| `voice_to_text` | Set to `openai` to enable OpenAI speech-to-text |
| `voice_to_text_model` | Optional, defaults to `gpt-4o-mini-transcribe`; can also be `gpt-4o-transcribe`, `whisper-1` |

Credentials are automatically reused from `open_ai_api_key`.

## Text-to-Speech (TTS)

```json
{
  "text_to_voice": "openai",
  "text_to_voice_model": "tts-1",
  "tts_voice_id": "alloy"
}
```

| Parameter | Description |
| --- | --- |
| `text_to_voice_model` | `tts-1`, `tts-1-hd`, `gpt-4o-mini-tts` |
| `tts_voice_id` | Voices: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`, `ash`, `ballad`, `coral`, `sage`, `verse` |

## Embedding

```json
{
  "embedding_provider": "openai",
  "embedding_model": "text-embedding-3-small"
}
```

Available models: `text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002`. After changing the embedding, run `/memory rebuild-index` to rebuild the index.