---
title: GLM
description: Zhipu AI GLM model configuration (Text / Image Understanding / Speech-to-Text / Embedding)
---

Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single `zhipu_ai_api_key` enables all capabilities.

<Tip>
  All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file.
</Tip>

## Text Chat

```json
{
  "model": "glm-5.1",
  "zhipu_ai_api_key": "YOUR_API_KEY"
}
```

| Parameter | Description |
| --- | --- |
| `model` | Can be `glm-5.1`, `glm-5-turbo`, `glm-5`, `glm-4.7`, `glm-4-plus`, `glm-4-flash`, `glm-4-air`, etc. See [model codes](https://bigmodel.cn/dev/api/normal-model/glm-4) |
| `zhipu_ai_api_key` | Create one in the [Zhipu AI Console](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) |
| `zhipu_ai_api_base` | Optional, defaults to `https://open.bigmodel.cn/api/paas/v4` |

## Image Understanding

Zhipu's chat models (`glm-5.1`, `glm-5-turbo`, etc.) do not support vision; vision calls are uniformly routed to `glm-5v-turbo`. Once `zhipu_ai_api_key` is configured, the Agent's Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file.

## Speech-to-Text (ASR)

```json
{
  "voice_to_text": "zhipu",
  "voice_to_text_model": "glm-asr-2512"
}
```

| Parameter | Description |
| --- | --- |
| `voice_to_text` | Set to `zhipu` to enable Zhipu ASR |
| `voice_to_text_model` | Optional, defaults to `glm-asr-2512` |

Credentials are automatically reused from `zhipu_ai_api_key`. Audio files should be smaller than 25MB; oversized files may be rejected by the server.

## Embedding

```json
{
  "embedding_provider": "zhipu",
  "embedding_model": "embedding-3"
}
```

Available models: `embedding-3`, `embedding-2`. After changing the embedding, run `/memory rebuild-index` to rebuild the index.