--- title: GLM description: Zhipu AI GLM model configuration (Text / Image Understanding / Speech-to-Text / Embedding) --- Zhipu AI supports text chat, image understanding, speech-to-text (ASR), and embedding. A single `zhipu_ai_api_key` enables all capabilities. All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file. ## Text Chat ```json { "model": "glm-5.1", "zhipu_ai_api_key": "YOUR_API_KEY" } ``` | Parameter | Description | | --- | --- | | `model` | Can be `glm-5.1`, `glm-5-turbo`, `glm-5`, `glm-4.7`, `glm-4-plus`, `glm-4-flash`, `glm-4-air`, etc. See [model codes](https://bigmodel.cn/dev/api/normal-model/glm-4) | | `zhipu_ai_api_key` | Create one in the [Zhipu AI Console](https://www.bigmodel.cn/usercenter/proj-mgmt/apikeys) | | `zhipu_ai_api_base` | Optional, defaults to `https://open.bigmodel.cn/api/paas/v4` | ## Image Understanding Zhipu's chat models (`glm-5.1`, `glm-5-turbo`, etc.) do not support vision; vision calls are uniformly routed to `glm-5v-turbo`. Once `zhipu_ai_api_key` is configured, the Agent's Vision tool automatically uses this model, with no need to specify it explicitly in the configuration file. ## Speech-to-Text (ASR) ```json { "voice_to_text": "zhipu", "voice_to_text_model": "glm-asr-2512" } ``` | Parameter | Description | | --- | --- | | `voice_to_text` | Set to `zhipu` to enable Zhipu ASR | | `voice_to_text_model` | Optional, defaults to `glm-asr-2512` | Credentials are automatically reused from `zhipu_ai_api_key`. Audio files should be smaller than 25MB; oversized files may be rejected by the server. ## Embedding ```json { "embedding_provider": "zhipu", "embedding_model": "embedding-3" } ``` Available models: `embedding-3`, `embedding-2`. After changing the embedding, run `/memory rebuild-index` to rebuild the index.