mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-02 00:57:41 +08:00
113 lines
3.2 KiB
Plaintext
113 lines
3.2 KiB
Plaintext
---
|
|
title: Qwen
|
|
description: Qwen model configuration (Text / Image Understanding / Image Generation / Speech-to-Text / Text-to-Speech / Embedding)
|
|
---
|
|
|
|
Qwen (Alibaba DashScope / Bailian) is one of the most fully-featured vendors. Text, image understanding, image generation, speech-to-text, text-to-speech, and embedding can all be enabled with a single `dashscope_api_key`.
|
|
|
|
<Tip>
|
|
All capabilities below can be configured in one place via the "Model Management" page in the Web Console, with no need to manually edit the configuration file.
|
|
</Tip>
|
|
|
|
## Text Chat
|
|
|
|
```json
|
|
{
|
|
"model": "qwen3.6-plus",
|
|
"dashscope_api_key": "YOUR_API_KEY"
|
|
}
|
|
```
|
|
|
|
| Parameter | Description |
|
|
| --- | --- |
|
|
| `model` | Can be `qwen3.6-plus`, `qwen3.7-max`, `qwen3.5-plus`, `qwen3-max`, `qwen-max`, `qwen-plus`, `qwen-turbo`, `qwq-plus`, etc. |
|
|
| `dashscope_api_key` | Create one in the [Bailian Console](https://bailian.console.aliyun.com/?tab=model#/api-key); see the [official docs](https://bailian.console.aliyun.com/?tab=api#/api) |
|
|
|
|
## Image Understanding
|
|
|
|
Once `dashscope_api_key` is configured, the Agent's Vision tool automatically calls Qwen's vision models to recognize images. Models like `qwen3-max` / `qwen3.5-plus` / `qwen3.6-plus` are already multimodal; if the main model is text-only (e.g. `qwen-turbo`), it automatically falls back to `qwen-vl-max`.
|
|
|
|
To manually specify a Vision model:
|
|
|
|
```json
|
|
{
|
|
"tools": {
|
|
"vision": {
|
|
"model": "qwen3.6-plus"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Supported models: `qwen3.6-plus`, `qwen3.5-plus`, `qwen3-max`.
|
|
|
|
## Image Generation
|
|
|
|
```json
|
|
{
|
|
"skills": {
|
|
"image-generation": {
|
|
"model": "qwen-image-2.0"
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
Available models: `qwen-image-2.0`, `qwen-image-2.0-pro`.
|
|
|
|
## Speech-to-Text (ASR)
|
|
|
|
```json
|
|
{
|
|
"voice_to_text": "dashscope",
|
|
"voice_to_text_model": "qwen3-asr-flash"
|
|
}
|
|
```
|
|
|
|
| Parameter | Description |
|
|
| --- | --- |
|
|
| `voice_to_text` | Set to `dashscope` to enable Qwen ASR |
|
|
| `voice_to_text_model` | Optional, defaults to `qwen3-asr-flash` |
|
|
|
|
Credentials are automatically reused from `dashscope_api_key`. A single audio segment should be smaller than 10MB and no longer than 300 seconds.
|
|
|
|
## Text-to-Speech (TTS)
|
|
|
|
```json
|
|
{
|
|
"text_to_voice": "dashscope",
|
|
"text_to_voice_model": "qwen3-tts-flash",
|
|
"tts_voice_id": "Cherry"
|
|
}
|
|
```
|
|
|
|
| Parameter | Description |
|
|
| --- | --- |
|
|
| `text_to_voice_model` | Optional, defaults to `qwen3-tts-flash`; covers Mandarin, dialects, and major foreign languages |
|
|
| `tts_voice_id` | Voice ID; see the common list below |
|
|
|
|
Common voice examples:
|
|
|
|
| Voice ID | Description |
|
|
| --- | --- |
|
|
| `Cherry` | Qianyue · Sunny Female Voice |
|
|
| `Serena` | Suyao · Gentle Female Voice |
|
|
| `Ethan` | Chenxu · Sunny Male Voice |
|
|
| `Chelsie` | Qianxue · Anime Girl |
|
|
| `Dylan` | Beijing Dialect · Xiaodong |
|
|
| `Rocky` | Cantonese · Aqiang |
|
|
| `Sunny` | Sichuan Dialect · Qing'er |
|
|
|
|
The full voice list (Mandarin / regional dialects / bilingual, etc.) can be selected visually in the Web Console under "Model Management → Text-to-Speech".
|
|
|
|
## Embedding
|
|
|
|
```json
|
|
{
|
|
"embedding_provider": "dashscope",
|
|
"embedding_model": "text-embedding-v4"
|
|
}
|
|
```
|
|
|
|
The default model is `text-embedding-v4`. After changing the embedding, run `/memory rebuild-index` to rebuild the index.
|