In subscription account passive reply mode, WeChat allows only one reply per request. Multi-turn agent output was cached as separate entries, forcing the user to send an extra message to fetch each one. Now drain and merge all consecutive cached text segments into a single reply; media still returns one at a time.
Co-authored-by: Cursor <cursoragent@cursor.com>
Search tool now supports 4 backends with unified output (bocha,
qianfan, zhipu, linkai) and a routing layer:
- strategy 'auto' (default): pick first configured in canonical order
bocha > qianfan > zhipu > linkai
- strategy 'fixed': pin a specific provider
- agent may pass `provider` to override per-call (only exposed when
≥2 providers configured + auto strategy)
- Clear NOT_SUPPORT_REPLYTYPE on weixin, wecom_bot, dingtalk so TTS replies
are actually synthesized for these channels.
- Wire desire_rtype=VOICE in weixin and wecom_bot _compose_context so the
always_reply_voice / voice_reply_voice toggles take effect.
- DingTalk: send native sampleAudio (mediaId + duration). The media API
only accepts ogg/amr, so convert TTS mp3/wav to amr on the fly.
- WeCom Bot: send native voice msgtype via ws (respond + active push),
converting TTS audio to amr before upload.
- Weixin (ilink): no outbound voice item, deliver TTS as a file attachment.
- chat_channel: when a TEXT reply is converted to VOICE, stash original
text in context["voice_reply_text"] and send a text bubble before the
voice reply. Skipped for feishu_streamed and wechatcom_app, which
already render text alongside the voice.
When reloading a conversation, failed tool calls incorrectly showed checkmark instead of X because the is_error field was lost in the history rendering pipeline. Propagate is_error from DB extraction through to the frontend rendering to match the live SSE behavior.
Boot MCP servers (npx/uvx) on a background thread instead of blocking
agent init. Built-in tools serve traffic immediately while MCP comes
online; each new agent reads whatever is ready at creation time.
Idempotent via _mcp_loaded flag — concurrent sessions never re-fork
subprocesses. Per-server failures are isolated and warmup is triggered
in app.py so loading overlaps with channel startup.
- rewrite streaming reply to official cardkit v2.0 API (default on, auto-fallback)
- fix Whisper hallucination: bump ASR sample rate to 16k, pass language=zh
- fix lock-over-IO and tmp file cleanup from #2791
- drop deprecated feishu_bot_name; quiet unknown-key warnings
- docs: cardkit permission and feishu_stream_reply usage