Agent-generated images are sent as IMAGE_URL with a file:// path, but the wechatmp channel always used requests.get, which fails on file:// with InvalidSchema. Now read local files directly (file:// or local path) and fall back to HTTP download for remote URLs, in both passive and active reply modes.
Co-authored-by: Cursor <cursoragent@cursor.com>
Previously the passive reply only drained the cache after the agent task fully finished, so for long multi-turn tasks the user could not retrieve already-cached intermediate segments. Now return cached segments as soon as they are available, even while the task is still running; the next user message fetches the rest.
Co-authored-by: Cursor <cursoragent@cursor.com>
In subscription account passive reply mode, WeChat allows only one reply per request. Multi-turn agent output was cached as separate entries, forcing the user to send an extra message to fetch each one. Now drain and merge all consecutive cached text segments into a single reply; media still returns one at a time.
Co-authored-by: Cursor <cursoragent@cursor.com>
Search tool now supports 4 backends with unified output (bocha,
qianfan, zhipu, linkai) and a routing layer:
- strategy 'auto' (default): pick first configured in canonical order
bocha > qianfan > zhipu > linkai
- strategy 'fixed': pin a specific provider
- agent may pass `provider` to override per-call (only exposed when
≥2 providers configured + auto strategy)
- Clear NOT_SUPPORT_REPLYTYPE on weixin, wecom_bot, dingtalk so TTS replies
are actually synthesized for these channels.
- Wire desire_rtype=VOICE in weixin and wecom_bot _compose_context so the
always_reply_voice / voice_reply_voice toggles take effect.
- DingTalk: send native sampleAudio (mediaId + duration). The media API
only accepts ogg/amr, so convert TTS mp3/wav to amr on the fly.
- WeCom Bot: send native voice msgtype via ws (respond + active push),
converting TTS audio to amr before upload.
- Weixin (ilink): no outbound voice item, deliver TTS as a file attachment.
- chat_channel: when a TEXT reply is converted to VOICE, stash original
text in context["voice_reply_text"] and send a text bubble before the
voice reply. Skipped for feishu_streamed and wechatcom_app, which
already render text alongside the voice.
When reloading a conversation, failed tool calls incorrectly showed checkmark instead of X because the is_error field was lost in the history rendering pipeline. Propagate is_error from DB extraction through to the frontend rendering to match the live SSE behavior.
Boot MCP servers (npx/uvx) on a background thread instead of blocking
agent init. Built-in tools serve traffic immediately while MCP comes
online; each new agent reads whatever is ready at creation time.
Idempotent via _mcp_loaded flag — concurrent sessions never re-fork
subprocesses. Per-server failures are isolated and warmup is triggered
in app.py so loading overlaps with channel startup.