feat(memory): async LLM context summary injection on trim

- Unified flush + context injection into a single async LLM call (flush_from_messages accepts context_summary_callback) - Fixed response parsing bug: handle generator returns and Claude-format dicts from bot.call_with_tools, which previously caused all LLM summaries to silently fail (falling back to rule-based extraction) - Removed standalone context summary prompts and methods; reuse the existing [DAILY]/[MEMORY] summarization pipeline - Updated docs (zh/en/ja) to reflect the new injection behavior
2026-07-19 04:37:28 +08:00 · 2026-04-13 20:13:05 +08:00
parent da97e948ca
commit 33cf1bc4c3
9 changed files with 187 additions and 67 deletions
--- a/docs/memory/context.mdx
+++ b/docs/memory/context.mdx
@@ -39,14 +39,15 @@ description: 对话上下文 — 消息管理、压缩策略和上下文操作

 - 裁剪 **最早一半** 的完整轮次（保证工具调用链的完整性）
 - 被裁剪的消息会通过 LLM 总结后**写入当天的日级记忆文件**
- 剩余轮次保持不变
+- LLM 摘要完成后，同时将摘要**注入到保留消息的第一条用户消息开头**，帮助模型在后续对话中保持上下文连贯性
+- 摘要注入在后台异步完成，不阻塞当前回复；注入的摘要在下一轮对话时生效

 ### 3. Token 预算裁剪

 裁剪轮次后，如果 token 数仍超出预算：

 - **轮次 < 5 时**：对所有轮次进行**文本压缩** — 每轮只保留第一条用户文本和最后一条 Agent 回复，去掉中间的工具调用链
- **轮次 ≥ 5 时**：再次裁剪**前半轮次**，被丢弃内容同样写入记忆
+- **轮次 ≥ 5 时**：再次裁剪**前半轮次**，被丢弃内容同样写入记忆并注入上下文摘要

 ### 4. 溢出应急处理