feat(memory): async LLM context summary injection on trim

- Unified flush + context injection into a single async LLM call (flush_from_messages accepts context_summary_callback) - Fixed response parsing bug: handle generator returns and Claude-format dicts from bot.call_with_tools, which previously caused all LLM summaries to silently fail (falling back to rule-based extraction) - Removed standalone context summary prompts and methods; reuse the existing [DAILY]/[MEMORY] summarization pipeline - Updated docs (zh/en/ja) to reflect the new injection behavior
2026-07-18 12:07:15 +08:00 · 2026-04-13 20:13:05 +08:00
parent da97e948ca
commit 33cf1bc4c3
9 changed files with 187 additions and 67 deletions
--- a/docs/memory/context.mdx
+++ b/docs/memory/context.mdx
@@ -39,14 +39,15 @@ description: 对话上下文 — 消息管理、压缩策略和上下文操作

 - 裁剪 **最早一半** 的完整轮次（保证工具调用链的完整性）
 - 被裁剪的消息会通过 LLM 总结后**写入当天的日级记忆文件**
- 剩余轮次保持不变
+- LLM 摘要完成后，同时将摘要**注入到保留消息的第一条用户消息开头**，帮助模型在后续对话中保持上下文连贯性
+- 摘要注入在后台异步完成，不阻塞当前回复；注入的摘要在下一轮对话时生效

 ### 3. Token 预算裁剪

 裁剪轮次后，如果 token 数仍超出预算：

 - **轮次 < 5 时**：对所有轮次进行**文本压缩** — 每轮只保留第一条用户文本和最后一条 Agent 回复，去掉中间的工具调用链
- **轮次 ≥ 5 时**：再次裁剪**前半轮次**，被丢弃内容同样写入记忆
+- **轮次 ≥ 5 时**：再次裁剪**前半轮次**，被丢弃内容同样写入记忆并注入上下文摘要

 ### 4. 溢出应急处理

--- a/docs/memory/index.mdx
+++ b/docs/memory/index.mdx
@@ -19,7 +19,7 @@ description: CowAgent 的长期记忆系统 — 文件持久化、自动写入

 Agent 通过以下机制自动将对话内容持久化为长期记忆：

- **上下文裁剪时** — 当对话轮次或 token 超出配置上限时，裁剪最早一半的上下文，使用 LLM 将被裁剪的内容总结为关键信息写入当天记忆文件
+- **上下文裁剪时** — 当对话轮次或 token 超出配置上限时，裁剪最早一半的上下文，使用 LLM 将被裁剪的内容总结为关键信息写入当天记忆文件，并将摘要异步注入到保留的上下文中，帮助模型保持对话连贯性
 - **每日定时总结** — 每天 23:55 自动触发一次全量总结，防止低活跃日无记忆留存（内容无变化时自动跳过）
 - **API 上下文溢出时** — 当模型 API 返回上下文溢出错误时，紧急保存当前对话摘要