mirror of
https://github.com/zhayujie/chatgpt-on-wechat.git
synced 2026-06-03 19:17:10 +08:00
feat(memory): async LLM context summary injection on trim
- Unified flush + context injection into a single async LLM call (flush_from_messages accepts context_summary_callback) - Fixed response parsing bug: handle generator returns and Claude-format dicts from bot.call_with_tools, which previously caused all LLM summaries to silently fail (falling back to rule-based extraction) - Removed standalone context summary prompts and methods; reuse the existing [DAILY]/[MEMORY] summarization pipeline - Updated docs (zh/en/ja) to reflect the new injection behavior
This commit is contained in:
@@ -39,14 +39,15 @@ description: 对话上下文 — 消息管理、压缩策略和上下文操作
|
||||
|
||||
- 裁剪 **最早一半** 的完整轮次(保证工具调用链的完整性)
|
||||
- 被裁剪的消息会通过 LLM 总结后**写入当天的日级记忆文件**
|
||||
- 剩余轮次保持不变
|
||||
- LLM 摘要完成后,同时将摘要**注入到保留消息的第一条用户消息开头**,帮助模型在后续对话中保持上下文连贯性
|
||||
- 摘要注入在后台异步完成,不阻塞当前回复;注入的摘要在下一轮对话时生效
|
||||
|
||||
### 3. Token 预算裁剪
|
||||
|
||||
裁剪轮次后,如果 token 数仍超出预算:
|
||||
|
||||
- **轮次 < 5 时**:对所有轮次进行**文本压缩** — 每轮只保留第一条用户文本和最后一条 Agent 回复,去掉中间的工具调用链
|
||||
- **轮次 ≥ 5 时**:再次裁剪**前半轮次**,被丢弃内容同样写入记忆
|
||||
- **轮次 ≥ 5 时**:再次裁剪**前半轮次**,被丢弃内容同样写入记忆并注入上下文摘要
|
||||
|
||||
### 4. 溢出应急处理
|
||||
|
||||
|
||||
Reference in New Issue
Block a user