feat(memory): async LLM context summary injection on trim

- Unified flush + context injection into a single async LLM call (flush_from_messages accepts context_summary_callback) - Fixed response parsing bug: handle generator returns and Claude-format dicts from bot.call_with_tools, which previously caused all LLM summaries to silently fail (falling back to rule-based extraction) - Removed standalone context summary prompts and methods; reuse the existing [DAILY]/[MEMORY] summarization pipeline - Updated docs (zh/en/ja) to reflect the new injection behavior
2026-07-18 20:17:09 +08:00 · 2026-04-13 20:13:05 +08:00
parent da97e948ca
commit 33cf1bc4c3
9 changed files with 187 additions and 67 deletions
--- a/docs/en/memory/context.mdx
+++ b/docs/en/memory/context.mdx
@@ -39,14 +39,15 @@ When conversation turns exceed `agent_max_context_turns`:

 - The **oldest half** of complete turns is trimmed (preserving tool call chain integrity)
 - Trimmed messages are summarized by LLM and **written to the daily memory file**
- Remaining turns stay intact
+- Once the LLM summary is ready, it is also **injected into the first user message** of the retained context, helping the model maintain conversational continuity
+- Summary injection runs asynchronously in the background and takes effect from the next turn onward

 ### 3. Token Budget Trimming

 After turn trimming, if tokens still exceed the budget:

 - **Fewer than 5 turns**: All turns undergo **text compression** — each turn keeps only the first user text and last Agent reply, removing intermediate tool call chains
- **5 or more turns**: The **first half** of turns is trimmed again, with discarded content also written to memory
+- **5 or more turns**: The **first half** of turns is trimmed again, with discarded content written to memory and a context summary injected

 ### 4. Overflow Emergency Handling

--- a/docs/en/memory/index.mdx
+++ b/docs/en/memory/index.mdx
@@ -19,7 +19,7 @@ Stored in `~/cow/memory/` directory, named by date (e.g., `2026-03-08.md`), reco

 The Agent automatically persists conversation content to long-term memory through the following mechanisms:

- **On context trimming** — When conversation turns or tokens exceed the configured limit, the oldest half of the context is trimmed, and the discarded content is summarized by LLM into key information and written to the daily memory file
+- **On context trimming** — When conversation turns or tokens exceed the configured limit, the oldest half of the context is trimmed, and the discarded content is summarized by LLM into key information and written to the daily memory file. The summary is also asynchronously injected into the retained context for conversational continuity
 - **Daily scheduled summary** — A full summary is automatically triggered at 23:55 every day, ensuring memory is preserved even on low-activity days (skipped if content hasn't changed)
 - **On API context overflow** — When the model API returns a context overflow error, the current conversation summary is saved as an emergency measure