Files
chatgpt-on-wechat/docs/memory/context.mdx

82 lines
3.6 KiB
Plaintext

---
title: Short-term Memory
description: Conversation context — message management, compression strategies, and context operations
---
Conversation context is the Agent's short-term memory, containing all messages in the current session (user input, Agent replies, tool calls and results). Proper context management is critical for the Agent's reasoning quality and cost control.
## Context Structure
Each conversation turn consists of:
```
User message → Agent thinking → Tool call → Tool result → ... → Agent final reply
```
A single turn may include multiple tool calls (controlled by `agent_max_steps`). All tool calls and results are retained in context until compressed or trimmed.
## Key Configuration
| Parameter | Description | Default |
| --- | --- | --- |
| `agent_max_context_tokens` | Maximum context token budget | `50000` |
| `agent_max_context_turns` | Maximum conversation turns in context | `20` |
| `agent_max_steps` | Maximum decision steps per turn (tool call count) | `15` |
Configurable via `config.json` or the `/config` chat command.
## Compression Strategy
When context exceeds limits, the system automatically compresses to free space. The process has multiple stages:
### 1. Tool Result Truncation
Before each decision loop, the system checks tool call results in historical turns. Results exceeding **20,000 characters** are truncated, keeping only the beginning and end with a truncation notice. Current turn results are not affected.
### 2. Turn Trimming
When conversation turns exceed `agent_max_context_turns`:
- The **oldest half** of complete turns is trimmed (preserving tool call chain integrity)
- Trimmed messages are summarized by LLM and **written to the daily memory file**
- Once the LLM summary is ready, it is also **injected into the first user message** of the retained context, helping the model maintain conversational continuity
- Summary injection runs asynchronously in the background and takes effect from the next turn onward
### 3. Token Budget Trimming
After turn trimming, if tokens still exceed the budget:
- **Fewer than 5 turns**: All turns undergo **text compression** — each turn keeps only the first user text and last Agent reply, removing intermediate tool call chains
- **5 or more turns**: The **first half** of turns is trimmed again, with discarded content written to memory and a context summary injected
### 4. Overflow Emergency Handling
When the model API returns a context overflow error:
1. All current messages are summarized and written to memory
2. Aggressive trimming is applied (tool results limited to 10K chars, user text to 10K, max 5 turns)
3. If still overflowing, the entire conversation context is cleared
## Session Persistence
Conversation messages are persisted to a local database, automatically restored after service restart. Restore strategy:
- Restores the most recent **`max(3, max_context_turns / 6)`** turns
- Only retains each turn's **user text and Agent final reply**, not intermediate tool call chains
- Sessions older than **30 days** are automatically cleaned up
## Commands
Use these commands in chat to manage context:
| Command | Description |
| --- | --- |
| `/context` | View current context statistics (message count, role distribution, total characters) |
| `/context clear` | Clear current session context |
| `/config agent_max_context_tokens 80000` | Adjust context token budget |
| `/config agent_max_context_turns 30` | Adjust context turn limit |
<Tip>
After clearing context, the Agent "forgets" previous conversation content. Content that was already written to long-term memory can still be retrieved via memory search.
</Tip>