--- title: Short-term Memory description: Conversation context — message management, compression strategies, and context operations --- Conversation context is the Agent's short-term memory, containing all messages in the current session (user input, Agent replies, tool calls and results). Proper context management is critical for the Agent's reasoning quality and cost control. ## Context Structure Each conversation turn consists of: ``` User message → Agent thinking → Tool call → Tool result → ... → Agent final reply ``` A single turn may include multiple tool calls (controlled by `agent_max_steps`). All tool calls and results are retained in context until compressed or trimmed. ## Key Configuration | Parameter | Description | Default | | --- | --- | --- | | `agent_max_context_tokens` | Maximum context token budget | `50000` | | `agent_max_context_turns` | Maximum conversation turns in context | `20` | | `agent_max_steps` | Maximum decision steps per turn (tool call count) | `15` | Configurable via `config.json` or the `/config` chat command. ## Compression Strategy When context exceeds limits, the system automatically compresses to free space. The process has multiple stages: ### 1. Tool Result Truncation Before each decision loop, the system checks tool call results in historical turns. Results exceeding **20,000 characters** are truncated, keeping only the beginning and end with a truncation notice. Current turn results are not affected. ### 2. Turn Trimming When conversation turns exceed `agent_max_context_turns`: - The **oldest half** of complete turns is trimmed (preserving tool call chain integrity) - Trimmed messages are summarized by LLM and **written to the daily memory file** - Once the LLM summary is ready, it is also **injected into the first user message** of the retained context, helping the model maintain conversational continuity - Summary injection runs asynchronously in the background and takes effect from the next turn onward ### 3. Token Budget Trimming After turn trimming, if tokens still exceed the budget: - **Fewer than 5 turns**: All turns undergo **text compression** — each turn keeps only the first user text and last Agent reply, removing intermediate tool call chains - **5 or more turns**: The **first half** of turns is trimmed again, with discarded content written to memory and a context summary injected ### 4. Overflow Emergency Handling When the model API returns a context overflow error: 1. All current messages are summarized and written to memory 2. Aggressive trimming is applied (tool results limited to 10K chars, user text to 10K, max 5 turns) 3. If still overflowing, the entire conversation context is cleared ## Session Persistence Conversation messages are persisted to a local database, automatically restored after service restart. Restore strategy: - Restores the most recent **`max(3, max_context_turns / 6)`** turns - Only retains each turn's **user text and Agent final reply**, not intermediate tool call chains - Sessions older than **30 days** are automatically cleaned up ## Commands Use these commands in chat to manage context: | Command | Description | | --- | --- | | `/context` | View current context statistics (message count, role distribution, total characters) | | `/context clear` | Clear current session context | | `/config agent_max_context_tokens 80000` | Adjust context token budget | | `/config agent_max_context_turns 30` | Adjust context turn limit | After clearing context, the Agent "forgets" previous conversation content. Content that was already written to long-term memory can still be retrieved via memory search.